根据某些列（pandas）中的空值删除行_python

我知道如何从包含所有空值或单个null的DataFrame中删除一行，但是您可以根据指定的一组列的空值删除一行吗？

例如，假设我正在处理包含地理信息（城市，纬度和经度）的数据以及许多其他字段。 我想保留至少包含城市OR值的行和lat和long的行，但是删除所有三个都具有空值的行。

我在pandas文档中找不到这方面的功能。 任何指导将不胜感激。

您可以使用pd.dropna而不是使用how='all'和subset=[] ，您可以使用thresh参数在行被删除之前连续需要最少数量的NA。 在城市，long / lat示例中， thresh=2将起作用，因为我们仅在3个NA的情况下丢弃。 使用MaxU设置的大数据示例，我们会这样做

## get the data
df = pd.read_clipboard()

## remove undesired rows
df.dropna(axis=0, subset=[['city', 'longitude', 'latitude']], thresh=2)

这会产生：

In [5]: df.dropna(axis=0, subset=[['city', 'longitude', 'latitude']], thresh=2)
Out[5]:
  city  latitude  longitude  a  b
0  aaa   11.1111        NaN  1  2
1  bbb       NaN    22.2222  5  6
3  NaN   11.1111    33.3330  1  2

尝试这个：

In [25]: df
Out[25]:
  city  latitude  longitude  a  b
0  aaa   11.1111        NaN  1  2
1  bbb       NaN    22.2222  5  6
2  NaN       NaN        NaN  3  4
3  NaN   11.1111    33.3330  1  2
4  NaN       NaN    44.4440  1  1

In [26]: df.query("city == city or (latitude == latitude and longitude == longitude)")
Out[26]:
  city  latitude  longitude  a  b
0  aaa   11.1111        NaN  1  2
1  bbb       NaN    22.2222  5  6
3  NaN   11.1111    33.3330  1  2

如果我正确理解OP，则必须删除索引为4的行，因为两个坐标都不为空。 因此，在这种情况下， dropna()将无法“正常”工作：

In [62]: df.dropna(subset=['city','latitude','longitude'], how='all')
Out[62]:
  city  latitude  longitude  a  b
0  aaa   11.1111        NaN  1  2
1  bbb       NaN    22.2222  5  6
3  NaN   11.1111    33.3330  1  2
4  NaN       NaN    44.4440  1  1   # this row should be dropped...

dropna有一个参数仅在列的子集上应用测试：

dropna(axis=0, how='all', subset=[your three columns in this list])

使用布尔掩码和一些聪明的dot积（这是@Boud）

subset = ['city', 'latitude', 'longitude']
df[df[subset].notnull().dot([2, 1, 1]).ge(2)]

  city  latitude  longitude  a  b
0  aaa   11.1111        NaN  1  2
1  bbb       NaN    22.2222  5  6
3  NaN   11.1111    33.3330  1  2

您可以通过利用按位运算符来执行选择。

## create example data
df = pd.DataFrame({'City': ['Gothenburg', None, None], 'Long': [None, 1, 1], 'Lat': [1, None, 1]})

## bitwise/logical operators
~df.City.isnull() | (~df.Lat.isnull() & ~df.Long.isnull())
0     True
1    False
2     True
dtype: bool

## subset using above statement
df[~df.City.isnull() | (~df.Lat.isnull() & ~df.Long.isnull())]
         City  Lat  Long
0  Gothenburg  1.0   NaN
2        None  1.0   1.0

根据某些列（pandas）中的空值删除行

问题描述

1楼

2楼

3楼

4楼

5楼