2017-06-14 125 views
2

我試圖從我的數據框中刪除一行,其中一列的值爲空。我能找到的大部分幫助都涉及去除迄今爲止對我無效的NaN值。從熊貓數據框中刪除具有空值的行

這裏,我已經創建的數據幀:

[在這裏輸入的形象描述] [1]

# successfully crated data frame 
df1 = ut.get_data(symbols, dates) # column heads are 'SPY', 'BBD' 

# can't get rid of row containing null val in column BBD 
# tried each of these with the others commented out but always had an 
# error or sometimes I was able to get a new column of boolean values 
# but i just want to drop the row 
df1 = pd.notnull(df1['BBD']) # drops rows with null val, not working 
df1 = df1.drop(2010-05-04, axis=0) 
df1 = df1[df1.'BBD' != null] 
df1 = df1.dropna(subset=['BBD']) 
df1 = pd.notnull(df1.BBD) 


# I know the date to drop but still wasn't able to drop the row 
df1.drop([2015-10-30]) 
df1.drop(['2015-10-30']) 
df1.drop([2015-10-30], axis=0) 
df1.drop(['2015-10-30'], axis=0) 


with pd.option_context('display.max_row', None): 
    print(df1) 

這裏是我的輸出:

null val here

人請告訴我如何放棄這一行。最好通過用空值標識行以及如何按日期刪除。我一直沒有和熊貓一起工作過很長時間,我一直堅持這一個小時。任何意見將不勝感激。

回答

2

這應該做的工作:

df = df.dropna(how='any',axis=0) 

它會刪除每(軸= 0),有 「任何」 在它空值。

實施例:

#Recreate random DataFrame with Nan values 
df = pd.DataFrame(index = pd.date_range('2017-01-01', '2017-01-10', freq='1d')) 
# Average speed in miles per hour 
df['A'] = np.random.randint(low=198, high=205, size=len(df.index)) 
df['B'] = np.random.random(size=len(df.index))*2 

#Create dummy NaN value on 2 cells 
df.iloc[2,1]=None 
df.iloc[5,0]=None 

print(df) 
       A   B 
2017-01-01 203.0 1.175224 
2017-01-02 199.0 1.338474 
2017-01-03 198.0  NaN 
2017-01-04 198.0 0.652318 
2017-01-05 199.0 1.577577 
2017-01-06 NaN 0.234882 
2017-01-07 203.0 1.732908 
2017-01-08 204.0 1.473146 
2017-01-09 198.0 1.109261 
2017-01-10 202.0 1.745309 

#Delete row with dummy value 
df = df.dropna(how='any',axis=0) 

print(df) 

       A   B 
2017-01-01 203.0 1.175224 
2017-01-02 199.0 1.338474 
2017-01-04 198.0 0.652318 
2017-01-05 199.0 1.577577 
2017-01-07 203.0 1.732908 
2017-01-08 204.0 1.473146 
2017-01-09 198.0 1.109261 
2017-01-10 202.0 1.745309 

用於進一步細節見reference

如果您的DataFrame一切正常,則放棄NaN應該如此簡單。如果這仍然不起作用,請確保您的列中有適當的數據類型(pd.to_numeric想起...)

+0

這在我的情況下不起作用 –

+0

我的解決方法是在參數 中包含'null' na_values(['NaN','null'])被傳遞給pandas.read_csv()來創建df。仍然沒有解決方案,這是不可能的 –

+0

查看更新的答案與工作示例。 –

相關問題