2017-08-04 14 views
2

我在下面有一個示例數據框和函數。我創建了一個函數,它將得到一個「單元格」的座標並將其放在一個元組中,以及它放在那裏的原因。我想要這個函數也改變某個列的值。熊貓在iloc上設置的值爲空

import pandas as pd 
import numpy as np 
df1 = pd.DataFrame({'A' : [np.NaN,np.NaN,3,4,5,5,3,1,5,np.NaN], 
        'B' : [1,0,3,5,0,0,np.NaN,9,0,0], 
        'C' : [10,0,30,50,0,0,4,10,1,0], 
        'D' : [1,0,3,4,0,0,7,8,0,1], 
        'E' : [np.nan,'Unassign','Assign','Ugly','Appreciate', 
          'Undo','Assign','Unicycle','Assign','Unicorn',]}) 
print(df1) 
highlights = [] 
def find_nan(list_col): 
    for c in list_col: 
     # if column is one of the dataframe's columns, go 
     if c in df1.columns: 
      # for each index x where column c of the dataframe is null, go 
      for x in df1.loc[df1[c].isnull()].index: #appends to the list of tuples 
       highlights.append(
        tuple([x + 2, df1.columns.get_loc(c) + 1, f'{c} is Null in row {x + 2}'])) 

       df1.iloc[x, df1.columns.get_loc('E')] = f'{c} is blank in row {x + 2}' 
find_nan(['A','B']) 
    # using the function above, checks for all nulls in A and B 
    # Also places the coordinates and reason in a tuple and changes values of column 'E' 

    #output: 
    A B C D E 
0 NaN 1.0 10 1 A is blank in row 2 
1 NaN 0.0 0 0 A is blank in row 3 
2 3.0 3.0 30 3 Assign 
3 4.0 5.0 50 4 Ugly 
4 5.0 0.0 0 0 Appreciate 
5 5.0 0.0 0 0 Undo 
6 3.0 NaN 4 7 Assign 
7 1.0 9.0 10 8 Unicycle 
8 5.0 0.0 1 0 Assign 
9 NaN 0.0 0 1 A is blank in row 11 

我想要做的是增加,將增加的原因一起,如果已填充E的,或者乾脆改變,如果空了的E價值邏輯。這是我的問題:使用df1.iloc我似乎無法檢查空值。

df1.iloc[0]['E'].isnull()回報AttributeError: 'float' object has no attribute 'isnull'(顯然)

要解決這個問題:如果np.isnan(df1.iloc[0]['E'])計算結果爲True,但如果有一個值已經在E我會得到一個TypeError我可以使用。

基本上我要的是這種邏輯的我的函數中:從我的函數的預期輸出的原始數據幀

if df1.iloc[x]['E'] is null: 
    df1.iloc[x, df1.columns.get_loc('E')] = 'PREVIOUS_VALUE' + f'{c} is blank in row {x + 2}' 
else: 
    df1.iloc[x, df1.columns.get_loc('E')] = f'{c} is blank in row {x + 2} 

find_nan(['A','B']) 

    A B C D E 
0 NaN 1.0 10 1 A is blank in row 2 
1 NaN 0.0 0 0 Unassign and A is blank in row 3 
2 3.0 3.0 30 3 Assign 
3 4.0 5.0 50 4 Ugly 
4 5.0 0.0 0 0 Appreciate 
5 5.0 0.0 0 0 Undo 
6 3.0 NaN 4 7 Assign and B is blank in row 8 
7 1.0 9.0 10 8 Unicycle 
8 5.0 0.0 1 0 Assign 
9 NaN 0.0 0 1 Unicorn and A is blank in row 11 

使用Python 3.6。這是一個更大的項目的一部分,具有更多的功能,因此'原因的增加'和2的索引'沒有明顯的原因'加入

+0

豆科植物避免了。 – Alexander

回答

2

請注意,這是使用Python 2進行測試,但我沒有注意任何可能 防止它在Python 3

def find_nan(df, cols): 
    if isinstance(cols, (str, unicode)): 
     cols = [cols] # Turn a single column into an list. 
    nulls = df[cols].isnull() # Find all null values in requested columns. 
    df['E'] = df['E'].replace(np.nan, "") # Turn NaN values into an empty string. 
    for col in cols: 
     if col not in df: 
      continue 
     # If null value in the column an existing value in column `E`, add " and ". 
     df.loc[(nulls[col] & df['E'].str.len().astype(bool)), 'E'] += ' and ' 
     # For null column values, add to column `E`: "[Column name] is blank in row ". 
     df.loc[nulls[col], 'E'] += '{} is blank in row '.format(col) 
     # For null column values, add to column `E` the index location + 2. 
     df.loc[nulls[col], 'E'] += (df['E'][nulls[col]].index + 2).astype(str) 
    return df 

>>> find_nan(df1, ['A', 'B']) 
    A B C D         E 
0 NaN 1 10 1    A is blank in row 2 
1 NaN 0 0 0 Unassign and A is blank in row 3 
2 3 3 30 3       Assign 
3 4 5 50 4        Ugly 
4 5 0 0 0      Appreciate 
5 5 0 0 0        Undo 
6 3 NaN 4 7 Assign and B is blank in row 8 
7 1 9 10 8       Unicycle 
8 5 0 1 0       Assign 
9 NaN 0 0 1 Unicorn and A is blank in row 11 
+0

用空字符串替換空值是個訣竅。儘管這並不像我希望的那樣「解決方案」的「乾淨」,但我會接受! – MattR

0

一個可能的解決方案的工作,以避免循環

def val(ser): 
    row_value = ser['index'] 
    check = ser[['A','B']].isnull() 
    found = check[check == True] 
    if len(found) == 1: 
     found = found.index[0] 
     if pd.isnull(ser['E']) == True: 
      return found + ' is blank in row ' + str(row_value + 2) 
     else: 
      return str(ser['E']) + ' and ' + found +' is blank in row ' + str(row_value+2) 
    else: 
     return ser['E'] 



df1['index'] = df1.index 


df1['E'] = df1.apply(lambda row: val(row),axis=1) 
print(df1.iloc[:,:5]) 

    A B C D         E 
0 NaN 1.0 10 1    A is blank in row 2 
1 NaN 0.0 0 0 Unassign and A is blank in row 3 
2 3.0 3.0 30 3       Assign 
3 4.0 5.0 50 4        Ugly 
4 5.0 0.0 0 0      Appreciate 
5 5.0 0.0 0 0        Undo 
6 3.0 NaN 4 7 Assign and B is blank in row 8 
7 1.0 9.0 10 8       Unicycle 
8 5.0 0.0 1 0       Assign 
9 NaN 0.0 0 1 Unicorn and A is blank in row 11 

EDIT如果多個列楠

def val(ser): 
    check = ser[['A','B']].isnull() 
    found = check[check].index 
    if len(found) == 0: 
     return str(ser['E']) 
    else: 
     one = (str(ser['E'])+' and ').replace('nan and','') 
     two = ' and '.join([str(x) for x in found]) 
     three = ' is blank in row ' + str(ser['index']+2) 
     return (one+two+three).strip() 



df1['index'] = df1.index 


df1['E'] = df1.apply(lambda row: val(row),axis=1) 
print(df1.iloc[:,:5]) 


    A B  C D         E 
0 NaN NaN NaN 1   A and B is blank in row 2 
1 NaN 0.0 0.0 0 Unassign and A is blank in row 3 
2 3.0 3.0 30.0 3       Assign 
3 4.0 5.0 50.0 4        Ugly 
4 5.0 0.0 0.0 0      Appreciate 
5 5.0 0.0 0.0 0        Undo 
6 3.0 NaN 4.0 7 Assign and B is blank in row 8 
7 1.0 9.0 10.0 8       Unicycle 
8 5.0 0.0 1.0 0       Assign 
9 NaN 0.0 0.0 1 Unicorn and A is blank in row 11 
+0

如果一行有多個'NaN'值會發生什麼?另外,不需要'== True',例如'檢查[檢查]'工作得很好。 – Alexander

+0

@亞歷山大公平夠了,只是爲了好玩 – DJK