熊貓錯誤匹配字符串

我有像下面的SampleDf數據的數據。我試圖檢查數據框中一列中的值，看它們是否包含'sum'或'count'或'Avg'，然後創建一個新值爲'sum'，'count'或'Avg'的列。當我在我的真實數據框上運行下面的代碼時，我得到下面的錯誤。當我在我的真實數據框上運行dtypes時，它說所有的列都是對象。下面的代碼與下面的帖子有關。不幸的是，我在我提供的SampleDf上運行代碼時沒有得到相同的錯誤，但是我無法發佈我的整個數據框。熊貓錯誤匹配字符串

後： Pandas and apply function to match a string

Code: 

SampleDf=pd.DataFrame([['tom',"Avg(case when Value1 in ('Value2') and [DateType] in ('Value3') then LOS end)"],['bob',"isnull(Avg(case when XferToValue2 in (1) and DateType in ('Value3') and [Value1] in ('HM') then LOS end),0)"]],columns=['ReportField','OtherField']) 


search1='Sum' 
search2='Count' 
search3='Avg' 


def Agg_type(x): 
    if search1 in x: 
     return 'sum' 
    elif search2 in x: 
     return 'count' 
    elif search3 in x: 
     return 'Avg' 
    else: 
     return 'Other' 

SampleDf['AggType'] = SampleDf['OtherField'].apply(Agg_type) 

SampleDf.head() 


Error: 

--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-17-a2b4920246a7> in <module>() 
    17   return 'Other' 
    18 
---> 19 SampleDf['AggType'] = SampleDf['OtherField'].apply(Agg_type) 
    20 
    21 #SampleDf.head() 

C:\Users\Name\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds) 
    2292    else: 
    2293     values = self.asobject 
-> 2294     mapped = lib.map_infer(values, f, convert=convert_dtype) 
    2295 
    2296   if len(mapped) and isinstance(mapped[0], Series): 

pandas\src\inference.pyx in pandas.lib.map_infer (pandas\lib.c:66124)() 

<ipython-input-17-a2b4920246a7> in Agg_type(x) 
     8 
     9 def Agg_type(x): 
---> 10  if search1 in x: 
    11   return 'sum' 
    12  elif search2 in x: 

TypeError: argument of type 'float' is not iterable

來源

2017-07-08 ndderwerdo

我不能用這些數據重現你的錯誤，儘管如果我用浮點數列表在'['OtherField']'列上寫字，你的函數看起來很好 - 問題似乎與你的''' ['OtherField']'列。 – cmaher

是否okey，如果我提供一個解決方案，工程b與你的不同嗎？只是因爲你的代碼對我也沒有提出任何錯誤 –

@RayhaneMama謝謝你回覆我，是的請提供你的解決方案。 – ndderwerdo

你可以試試這個：

SampleDf['new_col'] = np.where(SampleDf.OtherField.str.contains("Avg"),"Avg", 
          np.where(SampleDf.OtherField.str.contains("Count"),"Count", 
            np.where(SampleDf.OtherField.str.contains("Sum"),"Sum","Nothing")))

請注意，這將正常工作，如果你不具備這兩個Avg和Count或Sum在相同的字符串。
如果你這樣做，請注意我，我會尋找更好的方法。
當然，如果某些東西不適合您的需求，也請回報。
希望這是有幫助的

解釋：

發生的事情是，你要尋找的指標，其中Avg裏面OtherField列字符串中並在這些指標以「平均」補new_col。其餘字段（那裏沒有「平均」，你看Count和做同樣受用Sum做同樣的

文檔：

np.where

pandas.series.str.contains

來源

2017-07-08 18:42:59

謝謝，這個伎倆。你是否理解我的代碼在做什麼和你的代碼在做什麼之間的區別？我試圖理解熊貓版本爲什麼不起作用。我看過數據，看起來像字符串不漂浮。 – ndderwerdo

是的，其實我也有這個bug，我也不知道它爲什麼會彈出，但是打印df.dtypes表明我沒有浮動但是很好..但是我仍然在尋找原因，我會通知你@ndderwerdo –

@ndderwerdo我剛剛測試了你的代碼，它的工作原理，所以如果它仍然不適合你，我不認爲它與你發佈的代碼有關。 –

熊貓錯誤匹配字符串

回答

相關問題