2017-04-05 112 views
0

我從數據幀的列中提取模式。有些人擁有「奧斯卡」一詞,有些擁有「奧斯卡」一詞。如何在熊貓數據框中提取。以下是提取行代碼。這給了錯誤。熊貓數據幀提取模式

df['Oscar_Awards_Won'] = df['Awards'].str.extract('Won (\d+) (Oscar[s]?)', expand=True).fillna(0) 

我很抱歉沒有發佈示例data.Sample數據與列獎。我試圖提取贏得的奧斯卡獎項。

Awards 
Won 3 Oscars. Another 234 wins & 312 nominations. 
Won 7 Oscars. Another 215 wins & 169 nominations. 
Won 11 Oscars. Another 174 wins & 113 nominations. 
Won 4 Oscars. Another 122 wins & 213 nominations. 
Won 3 Oscars. Another 92 wins & 150 nominations. 
Won 1 Oscar. Another 91 wins & 95 nominations. 
+5

我需要示例數據和預期輸出。閱讀[*** MCVE ***](http://stackoverflow.com/help/mcve)和[*** HowToAsk ***](http://stackoverflow.com/help/how-to-ask) – piRSquared

+0

你很幸運,你的問題沒有得到downvoted不提供樣品數據。 –

回答

0

這是需要什麼的?

import pandas as pd 
df = pd.DataFrame({'a': [1,2,3,4], 'b': ['is Oscar','asd','Oscars','not an Oscars q']}) 

df['c'] = ['Won 3 Oscars. Another 234 wins & 312 nominations.', 
'Won 7 Oscars. Another 215 wins & 169 nominations.', 
'Won 11 Oscar. Another 174 wins & 113 nominations.', 
'Won 4 Oscars. Another 122 wins & 213 nominations.'] 

這條線:

df['c'].str.extract('Won (\d+) Oscar[s]?', expand=True).fillna(0) 

給出:

0 
0 3 
1 7 
2 11 
3 4 
+0

它不適用於我的示例數據。我已經發布樣本數據上面 – Harish

+0

對我來說它的工作原理。你得到什麼樣的錯誤? –

+0

我只需要得到奧斯卡或奧斯卡之前的數字。它只適用於奧斯卡。奧斯卡之前的號碼不來 – Harish

0

這也將工作,因爲你不需要反正擔心的字母s。

df['Oscar_Awards_Won']=df['Awards'].str.extract('Won (\d+) Oscar', expand=True).fillna(0)