2016-04-20 103 views
1

我想通過我的數據框中的列循環,如果該單詞存在,然後添加到一個新的列字。熊貓列表理解,如果語句

這是我的數據:

import pandas as pd 

d = {'title':pd.Series(['123','xyz']), 
'question':pd.Series(["Hi i want to buy orange and pear", "How much is the banana?"]) 
} 
df =pd.DataFrame(d) 

DF
      question  title 
0 Hi i want to buy orange and pear 123 
1   How much is the banana? xyz 

代碼:

#write to column if word exist: 

fruit_list=['orange','pear','banana'] 
for i in fruit_list: 
    df['fruit']=[i if i in qn for qn in df['question']] 

期望的輸出:

      question  title fruit 
0 Hi i want to buy orange and pear 123  orange 
1 Hi i want to buy orange and pear 123  pear 
2 How much is the banana?   xyz  banana 

錯誤

SyntaxError: invalid syntax at the 'for' word. 

回答

0

這個怎麼樣?對於每一行,它提供一個匹配單詞列表,然後展開數據框,以便每行只有一個匹配單詞。

fruit_list = ['orange', 'pear', 'banana'] 
df['word_match'] = df.question.str.findall(
    r'[\w]+').apply(set).apply(lambda my_set: list(my_set.intersection(fruit_list))) 
>>> df 
          question title  word_match 
0 Hi i want to buy orange and pear 123 [orange, pear] 
1   How much is the banana? xyz  [banana] 

rows = [] 
for _, row in df.iterrows(): 
    [rows.append([row.question, row.title, word]) for word in row.word_match] 
>>> pd.DataFrame(rows, columns=df.columns) 
          question title word_match 
0 Hi i want to buy orange and pear 123  orange 
1 Hi i want to buy orange and pear 123  pear 
2   How much is the banana? xyz  banana 
2

我相信你想要的是:

fruit_list=['orange','pear','banana'] 

df['fruit'] = [[f for f in fruit_list if f in qn] for qn in df['question']] 
+0

就這樣,在輸出的第一行有2種水果在那裏,但我希望他們作爲獨立的行。 – jxn

+1

啊,我沒有正確地閱讀這個問題。我不知道有沒有一種好的方法來平整不涉及複製行的數據框架,所以Asav的答案可能是要走的路 – lsankar4033

2

這個怎麼樣?

input = [{"question" : "Hi i want to buy orange and pear", "title" : 123} 
     , {"question" : "How much is the banana?", "title" : 456}] 
list_size = len(input) 

output = [] 

fruit_list=['orange','pear','banana'] 

for i in range(list_size): 
    fruits = [f for f in fruit_list if f in input[i].get("question")] 
    for f in fruits: 
     if not input[i].get("fruit"): 
      input[i]['fruit'] = f 
     else: 
      i = input[i].copy() # need to append a copy, otherwise it will just add references to the same dictionary over and over again 
      i['fruit'] = f 
      input.append(i) 
print (input) 

如果你不希望創建然後上面的代碼修改後的新對象會工作,但如果它是確定以創建輸出另一個對象,然後代碼變得更簡單。

input = [{"question" : "Hi i want to buy orange and pear", "title" : 123} 
        , {"question" : "How much is the banana?", "title" : 456}] 
output = [] 
fruit_list=['orange','pear','banana'] 

for i in input: 
    fruits = [f for f in fruit_list if f in i.get("question")] 
    for f in fruits: 
     i['fruit'] = f 
     output.append(i.copy()) # need to append a copy, otherwise it will just add references to the same dictionary over and over again 
print (output) 

希望它有助於