如何提取新的子字符串作爲科拉姆colums

我有一個熊貓數據幀名爲科拉姆：實體當我通過打印列：如何提取新的子字符串作爲科拉姆colums

df.entity

輸出看起來像這樣（我有267行，這只是前兩行）

[(East, NNP), (India, CTR), (Company, ORG)] 
[(Pasteur, ZZP)]

我怎樣才能得到一個新的列，其中輸出是這樣的：

East, India, Company 
Pasteur

來源

2017-09-28 jennifer ruurs

選項1次
zip和迭代

df.assign(entity=[', '.join(next(zip(*r))) for r in df.entity]) 

       entity 
0 East, India, Company 
1    Pasteur

選項2
的@零的回答理解優化版本。應該更快。

df.assign(entity=[', '.join([x[0] for x in r]) for r in df.entity]) 

       entity 
0 East, India, Company 
1    Pasteur

設置

df = pd.DataFrame(dict(
    entity=[ 
     [('East', 'NNP'), ('India', 'CTR'), ('Company', 'ORG')], 
     [('Pasteur', 'ZZP')] 
    ]))

來源

2017-09-28 21:24:53 piRSquared

兩個'for loop' :)不錯 – Wen

@零功能相同，但我' d幾乎總是在'apply'上使用comprehensions – piRSquared

發電機非常強大。 – Zero

使用apply

In [4697]: df.entity.apply(lambda x: ', '.join(t[0] for t in x)) 
Out[4697]: 
0 East, India, Company 
1     Pasteur 
Name: entity, dtype: object

詳細

         entity 
0 [(East, NNP), (India, CTR), (Company, ORG)] 
1        [(Pasteur, ZZP)]

來源

2017-09-28 21:15:36 Zero

非常優雅！ +1 – Vaishali

這裏是另一種解決方案

df['New']=df.entity.apply(pd.Series).stack().apply(pd.Series).groupby(level=0)[0].agg(lambda x: ','.join(set(x))) 
df 
Out[74]: 
             entity     New 
0 [(East, NNP), (India, CTR), (Company, ORG)] India,Company,East 
1        [(Pasteur, ZZP)]    Pasteur

數據輸入

df=pd.DataFrame({'entity':[[('East', 'NNP'), ('India', 'CTR'), ('Company', 'ORG')],[('Pasteur', 'ZZP')] ]})

次

來源

2017-09-28 21:24:07 Wen

TypeError：參數'obj'有不正確的類型（預計列表，得到樹）我收到此錯誤 –

@jenniferruurs添加了我的輸入數據 – Wen

如何提取新的子字符串作爲科拉姆colums

回答

相關問題