將行，列值轉換爲字典並轉換爲數據框熊貓

我有一個數據框people與name和text作爲兩列。

name  text 
0 Obama  Obama was the 44th president of the... 
1 Trump  Donald J. Trump ran as a republican...

我只需要對Obama進行一些探索性分析。

obama= people[people['name'] == 'Obama'].copy() 
obama.text 

35817 Obama was the 44th president of the unit... 
Name: text, dtype: object

如何將文本轉換爲字典與鍵的話和單詞值的計數的新列？
例如：

name  text         dictionary 
0 Obama  Obama was the 44th president of the... {'Obama':1, 'the':2,...}

做一次，我怎麼字典轉換爲一個單獨的數據幀？
預期：

word count 
0 Obama 1 
1 the 2

來源

2016-11-18 Drj

可以使用Counter對象從集合模塊：

import collections 

people['dictionary'] = people.text.apply(lambda x: dict(collections.Counter(x.split())))

要轉換這些字典的數據幀中的一個：

dictionary = people['dictionary'][0] 
pd.DataFrame(data={'word': dictionary.keys(), 'count': dictionary.values()})

來源

2016-11-18 03:02:10 nathanielobrown

第一部分奇蹟般有效。第二個將字典轉換爲數據框繼續給我問題''numpy.ndarray'對象不可調用'。我終於解決了它使用'pd.DataFrame.from_dict（dictionary，orient =「index」）' – Drj

嗯，有趣。我期望'people'''dictionary'] [0]'產生一本字典，但聽起來你正在獲得一個熊貓系列。也許你正在使用不同版本的熊貓。您可以嘗試使用'Dataframe.loc'或'Dataframe.iloc'作爲引用[here]（http://pandas.pydata.org/pandas-docs/stable/indexing.html）。 – nathanielobrown

是的，你是對的，這確實是一個系列，我很困惑從'R'移動。我認爲這是熊貓的工作方式，但看起來像一些特定的版本。無論如何，現在這個問題已經解決了，我會盡快嘗試你的建議。 – Drj

將行，列值轉換爲字典並轉換爲數據框熊貓

回答

相關問題