熊貓據幀堆疊多個列中的值成單列

假設下面的數據幀。熊貓據幀堆疊多個列中的值成單列

key.0 key.1 key.2 topic 
1 abc def ghi  8 
2 xab xcd xef  9

我如何可以將所有的關鍵值*成一列的「鑰匙」欄目，這是與主題關聯對應於鍵的值。*列？這是結果，我想：

topic key 
1  8 abc 
2  8 def 
3  8 ghi 
4  9 xab 
5  9 xcd 
6  9 xef

注意key.N列數是一些外部變量N.

來源

2015-12-19 borice

你可以融化你的數據框：

>>> keys = [c for c in df if c.startswith('key.')] 
>>> pd.melt(df, id_vars='topic', value_vars=keys, value_name='key') 

    topic variable key 
0  8 key.0 abc 
1  9 key.0 xab 
2  8 key.1 def 
3  9 key.1 xcd 
4  8 key.2 ghi 
5  9 key.2 xef

這也給你是鑰匙的來源。

從v0.20，melt是pd.DataFrame類的第一類函數：

>>> df.melt('topic', value_name='key').drop('variable', 1) 

    topic key 
0  8 abc 
1  9 xab 
2  8 def 
3  9 xcd 
4  8 ghi 
5  9 xef

來源

2015-12-19 22:55:48 Alexander

簡單，速度非常快。謝謝。 – borice

嘗試各種方式之後，我發現下面是多還是少直觀，提供stack的魔法瞭解：

# keep topic as index, stack other columns 'against' it 
stacked = df.set_index('topic').stack() 
# set the name of the new series created 
df = stacked.reset_index(name='key') 
# drop the 'source' level (key.*) 
df.drop('level_1', axis=1, inplace=True)

所得數據幀是根據需要：

topic key 
0  8 abc 
1  8 def 
2  8 ghi 
3  9 xab 
4  9 xcd 
5  9 xef

您可能要打印中間結果，瞭解全過程。如果你不介意超過所需的列，關鍵步驟是set_index('topic')，stack()和reset_index(name='key')。

來源

2015-12-19 23:09:21 miraculixx

我似乎無法找到關於'reset_index'了'name'參數的任何文件，你能解釋它是如何工作的？ – imp9

它是[Series.reset_index（）]（http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.reset_index.html?highlight=reset_index） – miraculixx

OK，導致當前的答案之一是標記爲重複這個問題，我會在這裏回答。

使用wide_to_long

pd.wide_to_long(df, ['key'], 'topic', 'age').reset_index().drop('age',1) 
Out[123]: 
    topic key 
0  8 abc 
1  9 xab 
2  8 def 
3  9 xcd 
4  8 ghi 
5  9 xef

來源

2017-09-15 13:07:54 Wen

熊貓據幀堆疊多個列中的值成單列

回答

相關問題