2013-09-30 68 views
3

我有這樣一個數據幀:從現有含有列表的列建立一個新的數據框(使用列表填充新線)

df = pd.DataFrame({'name': ['toto', 'tata', 'tati'], 'choices': 0}) 
df['choices'] = df['choices'].astype(object) 
df['choices'][0] = [1,2,3] 
df['choices'][1] = [5,4,3,1] 
df['choices'][2] = [6,3,2,1,5,4] 

print(df) 

      choices name 
0   [1, 2, 3] toto 
1  [5, 4, 3, 1] tata 
2 [6, 3, 2, 1, 5, 4] tati 

我想建立一個數據幀基於DF像這樣

   choice rank name 
0     1  0 toto 
1     2  1 toto 
2     3  2 toto 
3     5  0 tata 
4     4  1 tata 
5     3  2 tata 
6     1  3 tata 
7     6  0 tati 
8     3  1 tati 
9     2  2 tati 
10    1  3 tati 
11    5  4 tati 
12    4  5 tati 

我想使用每個值的列表和索引來填充新行。

我這樣做

size = df['choices'].map(len).sum() 
df2 = pd.DataFrame(index=range(size), columns=df.columns) 
del df2['choices'] 
df2['choice'] = np.nan 
df2['rank'] = np.nan 

k = 0 
for i in df.index: 
    choices = df['choices'][i] 
    for rank, choice in enumerate(choices): 
     df2['name'][k] = df['name'][i] 
     df2['choice'][k] = choice 
     df2['rank'][k] = rank 
     k += 1 

但我寧願一個量化的解決方案。 Python/Pandas可能嗎?

回答

5
In [4]: s = df.choices.apply(Series).stack() 

In [5]: s.name = 'choices' # needs a name to join 

In[6]: del df['choices'] 

In[7]: df1 = df.join(s.reset_index(level=1)) 

In[8]: df1.columns = ['name', 'rank', 'choice'] 

In [9]: df1.sort(['name', 'rank']).reset_index(drop=True) 
Out[9]: 
    name rank choice 
0 tata  0  5 
1 tata  1  4 
2 tata  2  3 
3 tata  3  1 
4 tati  0  6 
5 tati  1  3 
6 tati  2  2 
7 tati  3  1 
8 tati  4  5 
9 tati  5  4 
10 toto  0  1 
11 toto  1  2 
12 toto  2  3 

這與this solution of mine有關,但在您的情況下,您使用的是索引(排名)而不是丟棄它。

+0

熊貓太棒了! StackOv和你也一樣;-)謝謝 – working4coins

相關問題