我的熊貓數據框中的一列包含一個列表。我想擴展它並轉換下面的垂直形狀。如何做到這一點？如何將列中的列表轉換爲垂直形狀？

之前（代碼）：

import pandas as pd 
pd.DataFrame({ 
    'col1':['fruit', 'veicle', 'animal'], 
    'col2':['apple', 'bycicle', 'cat'], 
    'col3':[1,4,2], 
    'list':[ 
     [10, 20], 
     [1.2, 3.0, 2.75], 
     ['tommy', 'tom'] 
    ] 
})

之前（表）：

|col1 |col2 |col3|list   | 
    |------|-------|----|----------------| 
    |fruit |apple | 1|[10, 20]  | 
    |veicle|bicycle| 4|[1.2, 3.0, 2.75]| 
    |animal|cat | 2|['tommy', 'tom']|

注1後

|col1 |col2 |col3|list | 
    |------|-------|----|-------| 
    |fruit |apple | 1|10  | 
    |fruit |apple | 1|20  | 
    |viecle|bycicle| 4|1.2 | 
    |viecle|bycicle| 4|3.0 | 
    |viecle|bycicle| 4|2.75 | 
    |animal|cat | 2|'tommy'| 
    |animal|cat | 2|'tom |

：列表的長度和類型是不同的。

注2：我可以不是修改生成datafarme的代碼。

謝謝您的閱讀。

來源

2017-08-27 AkiraIsaka

的[爆炸與熊貓不同長度的列表]可能的複製（https://stackoverflow.com/questions/45885143/爆炸列表與不同長度的熊貓） – Wen

之前問你可以簡單地谷歌它，https://stackoverflow.com/questions/45885143/explode-lists-with-different-lengths-in-pandas/45886206 ＃45886206 – Wen

謝謝你有用的鏈接，請原諒我發佈重複的問題。我仔細搜索了Google，但我找不到那篇文章。 – AkiraIsaka

學到PIR這個涼爽的伎倆有一天，使用np.repeat和np.concatenate：

idx = np.arange(len(df)).repeat(df.list.str.len(), 0)  
out = df.iloc[idx, :-1].assign(list=np.concatenate(df.list.values)) 
print(out) 

    col1  col2 col3 list 
0 fruit apple  1  10 
0 fruit apple  1  20 
1 veicle bycicle  4 1.2 
1 veicle bycicle  4 3.0 
1 veicle bycicle  4 2.75 
2 animal  cat  2 tommy 
2 animal  cat  2 tom

性能

小

# Bharath 
%timeit df.set_index(['col1','col2','col3']['list'].apply(pd.Series).stack()\ 
       .reset_index().drop('level_3',axis=1) 
100 loops, best of 3: 7.75 ms per loop 

# Mine 
%%timeit 
idx = np.arange(len(df)).repeat(df.list.str.len(), 0)  
out = df.iloc[idx, :-1].assign(list=np.concatenate(df.list.values))  
1000 loops, best of 3: 1.41 ms per loop

大

df_test = pd.concat([df] * 10000) 

# Bharath 
%timeit df_test.set_index(['col1','col2','col3'])['list'].apply(pd.Series).stack()\ 
       .reset_index().drop('level_3',axis=1) 
1 loop, best of 3: 7.09 s per loop 

# Mine 
%%timeit 
idx = np.arange(len(df_test)).repeat(df_test.list.str.len(), 0)  
out = df_test.iloc[idx, :-1].assign(list=np.concatenate(df_test.list.values)) 
10 loops, best of 3: 123 ms per loop

作爲1套，巴拉斯的答案是矮，但速度緩慢。下面是一個使用數據幀的構造函數，而不是df.apply對大數據的200倍加速改進：

idx = df.set_index(['col1', 'col2', 'col3']).index 
out = pd.DataFrame(df.list.values.tolist(), index=idx).stack()\ 
       .reset_index().drop('level_3', 1).rename(columns={0 : 'list'}) 

print(out) 

    col1  col2 col3 list 
0 fruit apple  1  10 
1 fruit apple  1  20 
2 veicle bycicle  4 1.2 
3 veicle bycicle  4  3 
4 veicle bycicle  4 2.75 
5 animal  cat  2 tommy 
6 animal  cat  2 tom

小

100 loops, best of 3: 4.7 ms per loop

大

10 loops, best of 3: 28.9 ms per loop

來源

2017-08-27 14:35:15

Numpy非常快。它很難打敗一個不起眼的答案。 – Dark

@Bharathshetty是的，但我沒想到熊貓會這麼慢。 –

我用過。所以是的，它有點慢。我認爲應用總是殺死一點表現。 – Dark

可以set_index前三列的和然後將pd.Series應用於列表的列，然後堆疊它們。

df.set_index(['col1','col2','col3'])['list'].apply(pd.Series).stack().reset_index().drop('level_3',axis=1)

輸出：

 
    col1  col2 col3  0 
0 fruit apple 1  10 
1 fruit apple 1  20 
2 veicle bycicle 4  1.2 
3 veicle bycicle 4  3  
4 veicle bycicle 4  2.75 
5 animal cat  2  tommy 
6 animal cat  2  tom

來源

2017-08-27 14:41:38 Dark

增加了一些時間安排：https://stackoverflow.com/a/45906100/4909087 –

這裏大約是如何完成這個任務。這不是精確解，但你如何完成你的任務會給你一個想法：

original_df = <your dataframe to start> 
new_empty_df = pd.DataFrame() 
# now go through each row of the original df 
for i in range(original_df.shape[0]): 
    row_Series = original_df.iloc[i] 
    row_list = row_Series['list'] 
    for item in row_list: 
     new_empty_df.append({'col1':row_Series['col1'], 
           'col2':row_Series['col2'], 
           'list':item})

來源

2017-08-27 14:58:58 Heapify

如何將列中的列表轉換爲垂直形狀？

回答

小

大

小

大

相關問題