分組的熊貓列（這是一個系列本身）的合併系列

我有一個熊貓數據框，其中一列是一個系列本身。例如：分組的熊貓列（這是一個系列本身）的合併系列

df.head() 

Col1 Col2 
1  ["name1","name2","name3"] 
1  ["name3","name2","name4"] 
2  ["name1","name2","name3"] 
2  ["name1","name5","name6"]

我需要在Col1組中連接Col2。我想是這樣

Col1 Col2 
1  ["name1","name2","name3","name4"] 
2  ["name1","name2","name3","name5","name6"]

我試圖使用GROUPBY作爲

.agg({"Col2":lambda x: pd.Series.append(x)})

但是，這將引發錯誤，說需要兩個參數。我也嘗試在agg函數中使用sum。這種失敗並不會減少錯誤。

我該怎麼做？

來源

2016-11-22 Sarvo

您可以使用groupby與apply自定義函數，其中由chain（最快solution）首先壓平嵌套列表，然後通過set刪除重複，轉換爲list和最後一個排序：

import pandas as pd 
from itertools import chain 

df = pd.DataFrame({'Col1':[1,1,2,2], 
        'Col2':[["name1","name2","name3"], 
          ["name3","name2","name4"], 
          ["name1","name2","name3"], 
          ["name1","name5","name6"]]}) 

print (df) 
    Col1     Col2 
0  1 [name1, name2, name3] 
1  1 [name3, name2, name4] 
2  2 [name1, name2, name3] 
3  2 [name1, name5, name6]

print (df.groupby('Col1')['Col2'] 
     .apply(lambda x: sorted(list(set(list(chain.from_iterable(x)))))) 
     .reset_index()) 
    Col1         Col2 
0  1   [name1, name2, name3, name4] 
1  2 [name1, name2, name3, name5, name6]

解決方案可以更簡化，只需要chain,set和sorted：

print (df.groupby('Col1')['Col2'] 
     .apply(lambda x: sorted(set(chain.from_iterable(x)))) 
     .reset_index()) 

    Col1         Col2 
0  1   [name1, name2, name3, name4] 
1  2 [name1, name2, name3, name5, name6]

來源

2016-11-22 06:10:21 jezrael

是的，你不能在這樣的分類數據上使用.aggby{}。無論如何，這是我的問題，使用numpy的幫助。（註釋爲清晰起見）

import numpy as np 

# Set group by ("Col1") unique values 
groupby = df["Col1"].unique() 

# Create empty dict to store values on each iteration 
d = {} 

for i,val in enumerate(groupby): 

    # Set "Col1" key, to the unique value (e.g., 1) 
    d.setdefault("Col1",[]).append(val) 

    # Create empty list to store values from "Col2" 
    col2_unis=[] 

    # Create sub-DataFrame for each unique groupby value 
    sdf = df.loc[df["Col1"]==val] 

    # Loop through the 2D-array/Series "Col2" and append each 
    # value to col_unis (using list comprehension) 
    col2_unis.append([[j for j in array] for i,array in enumerate(sdf["Col2"].values)]) 

    # Set "Col2" key, to be unique values of col2_unis 
    d.setdefault("Col2",[]).append(np.unique(col2_unis)) 

new_df = pd.DataFrame(d) 

print(new_df)

更濃縮版本會是什麼樣子：

d = {} 
for i,val in enumerate(df["Col1"].unique()): 
    d.setdefault("Col1",[]).append(val) 
    sdf = df.loc[df["Col1"]==val] 
    d.setdefault("Col2",[]).append(np.unique([[j for j in array] for i,array in enumerate(df.loc[df["Col1"]==val, "Col2"].values)])) 
new_df = pd.DataFrame(d) 
print(new_df)

瞭解更多關於Python的.setdefault()功能字典，通過檢查this related SO question。

來源

2016-11-22 05:33:15 ralston

分組的熊貓列（這是一個系列本身）的合併系列

回答

相關問題