2017-07-21 88 views
1

我有一個看起來像這樣的Excel文件:得到一個pd.dataFrame的最後一列,並把它添加到另一個pd.dataFrame

CompanyName High Priority  QualityIssue 
Customer1   Yes    User 
Customer1   Yes    User 
Customer2   No    User 
Customer3   No    Equipment 
Customer1   No    Neither 
Customer3   No    User 
Customer3   Yes    User 
Customer3   Yes    Equipment 
Customer4   No    User 

我想指望有多少時間在CompanyName每個實例每出現每種類型的QualityIssue和排序的外觀下降的數量。

例如,通過使用此代碼:

df.groupby(["CompanyName ", "QualityIssue"]).size().to_frame('Count') 

我得到:

Out: 

CompanyName  QualityIssue Count 
Customer2   User   1 
Customer1   Neither   1 
Customer4   User   1 
Customer1   User   2 
Customer3   Equipment  2 
Customer3   User   2 

然後讓我們說,我有以上的另一個副本存儲在存儲器。

我要的是第二個查詢的最後一列添加到第一個結束(在現實中,它不會是它的一個副本,它僅僅是一個例子):

CompanyName  QualityIssue Count1 Count2 
Customer2   User   1  1 
Customer1   Neither   1  1 
Customer4   User   1  1 
Customer1   User   2  2 
Customer3   Equipment  2  2 
Customer3   User   2  2  

這裏的問題是,如果我做

df['Count'] 

它不會只打印該列,將打印的一切,就像做

print df 

所以我找不到一種方法只獲取dataFrame的最後一列以將其添加到另一個。

回答

1

快速和簡單的方式使用groupbysize

df.groupby(['CompanyName', 'QualityIssue']).size() 

CompanyName QualityIssue 
Customer1 Neither   1 
      User   2 
Customer2 User   1 
Customer3 Equipment  2 
      User   2 
Customer4 User   1 
dtype: int64 

假設我們有另一個在內存

c1 = df.groupby(['CompanyName', 'QualityIssue']).size() 
c2 = c1.copy() 

然後使用pd.concat

pd.concat([c1, c2], keys=['Count1', 'Count2']).unstack(0, fill_value=0) 

          Count1 Count2 
CompanyName QualityIssue     
Customer1 Neither   1  1 
      User    2  2 
Customer2 User    1  1 
Customer3 Equipment   2  2 
      User    2  2 
Customer4 User    1  1 

reset_index如果你想索引返回到數據框本身。

pd.concat([c1, c2], keys=['Count1', 'Count2']).unstack(0, fill_value=0) \ 
    .reset_index() 

    CompanyName QualityIssue Count1 Count2 
0 Customer1  Neither  1  1 
1 Customer1   User  2  2 
2 Customer2   User  1  1 
3 Customer3 Equipment  2  2 
4 Customer3   User  2  2 
5 Customer4   User  1  1 
相關問題