2017-04-19 72 views
0

我有下面的SQL查詢創建一個新的表,總結每個訪問計數的用戶ID。如何在python中創建這個數據框?Python創建新的數據幀分組和總結列

create table User_Visits_summary as 
select user_id, 
sum(case when visit_type = 1 then 1 else 0 end) as Type_One_Counts, 
sum(case when visit_type = 2 then 1 else 0 end) as Type_Two_Counts, 
sum(case when visit_type = 3 then 1 else 0 end) as Type_Three_Counts, 
count(*) as Total_Visits 
from user_visits 
group by user_id 
+0

凡/如何存儲你的數據? 'user_visits'在Python中已經是一個dataframe/numpy數組/列表嗎?或者它存儲在已經連接到Python的SQL數據庫中?最新版本的Pandas具有直接處理SQL語句的能力(http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html)。但是,將SQL語句轉換爲Pandas索引/聚合語句會更高效(更快)。 – jberrio

+0

是的,user_visits已經是Python中的一個數據框。我想要創建一個名爲User_Visits_summary的新數據框,它捕獲SQL在上面執行的操作。 – esteban

回答

0

下面的代碼應該創建與SQL查詢相同的表。閱讀代碼中的註釋,並在調試模式下執行以更好地理解每行代碼的作用。有關大熊貓功能的一個有用的指南,看看這個cheatsheet-

https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf

import pandas as pd 

# example dataset 
user_visits = pd.DataFrame({'user_id' :['A','A','A','A','A','B','B','B','B'], 
          'visit_type':[ 1, 1, 3, 3, 3, 2, 2, 2, 2] }) 

# This summary table already contains the data you want, but on 'long column' format 
User_Visits_summary = user_visits.groupby(['user_id','visit_type']).size().reset_index() 

# Here we pivot the table to get to your desired format 
User_Visits_summary = User_Visits_summary.pivot(index='user_id',columns='visit_type', values=0) 

# Calculate total from sub-totals in new column 
User_Visits_summary['Total_Visits'] = User_Visits_summary.sum(axis=1)  

# Some formatting 
User_Visits_summary.reset_index(inplace=True) 
User_Visits_summary.rename(columns={1:'Type_One_Counts', 
            2:'Type_Two_Counts', 
            3:'Type_Three_Counts'}, inplace=True) 

# Table ready 
print(User_Visits_summary) 
# ...too wide to paste... 
+0

這工作完美!謝謝! – esteban

+0

不用擔心...如果工作正常,請將答案標記爲已接受的答案。謝謝 – jberrio