2016-12-25 16 views
1

我怎麼能在更多的大熊貓的方式寫下面的代碼:過濾數據幀中更有效的方式

majority_df = df[(df.voting_majority_status_fk == 4) & (df.other == True)] 
minority_df = df[(df.voting_majority_status_fk == 3)] 

我需要採取只vp_fk是在majority_dfminority_df然後採取從唯一行majority_df被發現獨特vp_fk

我怎麼可以寫下更多的熊貓的方式。

majority_vp_fk = set(majority_df.vp_fk) 
minority_vp_fk = set(minority_df.vp_fk) 

clean_majority_vp_fk = majority_vp_fk - minority_vp_fk 

clean_majority_df = majority_df[majority_df.vp_fk.isin(clean_majority_vp_fk)] 
clean_majority_df = clean_majority_df.drop_duplicates(subset=['probe_fk', 'vp_fk', 'masking_box_fk', 'product_fk']) 
+2

你能提供一個小的可重複的樣本數據集和預期/期望的最終數據集? – MaxU

回答

2

這裏是我的「非常理論」(這是很難測試它沒有樣本數據集)解決方案:

minority_df = df[(df.voting_majority_status_fk == 3)] 
qry = "voting_majority_status_fk == 4 and other == True and vp_fk not in @minority_df.vp_fk" 
result = (df.query(qry) 
      .drop_duplicates(subset=['probe_fk', 'vp_fk', 'masking_box_fk', 'product_fk']))