2017-04-16 19 views
3

我有這兩種話語結構從兩個話語結構具有罕見的列值

主動刪除行:

Customer_ID | product_No| Rating 
7   | 111  | 3.0 
7   | 222  | 1.0 
7   | 333  | 5.0 
7   | 444  | 3.0 

用戶:

Customer_ID | product_No| Rating 
9   | 111  | 2.0 
9   | 222  | 5.0 
9   | 666  | 5.0 
9   | 555  | 3.0 

我想找到共同的產品評級兩個用戶評分(例如111,222)並刪除任何不常見的產品(例如444,333,555,666)。因此,新的DFS應該是這樣的:

活動:

Customer_ID | product_No| Rating 
7   | 111  | 3.0 
7   | 222  | 1.0 

用戶:

Customer_ID | product_No| Rating 
9   | 111  | 2.0 
9   | 222  | 5.0 

我不知道該怎麼做,而無需進行循環。你能幫助我,請

這是我的代碼至今:

import pandas as pd 
ratings = pd.read_csv("ratings.csv",names['Customer_ID','product_No','Rating']) 
active=ratings[ratings['UserID']==7] 
user=ratings[ratings['UserID']==9] 

回答

4

你可以先拿到普通product_No使用交集然後用isin方法對原始數據幀的過濾器:

common_product = set(active.product_No).intersection(user.product_No) 

common_product 
# {111, 222} 

active[active.product_No.isin(common_product)] 

#Customer_ID product_No Rating 
#0   7   111  3.0 
#1   7   222  1.0 

user[user.product_No.isin(common_product)] 

#Customer_ID product_No Rating 
#0   9   111  2.0 
#1   9   222  5.0 
0

這個我試過用INNER JOIN如下:

import pandas as pd 

df1 = pd.read_csv('a.csv') 
df2 = pd.read_csv('b.csv') 
print df1 
print df2 

df_ij = pd.merge(df1, df2, on='product_No', how='inner') 
print df_ij 

df_list = [] 
for df_e,suffx in zip([df1,df2],['_x','_y']): 
    df_e = df_ij[['Customer_ID'+suffx,'product_No','Rating'+suffx]] 
    df_e.columns = list(df1) 
    df_list.append(df_e) 

print df_list[0] 
print df_list[1] 

它給出以下輸出:

# print df1 
    Customer_ID product_No Rating 
0   7   111  3 
1   7   222  1 
2   7   333  5 
3   7   444  3 

# print df2 
    Customer_ID product_No Rating 
0   9   111  2 
1   9   222  5 
2   9   777  5 
3   9   555  3 

# print the INNER JOINed df 
    Customer_ID_x product_No Rating_x Customer_ID_y Rating_y 
0    7   111   3    9   2 
1    7   222   1    9   5 

# print the first df you want, with common 'product_No' 
    Customer_ID product_No Rating 
0   7   111  3 
1   7   222  1 

# print the second df you want, with common 'product_No' 
    Customer_ID product_No Rating 
0   9   111  2 
1   9   222  5 

inner join選擇在每個df公共行。由於有共同的列名稱,對於未在聯接中使用的列,聯接的df已添加後綴以區分這些列名稱。然後,只需指定適當的後綴,即可簡單地提取列以獲得所需的最終結果。

有一個很好的例子INNER JOINhere

1

使用query引用其他dataframes

Active.query('product_No in @User.product_No') 

    Customer_ID product_No Rating 
0   7   111  3.0 
1   7   222  1.0 

User.query('product_No in @Active.product_No') 

    Customer_ID product_No Rating 
0   9   111  2.0 
1   9   222  5.0 
0

您的這個問題的答案是....

import pandas as pd 
dict1={"Customer_id":[7,7,7,7], 
     "Product_No":[111,222,333,444], 
     "rating":[3.0,1.0,5.0,3.0]} 
active=pd.DataFrame(dict1) 
dict2={"Customer_id":[9,9,9,9], 
     "Product_No":[111,222,666,555], 
     "rating":[2.0,5.0,5.0,3.0]} 
user=pd.DataFrame(dict2) 
df3=pd.merge(active,user,on="Product_No",how="inner") 
df3 
active=df3[["Customer_id_x","Product_No","rating_x"]] 
print(active) 
user=df3[["Customer_id_y","Product_No","rating_y"]] 
print(user)