2016-04-09 145 views
0

我有3個熊貓數據框(類似於下圖)。我有2所列出list ID_1 = ['sdf', 'sdfsdf', ...]list ID_2 = ['kjdf', 'kldfjs', ...]如何從不同的熊貓數據框中選取多列

Table1: 
    ID_1 ID_2 Value 
0 PUFPaY9 NdYWqAJ 0.002 
1 Iu6AxdB qANhGcw 0.01 
2 auESFwW jUEUNdw 0.2345 
3 LWbYpca G3uZ_Rg 0.0835 
4 8fApIAM mVHrayg 0.0295 

Table2: 
    ID_1 weight1 weight2 .....weightN 
0 PUFPaY9  
1 Iu6AxdB  
2 auESFwW 
3 LWbYpca  

Table3: 
    ID_2 weight1 weight2 .....weightN 
0 PUFPaY9  
1 Iu6AxdB  
2 auESFwW  
3 LWbYpca  

我想有應等來計算一個數據幀,

for each x ID_1 in list1: 
    for each y ID_2 in list2: 
     if x-y exist in Table1: 
      temp_row = (x[weights[i]].* y[weights[i]]) 
      # here i want one to one multiplication, x[weight1]*y[weight1] , x[weight2]*y[weight2] 
      temp_row.append(value[x-y] in Table1) 
      new_dataframe.append(temp_row) 

return new_dataframe 

所需new_dataframe應該像表4:

Table4: 
     weight1 weight2 weight3 .....weightN value 
    0   
    1   
    2  
    3  

我我現在能夠做的是:

new_df = df[(df.ID_1.isin(list1)) & (df.ID_2.isin(list2))] 使用這個我得到所有有效的ID_1ID_2組合和值。但我不知道,我怎麼能從兩個數據庫中獲得權重的乘法(每個weight[i]沒有循環)?

現在的任務是比較容易的,我可以遍歷new_dffor each row in new_df,我會找到weight[i to n] for ID_1 from table 2weight[i to n] for ID_2 from table3。然後我可以將one-one multiplication"value" from table1附加到新的FINAL_DF。但我不想循環和做,我們可以用更聰明的方式解決這個問題嗎?

+0

在問題已更新。我不確定我們是否有不使用循環的選項。 – impossible

+0

請檢查我的答案 – MaxU

回答

0

是你想要的嗎?

data = """\ 
ID_1 
PUFPaY9  
aaaaaaa 
Iu6AxdB  
auESFwW 
LWbYpca 
""" 
id1 = pd.read_csv(io.StringIO(data), delim_whitespace=True) 

data = """\ 
ID_2 
PUFPaY9 
Iu6AxdB 
xxxxxxx 
auESFwW 
LWbYpca 
""" 
id2 = pd.read_csv(io.StringIO(data), delim_whitespace=True) 

cols = ['weight{}'.format(i) for i in range(1,5)] 
for c in cols: 
    id1[c] = np.random.randint(1, 10, len(id1)) 
    id2[c] = np.random.randint(1, 10, len(id2)) 

id1.set_index('ID_1', inplace=True) 
id2.set_index('ID_2', inplace=True) 

df_mul = id1 * id2 

一步一步:

In [215]: id1 
Out[215]: 
     weight1 weight2 weight3 weight4 
ID_1 
PUFPaY9  8  9  1  1 
aaaaaaa  6  1  9  2 
Iu6AxdB  8  4  8  5 
auESFwW  9  3  4  2 
LWbYpca  7  7  1  8 

In [216]: id2 
Out[216]: 
     weight1 weight2 weight3 weight4 
ID_2 
PUFPaY9  6  5  5  1 
Iu6AxdB  1  5  4  5 
xxxxxxx  1  2  6  4 
auESFwW  3  9  5  5 
LWbYpca  3  3  6  7 

In [217]: id1 * id2 
Out[217]: 
     weight1 weight2 weight3 weight4 
Iu6AxdB  8.0  20.0  32.0  25.0 
LWbYpca  21.0  21.0  6.0  56.0 
PUFPaY9  48.0  45.0  5.0  1.0 
aaaaaaa  NaN  NaN  NaN  NaN 
auESFwW  27.0  27.0  20.0  10.0 
xxxxxxx  NaN  NaN  NaN  NaN