2016-05-12 31 views
0

我有一個數據幀,看起來像這樣:在熊貓中查找等值的行嗎?

df = pd.DataFrame([ 
    {'code': '0101010C0AAAAAA', 'chemical': '0101010C0', 'is_generic': True, 'format': 'AAAA'}, 
    {'code': '0101010C0BBAAAA', 'chemical': '0101010C0', 'is_generic': False, 'format': 'AAAA'}, 
    {'code': '0101010F0AAAUAU', 'chemical': '0101010F0', 'is_generic': True, 'format': 'AUAU'}, 
    {'code': '0101010F0BCAAAU', 'chemical': '0101010F0', 'is_generic': False, 'format': 'AAAU'}, 
    {'code': '0101010G0AAABAB', 'chemical': '0101010G0', 'is_generic': False, 'format': 'ABAB'} 
]) 
        code chemical is_generic format 
0  0101010C0AAAAAA 0101010C0  True AAAA 
1  0101010C0BBAAAA 0101010C0  False AAAA 
2  0101010F0AAAUAU 0101010F0  True AUAU 
3  0101010F0BCAAAU 0101010F0  False AAAU 
4  0101010G0AAABAB 0101010G0  False ABAB 

我想創建一個新的數據框與一排,其中is_generic是假的每個代碼。然後,我想增加一列,每一個代碼,是具有相同化學和格式的代碼,但它is_generic爲真:

  code  generic_equiv 
0101010C0BBAAAA  0101010C0AAAAAA 
0101010F0BCAAAU  0101010F0AAAUAU 
0101010G0AAABAB  None 

我知道如何與一排各獲得數據幀代碼中is_generic爲False:

df1 = df[df['is_generic'] == True] 

我想我想做一個條件合併與DF,但如何做到這一點?

+0

有沒有保證是最多一個通用=真行匹配任何特定的通用= False行?或者對於一個非泛型可能會有多種泛型?如果可能有多個,輸出結果如何? –

+0

@JohnZwinck謝謝!是的,最多隻有一個。 – Richard

回答

3

這裏......

df = pd.DataFrame([ 
    {'code': '0101010C0AAAAAA', 'chemical': '0101010C0', 'is_generic': True, 'format': 'AAAA'}, 
    {'code': '0101010C0BBAAAA', 'chemical': '0101010C0', 'is_generic': False, 'format': 'AAAA'}, 
    {'code': '0101010F0AAAUAU', 'chemical': '0101010F0', 'is_generic': True, 'format': 'AUAU'}, 
    {'code': '0101010F0BCAAAU', 'chemical': '0101010F0', 'is_generic': False, 'format': 'AAAU'}, 
    {'code': '0101010G0AAABAB', 'chemical': '0101010G0', 'is_generic': False, 'format': 'ABAB'} 
]) 

groups = df.groupby('is_generic') 
pd.merge(groups.get_group(False), groups.get_group(True), on='chemical', how='left') 

輸出...

chemical   code_x format_x is_generic_x   code_y format_y \ 
0 0101010C0 0101010C0BBAAAA  AAAA  False 0101010C0AAAAAA  AAAA 
1 0101010F0 0101010F0BCAAAU  AAAU  False 0101010F0AAAUAU  AUAU 
2 0101010G0 0101010G0AAABAB  ABAB  False    NaN  NaN 

    is_generic_y 
0   True 
1   True 
2   NaN 

子集/爲你希望重命名列。

+0

謝謝 - 這似乎工作!我做了'on'['chemical','format']'以匹配兩列。 – Richard

0

建立一個新的數據框只假都存在那裏併合並2個新dataframes seperatly

df1 = df[df['is_generic'] == True] 
df2 = df[df['is_generic'] == False] 
df3 = pd.merge(df1[['chemical','code']],df2[['chemical','code']],left_on='chemical',right_on='chemical',how='right') 
del df3['chemical'] 
df3.rename(columns={'code_x':'generic_equiv','code_y':'code'},inplace=True) 

輸出:

generic_equiv    code 
0 0101010C0AAAAAA 0101010C0BBAAAA 
1 0101010F0AAAUAU 0101010F0BCAAAU 
2    NaN 0101010G0AAABAB 
+0

謝謝,但我需要匹配化學和格式... – Richard