2017-08-08 80 views
0

我有一個名爲pinkH1_ppm.txt文件,它看起來像這樣之間的列:更新基於匹配列中的兩個dataframes(熊貓)

2.H8 7.61004 0.3 
1.H8 8.13712 0.3 
3.H6 7.53261 0.3 
4.H8 7.49932 0.3 
5.H6 7.72158 0.3 
7.H8 8.16859 0.3 
6.H6 7.70272 0.3 
9.H8 8.1053 0.3 
8.H6 7.65014 0.3 
10.H6 7.5231 0.3 
11.H6 7.58213 0.3 
12.H6 7.72805 0.3 
13.H6 8.02977 0.3 
14.H6 7.69624 0.3 
15.H8 7.82994 0.3 
17.H8 7.24899 0.3 
18.H6 7.6439 0.3 
20.H8 7.78512 0.3 
19.H8 7.65501 0.3 
22.H8 7.47677 0.3 
23.H6 7.7306 0.3 
24.H6 7.80104 0.3 
25.H8 7.67295 0.3 
26.H6 7.67463 0.3 
27.H6 7.64807 0.3 
1.H1' 5.8202 0.3 
2.H1' 5.90291 0.3 
4.H1' 5.74125 0.3 
3.H1' 5.54935 0.3 
6.H1' 5.54297 0.3 
8.H1' 5.238 0.3 
11.H1' 5.50093 0.3 
10.H1' 5.426 0.3 
14.H1' 5.96177 0.3 
15.H1' 5.959 0.3 
17.H1' 5.75214 0.3 
19.H1' 5.681 0.3 
22.H1' 5.523 0.3 
24.H1' 5.55313 0.3 
25.H1' 5.70819 0.3 
27.H1' 5.74236 0.3 
26.H1' 5.48061 0.3 

我有一個名爲pinkH2_ppm.txt另一個文件看起來像這樣:

5.H8 7.72158 0.3 
2.H8 7.70272 0.3 
7.H8 8.16859 0.3 
8.H6 7.65014 0.3 
9.H8 8.1053 0.3 
10.H6 7.5231 0.3 
12.H6 7.72805 0.3 
13.H6 8.02977 0.3 
14.H6 7.69624 0.3 
17.H8 7.24899 0.3 
16.H8 8.27957 0.3 
18.H6 7.6439 0.3 
19.H8 7.65501 0.3 
20.H8 7.78512 0.3 
21.H8 8.06057 0.3 
22.H8 7.47677 0.3 
23.H6 7.7306 0.3 
24.H6 7.80104 0.3 
5.H2' 4.2621 0.3 
7.H2' 4.54158 0.3 
9.H2' 4.50708 0.3 
12.H2' 3.76928 0.3 
13.H2' 4.67514 0.3 
16.H1' 4.52918 0.3 
18.H2' 4.71109 0.3 
20.H2' 4.63392 0.3 
21.H2' 4.65975 0.3 
23.H2' 4.27267 0.3 

我如何檢查是否pinkH1_ppm.txt的第一列的值等於pinkH2_ppm.txt的第一列值,如果它們相等,則與替換pinkH2_ppm.txt第二列的值pinkH1_ppm.txt中第二列的值?

例如,第一列和第pinkH1_ppm.txt行的條目匹配pinkH2_ppm.txt第一列和第二列的條目。由於2.H8是相同的,我想用pinkH2_ppm.txt替換7.70272與從pinkH1_ppm.txt 7.61004,但我不確定如何使用熊貓ix索引器做到這一點。

這是我的代碼:

import pandas as pd 
import os 
import sys 
import re 

filename = 'pinkH1_ppm.txt' 
ppmColor = 'pinkH2_ppm.txt' 


df = pd.read_csv(filename, sep = r'\s+', header=None) 
df=df.ix[:, [0,1]] 
color = pd.read_csv(ppmColor, sep = r'\s+', header=None, names = ('Atom','ppm','x')) 

df.set_index(0,inplace=True) 
color.set_index('Atom',inplace=True) 
color.update(df) 

color.to_csv(ppmColor,sep=" ", header = False) 
+0

我相信[this](https://stackoverflow.com/questions/19007383/compare-two-different-files-line-by-line-in-python)文章將闡明這個問題 – FussinHussin

+1

@FussinHussin謝謝,但是有一種方法可以使用熊貓來做到這一點? – user8358234

+0

@cᴏʟᴅsᴘᴇᴇᴅ我想從我的文件'pinkH1_ppm.txt'讀入,看看第一列中的任何值是否與我第二個文件'pinkH2_ppm.txt'第一列中的值匹配,如果是,那麼我想用'pinkH1_ppm.txt'中的第二列值替換'pinkH2_ppm.txt'中的第二列值作爲第一列中相同的值 – user8358234

回答

1
filename = 'pinkH1_ppm.txt' 
ppmColor = 'pinkH2_ppm.txt' 

df = pd.read_csv(filename, sep = r'\s+', header=None, names=('Atom','ppm', 'x')) 
color = pd.read_csv(ppmColor, sep = r'\s+', header=None, names=('Atom','ppm', 'x')) 

color = pd.merge(color, df.loc[:, ['Atom','ppm']], how='left', on='Atom') 

合併後,因爲有兩列具有相同的名稱 'PPM' 他們都更改爲 'ppm_x' 和 'ppm_y'

l = color[~color.loc[:,'ppm_y'].isnull()].index.tolist() 
color.loc[l,'ppm_x'] = color.loc[l,'ppm_y'] 
color.drop('ppm_y',axis =1,inplace=True) 
color.rename(index=str,columns={"ppm_x": "ppm"})