2016-06-18 117 views
1

我有兩個數據文件a.csvb.csv可從引擎收錄獲得方式有4列和一些評論:一個合併兩個文件具有相同的「列名」和「不同行」用大熊貓在python

# coating file for detector A/R 
# column 1 is the angle of incidence (degrees) 
# column 2 is the wavelength (microns) 
# column 3 is the transmission probability 
# column 4 is the reflection probability 
14.2 531.0 0.0618 0.9382 
14.2 532.0 0.07905 0.92095 
14.2 533.0 0.09989 0.90011 
14.2 534.0 0.12324 0.87676 
14.2 535.0 0.14674 0.85326 
14.2 536.0 0.16745 0.83255 
14.2 537.0 0.1837 0.8163 
# 
# 171 lines, 5 comments, 166 data 

第二個文件b.csv有不同數量的行的一個共同的列兩列:

# Version 2.0 - nm, [email protected] to 1, burrows+2006c91.21_T1350_g4.7_f100_solar 
# Wavelength(nm) Flambda(ergs/cm^s/s/nm) 
300.0 1.53345164121e-32 
300.1 1.53345164121e-32 
300.2 1.53345164121e-32 

# total lines = 20003, comment lines = 2, data lines = 20001 

現在,我想合併這兩個文件與第二列公共(兩個文件中的波長應該是相同的)。

輸出看起來像:

# coating file for detector A/R 
# column 1 is the angle of incidence (degrees) 
# column 2 is the wavelength (microns) 
# column 3 is the transmission probability 
# column 4 is the reflection probability 
# Version 2.0 - nm, [email protected] to 1, burrows+2006c91.21_T1350_g4.7_f100_solar 
# Wavelength(nm) Flambda(ergs/cm^s/s/nm) 
14.2 531.0 0.0618 0.9382 1.14325276212 
14.2 532.0 0.07905 0.92095 1.14557732058 

注:的意見也被合併。
在文件b.csv中,波長是行號= 2313.

我們如何在python中這樣做?

我最初的嘗試是這樣的:

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 
# Author : Bhishan Poudel 
# Date  : Jun 17, 2016 


# Imports 
from __future__ import print_function 
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 


# read in dataframes 
#====================================================================== 
# read in a file 
# 
infile = 'a.csv' 
colnames = ['angle', 'wave','trans','refl'] 
print('{} {} {} {}'.format('\nreading file : ', infile, '','')) 
df1 = pd.read_csv(infile,sep='\s+', header = None,skiprows = 0, 
     comment='#',names=colnames,usecols=(0,1,2,3)) 

print('{} {} {} {}'.format('df.head \n', df1.head(),'','')) 
#------------------------------------------------------------------ 


#====================================================================== 
# read in a file 
# 
infile = 'b.csv' 
colnames = ['wave', 'flux'] 
print('{} {} {} {}'.format('\nreading file : ', infile, '','')) 
df2 = pd.read_csv(infile,sep='\s+', header = None,skiprows = 0, 
     comment='#',names=colnames,usecols=(0,1)) 
print('{} {} {} {}'.format('df.head \n', df2.head(),'','\n')) 
#---------------------------------------------------------------------- 


result = df1.append(df2, ignore_index=True) 
print(result.head()) 
print("\n") 

一些有用的鏈接如下:
How to merge data frame with same column names
http://pandas.pydata.org/pandas-docs/stable/merging.html

回答

2

如果您想將兩個數據集合並,你應該使用.merge()方法,而不是.append()

result = pd.merge(df1,df2,on='wave') 

前者連接兩個數據幀(類似於SQL連接),而後者則將兩個數據幀疊加在一起。

相關問題