2013-04-28 154 views
15

OK我已經在堆棧溢出上讀了幾個線程。我認爲這對我來說是相當容易的,但是我發現我對Python的掌握仍然不夠。我嘗試了位於How to combine 2 csv files with common column value, but both files have different number of lines的示例,這很有幫助,但我仍然沒有獲得希望達到的結果。合併2個csv文件

本質上我有2個csv文件與共同的第一列。我想合併2即

filea.csv

 
title,stage,jan,feb 
darn,3.001,0.421,0.532 
ok,2.829,1.036,0.751 
three,1.115,1.146,2.921 

fileb.csv

 
title,mar,apr,may,jun, 
darn,0.631,1.321,0.951,1.751 
ok,1.001,0.247,2.456,0.3216 
three,0.285,1.283,0.924,956 

output.csv(不是一個我得到,但我想要的東西)

 
title,stage,jan,feb,mar,apr,may,jun 
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751 
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216 
three,1.115,1.146,2.921,0.285,1.283,0.924,956 

output.csv(我實際得到的輸出)

 
title,feb,may 
ok,0.751,2.456 
three,2.921,0.924 
darn,0.532,0.951 

我試圖代碼:

''' 
testing merging of 2 csv files 
''' 
import csv 
import array 
import os 

with open('Z:\\Desktop\\test\\filea.csv') as f: 
    r = csv.reader(f, delimiter=',') 
    dict1 = {row[0]: row[3] for row in r} 

with open('Z:\\Desktop\\test\\fileb.csv') as f: 
    r = csv.reader(f, delimiter=',') 
    #dict2 = {row[0]: row[3] for row in r} 
    dict2 = {row[0:3] for row in r} 

print str(dict1) 
print str(dict2) 

keys = set(dict1.keys() + dict2.keys()) 
with open('Z:\\Desktop\\test\\output.csv', 'wb') as f: 
    w = csv.writer(f, delimiter=',') 
    w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] for key in keys]) 

任何幫助是極大的讚賞。

+0

你可以用一種簡單的方式描述你想要的東西嗎?也許可以這樣說:我想要合併月份列,從文件X中餘下列 – juanpastas 2013-04-28 17:53:29

回答

1

您需要的所有這些文件中的額外行的存儲在你的字典裏,不只是他們中的一個:

dict1 = {row[0]: row[1:] for row in r} 
... 
dict2 = {row[0]: row[1:] for row in r} 

然後,因爲在字典中的值列表,你只是需要拼接的列出了起來:

w.writerows([[key] + dict1.get(key, []) + dict2.get(key, []) for key in keys]) 
42

當我與csv文件時,我經常使用的pandas庫。它使這樣的事情變得非常簡單。例如:

import pandas as pd 

a = pd.read_csv("filea.csv") 
b = pd.read_csv("fileb.csv") 
b = b.dropna(axis=1) 
merged = a.merge(b, on='title') 
merged.to_csv("output.csv", index=False) 

一些解釋如下。首先,我們看在CSV文件:

>>> a = pd.read_csv("filea.csv") 
>>> b = pd.read_csv("fileb.csv") 
>>> a 
    title stage jan feb 
0 darn 3.001 0.421 0.532 
1  ok 2.829 1.036 0.751 
2 three 1.115 1.146 2.921 
>>> b 
    title mar apr may  jun Unnamed: 5 
0 darn 0.631 1.321 0.951 1.7510   NaN 
1  ok 1.001 0.247 2.456 0.3216   NaN 
2 three 0.285 1.283 0.924 956.0000   NaN 

,我們看到有數據的一個額外的列(注意,fileb.csv第一線 - title,mar,apr,may,jun, - 有一個額外的末處的逗號)。我們可以擺脫的,很容易不夠:

>>> b = b.dropna(axis=1) 
>>> b 
    title mar apr may  jun 
0 darn 0.631 1.321 0.951 1.7510 
1  ok 1.001 0.247 2.456 0.3216 
2 three 0.285 1.283 0.924 956.0000 

現在我們可以在標題欄合併ab

>>> merged = a.merge(b, on='title') 
>>> merged 
    title stage jan feb mar apr may  jun 
0 darn 3.001 0.421 0.532 0.631 1.321 0.951 1.7510 
1  ok 2.829 1.036 0.751 1.001 0.247 2.456 0.3216 
2 three 1.115 1.146 2.921 0.285 1.283 0.924 956.0000 

終於寫出了這一點:

>>> merged.to_csv("output.csv", index=False) 

生產:

title,stage,jan,feb,mar,apr,may,jun 
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751 
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216 
three,1.115,1.146,2.921,0.285,1.283,0.924,956.0 
+0

如何根據不同的名稱列來完成.merge?例如。就像列A上的表一樣,列B上的bTable也會合並。 – 2014-05-21 13:28:48

+2

@JorgeVidinha:如果你有一個新的問題,請開一個新的問題 - 如果你問這是一個一年前的問題的評論,沒有人會看到它。 – DSM 2014-05-21 13:30:57

+0

優雅的解決方案,甚至4年後。但是,不要使用'how =「all」';''.dropna()'方法。否則如果ANY單元爲空,它可能會丟棄列。 – WillardSolutions 2017-07-14 19:56:47