2012-08-27 75 views
-1
file1.csv contains 2 columns: c11;c12 
file2.csv contains 2 columns: c21;c22 
Common column: c11, c21 

實施例:如何組合與公共列值2個的CSV文件,但兩者文件具有不同的行數

f1.csv

a;text_a    
b;text_b    
f;text_f    
x;text_x 

f2.csv

a;path_a 
c;path_c 
d;path_d 
k;path_k 
l;path_l 
m:path_m 

輸出f1 + f2:

a;text_a;path_a 
b;text_b,'' 
c;'';path_c 
d;'';path_d 
f;text_f;'' 
k;'';path_k 
l;'';path_l 
m;'';path_m 
x;text_x;'' 

如何使用python實現它?

+2

什麼是你的工作這麼遠? –

+0

如果您只需要這個,請查看命令行'join'工具:http://linux.die.net/man/1/join – eumiro

+0

感謝您的建議,但是一個示例如何使用join命令對於這種情況非常歡迎 – user1042891

回答

3

這是相當容易與csv模塊完成:

import csv 

with open('file1.csv') as f: 
    r = csv.reader(f, delimiter=';') 
    dict1 = {row[0]: row[1] for row in r} 

with open('file2.csv') as f: 
    r = csv.reader(f, delimiter=';') 
    dict2 = {row[0]: row[1] for row in r} 

keys = set(dict1.keys() + dict2.keys()) 
with open('output.csv', 'wb') as f: 
    w = csv.writer(f, delimiter=';') 
    w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] 
       for key in keys]) 
+0

謝謝你的迴應。我還有一個問題。如果file2.csv有3列i.s.o 2列,其他條件相同。這對代碼有很大的影響嗎? – user1042891

0

用於合併多個文件(甚至> 2)的基礎上的一個或多個公共列,最好的和有效的方法之一在python將使用「啤酒廠」。你甚至可以指定哪些字段需要考慮合併以及哪些字段需要保存。

import brewery 
from brewery 
import ds 
import sys 

sources = [ 
    {"file": "grants_2008.csv", 
    "fields": ["receiver", "amount", "date"]}, 
    {"file": "grants_2009.csv", 
    "fields": ["id", "receiver", "amount", "contract_number", "date"]}, 
    {"file": "grants_2010.csv", 
    "fields": ["receiver", "subject", "requested_amount", "amount", "date"]} 
] 

創建所有的字段列表和數據records.Go通過源定義添加文件名存儲有關原產地信息,並收集領域:

for source in sources: 
    for field in source["fields"]: 
     if field not in all_fields: 

out = ds.CSVDataTarget("merged.csv") 
out.fields = brewery.FieldList(all_fields) 
out.initialize() 

for source in sources: 

    path = source["file"] 

# Initialize data source: skip reading of headers 
# use XLSDataSource for XLS files 
# We ignore the fields in the header, because we have set-up fields 
# previously. We need to skip the header row. 

    src = ds.CSVDataSource(path,read_header=False,skip_rows=1) 

    src.fields = ds.FieldList(source["fields"]) 

    src.initialize() 


    for record in src.records(): 

    # Add file reference into ouput - to know where the row comes from 
    record["file"] = path 

     out.append(record) 

# Close the source stream 

    src.finalize() 


cat merged.csv | brewery pipe pretty_printer 
相關問題