如何組合與公共列值2個的CSV文件，但兩者文件具有不同的行數

-1

file1.csv contains 2 columns: c11;c12 
file2.csv contains 2 columns: c21;c22 
Common column: c11, c21

實施例：如何組合與公共列值2個的CSV文件，但兩者文件具有不同的行數

f1.csv

a;text_a    
b;text_b    
f;text_f    
x;text_x

f2.csv

a;path_a 
c;path_c 
d;path_d 
k;path_k 
l;path_l 
m:path_m

輸出f1 + f2：

a;text_a;path_a 
b;text_b,'' 
c;'';path_c 
d;'';path_d 
f;text_f;'' 
k;'';path_k 
l;'';path_l 
m;'';path_m 
x;text_x;''

如何使用python實現它？

來源

2012-08-27 user1042891

什麼是你的工作這麼遠？ –

如果您只需要這個，請查看命令行'join'工具：http://linux.die.net/man/1/join – eumiro

感謝您的建議，但是一個示例如何使用join命令對於這種情況非常歡迎 – user1042891

這是相當容易與csv模塊完成：

import csv 

with open('file1.csv') as f: 
    r = csv.reader(f, delimiter=';') 
    dict1 = {row[0]: row[1] for row in r} 

with open('file2.csv') as f: 
    r = csv.reader(f, delimiter=';') 
    dict2 = {row[0]: row[1] for row in r} 

keys = set(dict1.keys() + dict2.keys()) 
with open('output.csv', 'wb') as f: 
    w = csv.writer(f, delimiter=';') 
    w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] 
       for key in keys])

來源

2012-08-27 11:15:41 BrtH

謝謝你的迴應。我還有一個問題。如果file2.csv有3列i.s.o 2列，其他條件相同。這對代碼有很大的影響嗎？ – user1042891

用於合併多個文件（甚至> 2）的基礎上的一個或多個公共列，最好的和有效的方法之一在python將使用「啤酒廠」。你甚至可以指定哪些字段需要考慮合併以及哪些字段需要保存。

import brewery 
from brewery 
import ds 
import sys 

sources = [ 
    {"file": "grants_2008.csv", 
    "fields": ["receiver", "amount", "date"]}, 
    {"file": "grants_2009.csv", 
    "fields": ["id", "receiver", "amount", "contract_number", "date"]}, 
    {"file": "grants_2010.csv", 
    "fields": ["receiver", "subject", "requested_amount", "amount", "date"]} 
]

創建所有的字段列表和數據records.Go通過源定義添加文件名存儲有關原產地信息，並收集領域：

for source in sources: 
    for field in source["fields"]: 
     if field not in all_fields: 

out = ds.CSVDataTarget("merged.csv") 
out.fields = brewery.FieldList(all_fields) 
out.initialize() 

for source in sources: 

    path = source["file"] 

# Initialize data source: skip reading of headers 
# use XLSDataSource for XLS files 
# We ignore the fields in the header, because we have set-up fields 
# previously. We need to skip the header row. 

    src = ds.CSVDataSource(path,read_header=False,skip_rows=1) 

    src.fields = ds.FieldList(source["fields"]) 

    src.initialize() 


    for record in src.records(): 

    # Add file reference into ouput - to know where the row comes from 
    record["file"] = path 

     out.append(record) 

# Close the source stream 

    src.finalize() 


cat merged.csv | brewery pipe pretty_printer

來源

2015-04-22 00:46:46

如何組合與公共列值2個的CSV文件，但兩者文件具有不同的行數

回答

相關問題