2012-12-05 40 views
3

嗨我想創建一個新的CSV文件,根據共同的列或主鍵合併兩個CSV文件中的特定字段。我試過在PowerShell中做同樣的事情,它的工作原理,但完成這個過程非常緩慢 - 超過30分鐘用於合併5000多行文件,所以在Python中試用這個。我很新,所以請在我身上輕鬆一下。使用Python字典在Python中合併CSV文件

因此,兩個文件是infile.csv和checkfile.csv,創建的輸出文件中的列將基於infile.csv中的列。代碼檢查checkfile.csv中的值,創建outfile.csv,從infile.csv複製列,並需要根據checkfile.com中的相應值重寫兩個字段的值。以下是詳細信息

infile.csv -

"StockNumber","SKU","ChannelProfileID","CostPrice" 
"10m_s-vid#APTIIAMZ","2VV-10",3746,0.33 
"10m_s-vid#CSE","2VV-10",3746,0.98 
"1RR-01#CSE","1RR-01",3746 
"1RR-01#PCAWS","1RR-01",3746, 
"1m_s-vid_ext#APTIIAMZ","2VV-101",3746,0.42 

checkfile.csv

ProductCode, Description, Supplier, CostPrice, RRPPrice, Stock, Manufacturer, SupplierProductCode, ManuCode, LeadTime 
2VV-03,3MTR BLACK SVHS M - M GOLD CABLE - B/Q 100,Cables Direct Ltd,0.43,,930,CDL,2VV-03,2VV-03,1 
2VV-05,5MTR BLACK SVHS M - M GOLD CABLE - B/Q 100,Cables Direct Ltd,0.54,,1935,CDL,2VV-05,2VV-05,1 
2VV-10,10MTR BLACK SVHS M - M GOLD CABLE - B/Q 50,Cables Direct Ltd,0.86,,1991,CDL,2VV-10,2VV-10,1 

我得到的outfile.csv是 -

StockNumber,SKU,ChannelProfileID,CostPrice 
10m_s-vid#APTIIAMZ,2VV-10,"(' ',)", 
10m_s-vid#CSE,2VV-10,"(' ',)", 
1RR-01#CSE,1RR-01,"(' ',)", 
1RR-01#PCAWS,1RR-01,"(' ',)", 
1m_s-vid_ext#APTIIAMZ,2VV-101,"(' ',)", 

但outfile.csv我需要的是 -

StockNumber,SKU,ChannelProfileID,CostPrice 
10m_s-vid#APTIIAMZ,2VV-10,1991,0.86 
10m_s-vid#CSE,2VV-10,1991,0.86 
1RR-01#CSE,1RR-01 
1RR-01#PCAWS,1RR-01   
1m_s-vid_ext#APTIIAMZ,2VV-101 

最後的代碼 -

import csv 

with open('checkfile.csv', 'rb') as checkfile: 
    checkreader = csv.DictReader(checkfile) 

    product_result = dict(
     ((v['ProductCode'], v[' Stock']), (v['ProductCode'], v[' CostPrice'])) for v in checkreader 
    ) 

with open('infile.csv', 'rb') as infile: 
    with open('outfile.csv', 'wb') as outfile: 
     reader = csv.DictReader(infile) 

     writer = csv.DictWriter(outfile, reader.fieldnames) 
     writer.writeheader() 

     for item in reader: 
      result = product_result.get(item['SKU'], " ") 

      item['ChannelProfileID'] = result, 
      item['CostPrice'] = result 

      writer.writerow(item) 
+0

目前尚不清楚你的問題是什麼。目前還不清楚預期結果應該是什麼樣子。 – pillmuncher

+0

另外,你的infile頭文件定義了4個字段,但下面只有3個。 – pillmuncher

+0

好的,現在添加了期望的outfile.csv。正如你所看到的ChannelProfileID和CostPrice項目應該被填充,但它們不是。 – Anike

回答

3

你可以把它稍微簡單:

import csv 

with open('checkfile.csv', 'rb') as checkfile: 
    product_result = { 
     record['ProductCode']: record for record in csv.DictReader(checkfile)} 

with open('infile.csv', 'rb') as infile: 
    with open('outfile.csv', 'wb') as outfile: 
     reader = csv.DictReader(infile) 
     writer = csv.DictWriter(outfile, reader.fieldnames) 
     writer.writeheader() 
     for item in reader: 
      record = product_result.get(item['SKU'], None) 
      if record: 
       item['ChannelProfileID'] = record[' Stock'] # ??? 
       item['CostPrice'] = record[' CostPrice'] 
      else: 
       item['ChannelProfileID'] = None 
       item['CostPrice'] = None 
      writer.writerow(item) 

我不知道我與???註釋行。

此外,如果您確實想要生成損壞的CSV,請隨時省略else子句。

我用StringIO對象測試了它。它產生了你指定的結果,但是後面的逗號是checkfile中沒有匹配的地方。

我用Python 2.7 dict理解,因爲你用python-2.7標記了你的問題。

+0

謝謝!一旦我獲得了足夠的積分+1,我會! – Anike

1
import csv 

product_result = {} 

with open('checkfile.csv', 'rb') as checkfile: 
    checkreader = csv.DictReader(checkfile) 

    for v in checkreader: 
     product_result[v['ProductCode']] = (v[' Stock'], v[' CostPrice']) 

with open('infile.csv', 'rb') as infile: 
    with open('outfile.csv', 'wb') as outfile: 
     reader = csv.DictReader(infile) 
     writer = csv.DictWriter(outfile, reader.fieldnames) 
     writer.writeheader() 

     for item in reader: 
      result = product_result.get(item['SKU']) 
      if result: 
       item['ChannelProfileID'], item['CostPrice'] = result 
      else: 
       item['ChannelProfileID'] = item['CostPrice'] = None 

      writer.writerow(item) 
+0

感謝您的回覆 - 所以我將infile數據轉換爲元組。但是,如何將「股票」字段的字典值更新爲ChannelProfileID,然後在outfile.csv中將值CostPrice更新爲CostPrice? – Anike

+0

要繼續,是否像項目['ChannelProfileID'] = result ['Stock']基本上試圖將數據從Dictionary寫入到特定的CSV字段 – Anike

+0

結果是一個元組,因此您只能使用整數作爲其索引;我在這個例子中做的是序列拆包。 – Talvalin