數百兆字節沒有那麼多。爲什麼使用不是去一個簡單的方法的csv
module和collections.defaultdict
:
import csv
from collections import defaultdict
result = defaultdict(dict)
fieldnames = {"ID"}
for csvfile in ("file1.csv", "file2.csv", "file3.csv"):
with open(csvfile, newline="") as infile:
reader = csv.DictReader(infile)
for row in reader:
id = row.pop("ID")
for key in row:
fieldnames.add(key) # wasteful, but I don't care enough
result[id][key] = row[key]
產生的defaultdict
看起來是這樣的:
>>> result
defaultdict(<type 'dict'>,
{'001': {'SALARY': '25', 'SCHOOLS_ATTENDED': 'my Nice School', 'NAME': 'Jhon'},
'002': {'SALARY': '40', 'SCHOOLS_ATTENDED': 'His lovely school', 'NAME': 'Doe'}})
然後,您可以合併到這一個CSV文件(不是我最漂亮的工作,但好夠了):
with open("out.csv", "w", newline="") as outfile:
writer = csv.DictWriter(outfile, sorted(fieldnames))
writer.writeheader()
for item in result:
result[item]["ID"] = item
writer.writerow(result[item])
out.csv
則包含
ID,NAME,SALARY,SCHOOLS_ATTENDED
001,Jhon,25,my Nice School
002,Doe,40,His lovely school
爲什麼有人會downvote呢??? – Volatil3
也許是因爲它表明你缺乏研究工作?不過,這不是我。 –
我認爲這是一個重複的問題。在開新問題之前,你應該總是搜索它。順便說一句,這不是我!http://stackoverflow.com/questions/17586573/python-combing-data-from-different-csv-files-into-one/17588521#17588521 –