使用Python將單行.dat文件合併到一個.csv文件中

我是編程世界的初學者，想了解如何解決一個挑戰的一些提示。現在我有10〜000的.dat具有以下這種結構單行文件中的每個：使用Python將單行.dat文件合併到一個.csv文件中

Attribute1=Value&Attribute2=Value&Attribute3=Value...AttibuteN=Value

我一直在嘗試使用Python和CSV庫，這些.dat文件轉換成一個單一的.csv文件。

到目前爲止，我能夠編寫一些能夠讀取所有文件的內容，將每個文件的內容存儲在一個新行中，並將「&」替換爲「，」但由於Attribute1，Attribute2 ... AttributeN正好對於每個文件都是一樣的，我想將它們放入列標題中並將它們從其他所有行中刪除。

關於如何去做的任何提示？

謝謝！

來源

2015-10-31 brenogil

既然你是初學者，我準備了一些可行的代碼，同時也很容易理解。

我假設你已經把文件夾中的所有文件都稱爲「輸入」。下面的代碼應該位於文件夾旁邊的腳本文件中。

請記住，應該使用此代碼來了解如何解決這樣的問題。優化和完整性檢查已被故意排除。

您可能需要額外檢查什麼，當值缺少一些線情況，如果屬性缺失，有損壞的輸入等會發生什麼.. :)

好運會發生什麼！

import os 

# this function splits the attribute=value into two lists 
# the first list are all the attributes 
# the second list are all the values 
def getAttributesAndValues(line): 
    attributes = [] 
    values = [] 

    # first we split the input over the & 
    AtributeValues = line.split('&') 
    for attrVal in AtributeValues: 
     # we split the attribute=value over the '=' sign 
     # the left part goes to split[0], the value goes to split[1] 
     split = attrVal.split('=') 
     attributes.append(split[0]) 
     values.append(split[1]) 

    # return the attributes list and values list 
    return attributes,values 

# test the function using the line beneath so you understand how it works 
# line = "Attribute1=Value&Attribute2=Value&Attribute3=Vale&AttibuteN=Value" 
# print getAttributesAndValues(line) 

# this function writes a single file to an output file 
def writeToCsv(inFile='', wfile="outFile.csv", delim=","): 
    f_in = open(inFile, 'r') # only reading the file 
    f_out = open(wfile, 'ab+') # file is opened for reading and appending 

    # read the whole file line by line 
    lines = f_in.readlines() 

    # loop throug evert line in the file and write its values 
    for line in lines: 
     # let's check if the file is empty and write the headers then 
     first_char = f_out.read(1) 
     header, values = getAttributesAndValues(line) 

     # we write the header only if the file is empty 
     if not first_char: 
      for attribute in header: 
       f_out.write(attribute+delim) 
      f_out.write("\n") 

     # we write the values 
     for value in values: 
      f_out.write(value+delim) 
     f_out.write("\n") 

# Read all the files in the path (without dir pointer) 
allInputFiles = os.listdir('input/') 
allInputFiles = allInputFiles[1:] 

# loop through all the files and write values to the csv file 
for singleFile in allInputFiles: 
    writeToCsv('input/'+singleFile)

來源

2015-10-31 17:24:11 afabijan

非常感謝！正如你打算的那樣，這段代碼幫助我解決了我的問題，並給了我一點東西來學習。 – brenogil

歡迎您！ – afabijan

將dat文件放入名爲myDats的文件夾中。將此腳本放在myDats文件夾旁邊，並附帶一個名爲temp.txt的文件。您還需要您的output.csv。 [也就是說，你將有output.csv，myDats，並mergeDats.py在同一文件夾]

mergeDats.py

import csv 
import os 
g = open("temp.txt","w") 
for file in os.listdir('myDats'): 
    f = open("myDats/"+file,"r") 
    tempData = f.readlines()[0] 
    tempData = tempData.replace("&","\n") 
    g.write(tempData) 
    f.close() 
g.close() 
h = open("text.txt","r") 
arr = h.read().split("\n") 
dict = {} 
for x in arr: 
    temp2 = x.split("=") 
    dict[temp2[0]] = temp2[1] 
with open('output.csv','w' """use 'wb' in python 2.x""") as output: 
    w = csv.DictWriter(output,my_dict.keys()) 
    w.writeheader() 
    w.writerow(my_dict)

來源

2015-10-31 16:33:20 AMACB

謝謝！運行這個，我得到： 'IOError：[Errno 2]沒有這樣的文件或目錄：'1.dat'' – brenogil

應該修復它，再試一次 – AMACB

but since the Attribute1,Attribute2...AttributeN are exactly the same for every file, I would like to make them into column headers and remove them from every other line.

input = 'Attribute1=Value1&Attribute2=Value2&Attribute3=Value3'

一次的第一個文件：

','.join(k for (k,v) in map(lambda s: s.split('='), input.split('&')))

對於每個文件的內容：

','.join(v for (k,v) in map(lambda s: s.split('='), input.split('&')))

也許你需要修剪字符串，不知道你的輸入有多清潔。

來源

2015-10-31 16:40:05

好吧，這是一個有趣的方法！我會試試看，讓你知道會發生什麼。謝謝！ – brenogil

使用Python將單行.dat文件合併到一個.csv文件中

回答

相關問題