閱讀內容與Python

我是新來的Python和正在運行到讀。廣州文件的內容問題：閱讀內容與Python

我已經有了一個完整的，我已經加上.gz文件的文件夾使用私有API以編程方式提取。每個.gz文件的內容都是一個.xml文件，所以我需要遍歷該目錄並提取它們。

問題是，當我以編程方式將這些.gz文件解壓縮到它們各自的.xml版本時......文件創建時沒有錯誤，當我打開一個（使用TextWrangler）時，它看起來像一個普通的.xml文件，我用十六進制編輯器查看它。另外，當我以編程方式打開.xml文件並打印它的內容時，它顯示爲一堆（二進制？）混亂的文本。

考慮到上述情況，如果我手動提取其中一個文件（即：使用OSX，但不是Python），該文件可以在我希望的十六進制編輯器中查看。

這裏是我的代碼片段（適當的進口沒有顯示，但它們是水珠和gzip）：

searchpattern = siteid + "_" + resource + "_*.gz" 
for infile in glob.glob(workingDir + searchpattern): 
    print infile 

    #read the zipped contents (https://docs.python.org/2/library/gzip.html) 
    f = gzip.open(infile, 'rb') 
    file_content = f.read() 
    file_content = str(file_content) #This was an attempt to fix 
    print file_content # This shows a bunch of mumbo jumbo 

    #write the contents we just read to a new file (uncompressed) 
    newfilename = infile[0:-3] # the filename without the ".gz" 
    newfilename = newfilename + ".xml" 
    fnew = open(newfilename, 'w+b') 
    fnew.write(str(file_content)) 
    fnew.close() 

    #delete the .gz version of the file 
    #os.remove(infile)

來源

2015-02-09 Adam

因此，這在我看來是一個愚蠢的錯誤，但我會將此作爲其他人的後續行爲，讓我犯同樣的錯誤。

問題是我正在壓縮之前在我的程序中已經壓縮過的內容。所以考慮到這一點，我在這個線程上的代碼片段沒有任何問題。（技術上）我創建.gz文件的代碼也沒有。正如你可以看到下面。通常打開文件，而不是在程序中的早些時候使用gzip庫。

#Download and write the contents of each response to a .gz file 
    if limitCounter < limit or int(limit) == 0: 
     print _name + " " + scopeStartDate + " through " + scopeEndDate + " at " + href 
     file = api.get(href) 
     gz_file_content = file.content 
     #gz_file = gzip.open(workingDir + _name, "wb") # This breaks the program later 
     gz_file = open(workingDir + _name, 'wb') # This works. 
     gz_file.write(gz_file_content) 
     gz_file.close()

來源

2015-02-13 18:28:48 Adam

如果我跑這對XML我沒有得到與程序的任何問題。

如果我用這個程序壓縮和XML並將其提取出來，然後將這個程序的輸出與原始文件進行比較，我就沒有區別。

該程序不會添加額外的「.xml」擴展名。

來源

2015-02-10 13:12:05

閱讀內容與Python

回答

相關問題