parse.unquote_plus TypeError

我試圖格式化文件，以便它可以插入到數據庫中，該文件最初是壓縮和大約1.3MB大。每一行看起來是這樣的：parse.unquote_plus TypeError

398,%7EAnoniem+001%7E,543,480,7525010,1775,0

這是怎樣的代碼看起來像這樣解析這個文件：

Village = gzip.open(Root+'\\data'+'\\' +str(Newest_Date[0])+'\\' +str(Newest_Date[1])+'\\' +str(Newest_Date[2])\ 
       +'\\'+str(Newest_Date[3])+' village.gz'); 
Village_Parsed = str 
for line in Village: 
    Village_Parsed = Village_Parsed + urllib.parse.unquote_plus(line); 
print(Village.readline());

當我運行程序我得到這個錯誤：

Village_Parsed = Village_Parsed + urllib.parse.unquote_plus(line); 
file "C:\Python31\lib\urllib\parse.py", line 404, in unquote_plus string = string.replace('+', ' ') TypeError: expected an object with the buffer interface

任何想法這裏有什麼不對？在此先感謝您的幫助:)

來源

2009-11-04 user202459

import gzip, os, urllib.parse 

archive_relpath = os.sep.join(map(str, Newest_Date[:4])) + ' village.gz' 
archive_path = os.path.join(Root, 'data', archive_relpath) 

with gzip.open(archive_path) as Village: 
    Village_Parsed = ''.join(urllib.parse.unquote_plus(line.decode('ascii')) 
          for line in Village) 
    print(Village_Parsed)

輸出：

 
398,~Anoniem 001~,543,480,7525010,1775,0

注：RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax說：

This specification does not mandate any particular character encoding for mapping between URI characters and the octets used to store or transmit those characters. When a URI appears in a protocol element, the character encoding is defined by that protocol; without such a definition, a URI is assumed to be in the same character encoding as the surrounding text.

在line.decode('ascii')片段因此'ascii'應該由任何字符編碼，你已經習慣你的文字編碼來代替。

來源

2009-11-04 10:39:05 jfs

@JFSebastian：你真的嘗試過嗎？我得到和OP一樣的錯誤...除了他的初始化問題，你的代碼看起來在功能上等同於他的返回字節對象。 – 2009-11-04 11:11:08

@John Machin：我試過了（現在）。我找不到'unquote_plus_from_bytes'，所以我們不得不求助於顯式的'bytes.decode'方法。 – jfs 2009-11-04 11:19:27

謝謝，您的解決方案效果很好，謝謝您指出我的其他錯誤（Machin和Sebestian）。我不確定ascii是否是使用過的字符編碼，但據我所知，它沒有任何問題。 – user202459 2009-11-08 05:40:11

問題1是urllib.unquote_plus不喜歡你餵它的line。該消息應該是「請提供一個STR對象」 :-)我建議你解決問題2以下，並插入：

print('line', type(line), repr(line))

後立即您for語句，這樣你可以看到你在line得到什麼。

你會發現，它返回字節對象：

>>> [line for line in gzip.open('test.gz')] 
[b'nudge nudge\n', b'wink wink\n']

使用的「R」的模式有很少的效果：

>>> [line for line in gzip.open('test.gz', 'r')] 
[b'nudge nudge\n', b'wink wink\n']

我建議，而不是傳遞line的分析例程你通過line.decode('UTF-8') ...或編寫gz文件時使用的任何編碼。

問題2是在這條線：

Village_Parsed = str

str是一種類型。你需要一個空str對象。爲了得到這一點，你可以調用即str()其類型在形式上是正確的，但不切實際/異常/ scoffable /怪異相比，使用字符串常量''時候......所以這樣做：

Village_Parsed = ''

你也有問題3 ：您的最後一條語句是在EOF之後嘗試讀取gz文件。

來源

2009-11-04 10:05:50

parse.unquote_plus TypeError

回答

相關問題