解析目錄內的問題Python 2.7與3.2

我正在嘗試在Python 3的目錄中執行一些基本的文件解析。此代碼在Python 2.7中完美工作，但我無法弄清楚Python 3.2中出現了什麼問題。解析目錄內的問題Python 2.7與3.2

進口SYS，操作系統，重新

filelist = os.listdir('/Users/sbrown/Desktop/Test') 
os.chdir('/Users/sbrown/Desktop/Test') 
for file in filelist: 
    infile = open(file, mode='r') 
    filestring = infile.read() 
    infile.close() 
    pattern = re.compile('exit') 
    filestring = pattern.sub('so long', filestring) 
    outfile = open(file, mode='w') 
    outfile.write(filestring) 
    outfile.close 
exit

這是後仰的錯誤：

Traceback (most recent call last): 
    File "/Users/bunsen/Desktop/parser.py", line 9, in <module> 
     filestring = infile.read() 
    File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py", line 26, in decode 
     return codecs.ascii_decode(input, self.errors)[0] 
    UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 3131: ordinal not in range(128)`

我解析的文件都是文本文件。我試圖在utf-8的方法參數中指定編碼，但那不起作用。有任何想法嗎？提前致謝！

如果我指定的編碼爲UTF-8，這裏是拋出的錯誤：

Traceback (most recent call last): 
    File "/Users/sbrown/Desktop/parser.py", line 9, in <module> 
    filestring = infile.read() 
    File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py", line 26, in decode 
    return codecs.ascii_decode(input, self.errors)[0] 
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 3131: ordinal not in range(128)`

來源

2011-05-19 drbunsen

您在打開文件時未指定編碼。您需要在Python 3中執行此操作，因爲在Python 3中，文本模式文件將返回已解碼的Unicode字符串。

現在你用UTF-8進行了嘗試，而且沒有工作，所以很明顯，這不是使用的編碼。只有你知道它是什麼編碼，但我猜測這是cp1252，因爲0x80是代碼頁的字符爲€，所以當你有歐洲Windows用戶時，在0x80上失敗是常見的。 :-)

爲了與Python 2.7和3.1兼容，我建議您使用io庫來打開文件。這是在默認情況下在Python 3中使用的一個，它在Python 2.6的可用後來還有：

import io 
infile = io.open(filelist[0], mode='rt', encoding='cp1252')

來源

2011-05-20 06:43:45

測試

filelist = os.listdir('/Users/sbrown/Desktop/Test') 
infile = open(filelist[0], mode='r') 
print(infile.encoding)

，以確保您在utf-8閱讀您的文件。如果沒有，請檢查你是否沒有做過與codecs相關的惡作劇。你也可以發佈你的測試跟蹤強制utf-8？

來源

2011-05-19 23:13:43 Evpok

感謝您的幫助Evpok。默認編碼是US-ASCII。當我強制utf-8編碼到我的問題時，我還添加了錯誤信息。詛咒你Python 3！ – drbunsen 2011-05-19 23:25:16

哇，同樣的痕跡，有多奇怪！打印（infile.encoding）返回哪個編碼？ – Evpok 2011-05-20 06:43:15

感謝您的幫助！我把它與Lennart的幫助一起工作。 – drbunsen 2011-05-20 10:05:36

這是行不通的？

import codecs 
infile = codecs.open(filelist[0], encoding='UTF-8') 
infile.read()

來源

2011-05-19 23:49:04 linuts

解析目錄內的問題Python 2.7與3.2

回答

相關問題