爲什麼我必須做`sys.stdin = codecs.getreader（sys.stdin.encoding）（sys.stdin）`？

我正在寫一個python程序，其中大寫所有輸入（替換爲非工作的tr '[:lowers:]' '[:upper:]'）。語言環境是ru_RU.UTF-8，我使用PYTHONIOENCODING=UTF-8來設置STDIN/STDOUT編碼。這正確設置sys.stdin.encoding。 那麼，如果sys.stdin已經知道編碼，爲什麼我仍然需要明確地創建一個解碼包裝？如果我沒有創建包裝閱讀器，則.upper()函數無法正常工作（對於非ASCII字符沒有任何作用）。爲什麼我必須做`sys.stdin = codecs.getreader（sys.stdin.encoding）（sys.stdin）`？

import sys, codecs 
sys.stdin = codecs.getreader(sys.stdin.encoding)(sys.stdin) #Why do I need this? 
for line in sys.stdin: 
    sys.stdout.write(line.upper())

爲什麼stdin有.encoding如果不使用它呢？

來源

2013-04-03 Ark-kun

什麼python版本？嘗試使用'line.decode（your_encoding）.upper（）' – JBernardo

簡短回答：因爲您使用的是過時的Python版本，其中包含歷史包袱。 – phihag

@JBernardo Python版本是2.7.3（在FreeBSD 9下） 'line.decode（sys.stdin.encoding）.upper（）'當然會工作。但我的問題是爲什麼我們需要這一切？ –

要回答「爲什麼」，我們需要了解Python 2.x的內置file類型，file.encoding及其關係。

內置的file對象處理原始字節---始終讀取和寫入原始字節。

encoding屬性描述了流中原始字節的編碼。此屬性可能存在也可能不存在，甚至可能不可靠（例如，在標準流的情況下，我們錯誤地設置了PYTHONIOENCODING）。

file對象執行任何自動轉換的唯一時間是將unicode對象寫入該流。在這種情況下，它將使用file.encoding（如果可用）執行轉換。

在讀取數據的情況下，文件對象不會進行任何轉換，因爲它返回原始字節。在這種情況下，encoding屬性是用戶手動執行轉換的提示。

file.encoding在你的情況設置，因爲你設置的PYTHONIOENCODING變量和sys.stdin的encoding屬性被相應的設置。爲了獲得文本流，我們必須手動包裝它，就像您在示例代碼中完成的一樣。想想另一種方式，假設我們沒有單獨的文本類型（例如Python 2.x的unicode或Python 3的str）。我們仍然可以使用原始字節處理文本，但會跟蹤所使用的編碼。這是如何使用file.encoding（用於跟蹤編碼）。我們自動創建的讀者包裝會爲我們做跟蹤和轉換。

當然，自動換行sys.stdin會更好（這就是Python 3.x所做的），但在Python 2.x中更改sys.stdin的默認行爲將打破向後兼容性。

以下是sys.stdin在Python 2.x和3.x的比較：

# Python 2.7.4 
>>> import sys 
>>> type(sys.stdin) 
<type 'file'> 
>>> sys.stdin.encoding 
'UTF-8' 
>>> w = sys.stdin.readline() 
## ... type stuff - enter 
>>> type(w) 
<type 'str'>   # In Python 2.x str is just raw bytes 
>>> import locale 
>>> locale.getdefaultlocale() 
('en_US', 'UTF-8')

的io.TextIOWrapper class是因爲Python 2.6的標準庫的一部分。該類有一個encoding屬性，該屬性用於將原始字節轉換爲Unicode。

# Python 3.3.1 
>>> import sys 
>>> type(sys.stdin) 
<class '_io.TextIOWrapper'> 
>>> sys.stdin.encoding 
'UTF-8' 
>>> w = sys.stdin.readline() 
## ... type stuff - enter 
>>> type(w) 
<class 'str'>  # In Python 3.x str is Unicode 
>>> import locale 
>>> locale.getdefaultlocale() 
('en_US', 'UTF-8')

的buffer屬性提供訪問原始字節流背襯stdin;這通常是BufferedReader。下面注意它的確如此不是有一個encoding屬性。

# Python 3.3.1 again 
>>> type(sys.stdin.buffer) 
<class '_io.BufferedReader'> 
>>> w = sys.stdin.buffer.readline() 
## ... type stuff - enter 
>>> type(w) 
<class 'bytes'>  # bytes is (kind of) equivalent to Python 2 str 
>>> sys.stdin.buffer.encoding 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
AttributeError: '_io.BufferedReader' object has no attribute 'encoding'

在Python 3的encoding屬性的存在或不存在與所使用的流的類型是一致的。

來源

2013-07-06 17:21:38 finiteint

感謝您的答案。它證實了我的信念，即「Python 2」的設計很糟糕。對象（除非它們是簡單的僅用於數據存儲的結構）不應包含它們不使用的數據。 'file'類不應該使用其'.encoding'屬性或將其刪除。這個設計缺陷在'Python 3'中被修復。 .Net以同樣的方式處理：有基於字節的'Stream's（你只能寫字節給他們）和編碼感知的'TextReader' /'TextWriter'派生的包裝類。您可以使用StreamReader.BaseStream來訪問底層字節。 'Console.In'是一個'TextReader'（編碼感知）。 –

很高興知道'Python 3'似乎也擺脫了非Unicode字符串（雖然還有像BufferedReader.readline（）這樣的東西）。 –

爲什麼我必須做`sys.stdin = codecs.getreader（sys.stdin.encoding）（sys.stdin）`？

回答

相關問題