unicode的，而不是在蟒蛇

str的我是新來pyhton我試圖運行此代碼：unicode的，而不是在蟒蛇

with io.open(outfile, 'w') as processed_text, io.open(infile, 'r') as fin: 
    for line in fin: 
     processed_text.write(preprocess(line.rstrip())+'\n')

，但得到TypeError: must be unicode, not str

我該怎麼解決呢？我在這裏搜索了類似的問題，發現一個嘗試像

with io.open(outfile, 'w', encoding="utf-8") as processed_text, io.open(infile, 'r') as fin:

但沒有奏效。

來源

2016-12-02 user1

請用你的文件例子（outfile和infile）編輯你的問題 – Giordano

看起來像你的'preprocess'函數返回一個'str'而不是'unicode'。 – sirfz

我編輯了預處理帖子 – user1

Note：

由於該模塊已主要用於Python的3.x的，你必須要知道，本文件中的‘字節’的所有用途指STR類型（其中字節是別名），「文本」的所有用法指的是unicode類型。此外，這兩種類型在io API中不可互換。

In [1]: import io 

In [2]: def preprocess(s): 
    ...:  return bytes(s) 
    ...: 

In [3]: with io.open('tst1.out', 'w') as processed_text, io.open('tst1', 'r') as fin: 
    ...:  for line in fin: 
    ...:   try: 
    ...:    out_line = unicode(preprocess(line.rstrip() + '\n'), 'utf-8') 
    ...:   except TypeError: 
    ...:    out_line = preprocess(line.rstrip() + '\n') 
    ...:   processed_text.write(out_line)

來源

2016-12-02 23:20:43

感謝您的回覆，我試過你的通信，但問題仍然存在，請問我是否意味着如果我升級版本的python，這個錯誤將被解決 – user1

我的意思是'str'在python2中是'bytes'，在python3中它是'unicode' 。顯然你的'preprocess（）'對輸入做了一些事情，所以它變成'bytes'類型，因此就是錯誤。 –

我編輯了預處理後的帖子 – user1

嘗試在處理過的字符串前寫u，例如[u'blah']

來源

2016-12-02 23:09:19 WYCE

感謝您的回覆，請問我在哪裏可以寫？在我想要處理它的文件中？ – user1

嘗試把這個在你的文件的頂部：

from __future__ import unicode_literals

的Python 3.x的默認情況下使用Unicode。這將導致Python 2.x遵循相同的行爲。

如果您仍然有問題，你可以手動施放的問題串ALA

uni_string = unicode(my_string)

來源

2016-12-02 23:24:52 tvt173

謝謝，但非常不起作用 – user1

它在哪裏失敗？找到導致問題的輸入字符串，並將其轉換爲unicode，如下所示：unicode（mystring） – tvt173

是的，這是正確的 – user1

確保您使用io.open打開文件時寫unicode字符串。這樣的事情應該做的伎倆：

with io.open(outfile, 'w') as processed_text, io.open(infile, 'r') as fin: 
    for line in fin: 
     s = preprocess(line.rstrip()) 
     if isinstance(s, str): 
      s = s.decode('utf8') 
     processed_text.write(s + u'\n')

或修改preprocess以確保它返回一個字符串unicode。

來源

2016-12-02 23:40:53 sirfz

謝謝它的工作 – user1

unicode的，而不是在蟒蛇

回答

相關問題