如何將中文文本寫入python中的文件

我在提取中文文本並將其寫入文件時遇到了問題。如何將中文文本寫入python中的文件

str = "全球緊張致富豪財富縮水 貝索斯丁磊分列跌幅前兩位"; 
f=open('test.txt','w'); 
f.write(str);

上面的代碼運行良好。同時在下面的代碼中寫入文件以顯示亂碼。

import requests; 
from bs4 import BeautifulSoup 

f=open('data.txt','w'); 

def techSinaCrawler(): 
    url="http://tech.sina.com.cn/" 
    source_code = requests.get(url) 
    plain_text = source_code.text 
    soup = BeautifulSoup(plain_text, "html.parser") 
    for li in soup.findAll('li',{'data-sudaclick': 'yaowenlist-1'}): 
     for link in li.findAll('a'): 
      href = link.get('href') 
      techSinaInsideLinkCrawler(href);    

def techSinaInsideLinkCrawler(url): 

    source_code = requests.get(url) 
    plain_text = source_code.text 
    soup = BeautifulSoup(plain_text, "html.parser") 
    for data in soup.findAll('h1',{'id': 'main_title'}): 
     str='main_title'+':'+ data.string 
     f.write(str); 
     f.write('\n'); 

techSinaCrawler();

感謝您的幫助

來源

2017-08-11 Zain Danish

你使用什麼字符集？ – Jay

使用UTF-8字符集的網站 –

[This]（https://stackoverflow.com/questions/20205455/how-to-correctly-parse-utf-8-encoded-html-to-unicode-strings-with- beautifulsoup）和[this]（https://stackoverflow.com/questions/7219361/python-and-beautifulsoup-encoding-issues）可能對處理BeautifulSoup編碼問題很有幫助。 – Ramon

解決..

只是改變了的.text到。內容

plain_text = source_code.text to plain_text = source_code.content

得到輸出作爲中國文字。

得到想要的結果

來源

2017-08-11 19:13:06

在Python 2，這是一個好主意，用codecs.open（）如果你處理非ASCII編碼。這樣，您不需要手動編碼您編寫的所有內容。此外，os.walk（）應該如果你在文件名預計非ASCII字符傳遞一個Unicode字符串：

import codecs 
with codecs.open("c:/Users/me/filename.txt", "a", encoding="utf-8") as d: 
    for dir, subdirs, files in os.walk(u"c:/temp"): 
     for f in files: 
     fname = os.path.join(dir, f) 
     print fname 
     d.write(fname + "\n")

不需要調用d.close（）時，有塊已需要照顧那。

來源

2017-08-11 17:24:40

如何將中文文本寫入python中的文件

回答

相關問題