我在提取中文文本並將其寫入文件時遇到了問題。如何將中文文本寫入python中的文件
str = "全球緊張致富豪財富縮水 貝索斯丁磊分列跌幅前兩位";
f=open('test.txt','w');
f.write(str);
上面的代碼運行良好。同時在下面的代碼中寫入文件以顯示亂碼。
import requests;
from bs4 import BeautifulSoup
f=open('data.txt','w');
def techSinaCrawler():
url="http://tech.sina.com.cn/"
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
for li in soup.findAll('li',{'data-sudaclick': 'yaowenlist-1'}):
for link in li.findAll('a'):
href = link.get('href')
techSinaInsideLinkCrawler(href);
def techSinaInsideLinkCrawler(url):
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
for data in soup.findAll('h1',{'id': 'main_title'}):
str='main_title'+':'+ data.string
f.write(str);
f.write('\n');
techSinaCrawler();
感謝您的幫助
你使用什麼字符集? – Jay
使用UTF-8字符集的網站 –
[This](https://stackoverflow.com/questions/20205455/how-to-correctly-parse-utf-8-encoded-html-to-unicode-strings-with- beautifulsoup)和[this](https://stackoverflow.com/questions/7219361/python-and-beautifulsoup-encoding-issues)可能對處理BeautifulSoup編碼問題很有幫助。 – Ramon