我想用Python抓取網站的內容。就像這樣:使用Python編寫數據抓取
Apple’s stock continued to dominate the news over the weekend, with Barron’s placing it on the top of its favorite 2013 stock list.
但隨着錯誤結果打印出來:
Apple âs stock continued to dominate the news over the weekend, with Barronâs placing it on the top of its favorite 2013 stock list.
符號 「'」 無法顯示,這裏是我的代碼:
#-*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
import urllib
from lxml import *
import urllib
import lxml.html as HTML
url = "http://www.forbes.com/sites/panosmourdoukoutas/2012/12/09/apple-tops-barrons- 10-favorite-stocks-for-2013/?partner=yahootix"
sock = urllib.urlopen(url)
htmlSource = sock.read()
sock.close()
root = HTML.document_fromstring(htmlSource)
contents = ' '.join([x.strip() for x in root.xpath("//div[@class='body']/descendant::text()")])
print contents
f = open('C:/Users/yinyao/Desktop/Python Code/data.txt','w')
f.write(contents)
f.close()
然而,設置之後,printf的功能就沒用了。爲什麼?我該怎麼做? 我使用的是Windows,默認的編碼方式是gbk。
你可以張貼在執行該刮的代碼? –
你是如何印製這份聲明的?請發佈您執行的確切命令以打印聲明。 Python中沒有printf函數,是嗎? – stackoverflowery
試試[Beautiful Soup](http://www.crummy.com/software/BeautifulSoup/) –