對於一個典型的eBay搜索結果頁面,如this,我使用LXML提取每個結果的價格這樣的:LXML etree.parse.xpath()返回項目consiting只是標籤和換行符
import urllib2
from lxml import etree
url = "http://www.ebay.com/sch/i.html?rt=nc&LH_Complete=1&_nkw=Mizuno+Pants+Baseball&LH_Sold=1&_sacat=0&LH_BIN=1&_from=R40&_sop=3&LH_ItemCondition=1000"
response = urllib2.urlopen(url)
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
xpathselector="//span[@class ='bold bidsold']/text()"
tree.xpath(xpathselector)
雖然有搜索結果(因此價格),tree.xpath(xpathselector)返回一個長度爲列表,其中包含所有的價格,這些結果與網頁上的結果有所不同 - 這是由於我的地理位置)。爲什麼是這樣?
['\n\t\t\t\t\t',
'\n\t\t\t\t\t\t\t\t',
'\n\t\t\t\t\t',
u' 1\xc2\xa0049.27',
'\n\t\t\t\t\t',
' 965.31',
'\n\t\t\t\t\t',
'\n\t\t\t\t\t\t\t\t',
'\n\t\t\t\t\t',
' 883.56',
'\n\t\t\t\t\t',
'\n\t\t\t\t\t\t\t\t',
'\n\t\t\t\t\t',
'\n\t\t\t\t\t\t\t\t',
'\n\t\t\t\t\t',
'\n\t\t\t\t\t\t\t\t',
'\n\t\t\t\t\t',
' 827.21',
'\n\t\t\t\t\t',
' 827.21',
'\n\t\t\t\t\t',
' 827.21',
'\n\t\t\t\t\t',
' 827.21',
'\n\t\t\t\t\t',
' 800.97',
'\n\t\t\t\t\t',
' 799.59',
'\n\t\t\t\t\t',
'\n\t\t\t\t\t\t\t\t',
'\n\t\t\t\t\t',
'\n\t\t\t\t\t\t\t\t',
'\n\t\t\t\t\t',
' 716.73',
'\n\t\t\t\t\t',
' 716.73',
'\n\t\t\t\t\t',
' 716.73',
'\n\t\t\t\t\t',
' 690.22',
'\n\t\t\t\t\t',
' 662.60',
'\n\t\t\t\t\t',
' 662.60',
'\n\t\t\t\t\t',
' 635.25',
'\n\t\t\t\t\t',
' 606.25',
'\n\t\t\t\t\t',
' 606.25',
'\n\t\t\t\t\t',
' 552.39',
'\n\t\t\t\t\t',
' 552.39',
'\n\t\t\t\t\t',
' 552.39',
'\n\t\t\t\t\t',
' 552.39',
'\n\t\t\t\t\t',
'\n\t\t\t\t\t\t\t\t',
'\n\t\t\t\t\t',
'\n\t\t\t\t\t\t\t\t',
'\n\t\t\t\t\t',
'\n\t\t\t\t\t\t\t\t',
'\n\t\t\t\t\t',
'\n\t\t\t\t\t\t\t\t',
'\n\t\t\t\t\t',
' 551.01',
'\n\t\t\t\t\t',
' 551.01',
'\n\t\t\t\t\t',
' 517.59',
'\n\t\t\t\t\t',
' 497.16',
'\n\t\t\t\t\t',
' 496.88',
'\n\t\t\t\t\t',
' 496.88',
'\n\t\t\t\t\t',
' 496.60',
'\n\t\t\t\t\t',
'\n\t\t\t\t\t\t\t\t',
'\n\t\t\t\t\t',
' 469.26',
'\n\t\t\t\t\t',
'\n\t\t\t\t\t\t\t\t',
'\n\t\t\t\t\t',
' 468.15',
'\n\t\t\t\t\t',
' 414.30',
'\n\t\t\t\t\t',
' 414.02',
'\n\t\t\t\t\t',
' 414.02',
'\n\t\t\t\t\t',
' 414.02',
'\n\t\t\t\t\t',
' 414.02',
'\n\t\t\t\t\t',
' 386.68']
不知道的'正常化空間()'函數的。謝謝。 – Pyderman