LXML etree.parse.xpath（）返回項目consiting只是標籤和換行符

對於一個典型的eBay搜索結果頁面，如this，我使用LXML提取每個結果的價格這樣的：LXML etree.parse.xpath（）返回項目consiting只是標籤和換行符

import urllib2 
from lxml import etree 

url = "http://www.ebay.com/sch/i.html?rt=nc&LH_Complete=1&_nkw=Mizuno+Pants+Baseball&LH_Sold=1&_sacat=0&LH_BIN=1&_from=R40&_sop=3&LH_ItemCondition=1000" 
response = urllib2.urlopen(url) 
htmlparser = etree.HTMLParser() 
tree = etree.parse(response, htmlparser) 
xpathselector="//span[@class ='bold bidsold']/text()" 
tree.xpath(xpathselector)

雖然有搜索結果（因此價格），tree.xpath（xpathselector）返回一個長度爲列表，其中包含所有的價格，這些結果與網頁上的結果有所不同 - 這是由於我的地理位置）。爲什麼是這樣？

['\n\t\t\t\t\t', 
'\n\t\t\t\t\t\t\t\t', 
'\n\t\t\t\t\t', 
u' 1\xc2\xa0049.27', 
'\n\t\t\t\t\t', 
' 965.31', 
'\n\t\t\t\t\t', 
'\n\t\t\t\t\t\t\t\t', 
'\n\t\t\t\t\t', 
' 883.56', 
'\n\t\t\t\t\t', 
'\n\t\t\t\t\t\t\t\t', 
'\n\t\t\t\t\t', 
'\n\t\t\t\t\t\t\t\t', 
'\n\t\t\t\t\t', 
'\n\t\t\t\t\t\t\t\t', 
'\n\t\t\t\t\t', 
' 827.21', 
'\n\t\t\t\t\t', 
' 827.21', 
'\n\t\t\t\t\t', 
' 827.21', 
'\n\t\t\t\t\t', 
' 827.21', 
'\n\t\t\t\t\t', 
' 800.97', 
'\n\t\t\t\t\t', 
' 799.59', 
'\n\t\t\t\t\t', 
'\n\t\t\t\t\t\t\t\t', 
'\n\t\t\t\t\t', 
'\n\t\t\t\t\t\t\t\t', 
'\n\t\t\t\t\t', 
' 716.73', 
'\n\t\t\t\t\t', 
' 716.73', 
'\n\t\t\t\t\t', 
' 716.73', 
'\n\t\t\t\t\t', 
' 690.22', 
'\n\t\t\t\t\t', 
' 662.60', 
'\n\t\t\t\t\t', 
' 662.60', 
'\n\t\t\t\t\t', 
' 635.25', 
'\n\t\t\t\t\t', 
' 606.25', 
'\n\t\t\t\t\t', 
' 606.25', 
'\n\t\t\t\t\t', 
' 552.39', 
'\n\t\t\t\t\t', 
' 552.39', 
'\n\t\t\t\t\t', 
' 552.39', 
'\n\t\t\t\t\t', 
' 552.39', 
'\n\t\t\t\t\t', 
'\n\t\t\t\t\t\t\t\t', 
'\n\t\t\t\t\t', 
'\n\t\t\t\t\t\t\t\t', 
'\n\t\t\t\t\t', 
'\n\t\t\t\t\t\t\t\t', 
'\n\t\t\t\t\t', 
'\n\t\t\t\t\t\t\t\t', 
'\n\t\t\t\t\t', 
' 551.01', 
'\n\t\t\t\t\t', 
' 551.01', 
'\n\t\t\t\t\t', 
' 517.59', 
'\n\t\t\t\t\t', 
' 497.16', 
'\n\t\t\t\t\t', 
' 496.88', 
'\n\t\t\t\t\t', 
' 496.88', 
'\n\t\t\t\t\t', 
' 496.60', 
'\n\t\t\t\t\t', 
'\n\t\t\t\t\t\t\t\t', 
'\n\t\t\t\t\t', 
' 469.26', 
'\n\t\t\t\t\t', 
'\n\t\t\t\t\t\t\t\t', 
'\n\t\t\t\t\t', 
' 468.15', 
'\n\t\t\t\t\t', 
' 414.30', 
'\n\t\t\t\t\t', 
' 414.02', 
'\n\t\t\t\t\t', 
' 414.02', 
'\n\t\t\t\t\t', 
' 414.02', 
'\n\t\t\t\t\t', 
' 414.02', 
'\n\t\t\t\t\t', 
' 386.68']

來源

2015-10-09 Pyderman

的換行，並直接位於目標span內的其他空格也是文本節點，因此它得到由你的XPath span[...]/text()選擇器選擇。您可以使用XPath normalize-space()功能謂詞過濾掉，雖然空文本節點：

xpathselector="//span[@class ='bold bidsold']/text()[normalize-space()]"

輸出：

[ '506,533.33'， '506,000.00'， '466,000.00'， '399,333.33'， '399,333.33'，'399,333.33'，'399,333.33'，'399,333.33'，'386,666.67'，'386,000.00'，'346,000.00'，'346,000.00'，'346,000.00'，'333,200.00'，'333,200.00'，'333,066.67'，'319,866.67 ''，'319,866.67'，'306,666.67'，'293,066.67'，'292,666.67'，'292,666.67'，'266,666.67'，'266,666.67'，'266,666.67'，'266,666.67'，'266,533.33'，'266,533.33'，'266,533.33' '266,000.00'，'266,000 0.00' ， '253,200.00'， '249,866.67'， '240,000.00'， '239,866.67'， '239,866.67'， '239,733.33'， '226,533.33']

來源

2015-10-10 07:25:01 har07

不知道的'正常化空間（）'函數的。謝謝。 – Pyderman

LXML etree.parse.xpath（）返回項目consiting只是標籤和換行符

回答

相關問題