2016-11-08 194 views
-1

我正在處理一個項目,我試圖讓lxml從分開的網頁上的單獨表中提取庫存數據。當我運行我的程序要打印我想要的值來拉我得到空方括號Xpath不返回值lxml Python

('Cash_and_short_term_investments:', []) 
('EPSNextYear:', []) 

下面來看看,順便我打電話這樣的:

#the url at this point is http://finviz.com/quote.ashx?t=RAIL confirmed with print statement 
    url = driver.current_url 
    page2 = requests.get(url) 
    tree2 = html.fromstring(page2.content) 
    EPSNextYear =    
    tree2.xpath('/html/body/table[3]/tr[1]/td/table/tr[7]/td/table/tr[2]/td[6]/b') 
    #Original XPath:/html/body/table[3]/tbody/tr[1]/td/table/tbody/tr[7]/td/table/tbody/tr[2]/td[6]/b 
    print ('EPSNextYear:', EPSNextYear) 

和:

#the url at this point is https://www.google.com/finance?q=NASDAQ%3ARAIL&fstype=ii&ei=hGwhWNHVPOW7iwLMiIfIDA I've confirmed this with a print 
url = driver.current_url 
page3 = requests.get(url) 
tree3 = html.fromstring(page3.content) 
Cash_and_Short_Term_Investments = tree3.xpath('//*[@id="fs-table"]/tr[3]/td[2]/text()') 
print('Cash_and_short_term_investments:', Cash_and_Short_Term_Investments) 

我已經從XPath中刪除了tbody,就像一些類似的問題所建議的一樣。任何幫助或建議將不勝感激,謝謝!

回答

0

當提問這樣的問題時,您需要提供一個簡短但完整的示例來說明問題。

看看你的第二個例子,顯然你正在使用的XPath表達式是不正確的。您錯過了XPath中的tbody元素。 (你可能會喜歡通過查找您正在搜索的實際字符串來選擇正確的錶行。)

考慮下面的代碼:

from lxml import etree 
import urllib 

url="http://www.google.com/finance?q=NASDAQ%3ARAIL&fstype=ii&ei=hGwhWNHVPOW7iwLMiIfIDA" 
parser = etree.HTMLParser() 
tree = etree.parse(urllib.urlopen(url), parser) 
result = tree.xpath('//*[@id="fs-table"]/tbody/tr[normalize-space(td) = "Cash and Short Term Investments"]') 
for x in result: print etree.tostring(x) 

運行此像這樣:

> python test.py 

你會得到以下輸出:

<tr> 
<td class="lft lm">Cash and Short Term Investments 
</td> 
<td class="r">39.78</td> 
<td class="r">78.45</td> 
<td class="r">91.21</td> 
<td class="r">110.02</td> 
<td class="r rm">125.01</td> 
</tr> 

<tr> 
<td class="lft lm">Cash and Short Term Investments 
</td> 
<td class="r">110.02</td> 
<td class="r">161.49</td> 
<td class="r">184.49</td> 
<td class="r rm">140.49</td> 
</tr> 

我相信你能弄清楚我是什麼你的第一個例子錯了,一旦你把它變成了一個獨立的問題複製者。

+0

這是一個很好的解決方案來獲取字符串,然後我用正則表達式來隔離數字。 – Marc