使用Python從html表中提取數據

我想解析來自網站的數據。例如，我試圖從中提取數據的SRC代碼部分看起來像這樣。使用Python從html表中提取數據

<table summary="Customer Pending and Vendor Pending Table"> 
    <tr> 
     <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort=Level&Escalationorder=0#Escalation" class="headlink"> 
    <img src="/images/rat/up_selected.png" width="11" height="9" border="0" alt="up">Risk   </a> </th> 
     <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort=AvgLastUpd&Escalationorder=1#Escalation" class="headlink"> 
    Avg Last Updated   </a> </th> 
     <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort=AvgDaysOpen&Escalationorder=1#Escalation" class="headlink"> 
    Avg Days Open   </a> </th> 
     <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort=Srs&Escalationorder=1#Escalation" class="headlink"> 
    # of Cases   </a> </th> 
     <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort_pct=1&Escalationorder=1#Escalation" class="headlink">% of Total Cases</a> </th> 
    </tr> 
     <tr > 
    <td><a href="/snapshot.php?statusrisk=2&wrkgrp=Somedata&function=statusrisk&statuses=CustomerPending"><img src="/images/rat/severity_2.gif" alt="Very High Risk" title="Very High Risk" border="0"></a></td> 
    <td> 8.0</td> 
    <td> 69.0</td> 
    <td>1</td> 
    <td> 3.1</td> 
    </tr>

我需要從上表中提取值8.0,69.0和3.1。我的Python代碼看起來像這樣。

from lxml import html 
import requests 

page = requests.get('http://rat-sucker.abc.com/team.php?wrkgrp=somedata') 
tree = html.fromstring(page.text) 
Stats = tree.xpath(//*[@id="leftrat"]/table[1]/tbody/tr[2]/td[2]) 

print 'Stats: ', Stats

我已經檢查使用多種方法和Xcode的模擬器我的Xpath，這是正確的（如果你對上面的部分代碼則可能無法正常運行），但運行我的Python腳本時，它不產生任何輸出。

[根@測試平臺testhost]＃蟒蛇scrapper.py 統計

[根@測試平臺testhost]＃

來源

2015-02-10 user1659329

'http：//rat-sucker.abc.com/team.php？wrkgrp = somedata'不會導致任何地方。你可以添加實際的網址嗎？ – 2015-02-10 14:38:08

你可以使用BeautifulSoup parser。

>>> s = '''<table summary="Customer Pending and Vendor Pending Table"> 
    <tr> 
     <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort=Level&Escalationorder=0#Escalation" class="headlink"> 
    <img src="/images/rat/up_selected.png" width="11" height="9" border="0" alt="up">Risk   </a> </th> 
     <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort=AvgLastUpd&Escalationorder=1#Escalation" class="headlink"> 
    Avg Last Updated   </a> </th> 
     <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort=AvgDaysOpen&Escalationorder=1#Escalation" class="headlink"> 
    Avg Days Open   </a> </th> 
     <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort=Srs&Escalationorder=1#Escalation" class="headlink"> 
    # of Cases   </a> </th> 
     <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort_pct=1&Escalationorder=1#Escalation" class="headlink">% of Total Cases</a> </th> 
    </tr> 
     <tr > 
    <td><a href="/snapshot.php?statusrisk=2&wrkgrp=Somedata&function=statusrisk&statuses=CustomerPending"><img src="/images/rat/severity_2.gif" alt="Very High Risk" title="Very High Risk" border="0"></a></td> 
    <td> 8.0</td> 
    <td> 69.0</td> 
    <td>1</td> 
    <td> 3.1</td> 
    </tr>''' 
>>> soup = BeautifulSoup(s) 
>>> [i.text.strip() for i in soup.find_all('td', text=True)] 
['8.0', '69.0', '1', '3.1']

來源

2015-02-10 14:36:36

http://www.crummy.com/software/BeautifulSoup/ – cdvv7788 2015-02-10 14:42:46

使用Python從html表中提取數據

回答

相關問題