Python從表中抓取的<a>值不工作

我有這個網站Python從表中抓取的<a>值不工作

<tr class="BgWhite"> 
    <td headers="th0" valign="top"> 
    3 
    </td> 
    <td headers="th1" style="width: 125px;" valign="top"> 
    <a href="https://www.dibbs.bsm.dla.mil/RFQ/RFQNsn.aspx?value=8340015511310&amp;category=issue&amp;Scope=" title="go to NSN view">8340-01-551-1310</a> 
    </td>

我想找到「8340-01-551-1310」所以我用這個代碼，這個數字ID

test = container1.find_all("td", {"headers": "th1"}) 
test1 = test.find_all("a", {"title":"go to NSN view"})

但它顯示此消息

"ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key 
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

我究竟錯在做，如何解決這一問題？

來源

2017-09-07 e.iluf

它告訴你，'test'是'ResultSet'其中有多個項目。你打印了「測試」並看看它嗎？ –

你第一次調用find_all返回一個列表。如果迭代該列表，則可以搜索此列表的成員，但不能要求列表「find_all」，因爲這不是列表的一種方法。 – RobertB

這裏有一種方法：

from bs4 import BeautifulSoup 

data = """<tr class="BgWhite"> 
    <td headers="th0" valign="top"> 
    3 
    </td> 
    <td headers="th1" style="width: 125px;" valign="top"> 
    <a href="https://www.dibbs.bsm.dla.mil/RFQ/RFQNsn.aspx?value=8340015511310&amp;category=issue&amp;Scope=" title="go to NSN view">8340-01-551-1310</a> 
    </td>""" 

soup = BeautifulSoup(data, "lxml") 

for td in soup.find_all('td', {"headers": "th1"}): 
    for a in td.find_all('a'): 
     print(a.text)

輸出：

8340-01-551-1310

但是，如果你確信你將有隻有一個「TH1」，或者只是想第一個。如果你確定只有一個「a」，或者你只想要第一個。你可以嘗試：

print(soup.find('td', {"headers": "th1"}).find('a').text)

它返回相同的輸出。

編輯：只注意到它可以簡化爲：

print(soup.find('td', {"headers": "th1"}).a.text)

來源

2017-09-07 21:48:37 RobertB

Python從表中抓取的<a>值不工作

回答

相關問題