如何抓取<a href="url">使用BeautifulSoup4（Python 2.7）沒有類或ID的鏈接

我很努力地試圖抓住一個不包含任何類或標識的標籤。它只是一個href，然後是鏈接。如何抓取<a href="url">使用BeautifulSoup4（Python 2.7）沒有類或ID的鏈接

html代碼 - 還有更多，但這只是它的一小部分。我試圖抓住一個href =「網址在這裏」，但我不能只抓住「一個」，因爲它會抓住頁面上的每一個鏈接。

<table> 
<tbody> 
<tr class=""> 
<td class="col1 align"> 
<a href="url is here"> 
1 
</a> 
</td> 
<td class="col2"> 
<a href="www.example.com"> 
<img class="avatar" src="www.example.com" alt="le me"> 
le me 
<img class="test" alt="test" title="test" src="test-icon.png"> 
</a> 
</td> 
<td class="col3 align"> 
<a href="www.example.com"> 
2,715 
</a> 
</td> 
<td class="col4 align"> 
<a href="www.example.com"> 
5,400,000,000 
</a> 
</td> 
</tr>

我的代碼：

source_code = requests.get(url) 
plain_text = source_code.text 
soup = BeautifulSoup(plain_text) 
for link in soup.findAll(): 
    username = link.get() 
    print(username)

我沒有這些充滿因爲任何我試圖將無法正常工作。不知道還有什麼要做。

來源

2016-11-24 CadenDEV

您可以選擇所有a標籤和使用has_attr功能檢查，如果它有class或id屬性：

for link in soup.findAll('a'): 
    if link.has_attr('class') or link.has_attr('id'): 
     continue 
    username = link.get('href') 
    print(username)

來源

2016-11-24 00:57:39 Dekel

如何抓取<a href="url">使用BeautifulSoup4（Python 2.7）沒有類或ID的鏈接

回答

相關問題