BeautifulSoup：刮掉源代碼中具有相同屬性集合的不同數據集

我正在使用BeautifulSoup模塊來從Twitter帳戶中獲取關注者總數和推文總數。然而，當我試圖檢查各自領域的內容網頁上，我發現，無論是場均相同的一組HTML的內部封閉屬性：BeautifulSoup：刮掉源代碼中具有相同屬性集合的不同數據集

關注

<a class="ProfileNav-stat ProfileNav-stat--link u-borderUserColor u-textCenter js-tooltip js-nav u-textUserColor" data-nav="followers" href="/IAmJericho/followers" data-original-title="2,469,681 Followers"> 
      <span class="ProfileNav-label">Followers</span> 
      <span class="ProfileNav-value" data-is-compact="true">2.47M</span> 
</a>

分享Tweet算

<a class="ProfileNav-stat ProfileNav-stat--link u-borderUserColor u-textCenter js-tooltip js-nav" data-nav="tweets" tabindex="0" data-original-title="21,769 Tweets"> 
       <span class="ProfileNav-label">Tweets</span> 
       <span class="ProfileNav-value" data-is-compact="true">21.8K</span> 
</a>

import requests 
import urllib2 
from bs4 import BeautifulSoup 

link = "https://twitter.com/iamjericho" 
r = urllib2.urlopen(link) 
src = r.read() 
res = BeautifulSoup(src) 
followers = '' 
for e in res.findAll('span', {'data-is-compact':'true'}): 
    followers = e.text 

print followers

但是：

，我寫的劇本採，因爲這兩者的值，總tweet數和總追隨者數被包含在同一組HTML屬性中，即在span標記class = "ProfileNav-value"和data-is-compact = "true"內，我只獲得通過運行返回的追隨者總數的結果上面的腳本。

我怎麼能從BeautifulSoup中提取兩個類似HTML屬性的信息？

來源

2015-07-04 Manas Chaturvedi

在一個側面說明刮網站，如Twitter是通常反對在服務條款。使用api可能會更好。 – Craicerjack

@Craicerjack嗯，它是一個普遍的問題說實話。在從網站上刪除信息時，在類似情況下會做什麼？ –

在這種情況下，一個方法去實現它，是檢查data-is-compact="true"只爲你想提取每一塊數據出現了兩次，你也知道，tweets是第一和followers第二，這樣你就可以有一個列表在相同的順序這些頭銜並使用zip加入他們的元組在同一時間同時打印，如：

import urllib2 
from bs4 import BeautifulSoup 

profile = ['Tweets', 'Followers'] 

link = "https://twitter.com/iamjericho" 
r = urllib2.urlopen(link) 
src = r.read() 
res = BeautifulSoup(src) 
followers = '' 
for p, d in zip(profile, res.find_all('span', { 'data-is-compact': "true"})): 
    print p, d.text

它產生：

Tweets 21,8K                                                                 
Followers 2,47M

來源

2015-07-04 23:32:59 Birei

BeautifulSoup：刮掉源代碼中具有相同屬性集合的不同數據集

回答

相關問題