Python BeautifulSoup：有沒有一種方法來計算爬網結果的數量？

有沒有一種方法可以統計BeautifulSoup中爬行的結果數量？Python BeautifulSoup：有沒有一種方法來計算爬網結果的數量？

這是代碼。

def crawl_first_url(max_page): 
    page = 1 

    while page <= max_page: 
     url = 'http://www.hdwallpapers.in/page/' + str(page) 
     source_code = requests.get(url) 
     plain_text = source_code.text 
     soup = BeautifulSoup(plain_text, 'html.parser') 

     for div in soup.select('.thumb a'): 
      href = 'http://www.hdwallpapers.in' + div.get('href') 
      crawl_second_url(href) 
     page += 1 

def crawl_second_url(second_href): 
    #need to count the number of results here. 
    #I tried, len(second_href) but it doesn't work well. 

crawl_first_url(1)

我想第二個函數來計算抓取結果的數量，例如，如果19個URL已經被抓取，我想要它的數量。

來源

2015-12-22 Lindow

'crawl_second_url'做什麼？它只計算結果嗎？ – dstudeba

@dstudeba是的，它應該只計算結果的數量，但我不知道我該怎麼做... – Lindow

由於您只需要計算結果數量，因此我沒有看到有獨立功能的原因，只需添加一個計數器即可。

page = 1 
numResults = 0 

while page <= max_page: 
    url = 'http://www.hdwallpapers.in/page/' + str(page) 
    source_code = requests.get(url) 
    plain_text = source_code.text 
    soup = BeautifulSoup(plain_text, 'html.parser') 

    for div in soup.select('.thumb a'): 
     href = 'http://www.hdwallpapers.in' + div.get('href') 
     numResults += 1 
    page += 1 

print("There are " + numResults + " results.")

這隻會計算子頁數。如果您還想計算頂層頁面，只需在湯後添加另一個增量線。您可能還需要添加一個try: except:塊以避免崩潰。

來源

2015-12-22 16:54:33 dstudeba

Python BeautifulSoup：有沒有一種方法來計算爬網結果的數量？

回答

相關問題