python3解析和維基百科頁面

所以我需要從「計算機科學」維基百科頁面獲得前10個鏈接。然後我需要從CS頁面獲得每個鏈接的10個鏈接。所以我最終會有10 * 10 = 100個鏈接。python3解析和維基百科頁面

，直到如今我寫了這個代碼：

import urllib.request as urllib2 
html = urllib2.urlopen('https://en.wikipedia.org/wiki/Computer_science').read() 
from bs4 import BeautifulSoup 
soup = BeautifulSoup(html, "lxml") 


for link in soup.find_all('a', limit=10): 
    rez=link.get('href') 
    for i in rez.find_all('a', limit=10): 
     print(i)

當我運行它，我得到這個錯誤：

'NoneType' 對象有沒有屬性 'find_all'

謝謝，這有助於很多。接下來，我需要從每一個環節返回10個鏈接，從Programming_language_theory，Computational_complexity_theory即10個鏈接..等我試圖做這部分是這樣的：

for link in soup.find_all('a', href=True, title=True, limit=10): 
     print(link['href']) 
     for link2 in link['href'].find_all('a', href=True, title=True, limit=10): 
      print(link2['href'])

但我發現了一個錯誤：「STR」對象有沒有屬性「find_all」

來源

2015-12-21 Lila

眼前的問題，我看到的是，前三個項目回來，當我運行這個片段：

for link in soup.find_all('a', limit=10): 
    rez=link.get('href') 
    print(rez)

是：

None #mw-head #p-search

這就是爲什麼當你調用rez.find_all()蟒蛇告訴你'NoneType' object has no attribute 'find_all'。

編輯＃2：
一個可能的解決方案，以消除None回報，在文章的鏈接和子鏈接是：

for link in soup.find_all('a', href=True, title=True, limit=10): 
     print(link['href']) 
     sub_html = urllib2.urlopen('https://en.wikipedia.org' + link['href']) 
     sub_soup = BeautifulSoup(sub_html, "lxml") 
     for sub_link in sub_soup.find_all('a', href=True, title=True, limit=10): 
      print(sub_link['href'])

的原因爲您的新問題是，你需要創建新鏈接的新湯對象，而link['href']只是一個字符串。

來源

2015-12-21 21:00:46 Adam

我想要第10個鏈接，它們的內容並不重要。因此，我寫了find_all（'a'），是不是正確？ – Lila

爲您修改的問題編輯 – Adam

python3解析和維基百科頁面

回答

相關問題