2016-10-02 85 views
0

所以我一直在尋找我最喜歡的軟件。後來我發現有關Web刮我發現它真的很神奇所以用我的蟒蛇的經驗,我在一些美麗的湯和要求得到了一些實踐和下面的代碼Web Scraping不能正常工作?

import html5lib 
 
import requests 
 
from bs4 import BeautifulSoup as BS 
 

 
# Get all the a strings , next siblings and next siblings 
 
def makeSoup(urls): 
 
    url = requests.get(urls).text 
 
    return BS(url,"html5lib") 
 

 
def something(soup): 
 
    for anchor in soup.findAll("a",{"data-type":"externalLink"}): 
 
     print(anchor.string) 
 
     next_sibling = anchor.nextSibling 
 
     water = str(next_sibling.string) 
 
     water = water[0:5] 
 
     while water != "(202)": 
 
      next_sibling = next_sibling.nextSibling 
 
      if next_sibling == None: 
 
       continue 
 
      if next_sibling.string != None: 
 
       print(next_sibling.string) 
 
       water = str(next_sibling.string) 
 
       water = water[0:5] 
 

 
soup = makeSoup("http://dc.about.com/od/communities/a/EmbassyGuide.htm") 
 
something(soup) 
 
soup = makeSoup("http://dc.about.com/od/communities/a/EmbassyGuide_2.htm") 
 
something(soup) 
 
soup = makeSoup("http://dc.about.com/od/communities/a/EmbassyGuide_3.htm") 
 
something(soup) 
 
<!-- begin snippet: js hide: false console: true babel: false -->

但遺憾的是所有的程序員噩夢錯誤。

Traceback (most recent call last): 
 
    File "C:\Users\Raj\Desktop\kunal projects\Python\listing_out_all_embassies.py", line 26, in <module> 
 
    something(soup) 
 
    File "C:\Users\Raj\Desktop\kunal projects\Python\listing_out_all_embassies.py", line 17, in something 
 
    next_sibling = next_sibling.nextSibling 
 
AttributeError: 'NoneType' object has no attribute 'nextSibling'

錯了我在做什麼,我是一個新手,編程以及Web的抓取。那麼有什麼好的做法,我不是遵循 無論如何,感謝閱讀,直到結束。

+0

那'continue'看起來不正確。 – user2357112

回答

0

你必須檢查next_sibling == None你可以使用next_sibling.nextSibling之前(和break當它是None

def something(soup): 
    for anchor in soup.findAll("a",{"data-type":"externalLink"}): 
     print(anchor.string) 
     next_sibling = anchor.nextSibling 
     water = str(next_sibling.string) 
     water = water[0:5] 
     while water != "(202)": 
      if next_sibling == None: 
       break 
      next_sibling = next_sibling.nextSibling 
      if next_sibling == None: 
       break 
      if next_sibling.string != None: 
       print(next_sibling.string) 
       water = str(next_sibling.string) 
       water = water[0:5] 

但我可以把它寫短

def something(soup): 
    for anchor in soup.findAll("a",{"data-type":"externalLink"}): 
     water = None # create variable to use it first time in "while" 
     while anchor and water != "(202)": 
      if anchor.string: 
       print(anchor.string) 
       water = anchor.string[:5] 
      anchor = anchor.nextSibling