標籤裏面的beautifulsoup無法正常工作

此代碼不打印公司名單作爲reqiured。它沒有達到第一個標籤內如果我在第一個標籤內寫入「print'文字'」，它不會打印它。 BeautifulSoup正在爲不同的網站編寫不同的代碼。任何建議爲什麼它不起作用？標籤裏面的beautifulsoup無法正常工作

from bs4 import BeautifulSoup 
import urllib 
request = urllib.urlopen('http://www.stockmarketsreview.com/companies_sp500/') 
html = request.read() 
request.close() 
soup = BeautifulSoup(html) 
for tags in soup.find_all('div', {'class':'mainContent'}): 
    for row in tags.find_all('tr'): 
     for column in row.find_all('td'): 
      print column.text

來源

2014-02-18 Kundan

此代碼對我的作品.. – Totem

也許檢查您的縮進實際代碼。當你運行這個時，你會得到什麼？ – Totem

你是否在使用'lxml'作爲解析器？某些版本的lxml與某些版本的基礎libxml在解析某些HTML時遇到了麻煩。 –

我BeautifulSoup 3，這似乎正常工作：

import BeautifulSoup as BS 
import urllib 
request = urllib.urlopen('http://www.stockmarketsreview.com/companies_sp500/') 
html = request.read() 
request.close() 
soup = BS.BeautifulSoup(html) 

try: 
    tags = soup.findAll('div', attrs={'class':'mainContent'}) 
    print '# tags = ' + str(len(tags)) 
    for tag in tags: 
     try:   
     tables = tag.findAll('table') 
     print '# tables = ' + str(len(tables)) 
     for table in tables:    
      try: 
       rows = tag.findAll('tr') 
       for row in rows: 
        try: 
        columns = row.findAll('td') 
        for column in columns: 
         print column.text 
        except: 
        e = 1 
        # print 'Caught error getting td tag under ' + str(row) 
        # This is okay since some rows have <th>, not <td> 
      except: 
       print 'Caught error getting tr tag under ' + str(table) 
     except: 
     print 'Caught error getting table tag under ' + str(tag) 
except: 
    print 'Caught error getting div tag'

我相信你需要更換 '的findAll' 與 'find_all'。

輸出看起來是這樣的： enter image description here

來源

2014-02-18 13:47:54 bornruffians

我使用了你的代碼，但是它打印出'＃tags = 0'。我想我的系統上可能沒有安裝使用此代碼的網站。但是我爲其他網站使用了這種類型的代碼。任何建議我需要安裝以運行此代碼 – Kundan

也許BS4與3.2.1的差別比我想象的要大......我在運行時添加了腳本輸出的圖片。我看到'＃tags = 1'，然後是您想要解析的公司列表。 – bornruffians

標籤裏面的beautifulsoup無法正常工作

回答

相關問題