使用BS4湊在同一個頁面的幾個表

所以我想湊這個網站http://www.baseball-reference.com/players/a/alberma01.shtml 使用BS4湊在同一個頁面的幾個表

url = 'http://www.baseball-reference.com/players/a/alberma01.shtml' 
r = urllib.request.urlopen(url).read() 
soup = BeautifulSoup(r)

我已經試過

div = soup.find('div', id='all_br-salaries')

和

在題爲 "Salaries"最後一個表中特定的HTML表

div = soup.find('div', attrs={'id': 'all_br-salaries'})

當我打印div我看到表中的數據，但是當我t ry是這樣的：

div.find('thead') 
div.find('tbody')

我什麼也沒得到。我的問題是如何正確選擇表格，以便我可以迭代tr標籤來提取數據？

來源

2017-05-13 e1v1s

原因是什麼？該表的HTML是 - 不要問我爲什麼 - 在評論字段中。因此，請從評論中挖掘HTML，將變成湯，然後以通常的方式挖出湯。

>>> import requests 
>>> page = requests.get('http://www.baseball-reference.com/players/a/alberma01.shtml').text 
>>> from bs4 import BeautifulSoup 
>>> table_code = page[page.find('<table class="sortable stats_table" id="br-salaries"'):] 
>>> soup = BeautifulSoup(table_code, 'lxml') 
>>> rows = soup.findAll('tr') 
>>> len(rows) 
14 
>>> for row in rows[1:]: 
...  row.text 
...  
'200825Baltimore\xa0Orioles$395,000? ' 
'200926Baltimore\xa0Orioles$410,000? ' 
'201027Baltimore\xa0Orioles$680,0002.141 ' 
'201128Boston\xa0Red\xa0Sox$875,0003.141 ' 
'201229Boston\xa0Red\xa0Sox$1,075,0004.141contracts ' 
'201330Cleveland\xa0Indians$1,750,0005.141contracts ' 
'201431Houston\xa0Astros$2,250,0006.141contracts ' 
'201532Chicago\xa0White\xa0Sox$1,500,0007.141contracts ' 
'201532Houston\xa0Astros$200,000Buyout of contract option' 
'201633Chicago\xa0White\xa0Sox$2,000,0008.141 ' 
'201734Chicago\xa0White\xa0Sox$250,000Buyout of contract option' 
'2017 StatusSigned thru 2017, Earliest Free Agent: 2018' 
'Career to date (may be incomplete)$11,385,000'

編輯：我發現這是一個註釋字段通過在Chrome瀏覽器中打開的HTML頁面，再往下看，通過它所需的表。這是我發現的。請注意開放<!--。

來源

2017-05-13 20:45:40

什麼表明它是在意見欄？我知道'table_code = page [page.find（

e1v1s

我已經爲您的第一個查詢編輯了我的答案。至於第二，這不是一個列表。該符號表示從字符串中進行選擇。首先我找到表格從註釋開始的位置，然後選擇以表格開頭部分開始的註釋部分。 –

相關問題

使用BS4湊在同一個頁面的幾個表

回答

相關問題