關聯的多個團隊的多個標頭/值：

http://www.nhl.com/ice/teamstats.htm

現在，我有我的代碼在這裏。這隻能打印出表格頂部的所有標題：

from bs4 import BeautifulSoup 
from urllib.request import urlopen 

url = urlopen("http://www.nhl.com/ice/teamstats.htm") 

content = url.read() 

soup = BeautifulSoup(content) 

results = {} 

for table in soup.find_all('table', class_='data stats'): 
    for row in table.find_all('tr'): 
     name = None 
     for cell in row.find_all('th'): 
      link = cell.find('a') 
      if link: 
       name = cell.a.string 
       print (name)

確定地說，這個東西比較複雜。我能夠在很多幫助和重新學習一些被遺忘的Python課程的同時，能夠在這個網站上完成團隊和分數的關聯：http://sports.yahoo.com/nhl/scoreboard?d=2013-04-01

但是，前一個網頁（第一個網頁）他們的價值。

我剛纔問的是其中一些的要點，以便我可以進一步完成沒有問題的其他問題（或者幾個人，誰知道）。從某種意義上說，這是我希望實現的：

Team X: GP: 30. W: 16. L: 4, etc.

謝謝！

來源

2013-08-18 Nathaniel Elder

您的密碼只處理th。還應該處理td。

嘗試以下操作：

from bs4 import BeautifulSoup 
from urllib.request import urlopen 

u = urlopen("http://www.nhl.com/ice/teamstats.htm") 
soup = BeautifulSoup(u) 
u.close() 

for table in soup.find_all('table', class_='data stats'): 
    row = table.find('tr') 
    header = [] 
    for cell in row.find_all('th')[1:]: 
     name = cell.string.strip() 
     header.append(name) 
    for row in table.find_all('tr')[1:]: 
     for name, cell in zip(header, row.find_all('td')[1:]): 
      value = cell.string.strip() 
      print('{}: {}'.format(name, value), end=', ') 
     print()

來源

2013-08-18 05:10:09 falsetru

很好的回答。出於好奇，zip是做什麼的？另外，[1：]。其餘的很清楚。 @falsetru –

@NathanielElder，'xs [1：]'返回沒有第一個元素的'xs'的副本。 – falsetru

@NathanielElder，'zip（a，b）'結合了'a'和'b'的每個元素。例如'zip（[1,2,3]，[4,5,6]）產生'[（1,4），（2,5），（3,6）]'。在Python 3.x中，產生迭代器而不是列表。 – falsetru

關聯的多個團隊的多個標頭/值：

回答

相關問題