如何使用Python自動分析跨越多個頁面的表格

我想解析跨越多個頁面的表格（或多個表格）。我在下面的工作方式，但是太方便了，我希望它自動解析來自不同頁面的表並將它們合併爲一個。頁數可能並不總是相同的。如何使用Python自動分析跨越多個頁面的表格

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
import pandas as pd 

one = "https://rittresultater.no/nb/sb_tid/923?page=0&pv2=11027&pv1=U" 
two = "https://rittresultater.no/nb/sb_tid/923?page=1&pv2=11027&pv1=U" 
three = "https://rittresultater.no/nb/sb_tid/923?page=2&pv2=11027&pv1=U" 

#parse the first page 
html = urlopen(one) 
soup = BeautifulSoup(html, "lxml") 
table = soup.find_all(class_="table-condensed") 
one = pd.read_html(str(table))[0] 

#parse the second page 
html = urlopen(two) 
soup = BeautifulSoup(html, "lxml") 
table = soup.find_all(class_="table-condensed") 
two = pd.read_html(str(table))[0] 

#parse thr third page 
html = urlopen(three) 
soup = BeautifulSoup(html, "lxml") 
table = soup.find_all(class_="table-condensed") 
three = pd.read_html(str(table))[0] 

df = pd.concat([one,two,three], axis = 0) 
df

請注意，url只在「page = X」中有所不同。此外，網頁本身包含鏈接，例如。下一頁。

來源

2017-06-08 NRVA

results = {} 
for page_num in range(1, 10): #change depending on max page 
    address = 'https://rittresultater.no/nb/sb_tid/923?page=' + \ 
       str(page_num) + '&pv2=11027&pv1=U' 

    html = urlopen(address) 
    soup = BeautifulSoup(html, 'lxml') 
    table = soup.find_all(class='table-condensed') 
    output = pd.read_html(str(table))[0] 
    results[page_num] = output

當對其做使用列表理解做培訓相關的事情輸出，如果它在你的代碼的最後一行，但擴大規模做到這一點：

df = pd.concat([v for v in results.values()], axis = 0)

來源

2017-06-08 09:28:34 Rosh

完美！感謝您提供乾淨，好的答案 – NRVA

如何使用Python自動分析跨越多個頁面的表格

回答

相關問題