Python 3.4 - 循環遍歷n個URL，其中n不固定

什麼是循環訪問系列URL的最簡單方法，直到沒有更多結果返回爲止？Python 3.4 - 循環遍歷n個URL，其中n不固定

如果URL的數量是固定的如9，類似下面的代碼將工作

for i in range(1,10): 
    print('http://www.trademe.co.nz/browse/categorylistings.aspx?v=list&rptpath=4-380-50-7145-&mcatpath=sports%2fcycling%2fmountain-bikes%2ffull-suspension&page='+ str(i)+'&sort_order=default ')

然而，URL的數量是動態的，我也得到一個網頁說：「對不起，目前沒有在這個類別上市。「當我超調。下面的例子。

http://www.trademe.co.nz/browse/categorylistings.aspx?v=list&rptpath=4-380-50-7145-&mcatpath=sports%2fcycling%2fmountain-bikes%2ffull-suspension&page=10&sort_order=default

什麼是隻返回與結果頁面的最簡單的方法？

乾杯史蒂夫

來源

2016-02-23 Steve

怎麼樣的'如果犯錯響應：break'其中'err'是你上面提到的錯誤？最有可能使用trademe API會更乾淨，儘管 – Geotob

我建議使用他們的API像一個好的Internet公民，而不是竊取他們的數據：http://developer.trademe.co.nz/api-terms/terms-and-條件/ – IanAuld

# count is an iterator that just keeps going 
# from itertools import count 
# but I'm not going to use it, because you want to set a reasonable limit 
# otherwise you'll loop endlessly if your end condition fails 

# requests is third party but generally better than the standard libs 
import requests 

base_url = 'http://www.trademe.co.nz/browse/categorylistings.aspx?v=list&rptpath=4-380-50-7145-&mcatpath=sports%2fcycling%2fmountain-bikes%2ffull-suspension&page={}&sort_order=default' 

for i in range(1, 30): 
    result = requests.get(base_url.format(i)) 
    if result.status_code != 200: 
     break 
    content = result.content.decode('utf-8') 
    # Note, this is actually quite fragile 
    # For example, they have 2 spaces between 'no' and 'listings' 
    # so looking for 'no listings' would break 
    # for a more robust solution be more clever. 
    if 'Sorry, there are currently no' in content: 
     break 

    # do stuff with your content here 
    print(i)

來源

2016-02-23 03:24:22

Python 3.4 - 循環遍歷n個URL，其中n不固定

回答

相關問題