Python分頁循環

我正在做一些簡單的網頁抓取，並需要找到一種更好的方式來循環遍歷目標網站上的分頁。我能做到這一點的唯一方法是編寫10個「for循環」以使其運行。基本上我正在尋找URL中的「下一步」圖標，如果它存在，我需要抓住圖標圖像的父鏈接並將其附加到URL，轉到新的更新的網址，並搜索相同的圖標並重復，直到我到達最後一頁（圖標將消失）。如何在不對一堆for循環進行硬編碼的情況下執行此操作？Python分頁循環

 url = "http://www.somewebsite.com/" 
     r = requests.get(wurl) 
     soup = BeautifulSoup(r.text, "lxml") 

     for img in soup.findAll("img"): 
      if "/Next_Icon" in img["src"]: 
       link = img.find_parent("a", href=True) 
       extLink = (link["href"]) 
       url = "http://www.somewebsite.com/" + extLink

來源

2017-03-26 M4cJunk13

使用遞歸或堆疊/隊列中，有大量的這兩個例子的SO。 – AChampion

url_stack = ["http://www.somewebsite.com/"] 

while url_stack: 
    wurl = url_stack.pop() 
    r = requests.get(wurl) 
    soup = BeautifulSoup(r.text, "lxml") 

    for img in soup.findAll("img"): 
     if "/Next_Icon" in img["src"]: 
      link = img.find_parent("a", href=True) 
      extLink = (link["href"]) 
      url = "http://www.somewebsite.com/" + extLink 
      url_stack.append(url)

你應該用列表來存儲所有的URL

來源

2017-03-26 05:01:24

Ha-Ha很有趣。你像我一樣寫「wurl」。是的，實際上我會用JSON來存儲所有內容。但是我只需要克服這個問題，所以我一直在打印它。非常感謝，這看起來非常好！它會工作。 – M4cJunk13

Python分頁循環

回答

相關問題