2017-06-02 68 views
1

我是編程新手,我遇到了使用python BeautifulSoup刮取所有頁面的問題。我想出瞭如何刮第一頁,但我迷失在如何做所有頁面。如何用python分頁刮頁BeautifulSoup

Here is the code: 
#!/usr/bin/python 
# -*- encoding: utf-8 -*- 
from urllib2 import urlopen 
import json 
from BeautifulSoup import BeautifulSoup 

defaultPage = 1 
items = [] 
url = "https://www.nepremicnine.net/oglasi-prodaja/ljubljana-mesto/stanovanje/%d/" 

def getWebsiteContent(page=defaultPage): 
    return urlopen(url % (page)).read() 

def writeToFile(content): 
    file = open("nepremicnine1.json", "w+") 
    json.dump(content, file) 
    # file.write(content) 
    file.close() 

def main(): 

    content = getWebsiteContent(page=defaultPage) 
    soup = BeautifulSoup(content) 
    posesti = soup.findAll("div", {"itemprop": "itemListElement"}) 

    for stanovanja in posesti: 
     item = {} 
     item["Naslov"] = stanovanja.find("span", attrs={"class": "title"}).string 
     item["Velikost"] = stanovanja.find("span", attrs={"class": "velikost"}).string 
     item["Cena"] = stanovanja.find("span", attrs={"class": "cena"}).string 
     item["Slika"] = stanovanja.find("img", src = True)["src"] 

     items.append(item) 

     writeToFile(items) 

main() 

所以我想通過循環,所以URL%d將由1每次增加,因爲頁面編號2,第3等

所有幫助表示高度讚賞。

回答

1

你並沒有增加你的defaultPage變量。

您嘗試這樣做的方式是正確的。你只需要增加每次defaultPage變量,你刮完一個頁面

def main(): 
    while (defaultPage <= numPages) # Loop through all pages. You also need to define the value of numPages. 
    content = getWebsiteContent(page=defaultPage) 
    soup = BeautifulSoup(content) 
    posesti = soup.findAll("div", {"itemprop": "itemListElement"}) 

    for stanovanja in posesti: 
     item = {} 
     item["Naslov"] = stanovanja.find("span", attrs={"class": "title"}).string 
     item["Velikost"] = stanovanja.find("span", attrs={"class": "velikost"}).string 
     item["Cena"] = stanovanja.find("span", attrs={"class": "cena"}).string 
     item["Slika"] = stanovanja.find("img", src = True)["src"] 

     items.append(item) 

     writeToFile(items) 
    defaultPage += 1 

我認爲這應該工作

+1

完美,感謝您的幫助:)你救了我的神經了很多:) – Jerry

+0

我很高興我能幫忙=) –