1
我是編程新手,我遇到了使用python BeautifulSoup刮取所有頁面的問題。我想出瞭如何刮第一頁,但我迷失在如何做所有頁面。如何用python分頁刮頁BeautifulSoup
Here is the code:
#!/usr/bin/python
# -*- encoding: utf-8 -*-
from urllib2 import urlopen
import json
from BeautifulSoup import BeautifulSoup
defaultPage = 1
items = []
url = "https://www.nepremicnine.net/oglasi-prodaja/ljubljana-mesto/stanovanje/%d/"
def getWebsiteContent(page=defaultPage):
return urlopen(url % (page)).read()
def writeToFile(content):
file = open("nepremicnine1.json", "w+")
json.dump(content, file)
# file.write(content)
file.close()
def main():
content = getWebsiteContent(page=defaultPage)
soup = BeautifulSoup(content)
posesti = soup.findAll("div", {"itemprop": "itemListElement"})
for stanovanja in posesti:
item = {}
item["Naslov"] = stanovanja.find("span", attrs={"class": "title"}).string
item["Velikost"] = stanovanja.find("span", attrs={"class": "velikost"}).string
item["Cena"] = stanovanja.find("span", attrs={"class": "cena"}).string
item["Slika"] = stanovanja.find("img", src = True)["src"]
items.append(item)
writeToFile(items)
main()
所以我想通過循環,所以URL%d將由1每次增加,因爲頁面編號2,第3等
所有幫助表示高度讚賞。
完美,感謝您的幫助:)你救了我的神經了很多:) – Jerry
我很高興我能幫忙=) –