使用HTML解析器bs4是非常容易使用,你可以通過價格秩序加入?sort=pricedsc
的URL,這樣的第一場比賽將是最大和最後將在最後一個最低(該頁面) :
r = requests.get("http://orlando.craigslist.org/search/apa?sort=pricedsc")
from bs4 import BeautifulSoup
html = r.content
soup = BeautifulSoup(html)
print "Highest price: ${}".format(prices[0])
print "Lowest price: ${}".format(prices[-1])
print "Average price: ${}".format(sum(prices, 0.0)/len(prices))
如果你想你需要順序遞增的最低價格:
r = requests.get("http://orlando.craigslist.org/search/apa?sort=priceasc")
from bs4 import BeautifulSoup
html = r.content
soup = BeautifulSoup(html)
prices = [int(pr.text.strip("$")) for pr in soup.select("span.price")]
print "Highest price: ${}".format(prices[-1])
print "Lowest price: ${}".format(prices[0])
print "Average price: ${}".format(sum(prices, 0.0)/len(prices))
現在的產量有很大的不同:
Highest price: $70
Lowest price: $1
Average price: $34.89
如果你想要所有的平均值,你需要添加更多的邏輯。默認情況下,您只會看到100 of 2500
結果,但我們可以更改。
r = requests.get("http://orlando.craigslist.org/search/apa")
from bs4 import BeautifulSoup
html = r.content
soup = BeautifulSoup(html)
prices = [int(pr.text.strip("$")) for pr in soup.select("span.price")]
# link to next 100 results
nxt = soup.select_one("a.button.next")["href"]
# keep looping until we find a page with no next button
while nxt:
url = "http://orlando.craigslist.org{}".format(nxt)
r = requests.get(url)
soup = BeautifulSoup(r.content)
# extend prices to our list
prices.extend([int(pr.text.strip("$")) for pr in soup.select("span.price")])
nxt = soup.select_one("a.button.next")
if nxt:
nxt = nxt["href"]
,這將給你從1-2500
每房源是否出現在該網頁上,如果你在瀏覽器中調用它呢?它不適合我。 – usr2564301
哦,所以它只會顯示頭版的最高價格?對不起,我不知道我的歉意 – PostMagne
一個方面的說明:你應該使用像美麗的解析器html解析器,而不是正則表達式 – Keatinge