2016-04-17 19 views
1

它顯示了當我看到的最高價格超過一百萬時,公寓的最高價格是4700美元。爲什麼沒有顯示?我究竟做錯了什麼?網絡報廢python的craigslist公寓價格沒有顯示最高成本公寓

import requests 
import re 


r = requests.get("http://orlando.craigslist.org/search/apa") 
r.raise_for_status() 

html = r.text 


matches = re.findall(r'<span class="price">\$(\d+)</span>', html) 
prices = map(int, matches) 


print "Highest price: ${}".format(max(prices)) 
print "Lowest price: ${}".format(min(prices)) 
print "Average price: ${}".format(sum(prices)/len(prices)) 
+0

每房源是否出現在該網頁上,如果你在瀏覽器中調用它呢?它不適合我。 – usr2564301

+0

哦,所以它只會顯示頭版的最高價格?對不起,我不知道我的歉意 – PostMagne

+1

一個方面的說明:你應該使用像美麗的解析器html解析器,而不是正則表達式 – Keatinge

回答

1

使用HTML解析器bs4是非常容易使用,你可以通過價格秩序加入?sort=pricedsc的URL,這樣的第一場比賽將是最大和最後將在最後一個最低(該頁面) :

r = requests.get("http://orlando.craigslist.org/search/apa?sort=pricedsc") 
from bs4 import BeautifulSoup 

html = r.content 

soup = BeautifulSoup(html) 
print "Highest price: ${}".format(prices[0]) 
print "Lowest price: ${}".format(prices[-1]) 
print "Average price: ${}".format(sum(prices, 0.0)/len(prices)) 

如果你想你需要順序遞增的最低價格:

r = requests.get("http://orlando.craigslist.org/search/apa?sort=priceasc") 
from bs4 import BeautifulSoup 

html = r.content 

soup = BeautifulSoup(html) 
prices = [int(pr.text.strip("$")) for pr in soup.select("span.price")] 
print "Highest price: ${}".format(prices[-1]) 
print "Lowest price: ${}".format(prices[0]) 
print "Average price: ${}".format(sum(prices, 0.0)/len(prices)) 

現在的產量有很大的不同:

Highest price: $70 
Lowest price: $1 
Average price: $34.89 

如果你想要所有的平均值,你需要添加更多的邏輯。默認情況下,您只會看到100 of 2500結果,但我們可以更改。

r = requests.get("http://orlando.craigslist.org/search/apa") 
from bs4 import BeautifulSoup 

html = r.content 

soup = BeautifulSoup(html) 
prices = [int(pr.text.strip("$")) for pr in soup.select("span.price")] 

# link to next 100 results 
nxt = soup.select_one("a.button.next")["href"] 

# keep looping until we find a page with no next button 
while nxt: 
    url = "http://orlando.craigslist.org{}".format(nxt) 
    r = requests.get(url) 
    soup = BeautifulSoup(r.content) 
    # extend prices to our list 
    prices.extend([int(pr.text.strip("$")) for pr in soup.select("span.price")]) 
    nxt = soup.select_one("a.button.next") 
    if nxt: 
     nxt = nxt["href"] 

,這將給你從1-2500