網絡報廢python的craigslist公寓價格沒有顯示最高成本公寓

它顯示了當我看到的最高價格超過一百萬時，公寓的最高價格是4700美元。爲什麼沒有顯示？我究竟做錯了什麼？網絡報廢python的craigslist公寓價格沒有顯示最高成本公寓

import requests 
import re 


r = requests.get("http://orlando.craigslist.org/search/apa") 
r.raise_for_status() 

html = r.text 


matches = re.findall(r'<span class="price">\$(\d+)</span>', html) 
prices = map(int, matches) 


print "Highest price: ${}".format(max(prices)) 
print "Lowest price: ${}".format(min(prices)) 
print "Average price: ${}".format(sum(prices)/len(prices))

來源

2016-04-17 PostMagne

每房源是否出現在該網頁上，如果你在瀏覽器中調用它呢？它不適合我。 – usr2564301

哦，所以它只會顯示頭版的最高價格？對不起，我不知道我的歉意 – PostMagne

一個方面的說明：你應該使用像美麗的解析器html解析器，而不是正則表達式 – Keatinge

使用HTML解析器bs4是非常容易使用，你可以通過價格秩序加入?sort=pricedsc的URL，這樣的第一場比賽將是最大和最後將在最後一個最低（該頁面）：

r = requests.get("http://orlando.craigslist.org/search/apa?sort=pricedsc") 
from bs4 import BeautifulSoup 

html = r.content 

soup = BeautifulSoup(html) 
print "Highest price: ${}".format(prices[0]) 
print "Lowest price: ${}".format(prices[-1]) 
print "Average price: ${}".format(sum(prices, 0.0)/len(prices))

如果你想你需要順序遞增的最低價格：

r = requests.get("http://orlando.craigslist.org/search/apa?sort=priceasc") 
from bs4 import BeautifulSoup 

html = r.content 

soup = BeautifulSoup(html) 
prices = [int(pr.text.strip("$")) for pr in soup.select("span.price")] 
print "Highest price: ${}".format(prices[-1]) 
print "Lowest price: ${}".format(prices[0]) 
print "Average price: ${}".format(sum(prices, 0.0)/len(prices))

現在的產量有很大的不同：

Highest price: $70 
Lowest price: $1 
Average price: $34.89

如果你想要所有的平均值，你需要添加更多的邏輯。默認情況下，您只會看到100 of 2500結果，但我們可以更改。

r = requests.get("http://orlando.craigslist.org/search/apa") 
from bs4 import BeautifulSoup 

html = r.content 

soup = BeautifulSoup(html) 
prices = [int(pr.text.strip("$")) for pr in soup.select("span.price")] 

# link to next 100 results 
nxt = soup.select_one("a.button.next")["href"] 

# keep looping until we find a page with no next button 
while nxt: 
    url = "http://orlando.craigslist.org{}".format(nxt) 
    r = requests.get(url) 
    soup = BeautifulSoup(r.content) 
    # extend prices to our list 
    prices.extend([int(pr.text.strip("$")) for pr in soup.select("span.price")]) 
    nxt = soup.select_one("a.button.next") 
    if nxt: 
     nxt = nxt["href"]

，這將給你從1-2500

來源

2016-04-17 17:18:00

網絡報廢python的craigslist公寓價格沒有顯示最高成本公寓

回答

相關問題