使用Python請求訪問動態生成的網站內容

-1

我試圖從使用Python（BeautifulSoup）的少數網站收集數據。但是，有時很難獲得搜索結果，例如：使用Python請求訪問動態生成的網站內容

import requests 
from bs4 import BeautifulSoup 

url1 = 'https://auto.ria.com/legkovie/city/vinnica/?page=1' 
url2= 'https://auto.ria.com/search/?top=11&category_id=1&state[0]=1' 

def get_value(url): 
    r = requests.get(url, headers = {'Accept-Encoding' : 'deflate'}) 
    print("Response Time: {}".format(r.elapsed.total_seconds())) 

    soup = BeautifulSoup(r.text, 'lxml') 
    data = soup.find('span', attrs = {'id' : 'resultsCount'}).find('strong') 
    print('{} \n'.format(data)) 

get_value(url1) 
get_value(url2)

輸出是：

Response Time: 5.4943 
<strong class="count">5 310</strong> 

Response Time: 0.174867 
<strong class="count">0</strong>

雖然URL2在瀏覽器中顯示的號碼的情況下，爲338 我想，搜索結果在某些JSON中找到，但如何使用請求訪問它？

來源

2016-10-11 trina24

「我想可以在某些json中找到搜索結果」：您是否試圖找到它？ –

我建議放大湯對象的細節，看看有什麼。您可以嘗試使用findAll而不是查找和打印結果。您也可以嘗試剝離最後的呼叫以查找（用於強標籤）並打印結果。一旦你調查了更大的物體，你很可能會看到發生了什麼。它可能是url2標記不同，你將不得不調整你的功能，以適應。

來源

2016-10-11 15:56:27 Rachel

您的代碼運行良好，並且url2返回預期結果。通過從使用Chrome瀏覽網頁的源文件：
<span id="resultsCount" class="hide">Найдено <strong class="count">0</strong> объявлений</span>

這是你正在努力尋找美麗的湯標籤。在Chrome中顯示的數字和程序的輸出是一樣的！

<strong class="count">0</strong>

此外，搜索結果不會在json中返回。如果你檢查響應標題：

Content-Type: text/html

也許你想讓響應包含整個標記呢？如果是這種情況，請嘗試：

data = soup.find('span', attrs = {'id' : 'resultsCount'})

來源

2016-10-11 17:02:24

使用Python請求訪問動態生成的網站內容

回答

相關問題