如何使用Python檢索動態html內容的值

我正在使用Python 3，並試圖從網站檢索數據。然而，這個數據動態加載和我現在所擁有的代碼不起作用：如何使用Python檢索動態html內容的值

url = eveCentralBaseURL + str(mineral) 
print("URL : %s" % url); 

response = request.urlopen(url) 
data = str(response.read(10000)) 

data = data.replace("\\n", "\n") 
print(data)

當我試圖找到一個特定的值，我發現一個模板，而不是如「{{formatPrice位數}}「而不是」4.48「。

我該如何使它能夠檢索值而不是佔位符文本？

編輯：This是我試圖從中提取信息的特定頁面。我試圖獲得使用模板的「中值」值{{formatPrice median}}

編輯2：我已經安裝並設置了我的程序以使用Selenium和BeautifulSoup。

我現在的代碼是：

from bs4 import BeautifulSoup 
from selenium import webdriver 

#... 

driver = webdriver.Firefox() 
driver.get(url) 

html = driver.page_source 
soup = BeautifulSoup(html) 

print "Finding..." 

for tag in soup.find_all('formatPrice median'): 
    print tag.text

Here是因爲它是執行程序的屏幕截圖。不幸的是，它似乎沒有找到任何指定了「formatPrice median」的東西。

來源

2013-07-11 Tagc

當你訪問瀏覽器中的URL時，你會得到模板標籤嗎？編輯：另外，你的模板如何呈現。如果您使用JavaScript模板引擎（例如Handlebars），這可能意味着您將在響應中獲得模板標籤。 –

RE編輯2 - 這只是一個新問題...無論如何，我認爲你需要查看find_all的文檔，因爲你的find_all字符串無效。我將在下面更新一些更接近您需要的內容http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html#arg-name。 –

乾杯！我嘗試使用soup.findall（True）來獲取所有標籤，並且我需要的信息就在那裏！這只是爲了找到我需要搜索哪個標籤以獲取該信息。 – Tagc

假設你正試圖從正在使用的JavaScript模板（比如像handlebars）呈現的頁面得到的值，那麼這就是你將與任何標準的解決方案（即beautifulsoup或requests）的得到了什麼。

這是因爲瀏覽器使用JavaScript來改變它收到的內容並創建新的DOM元素。 urllib將會像瀏覽器那樣做請求部分，但不是模板渲染部分。 A good description of the issues can be found here。本文討論了三個主要的解決方案：

解析AJAX JSON直接
使用離線Javascript解釋來處理請求SpiderMonkey，crowbar
使用瀏覽器自動化工具splinter

This answer提供對於選項3還有幾點建議，如selenium或watir。我使用硒進行自動化Web測試，它非常方便。

編輯

從您的意見看起來它是一個車把驅動的網站。我推薦硒和美麗的湯。 This answer給出了可能是有用的一個很好的代碼示例：

from bs4 import BeautifulSoup 
from selenium import webdriver 
driver = webdriver.Firefox() 
driver.get('http://eve-central.com/home/quicklook.html?typeid=34') 

html = driver.page_source 
soup = BeautifulSoup(html) 

# check out the docs for the kinds of things you can do with 'find_all' 
# this (untested) snippet should find tags with a specific class ID 
# see: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class 
for tag in soup.find_all("a", class_="my_class"): 
    print tag.text

基本上硒從瀏覽器中得到呈現的HTML，然後你可以使用BeautifulSoup從page_source屬性解析它。祝你好運:)

來源

2013-07-11 17:35:44

感謝您的幫助。我對網絡語言或基於網絡的編程方面的經驗很少，但如果有幫助，我會鏈接我試圖解析數據的網站。 – Tagc

我會開始尋找請求和美麗的泡泡。 – Tagc

我看了一下網站 - 它幾乎打破了我的電腦幾次加載:)是的，如果你是鉻擊中F12，如果你去「網絡」選項卡，你會看到'Backbone'，'下劃線'和「把手」全部加載。我認爲你將不得不採用「硒」方法。我會用一些示例代碼編輯 –

如何使用Python檢索動態html內容的值

回答

相關問題