用python填充javascript？

我想解析一個html頁面，但我需要在解析頁面之前過濾結果。用python填充javascript？

例如，'http://www.ksl.com/index.php?nid=443'是猶他州汽車的分類列表。我不想解析所有的汽車，而是先過濾它（即找到所有的寶馬），然後只解析這些頁面。是否有可能用Python填寫一個JavaScript表單？

這是我到目前爲止有：

import urllib 

content = urllib.urlopen('http://www.ksl.com/index.php?nid=443').read() 
f = open('/var/www/bmw.html',"w") 
f.write(content) 
f.close()

來源

2012-05-03 Marissa Levy

你是想通過解析與Python的HTML來提取網頁的JavaScript？從你的問題來看，這不是很清楚。 – MikeWyatt

我只對寶馬感興趣，因此，我想在我試圖解析html之前過濾我的結果 –

我想借此機會鏈接到[歷史上最流行的答案]（http ：//stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454）！ –

這是要做到這一點的方法。首先下載頁面，刮擦它找到你正在尋找的模型，然後你可以獲得鏈接到新頁面進行刮擦。這裏不需要javascript。這個模型和BeautifulSoup文檔將幫助你。

from BeautifulSoup import BeautifulSoup 
import urllib2 

base_url = 'http://www.ksl.com' 
url = base_url + '/index.php?nid=443' 
model = "Honda" # this is the name of the model to look for 

# Load the page and process with BeautifulSoup 
handle = urllib2.urlopen(url) 
html = handle.read() 
soup = BeautifulSoup(html) 

# Collect all the ad detail boxes from the page 
divs = soup.findAll(attrs={"class" : "detailBox"}) 

# For each ad, get the title 
# if it contains the word "Honda", get the link 
for div in divs: 
    title = div.find(attrs={"class" : "adTitle"}).text 
    if model in title: 
     link = div.find(attrs={"class" : "listlink"})["href"] 
     link = base_url + link 
     # Now you have a link that you can download and scrape 
     print title, link 
    else: 
     print "No match: ", title

在回答的那一刻，這個代碼片斷是尋找本田車型和返回如下：

1995- Honda Prelude http://www.ksl.com/index.php?sid=0&nid=443&tab=list/view&ad=8817797 
No match: 1994- Ford Escort 
No match: 2006- Land Rover Range Rover Sport 
No match: 2006- Nissan Maxima 
No match: 1957- Volvo 544 
No match: 1996- Subaru Legacy 
No match: 2005- Mazda Mazda6 
No match: 1995- Chevrolet Monte Carlo 
2002- Honda Accord http://www.ksl.com/index.php?sid=0&nid=443&tab=list/view&ad=8817784 
No match: 2004- Chevrolet Suburban (Chevrolet) 
1998- Honda Civic http://www.ksl.com/index.php?sid=0&nid=443&tab=list/view&ad=8817779 
No match: 2004- Nissan Titan 
2001- Honda Accord http://www.ksl.com/index.php?sid=0&nid=443&tab=list/view&ad=8817770 
No match: 1999- GMC Yukon 
No match: 2007- Toyota Tacoma

來源

2012-05-03 20:34:09 gauden

這是一個選項，但除了這個具體的例子，我想知道如何做一個JavaScript查詢與蟒蛇爲我的一般理解 –

我想這取決於你使用它的。許多[Python to Javascript庫都包含在這裏]（http://stackoverflow.com/questions/683462/best-way-to-integrate-python-and-javascript），可能是很好的線索。完全在另一個軌道上，您可能對[Selenium]（http://seleniumhq.org/）感興趣，它有一個Python庫來自動瀏覽/ Web測試，或者[機械化模塊]（http：//wwwsearch.sourceforge。 net/mechanize /）填寫表格......？ – gauden

-1

如果你使用python，Beautifull Soup是你在找什麼。

來源

2012-05-03 19:45:45 aldux

我瀏覽過文檔，但沒有看到有關javascript的任何內容.... –

確實。這不是關於JavaScript，而是Python，因爲你使用Python中的urllib來感染數據。 – aldux

但我需要過濾結果，然後才能在Python中解析它w/urllib –

用python填充javascript？

回答

相關問題