2012-05-29 86 views
0

一直以來都遇到了很多麻煩......對Python來說新手如此,對不起,如果我只是不知道正確的搜索條件來自己查找信息。我甚至沒有積極性,這是因爲JS,但這是我擁有的最好主意。用Python解析JavaScript href

這裏是我解析HTML的部分:

... 
<div class="promotion"> 
    <div class="address"> 
     <a href="javascript:PropDetail2('57795471:MRMLS')" title="View property detail for 5203 Alhama Drive">5203 Alhama Drive</a> 
    </div> 
</div> 
... 

...和Python的我用做(這個版本我已​​經得到了成功最接近):

homeFinderSoup = BeautifulSoup(open("homeFinderHTML.html"), "html5lib") 
addressClass = homeFinderSoup.find_all('div', 'address') 
for row in addressClass: 
    print row.get('href') 

...返回

None 
None 
None 
+3

沒有挖我nto文檔或任何東西,它看起來像你的代碼遍歷所有的div與類地址,並尋找一個他們沒有的href屬性。您需要獲取這些div內的所有錨定標記,然後查找THOSE的href屬性以獲取您要查找的內容。 –

+0

在導航樹時遇到了問題,列表一直在拋棄我。讓我確定正確的方向,謝謝。 –

回答

0
# Create soup from the html. (Here I am assuming that you have already read the file into 
# the variable "html" as a string). 
soup = BeautifulSoup(html) 
# Find all divs with class="address" 
address_class = soup.find_all('div', {"class": "address"}) 
# Loop over the results 
for row in address_class: 
    # Each result has one <a> tag, and we need to get the href property from it. 
    print row.find('a').get('href') 
+0

這很有效,非常好,謝謝。之前一直在嘗試.find_all(),不起作用。 –