Python的 - 從節點

我一直在試圖從網站獲取信息，而最近發現存儲在的childNodes [0]。數據獲取信息。我很新的python，從來沒有嘗試過針對網站的腳本。Python的 - 從節點

有人告訴我我可以創建一個tmp.xml文件，並從那裏提取信息，但由於它只是獲取源代碼（我認爲這對我來說沒用），所以我沒有得到任何結果。

當前代碼：

response = urllib2.urlopen(get_link) 
html = response.read() 
with open("tmp.xml", "w") as f: 
    f.write(html) 
dom = parse("tmp.xml") 
name = dom.getElementsByTagName("name[0].firstChild.nodeValue")

我還用 'DOM =解析（HTML）' 沒有更好的結果嘗試。

來源

2013-11-09 user2974787

getElementsByTagName()需要的元素名稱，而不是表達式。包含<name[0].firstChild.nodeValue>標籤的頁面中存在標籤的可能性非常小。

如果您加載HTML，使用HTML解析器，而不是像BeautifulSoup。對於XML，使用ElementTree API比使用（陳舊和非常冗長的）DOM API更容易。

這兩種方法都需要你先救源到磁盤上，這兩個API可以從urllib2返回的響應對象直接解析。

# HTML 
import urllib2 
from bs4 import BeautifulSoup 

response = urllib2.urlopen(get_link) 
soup = BeautifulSoup(response.read(), from_encoding=response.headers.getparam('charset')) 

print soup.find('title').text

或

# XML 
import urllib2 
from xml.etree import ElementTree as ET 

response = urllib2.urlopen(get_link) 
tree = ET.parse(response) 

print tree.find('elementname').text

來源

2013-11-09 20:27:11

Python的 - 從節點

回答

相關問題