如何使用Python2.7在網頁上顯示所有ID的值？

我需要顯示給定網站上所有ID的值。在urllib或urllib2中是否有函數可以讓我讀取該站點，然後在「id =」之後打印值？任何幫助，將不勝感激。如何使用Python2.7在網頁上顯示所有ID的值？

2013-01-15 user1981656

xpath'// * [string-length（@id）> 0]'，遍歷結果集，並吐出id屬性值。 –

@MarcB：你需要告訴他如何*先在數據上運行* xpath查詢.. –

謝謝，我正要問這個問題。感謝您迄今爲止的評論。 =） – user1981656

有明顯的（但醜陋的）regex的解決方案，你在哪裏得到的頁面，使用urllib或urllib2，或者更方便requests library，然後申請一個正則表達式，但我會建議pyquery包。這就像jquery，但是對於python，用css選擇器來獲取節點。

對於您的問題：

from pyquery import PyQuery 

page = """ 
<html> 
    <body id='test'> 
    <p id='test2'>some text</p> 
    </body> 
</html> 
""" 

doc = PyQuery(page) 
for node in doc("*[id]").items(): 
    print(node.attr.id)

會產生：

test 
test2

並下載頁面：

import requests 
page = requests.get("http://www.google.fr").text

而且pyquery甚至can open urls，用urllib或requests。

來源

2013-01-15 21:51:42 Scharron

我會這樣做使用BeautifulSoup和請求。我用一個簡單的例子把這個頁面放在一起，併發布在Github上。

請注意，這裏的實際工作是在return語句中 - 大部分是樣板。

from bs4 import BeautifulSoup as BS 
import requests as r 

def get_ids_from_page(page): 
    response = r.get(page) 
    soup = BS(response.content).body 

    return sorted([x.get('id') for x in soup.find_all() if x.get('id') is not None]) 

if __name__ == '__main__': 
    # In response to the question at the URL below - in short "How do I get the 
    # ids from all objects on a page in Python?" 
    ids = get_ids_from_page('http://stackoverflow.com/questions/14347086/') 

    for val in ids: 
     print val

來源

2013-01-15 22:00:26

你可以使用正則表達式：

import re 

id_list = re.findall('id="(.*?)"', html_text)

或者更復雜一點（以確保你出來只能從HTML標籤解析它）：

id_list = re.findall('<[^>]*? id="(.*?)"', html_text)

這樣只能解析特定類型的ID（匹配一些特殊模式）很容易

來源

2013-01-15 22:32:35

如何使用Python2.7在網頁上顯示所有ID的值？

回答

相關問題