Python HTML解析，獲取標籤名稱及其值

我在Python中使用beautifulsoup。
是否有一種方式來獲得屬性名稱中包含它的值：Python HTML解析，獲取標籤名稱及其值

名稱=標題值=這是標題

名稱=鏈接值= .../style.css中

soup.html.head =

<meta content="all" name="audience"/> 
<meta content="2006-2013 webrazzi.com." name="copyright"/> 
<title> This is title</title> 
<link href=".../style.css" media="screen" rel="stylesheet" type="text/css"/>

在此先感謝。

來源

2014-02-25 ridvanzoro

使用.text或.string屬性獲取元素的文本內容。

使用.get('attrname')或['attrname']來獲取屬性值。

html = ''' 
<head> 
    <meta content="all" name="audience"/> 
    <meta content="2006-2013 webrazzi.com." name="copyright"/> 
    <title> This is title</title> 
    <link href=".../style.css" media="screen" rel="stylesheet" type="text/css"/> 
</head> 
''' 

from bs4 import BeautifulSoup 
soup = BeautifulSoup(html) 
print('name={} value={}'.format('title', soup.title.text)) # <---- 
print('name={} value={}'.format('link', soup.link['href'])) # <----

輸出：

name=title value= This is title 
name=link value=.../style.css

根據OP的評論UPDATE：

def get_text(el): return el.text 
def get_href(el): return el['href'] 

# map tag names to functions (what to retrieve from the tag) 
what_todo = { 
    'title': get_text, 
    'link': get_href, 
} 
for el in soup.select('head *'): # To retrieve all children inside `head` 
    f = what_todo.get(el.name) 
    if not f: # skip non-title, non-link tags. 
     continue 
    print('name={} value={}'.format(el.name, f(el)))

輸出：同上

來源

2014-02-25 08:01:50 falsetru

感謝您的答覆。這是工作。但我正在尋找另一種方式，使用一個循環來獲得所有的值。像'while（）{print property，value}' – ridvanzoro

@ridvanzoro，然後你需要定義什麼標籤應該檢索文本內容，什麼標籤應首先檢索什麼屬性。 – falsetru

@ridvanzoro，我更新了答案。一探究竟。 – falsetru

Python HTML解析，獲取標籤名稱及其值

回答

相關問題