解析HTML並查找多個類和標籤 - 最優雅的方式？

目前，我有以下代碼：解析HTML並查找多個類和標籤 - 最優雅的方式？

author_name = soup.find(True, {"class":["author", "author-name"]}) 
    if author_name is not None: 
     print author_name.text 
    else: 
     author_name = soup.find(rel="author") 
     if author_name is not None: 
      print author_name.text 
     else: 
      print "No Author Found"

我試圖找到一個文章的作者。因此，我查看了諸如class="author",class="author-name"等等或者rel=author等條目。如果我這樣做，那麼我的做法將以很多不同的if和else陳述結束。儘管我剛剛開始編碼，但這對我來說似乎並不高雅。你們能幫我解決如何更優雅地做到這一點嗎？

來源

2014-10-10 eLudium

你可以使用CSS selectors;這些讓你在一個字符串中指定多個選擇標準：

soup.select('.author, .author-name, [rel="author"]')

這將產生一個列表，循環會給你找到一個最讓你喜歡的也許是選項，或者你可以只使用next()函數來獲取第一：

for candidate in soup.select('.author, .author-name, [rel="author"]'): 
    if candidate.text: 
     author = candidate.text 
     break 
else: 
    print "No author found"

的soup.select()調用將包括匹配文檔順序，任何元件，因此，上述將發現第一個限定元件不管它如何合格;如果第一個在文檔中找到後者，它將不會喜歡.author-name而不是rel="author"。

來源

2014-10-10 09:30:26

真棒，工作很棒:) – eLudium 2014-10-10 10:23:52

這我怎麼會做到這一點：

results = [] 
results += soup.select('.author') 
results += soup.select('.author-name') 
results += soup.select('[rel=author]')

來源

2014-10-10 09:22:24

解析HTML並查找多個類和標籤 - 最優雅的方式？

回答

相關問題