從Feed內容獲取圖標

我解析Feed URL並獲取內容。我想知道，如何獲取給定feed的favicon並在django模板中呈現。從Feed內容獲取圖標

我是django和python的新手。我不知道如何做到這一點。

我正在使用feedparser來解析供稿網址。

我正在使用以下代碼從文章內容中獲取url列表。 NOw，我應該如何獲取網站圖標，因爲在某些內容中的圖標呈現爲.png格式，並且有幾個.png鏈接。如何區分哪一個是favicon？

import feedparser 
import lxml.html as lh 
import urllib2 

#Import Feed for Parsing 
d = feedparser.parse("http://www.popgadget.net/atom.xml") 

# Print feed name 
print d['feed']['title'] 

# Determine number of posts and set range maximum 
posts = len(d['entries']) 

# Collect Post URLs 
for post in d['entries']: 
    link=post['link'] 
    print('Parsing {0}'.format(link)) 
    doc=lh.parse(urllib2.urlopen(link)) 
    imgs=doc.xpath('//img[@class="bpImage"]') 
    for img in imgs: 
     print(img.attrib['src'])

來源

2012-04-11 Anshuma

你應該去的網站的索引頁，讀取並解析HTML，然後看link標籤與"shortcut icon"一個rel。如果沒有，請查看服務器上的/favicon.ico。

來源

2012-04-11 05:54:56

ü可以編輯用U所提出的建議@Ignacio – Anshuma 2012-04-11 06:06:57

您可以從HTML文檔獲取圖標或在服務器上查找/favicon.ico。下面是代碼：

import lxml.html as lh 
import urllib2 

link = 'http://www.popgadget.net/' 
doc = lh.parse(urllib2.urlopen(link)) 
favicons = doc.xpath('//link[@rel="shortcut icon"]/@href') 
if len(favicons) > 0: 
    favicon = favicons[0] 
else: 
    favicon = "%sfavicon.ico" % link 
try: 
    urllib2.urlopen(favicon) 
except urllib2.HTTPError: 
    favicon = None

來源

2012-04-11 07:05:05 Irfan

感謝烏拉圭回合答覆解決方案上面的代碼。您的代碼適用於在www.techcrunch.com等html頁面上擁有圖標的頁面。但對於像popgadget.net這樣的網站，它們的網頁上沒有圖標，圖標不會被檢索到。我遇到了一個獲取網站圖標的應用程序。 [getfavicon]（http://getfavicon.appspot.com/）。即使像popgadget.net這樣的網站，它也會返回favicon圖片。並且在特殊情況下，返回默認圖標。 – Anshuma 2012-04-12 05:04:24

你試過這段代碼嗎？你是否注意到，如果在HTML中找不到favicon，它有一個回退？ – Irfan 2012-04-12 14:05:28

從Feed內容獲取圖標

回答

相關問題