蟒蛇feedparser不一致的項目

我執行這些行：蟒蛇feedparser不一致的項目

import feedparser 
url = 'https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/feed.xml' 
feed = feedparser.parse(url) 
items = feed['items'] 
print items[0]['links'][1]['href]

即採用這種feedparser module。這是有問題的RSS源的採樣區塊：

<item> 
    <title>More Android Annotations</title> 
    <link>http://youtu.be/77pPceVicNI</link> 
    <description><![CDATA[Walkthrough that goes a little bit more indepth to show you the things that <a href="http://androidannotations.org">AndroidAnnotations</a> can do for you as an application developer. <br /><a href="https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/StackSitesAnnotations.mp4">Direct download link <i>(rightclick and choose save as)</i></a>]]></description> 
    <image> 
     <url>https://dl.dropboxusercontent.com/u/5724095/images/Githubpics/moreAnnotations.png</url> 
     <link>https://github.com/FoamyGuy/StackSites</link> 
     <title>More Android Annotations</title> 
    </image> 
    </item>

我試圖獲得該項目的https://github.com/FoamyGuy/StackSites部分。在我的本地電腦上（win7 python 2.6），這個工作正常。但是當我在控制檯上執行相同的代碼時，我的github鏈接是pythonanywhere.com而不是我的github鏈接，我得到https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/StackSitesAnnotations.mp4這是在說明中CDATA結尾附近包含的mp4鏈接。

在兩臺機器上items[0]['links']只包含2個元素（索引0和1），但索引1處的字符串的值在兩臺機器上不同。爲什麼feedparser會在一臺機器上給我不同的值而不是另一臺呢？

我已經在pythonanywhere上打印了整個items[0]，並且我的github鏈接根本不包含在其中。是否有一些參數可以用來改變feed解析的方式，這樣我就可以正確地獲取github鏈接了嗎？

是否有一些其他的feed解析模塊對我更好，希望在機器上更一致？

來源

2013-05-25 FoamyGuy

它可能是某種地理位置的東西？ PythonAnywhere服務器在美國，也許你住在某個地方，服務器根據IP返回不同的結果？ – hwjp

我住在美國，（我認爲pythonanywhere是基於英國的）。但無論哪種方式，它不應該是一個地理定位問題，因爲有問題的XML是在我的控制之下，不應該根據地區而改變。 – FoamyGuy

已經嘗試用你的飼料，它看起來像每個項目在「鏈接」兩個條目，但是看起來他們是一致的不同 - 一個將有rel="alternate"，以及一個將rel="enclosure"

In [8]: items[0]['links'] 
Out[8]: 
[{'href': u'http://youtu.be/NL7szHeEiCs', 
    'rel': u'alternate', 
    'type': u'text/html'}, 
{u'href': u'https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/ButtonExample.mp4', 
    'rel': u'enclosure'}] 

In [9]: items[1]['links'] 
Out[9]: 
[{'href': u'http://youtu.be/77pPceVicNI', 
    'rel': u'alternate', 
    'type': u'text/html'}, 
{u'href': u'https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/StackSitesAnnotations.mp4', 
    'rel': u'enclosure'}]

那麼，你能用它來獲得你想要的那個嗎？

def get_alternate_link(item): 
    for link in item.links: 
     if link.get('rel') == 'alternate': 
      return link.get('href')

來源

2013-10-18 14:52:43 hwjp

今天晚些時候我可以測試它。我會告訴你。 – FoamyGuy

蟒蛇feedparser不一致的項目

回答

相關問題