試圖解析用python編寫的RSS閱讀器的提要

我仍然是一名python初學者。作爲一個實踐項目，我想編寫我自己的RSS閱讀器。我在這裏找到了一個有用的教程：learning python。我使用的教程中提供的代碼：試圖解析用python編寫的RSS閱讀器的提要

#! /usr/bin/env python  
import urllib2 
from xml.dom import minidom, Node 

""" Get the XML """ 
url_info = urllib2.urlopen('http://rss.slashdot.org/Slashdot/slashdot') 

if (url_info): 
    """ We have the RSS XML lets try to parse it up """ 
    xmldoc = minidom.parse(url_info) 
    if (xmldoc): 
     """We have the Doc, get the root node""" 
     rootNode = xmldoc.documentElement 
     """ Iterate the child nodes """ 
     for node in rootNode.childNodes: 
      """ We only care about "item" entries""" 
      if (node.nodeName == "item"): 
       """ Now iterate through all of the <item>'s children """ 
       for item_node in node.childNodes: 
        if (item_node.nodeName == "title"): 
         """ Loop through the title Text nodes to get 
         the actual title""" 
         title = "" 
         for text_node in item_node.childNodes: 
          if (text_node.nodeType == node.TEXT_NODE): 
           title += text_node.nodeValue 
         """ Now print the title if we have one """ 
         if (len(title)>0): 
          print title 

        if (item_node.nodeName == "description"): 
         """ Loop through the description Text nodes to get 
         the actual description""" 
         description = "" 
         for text_node in item_node.childNodes: 
          if (text_node.nodeType == node.TEXT_NODE): 
           description += text_node.nodeValue 
         """ Now print the title if we have one. 
         Add a blank with \n so that it looks better """ 
         if (len(description)>0): 
          print description + "\n" 
    else: 
     print "Error getting XML document!" 
else: 
    print "Error! Getting URL"<code>

一切都按預期工作，我首先想到了解它的一切。但是，當我使用另一個RSS源（例如「http://www.spiegel.de/schlagzeilen/tops/index.rss」）時，我從Eclipse IDE獲得了我的應用程序的「終止」錯誤。該錯誤消息，因爲我不知道究竟在哪裏和爲什麼應用程序終止。調試器沒有什麼幫助，因爲它忽略了我的斷點。那麼，這是另一個問題。

有人知道我在做什麼錯？

來源

2011-12-11 jacib

你可以嘗試做一個二進制搜索（通過註釋代碼）來隔離問題嗎？ –

我試過了。我剛剛知道編譯器不是錯誤消息，但我缺乏知識。 – jacib

好了「終止」的消息是不是一個錯誤，它只是信息Python有沒有錯誤退出。

你沒有做錯什麼，只是這個RSS閱讀器不是很靈活，因爲它只知道RSS的一個變種。

如果你比較Slashdot和明鏡在線的XML的文檔，你看到的文檔結構的差異：

Slashdot的：

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" ...> 
    <channel rdf:about="http://slashdot.org/"> 
    <title>Slashdot</title> 
    <!-- more stuff (but no <item>-tags) --> 
    </channel> 
    <item rdf:about="blabla"> 
    <title>The Condescending UI</title> 
    <!-- item data --> 
    </item> 
    <!-- more <item>-tags --> 
</rdf:RDF>

明鏡在線：

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?> 
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0"> 
    <channel> 
    <title>SPIEGEL ONLINE - Schlagzeilen</title> 
    <link>http://www.spiegel.de</link> 
    <item> 
     <title>Streit über EU-Veto: Vize Clegg meutert gegen britischen Premier Cameron</title> 
    </item> 
    <!-- more <item>-tags --> 
    <channel> 
</rss>

在Spiegel Online的所有<item>元素都在<channel>-tag中，但在slashdot feed中，它們在ro ot -tag（<rdf:RDF>）。而你的Python代碼只會在根目錄下標記 -tag。

如果你希望你的RSS閱讀器爲兩種物料的工作，例如，您可以更改以下行：

for node in rootNode.childNodes:

要的是：

for node in rootNode.getElementsByTagName('item'):

隨着所有<item>標籤都有效列舉，而不管它們在XML文檔中的位置。

來源

2011-12-11 14:55:06 vstm

感謝您的提示，現在它的作品。必須承認我的XML知識是低於標準的;） – jacib

如果沒有發生，也許一切是正確的在你的代碼，你就是不正確的元素:)

如果你有一個例外，試圖從從命令行啓動匹配：

python <yourfilename.py>

或者使用try/catch來捕獲異常，並打印錯誤：

try: 
    # your code 
catch Exception, e: 
    # print it 
    print 'My exception is', e

來源

2011-12-11 14:54:14 tito

你是對的代碼是正確的，但我的邏輯不是...... – jacib

試圖解析用python編寫的RSS閱讀器的提要

回答

相關問題