2014-10-02 43 views
0

我想保存所有的內容在我的字典。除了最後的項目,我幾乎可以抓到所有東西。保存所有在Dict與Python的XML解析

我的Python代碼如下所示:

#!/usr/bin/python 

import xml.sax 

class MovieHandler(xml.sax.ContentHandler): 
    def __init__(self): 
     self.item = {} 
     self.CurrentData = "" 
     self.type = "" 
     self.format = "" 
     self.year = "" 
     self.rating = "" 
     self.stars = "" 
     self.description = "" 

    # Call when an element starts 
    def startElement(self, tag, attributes):  
     self.CurrentData = tag 
     if tag == "movie": 
     #if self.item: 
     print self.item 
     print "*****Movie*****" 
     title = attributes["title"] 
     print "Title:", title 

    # Call when an elements ends 
    def endElement(self, tag): 
     if self.CurrentData == "type": 
     self.item["type"] = self.type 
     #print "Type:", self.type 
     elif self.CurrentData == "format": 
     self.item["format"] = self.format 
     #print "Format:", self.format 
     elif self.CurrentData == "year": 
     self.item["year"] = self.year 
     #print "Year:", self.year 
     elif self.CurrentData == "rating": 
     self.item["rating"] = self.rating 
     #print "Rating:", self.rating 
     elif self.CurrentData == "stars": 
     self.item["stars"] = self.stars 
     #print "Stars:", self.stars 
     elif self.CurrentData == "description": 
     self.item["description"] = self.description 
     #print "Description:", self.description 
     self.CurrentData = "" 

    # Call when a character is read 
    def characters(self, content): 
     if self.CurrentData == "type": 
     self.type = content 
     elif self.CurrentData == "format": 
     self.format = content 
     elif self.CurrentData == "year": 
     self.year = content 
     elif self.CurrentData == "rating": 
     self.rating = content 
     elif self.CurrentData == "stars": 
     self.stars = content 
     elif self.CurrentData == "description": 
     self.description = content 

if (__name__ == "__main__"): 

    # create an XMLReader 
    parser = xml.sax.make_parser() 
    # turn off namepsaces 
    parser.setFeature(xml.sax.handler.feature_namespaces, 0) 

    # override the default ContextHandler 
    Handler = MovieHandler() 
    parser.setContentHandler(Handler) 

    parser.parse("movies.xml") 

我的XML文件是這樣的:

<collection shelf="New Arrivals"> 
<movie title="Enemy Behind"> 
    <type>War, Thriller</type> 
    <format>DVD</format> 
self.date = "" <year>2003</year> 
    <rating>PG</rating> 
    <stars>10</stars> 
    <description>Talk about a US-Japan war</description> 
</movie> 
<movie title="Transformers"> 
    <type>Anime, Science Fiction</type> 
    <format>DVD</format> 
    <year>1989</year> 
    <rating>R</rating> 
    <stars>8</stars> 
    <description>A schientific fiction</description> 
</movie> 
    <movie title="Trigun"> 
    <type>Anime, Action</type> 
    <format>DVD</format> 
    <episodes>4</episodes> 
    <rating>PG</rating> 
    <stars>10</stars> 
    <description>Vash the Stampede!</description> 
    </movie> 
<movie title="Ishtar"> 
    <type><![CDATA[Neuilly-sur-Seine]]></type> 
    <format>VHS</format> 
    <rating>PG</rating> 
    <stars>2</stars> 
    <description>Viewable boredom</description> 
</movie> 
</collection> 

在最後的標籤,我目前沒有任何信息。

我該如何解決這個問題。 在此先感謝。

+1

也許你應該開始說預計的輸出會是什麼樣子?很有可能你最好使用不同類型的解析器(例如'xml.etree.ElementTree') – mgilson 2014-10-02 16:19:33

回答

0

我認爲如果你使用ElementTree這個問題就簡化了。例如:

import xml.etree.ElementTree as ET 
tree = ET.fromstring(s) # s is a string with the xml data. 
movies = tree.iter('movie') 
dct = {} 
for element in movies: 
    dct[element.attrib['title']] = element 
print dct # {'Transformers': <Element 'movie' at 0x7f8f40d6e750>, 'Ishtar': <Element 'movie' at 0x7f8f40d6eb50>, 'Enemy Behind': <Element 'movie' at 0x7f8f40d6e2d0>, 'Trigun': <Element 'movie' at 0x7f8f40d6e990>} 
print {element.tag: element.text for element in dct['Transformers']} # {'rating': 'R', 'description': 'A schientific fiction', 'format': 'DVD', 'stars': '8', 'year': '1989', 'type': 'Anime, Science Fiction'} 

從這裏,希望這不是太難以修改,以適合您的需求......當你有,你需要反覆解析大文件

注意sax真正的亮點。如果你想一次存儲所有的數據,那麼通常ElementTree可以讓事情變得更簡單。