2012-07-20 25 views
6

2天前我第一次被介紹給Python(以及一般編程)。今天我被卡住了。我花了幾個小時試圖找到答案,我懷疑是一個如此微不足道的問題,沒有人還沒有卡在這裏:)Python - 如何嵌套文件讀取循環?

老闆想讓我手動清理巨大的.xml文件變成更人性化的東西。我正在嘗試創建一個腳本來爲我做。以下是.xml文件的示例以及我所需的輸出。

輸入(File.xml):

<IssueTracking> 
    <Issue> 
    <SequenceNum>123</SequenceNum> 
    <Subject>Subject of Ticket 123</Subject> 
    <Description>Line 1 in Description field of Ticket 123. 
Line 2 in Description field of Ticket 123. 
Line 3 in Description field of Ticket 123.</Description> 
    </Issue> 
    <Issue> 
    <SequenceNum>124</SequenceNum> 
    <Subject>Subject of Ticket 124</Subject> 
    <Description>Line 1 in Description field of Ticket 124. 
Line 2 in Description field of Ticket 124. 
Line 3 in Description field of Ticket 124.</Description> 
    </Issue> 
</IssueTracking> 

所需的輸出:

123 Subject of Ticket 123 
Line 1 in Description field of Ticket 123. 
Line 2 in Description field of Ticket 123. 
Line 3 in Description field of Ticket 123. 

124 Subject of Ticket 124 
Line 1 in Description field of Ticket 124. 
Line 2 in Description field of Ticket 124. 
Line 3 in Description field of Ticket 124. 

這裏是我這麼遠。

with open(File.xml, 'r') as SourceFile: # Opens the file 
    while 1: # Keep going through the file to the end 
     SourceFileLine = SourceFile.readline() # Saves lines of the source file 
     if not SourceFileLine: # Skip empty lines 
      break 

     SourceFileLine = SourceFileLine.strip() # Strips the whitespace 

     if "<SequenceNum>" in SourceFileLine: 
      SequenceNum = SourceFileLine[13:-14] # Trims the tags, saves the field. 
      continue 

     if "<Subject>" in SourceFileLine: 
      Subject = SourceFileLine[9:-10] 
      continue 

     #if "<Description>" in SourceFileLine: 
     # last_pos = SourceFile.tell() 
     # while "</Description>" not in SourceFileLine: 
     #  SourceFile.seek(last_pos) 
     #  ????? 
     #  
     # Description = Description[22:] 
     # continue 

     if "</Issue>" in SourceFileLine: 
      print(SequenceNum, end = "\t") 
      print(Subject) 
     # print(Description) 
      print("\n") 

我被困在識別和<Description>標籤之間的三條線固定成一個字符串,我可以繼續沿着源文件之前打印。現在已經掃描了許多其他文件行讀取循環的例子,我懷疑我需要的是標記點到達目標字段並在文件中的該點嵌套另一個讀取循環。但我還沒有找到另一個這樣做的例子,所以我認爲我錯過了一些基本的東西,或者有更好的方法。預先感謝您的幫助!

+1

Python有一個內置的XML解析器:http://docs.python.org /library/pyexpat.html – 2012-07-20 19:24:14

+3

+1用於輸入,所需輸出以及您嘗試的內容。 – 2012-07-20 19:58:12

+0

您可能應該使用像YAML這樣的人性化序列化程序在您提取數據後輸出數據。你永遠不知道什麼時候需要再次處理這些數據。 – 2012-07-20 20:05:14

回答

7

使用LXML我強烈建議將您的數據的一個例子。 (注:爲Py2.x書面但容易爲Py3.x適應)

from lxml import etree 
xml = """<IssueTracking> 
    <Issue> 
    <SequenceNum>123</SequenceNum> 
    <Subject>Subject of Ticket 123</Subject> 
    <Description>Line 1 in Description field of Ticket 123. 
Line 2 in Description field of Ticket 123. 
Line 3 in Description field of Ticket 123.</Description> 
    </Issue> 
    <Issue> 
    <SequenceNum>124</SequenceNum> 
    <Subject>Subject of Ticket 124</Subject> 
    <Description>Line 1 in Description field of Ticket 124. 
Line 2 in Description field of Ticket 124. 
Line 3 in Description field of Ticket 124.</Description> 
    </Issue> 
</IssueTracking> 
""" 

root = etree.fromstring(xml) 
for issue in root.findall('Issue'): 
    as_list = [issue.find(n).text for n in ('SequenceNum', 'Subject', 'Description')] 
    as_list[2] = as_list[2].split('\n') 
    print as_list 

打印:

['123', 'Subject of Ticket 123', ['Line 1 in Description field of Ticket 123.', 'Line 2 in Description field of Ticket 123.', 'Line 3 in Description field of Ticket 123.']] 
['124', 'Subject of Ticket 124', ['Line 1 in Description field of Ticket 124.', 'Line 2 in Description field of Ticket 124.', 'Line 3 in Description field of Ticket 124.']] 
6

請不要閱讀這樣的XML文件,對於Python來說,有各種庫將幫助閱讀XML文件。

看看python庫lxml它提供了一種讀取和解析XML文件的非常簡單的方法,它將大大改善您的代碼。

我將解釋如何使用圖書館本身,而是它們的文檔遠不如我能擠進這個文本區域:http://lxml.de/tutorial.html

+0

謝謝,我會研究這個,並弄清楚。我感謝您的幫助。 – phlogiston 2012-07-20 19:48:54