2016-11-16 36 views
0

我有以下代碼嘗試解析XML文件,使其從外部文本文件(如果找到)讀取並將其內容插入新引入的標記並保存新的XML文件與所產生的操作。如何將文本中的文本插入到新的XML標記中

的代碼看起來是這樣的:

try: 
    import xml.etree.cElementTree as ET 
except ImportError: 
    import xml.etree.ElementTree as ET 
import os 

# define our data file 
data_file = 'test2_of_2016-09-19.xml' 

tree = ET.ElementTree(file=data_file) 
root = tree.getroot() 

for element in root: 
    if element.find('File_directory') is not None: 
     directory = element.find('File_directory').text 
    if element.find('Introduction') is not None: 
     introduction = element.find('Introduction').text 
    if element.find('Directions') is not None: 
     directions = element.find('Directions').text 

for element in root: 
    if element.find('File_directory') is not None: 
     if element.find('Introduction') is not None: 
      intro_tree = directory+introduction 
      with open(intro_tree, 'r') as f: 
       intro_text = f.read() 
      f.closed 
      intro_body = ET.SubElement(element,'Introduction_Body') 
      intro_body.text = intro_text 
     if element.find('Directions') is not None: 
      directions_tree = directory+directions 
      with open(directions_tree, 'r') as f: 
       directions_text = f.read() 
      f.closed 
      directions_body = ET.SubElement(element,'Directions_Body') 
      directions_body.text = directions_text 

tree.write('new_' + data_file) 

的問題是,它似乎像上次發現file_directory,引進的情況下,和方向被保存並傳播到多個項目,這是不希望因爲每個進入有自己的個人記錄可以這麼說。

源XML文件是這樣的:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> 
<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
    <Row> 
     <Entry_No>1</Entry_No> 
     <Waterfall_Name>Bridalveil Fall</Waterfall_Name> 
     <File_directory>./waterfall_writeups/1_Bridalveil_Fall/</File_directory> 
     <Introduction>introduction-bridalveil-fall.html</Introduction> 
     <Directions>directions-bridalveil-fall.html</Directions> 
    </Row> 
    <Row> 
     <Entry_No>52</Entry_No> 
     <Waterfall_Name>Switzer Falls</Waterfall_Name> 
     <File_directory>./waterfall_writeups/52_Switzer_Falls/</File_directory> 
     <Introduction>introduction-switzer-falls.html</Introduction> 
     <Directions>directions-switzer-falls.html</Directions> 
    </Row> 
</Root> 

所需的輸出XML應該是這樣的:

<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
    <Row> 
     <Entry_No>1</Entry_No> 
     <Waterfall_Name>Bridalveil Fall</Waterfall_Name> 
     <File_directory>./waterfall_writeups/1_Bridalveil_Fall/</File_directory> 
     <Introduction>introduction-bridalveil-fall.html</Introduction> 
     <Directions>directions-bridalveil-fall.html</Directions> 
     <Introduction_Body>Text from ./waterfall_writeups/1_Bridalveil_Fall/introduction-bridalveil-fall.html</Introduction_Body> 
     <Directions_Body>Text from ./waterfall_writeups/1_Bridalveil_Fall/directions-bridalveil-fall.html</Directions_Body> 
    </Row> 
    <Row> 
     <Entry_No>52</Entry_No> 
     <Waterfall_Name>Switzer Falls</Waterfall_Name> 
     <File_directory>./waterfall_writeups/52_Switzer_Falls/</File_directory> 
     <Introduction>introduction-switzer-falls.html</Introduction> 
     <Directions>directions-switzer-falls.html</Directions> 
     <Introduction_Body>Text from ./waterfall_writeups/52_Switzer_Falls/introduction-switzer-falls.html</Introduction_Body> 
     <Directions_Body>Text from ./waterfall_writeups/52_Switzer_Falls/directions-switzer-falls.html</Directions_Body> 
    </Row> 
</Root> 

但我最終得到的是:

<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
    <Row> 
     <Entry_No>1</Entry_No> 
     <Waterfall_Name>Bridalveil Fall</Waterfall_Name> 
     <File_directory>./waterfall_writeups/1_Bridalveil_Fall/</File_directory> 
     <Introduction>introduction-bridalveil-fall.html</Introduction> 
     <Directions>directions-bridalveil-fall.html</Directions> 
     <Introduction_Body>Text from ./waterfall_writeups/52_Switzer_Falls/introduction-switzer-falls.html</Introduction_Body> 
     <Directions_Body>Text from ./waterfall_writeups/52_Switzer_Falls/directions-switzer-falls.html</Directions_Body> 
    </Row> 
    <Row> 
     <Entry_No>52</Entry_No> 
     <Waterfall_Name>Switzer Falls</Waterfall_Name> 
     <File_directory>./waterfall_writeups/52_Switzer_Falls/</File_directory> 
     <Introduction>introduction-switzer-falls.html</Introduction> 
     <Directions>directions-switzer-falls.html</Directions> 
     <Introduction_Body>Text from ./waterfall_writeups/52_Switzer_Falls/introduction-switzer-falls.html</Introduction_Body> 
     <Directions_Body>Text from ./waterfall_writeups/52_Switzer_Falls/directions-switzer-falls.html</Directions_Body> 
    </Row> 
</Root> 

順便說一句,有沒有什麼方法可以引入主體標籤的內容,而不需要將其全部打印在一行上(對於readab ility)?

回答

0

在你的文檔的元素的第一for循環迭代,分別分配給您的directoryintroductiondirections變量的新值,每次迭代,從去年發生元素的值結束了。

我會做的是創建一個字典將標籤名稱映射到文本內容,然後使用該映射來即時添加新的子元素。示例(無需讀取參考文件):

for row in root: 
    elements = {} 
    for node in row: 
     elements[node.tag] = node.text 

    directory = elements['File_directory'] 

    intro_tree = directory + elements['Introduction'] 
    intro_body = ET.SubElement(row, 'Introduction_Body') 
    intro_body.text = 'Text from %s' % intro_tree 

    directions_tree = directory + elements['Directions'] 
    directions_body = ET.SubElement(row, 'Directions_Body') 
    directions_body.text = 'Text from %s' % directions_tree 
+0

感謝您的回答。事實證明,我發現我做錯了什麼,並通過重新安排循環的方式來修復它,但按照您的建議創建一本字典是有道理的。 – Johnny

相關問題