2011-08-02 67 views
2

我曾經提出過類似的問題,但是這個問題稍有不同。我想用python查找和替換XML標籤。我正在使用XML作爲一些GIS shapefile的元數據上傳。在元數據編輯器中,我可以選擇在收集某些數據時選擇日期。選項是'單日期','多個日期'和'日期範圍'。在第一個包含日期範圍標籤的XML中,您將看到帶有一些子元素「begdate」,「begtime」,「enddate」和標籤的「rngdates」。我想編輯這些標籤,使其看起來像包含多個單日期的第二個XML。新標籤是'mdattim','sngdate'和'caldate'。我希望這很清楚,但如果需要,請索取更多信息。 XML是一個奇怪的野獸,我還沒有完全理解它。使用Python查找並替換XML中的標籤

謝謝, 邁克

首先XML:

<idinfo> 
    <citation> 
    <citeinfo> 
     <origin>My Company Name</origin> 
     <pubdate>05/04/2009</pubdate> 
     <title>Feature Class Name</title> 
     <edition>0</edition> 
     <geoform>vector digital data</geoform> 
     <onlink>.</onlink> 
    </citeinfo> 
    </citation> 
<descript> 
    <abstract>This dataset represents the GPS location of inspection points collected in the field for the Site Name</abstract> 
    <purpose>This dataset was created to accompany the clients Assessment Plan. This point feature class represents the location within the area that the field crews collected related data.</purpose> 
</descript> 
<timeperd> 
<timeinfo> 
    <rngdates> 
    <begdate>7/13/2010</begdate> 
    <begtime>unknown</begtime> 
    <enddate>7/15/2010</enddate> 
    <endtime>unknown</endtime> 
    </rngdates> 
</timeinfo> 
<current>ground condition</current> 
</timeperd> 

二XML:

<idinfo> 
    <citation> 
    <citeinfo> 
     <origin>My Company Name</origin> 
     <pubdate>03/07/2011</pubdate> 
     <title>Feature Class Name</title> 
     <edition>0</edition> 
     <geoform>vector digital data</geoform> 
     <onlink>.</onlink> 
    </citeinfo> 
    </citation> 
<descript> 
    <abstract>This dataset represents the GPS location of inspection points collected in the field for the Site Name</abstract> 
    <purpose>This dataset was created to accompany the clients Assessment Plan. This point feature class represents the location within the area that the field crews collected related data.</purpose> 
</descript> 
<timeperd> 
<timeinfo> 
    <mdattim> 
    <sngdate> 
     <caldate>08-24-2009</caldate> 
     <time>unknown</time> 
    </sngdate> 
    <sngdate> 
     <caldate>08-26-2009</caldate> 
    </sngdate> 
    <sngdate> 
     <caldate>08-26-2009</caldate> 
    </sngdate> 
    <sngdate> 
     <caldate>07-07-2010</caldate> 
    </sngdate> 
    </mdattim> 
</timeinfo> 

這是到目前爲止我Python代碼:

folderPath = "Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009" 

for filename in glob.glob(os.path.join(folderPath, "*.xml")): 

    fullpath = os.path.join(folderPath, filename) 

    if os.path.isfile(fullpath): 
     basename, filename2 = os.path.split(fullpath) 

     root = ElementTree(file=r"Z:\ESRI\Figure_Sourcing\Figures\Metadata\Run_Metadata_2009\\" + filename2) 

     iter = root.getiterator() 
     #Iterate 
     for element in iter: 
      print element.tag 

      if element.tag == "begdate": 
       element.tag.replace("begdate", "sngdate") 
+2

爲什麼不使用XSLT? – GaretJax

+3

另外,向我們展示將一個轉換爲另一個的規則。即顯示輸入和從該輸入生成的預期輸出。 –

+0

第一個XML是輸入。我有一些模板XML,它們在特定標籤之間嵌入了關鍵字。第二個是我手動編輯的輸出。我想編輯第一個XML,以便第一個XML中時間信息標記之間的所有內容都被第二個XML中相同標記之間的所有內容替換。我使用的是Python,因爲這是一個ArcGIS函數,python是首選語言。我將這個腳本與他們的python工具一起使用。我的腳本將用於批處理XML,以用作大量GIS shapefile中的元數據...... – Mike

回答

1

我BEL即使我成功地使代碼工作。這將允許您編輯某些標記,如果您需要從現有的XML文件中進行更改。我需要這樣做,以便在批處理腳本中爲某些GIS shapefile創建元數據,以根據它們是單日期,多日期還是日期範圍來更改某些日期值。

本網頁幫助了很多:http://lxml.de/tutorial.html

我有一些更多的工作要做,但是這是我一直在尋找從我原來的問題的答案:)我敢肯定,這可以在許多其它應用中使用。

# Set workspace location for XML files 
folderPath = "Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009" 
# Loop through each file and search for files with .xml extension 
for filename in glob.glob(os.path.join(folderPath, "*.xml")): 

    fullpath = os.path.join(folderPath, filename) 

    # Split file name from the directory path 
    if os.path.isfile(fullpath): 
     basename, filename2 = os.path.split(fullpath) 
     # Set variable to XML files 
     root = ElementTree(file=r"Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009\\" + filename2) 

     # Set variable for iterator 
     iter = root.getiterator() 
     #Iterate through the tags in each XML file 
     for element in iter: 
      if element.tag == "timeinfo": 
       tree = root.find(".//timeinfo") 
       # Clear all tags below the "timeinfo" tag 
       tree.clear() 
       # Append new Element 
       element.append(ET.Element("mdattim")) 
       # Create SubElements to the parent tag 
       child1 = ET.SubElement(tree, "sngdate") 
       child2 = ET.SubElement(child1, "caldate") 
       child3 = ET.SubElement(child1, "time") 
       # Set text values for tags 
       child2.text = "08-24-2009" 
       child3.text = "unknown