2015-12-30 25 views
0

掃描目錄樹併產生xml。我嘗試了很多東西,但失敗了。有什麼代碼在python中掃描目錄樹併產生xml

對於Ex XML文件結構。

<dir name="dir_A"> 
    <dir name="dir_AA"> 
     <file name="abc.doc"/> 
    </dir> 
    <dir name="dir_BA"> 
     <dir name="dir_BAA"> 
      <file name="abc.doc"/> 
     </dir> 
     <file name="abc.doc"/> 
    </dir> 
</dir> 

我的代碼,我嘗試,但它不是完整的。我在開發過程中刪除了一些代碼,我現在沒有,對不起。

import xml.etree.ElementTree as ET 
import os 

class XMLOperations: 

    def list_files(self, startpath): 

     parent = None 
     prevLevel = None 
     xmlRoot = ET.Element("root") 

     xmlRoot.set('xml','http://www.google.com') 
     xmlRoot.set('xmlns','http://www.w3.org/1999/xlink') 

     directory = ET.Element("directory") 
     elementFile = ET.Element("file") 

     for root, dirs, files in os.walk(startpath): 

      level = root.replace(startpath, '').count(os.sep) 
      current = os.path.basename(root) 

      try: 
       dir_name = root.split(startpath+"/")[1] 
      except: 
       continue 

      depth = dir_name.count(os.sep) 
      fList = dir_name.split(os.sep) 

      if level == 0: 
       ET.SubElement(xmlRoot, directory, name = current) 

      else: 
       for tags in fList: 
        ET.SubElement(xmlRoot, directory) 
      if depth > 3: 
       break 

     #with open("output.xml",'w') as file: 
     # file.write(xmlRoot) 

     ET.dump(xmlRoot) 

謝謝。

+0

使用LXML升圖書館,你可以添加你嘗試的代碼嗎? –

+0

我認爲這個問題表達得很差,但內容有效。爲了防止進一步downvoting重命名爲掃描目錄樹,併產生xml – Pynchia

+0

@VivekSable我有一些代碼,我在這裏分享。我不使用lxml。 –

回答

1
  1. 使用os.walk獲取根目錄(當前目錄),當前目錄的目錄列表,文件列表當前目錄。
  2. 通過lxml Parser創建xml根元素。
  3. 迭代目錄結構os.walk
  4. 這很重要:通過xpath獲取當前目錄的父元素,因此從當前位置路徑創建xpath。例如xapth = "/dir[@name='dir_9']"xapth = "/dir[@name='dir_9']/dir[@name='dir_apache']"
  5. 按目錄列表追加dir元素。
  6. 通過文件列表添加'文件'元素。

輸入

>>> p = '/home/vivek/Desktop/9' 
>>> import os 
>>> for root, dires,files in os.walk(p): 
... print root 
... print dires 
... print files 
... print "="*10 
... 
/home/vivek/Desktop/9 
['apache', 'i18n', 'templates', 'common'] 
['manage.py', 'urls.py', 'settings.py', '__init__.py'] 
========== 
/home/vivek/Desktop/9/apache 
[] 
['readeradmin.wsgi'] 
========== 
/home/vivek/Desktop/9/i18n 
[] 
['__init__.py', 'models.py'] 
========== 
/home/vivek/Desktop/9/templates 
['admin', 'registration'] 
[] 
========== 
/home/vivek/Desktop/9/templates/admin 
[] 
['base_site.html'] 
========== 
/home/vivek/Desktop/9/templates/registration 
[] 
['logged_out.html'] 
========== 
/home/vivek/Desktop/9/common 
[] 
['views.py', '__init__.py', 'idmaptoalpha.py', 'tests.py', 'models.py'] 
========== 

代碼:

import os 
import lxml.etree as PARSER 
xml_root = PARSER.Element("dir", {"name":"dir_"+os.path.basename(p)}) 

base_location = os.path.dirname(p) + "/" 
for root, dires, files in os.walk(p): 
    # Get Parent by xpth 
    xpath_tmp = root.split(base_location)[1] 
    xpath_p = "" 
    for i in xpath_tmp.split("/"): 
     xpath_p = "%s/dir[@name='dir_%s']"%(xpath_p, i) 

    parent = xml_root.xpath(xpath_p)[0] 
    #- Append directory to parent element. 
    for i in dires: 
     parent.append(PARSER.Element("dir", {"name":"dir_"+i})) 
    #- Append files to parent element. 
    for i in files: 
     parent.append(PARSER.Element("file", {"name":i})) 


print PARSER.tostring(xml_root, method="xml", pretty_print=True) 

輸出

<dir name="dir_9"> 
    <dir name="dir_apache"> 
    <file name="readeradmin.wsgi"/> 
    </dir> 
    <dir name="dir_i18n"> 
    <file name="__init__.py"/> 
    <file name="models.py"/> 
    </dir> 
    <dir name="dir_templates"> 
    <dir name="dir_admin"> 
     <file name="base_site.html"/> 
    </dir> 
    <dir name="dir_registration"> 
     <file name="logged_out.html"/> 
    </dir> 
    </dir> 
    <dir name="dir_common"> 
    <file name="views.py"/> 
    <file name="__init__.py"/> 
    <file name="idmaptoalpha.py"/> 
    <file name="tests.py"/> 
    <file name="models.py"/> 
    </dir> 
    <file name="manage.py"/> 
    <file name="urls.py"/> 
    <file name="settings.py"/> 
    <file name="__init__.py"/> 
</dir> 
+0

很棒,xpath :) – Joseph