2014-09-02 102 views
2

我有一個結構類似這樣長的XML文檔:解析XML使用Python的ElementTree

<carrierData> 
    <inspections> 
     <inspection inspection_date="2013-01-16" report_state="TX" report_number="TX130G0ELJ05" level="1" time_weight="1"> 
      <drivers> 
       <driver driver_type="Primary Driver" first_name="JOHN" last_name="SMITH" date_of_birth="1962-11-20" license_state="TX" License_number="12345678"/> 
       <driver driver_type="CoDriver"/> 
      </drivers> 
      <vehicles> 
       <vehicle unit="1" vehicle_id_number="2HSCAAXN02C039269" unit_type="Truck Tractor" license_state="TX" license_number="1B13577"/> 
       <vehicle unit="2" vehicle_id_number="1GRAA76228S702393" unit_type="Semi-Trailer" license_state="TX" license_number="X99757"/> 
      </vehicles> 
      <violations> 
       <violation code="393.11" description="No/defective lighting devices/reflective devices/projected" oos="N" time_severity_weight="3" BASIC="Vehicle Maint."/> 
       <violation code="393.53(b)" description="Automatic brake adjuster CMV manufactured on or after 10/20/1994 - air brake" oos="N" time_severity_weight="4" BASIC="Vehicle Maint."/> 
       <violation code="393.47(e)" description="Clamp/Roto-Chamber type brake(s) out of adjustment" oos="N" time_severity_weight="4" BASIC="Vehicle Maint."/> 
       <violation code="396.3(a)(1)" description="Inspection/repair and maintenance parts and accessories" oos="N" time_severity_weight="2" BASIC="Vehicle Maint."/> 
      </violations> 
    </inspection> 

我需要通過檢驗報告號的列表進行迭代並打印相關聯的每個驅動器的第一個和最後一個名字列表中的每個數字。我使用Python的ElementTree解析XML,雖然我沒有與下面的代碼時收到錯誤,它沒有給我任何結果之一:

import xml.etree.ElementTree as ET 

codes = ['TX3YZ8HQE1X1', 'TX3YAEHQE15W', 'KS00YQ008857', 'TX43D99DAN33', 'NM3267100378', 
     'COPF31000853', 'TX3ZYF0MUQ6F', 'TX3ZFC0MHXLU', 'TX3Z760MGU0H', 'TX3YGG0MUQ1R', 
     'TX3YBD0MUI0A', 'TX3XPF0MKQYG', 'TX3X8F0MHXA7', 'AZ0160001581', 'TX3WC40ADYGZ', 
     'ID6300005350', 'TX3VV50ADUOI', 'TX137S0ELO02', 'UTCE03208119', 'UTCE03208119', 
     'TX3UTG0MJKDL', 'TX3UD60MIJU5', 'TX13690EBI05', 'TX3U4E0AFA94', 'TX3U4E0AFA94', 
     'TX3T5F0MIJMH', 'TX13550BKL02', 'TX3SLE0MIJGZ', 'TX3SLE0MIJGZ', 'TX3S8D0AFH3D', 
     'UTCE03207947', 'TX133Q0ENG01', 'TX133Q0ENG01', 'TX133Q0ENG01', 'TX3REM0MHEK3', 
     'ID0000169042', 'COPF05000200', 'TX13280EPV0B', 'TX131S9DAB02', 'CO1E19000017', 
     'TX3PD60WAA4L', 'TX1317W1NW07', 'CO2D02000044', 'LALAEQ001266', 'TX130H0EBT06', 
     'TX3NW10ABLMK', 'NV7233010192', 'NV4045000998', 'CO3301000406', 'CO5C01000218', 
     'TX12949DBU03', 'FL1619000314', 'TX12929DIE02', 'TX128X0AAP01', 'TX128A9DHA07', 
     'CO2B01000061', 'TX1274W1DV01', 'TX126Z9DCM01', 'TX127U9DBV01', 'TX127U9DBV01', 
     'TX127R9DIZ02', 'TX127K9DCQ06', 'AZ0YDG000141', 'NV7196001031', 'TX126B0FJZ01', 
     'TX126I9DAN01', 'LALACV003777', 'CO2B12000014', 'TX12650HTB01', 'ID0000220955'] 

tree = ET.parse("C:\All_BASICs_07-25-2014.xml") 
root = tree.getroot() 

for x in codes: 
    for node in tree.iter('inspection'): 
     if ['report_id'] == [x]: 
      name = node.attrib.get('first_name','last_name') 
      print name 

我是一名編程新手,所以我可能會丟失這裏有一些顯而易見的東西,但沒有任何錯誤可供參考,我在追查問題時遇到了困難。

回答

0

你對這條線做了什麼?

if ['report_id'] == [x]: 

有了這個代碼,你正在測試['report_id'] == ['TX3YZ8HQE1X1']['report_id'] == ['TX3YAEHQE15W']等,這些將永遠是正確的。所以這就是爲什麼你的代碼正在退出而沒有打印任何內容或發生錯誤。

您發佈的XML中沒有任何名爲report_id的內容,您的意思是report_number

如果你想抓住主要的駕駛員的名字在codes列表中的所有report_number的,嘗試這樣的事情:

for x in codes: 
    for node in tree.iter('inspection'): 
     if node.attrib['report_number'] == x: 
      primary_driver = [d for d in node.iter('driver') if d.attrib['driver_type'] == "Primary Driver"] 
      primary_driver = primary_driver[0] 
      first_name = primary_driver.attrib['first_name'] 
      last_name = primary_driver.attrib['last_name'] 
      print first_name, last_name 

然而,有與此代碼一個性能問題。對於codes中的每個代碼,您正在遍歷整個XML文檔。這有complexityO(number_of_codes * number_of_records)這是O(N**2)。您可以在步驟O(N)中執行此操作,而不是在文檔上循環一次,然後使用set確定是否應包含記錄。

+0

謝謝,先生!那樣做了!我在代碼中確實擁有正確的'report_number'屬性,但當我第一次輸入時,我認爲在那裏有'report_id',所以很抱歉在那裏發生混亂。否則,它給了我所需要的東西,看到這裏的答案讓我更準確地理解了我正在嘗試做什麼。上面也提到了使用set()函數,並且我確實實現了這一效果。再次感謝! – jerodestapa 2014-09-02 16:56:57