感謝brilliant help on my XML parsing problem我得到了一個讓我迷失在實際處理XML元素(使用lxml)的問題。Python:我不明白XML迭代是如何工作的
我的數據是NMAP掃描輸出,由許多的記錄,像下面的:
<?xml version="1.0"?>
<?xml-stylesheet href="file:///usr/share/nmap/nmap.xsl" type="text/xsl"?>
<nmaprun scanner="nmap" args="nmap -sV -p135,12345 -oX 10.232.0.0.16.xml 10.232.0.0/16" start="1340201347" startstr="Wed Jun 20 16:09:07 2012" version="5.21" xmloutputversion="1.03">
<host>
<status state="down" reason="no-response"/>
<address addr="10.232.0.1" addrtype="ipv4"/>
</host>
<host starttime="1340201455" endtime="1340201930">
<status state="up" reason="echo-reply"/>
<address addr="10.232.49.2" addrtype="ipv4"/>
<hostnames>
<hostname name="host1.example.com" type="PTR"/>
</hostnames>
<ports>
<port protocol="tcp" portid="135">
<state state="open" reason="syn-ack" reason_ttl="123"/>
<service name="msrpc" product="Microsoft Windows RPC" ostype="Windows" method="probed" conf="10"/>
</port>
<port protocol="tcp" portid="12345">
<state state="open" reason="syn-ack" reason_ttl="123"/>
<service name="http" product="Trend Micro OfficeScan Antivirus http config" method="probed" conf="10"/>
</port>
</ports>
<times srtt="890" rttvar="2835" to="100000"/>
</host>
</nmaprun>
我期待在產生行時
- 12345端口是開放的或
- 端口135是開放的,12345是開放
我用這個下面的代碼,我與我的事情如何去理解說:
from lxml import etree
import time
scanTime = str(int(time.time()))
d = etree.parse("10.233.85.0.22.xml")
# find all hosts records
for el_host in d.findall("host"):
# only process hosts UP
if el_host.find("status").attrib["state"] =="up":
# here comes a piece of code which sets the variable hostname
# used later - that part works fine (removed for clarity)
# get the status of port 135 and 12345
Open12345 = Open135 = False
for el_port in el_host.findall("ports/port"):
# we are now looping thought the <port> records for a given <host>
if el_port.attrib["portid"] == "135":
Open135 = el_host.find("ports/port/state").attrib["state"] == "open"
if el_port.attrib["portid"] == "12345":
Open12345 = el_host.find("ports/port/state").attrib["state"] == "open"
# I want to get for port 12345 the description, so I search
# for <service> within a given port - only 12345 in my case
# I just search the first one as there is only one
# this is the place I am not sure I get right
el_service = el_host.find("ports/port/service")
if el_service.get("product") is not None:
Type12345 = el_host.find("ports/port/service").attrib["product"]
if Open12345:
print "%s %s \"%s\"\n" % (scanTime,hostname,Type12345)
if not Open12345 and Open135:
print "%s %s \"%s\"\n" % (scanTime,hostname,"NO_OfficeScan")
的地方我不知道在註釋中高亮顯示。使用此代碼,我始終匹配Microsoft Windows RPC,就好像我處於端口135的記錄內(它首先在端口12345之前的XML文件中)。
我相信這個問題在我瞭解find函數的某個地方。它可能匹配所有東西,與我所處的地點無關。換句話說,沒有遞歸(據我所知)。
在這種情況下,我該如何編碼「當您在端口12345的記錄中時獲取服務名稱」的概念?
謝謝。
編輯& SOLUTION:
我發現在我的代碼的問題。我轉貼整個腳本,如果有人在這個問題一天絆倒(輸出來自NMAP所以它可能是有趣的人重用 - 這一點,解釋的代碼如下:)的大塊:
#!/usr/bin/python
from lxml import etree
import time
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("file", help="XML file to parse")
args = parser.parse_args()
scanTime = str(int(time.time()))
d = etree.parse(args.file)
f = open("OfficeScanComplianceDSCampus."+scanTime,"w")
print "Parsing "+ args.file
# find all hosts records
for el_host in d.findall("host"):
# only process hosts UP
if el_host.find("status").attrib["state"] =="up":
# get the first hostname if it exists, otherwise IP
el_hostname = el_host.find("hostnames/hostname")
if el_hostname is not None:
hostname = el_hostname.attrib["name"]
else:
hostname = el_host.find("address").attrib["addr"]
# get the status of port 135 and 12345
Open12345 = Open135 = False
for el_port in el_host.findall("ports/port"):
# we are now looping thought the <port> records for a given <host>
if el_port.attrib["portid"] == "135":
Open135 = el_port.find("state").attrib["state"] == "open"
if el_port.attrib["portid"] == "12345":
Open12345 = el_port.find("state").attrib["state"] == "open"
# if port open get info about service
if Open12345:
el_service = el_port.find("service")
if el_service is None:
Type12345 = "UNKNOWN"
elif el_service.get("method") == "probed":
Type12345 = el_service.get("product")
else:
Type12345 = "UNKNOWN"
if Open12345:
f.write("%s %s \"%s\"\n" % (scanTime,hostname,Type12345))
if not Open12345 and Open135:
f.write("%s %s \"%s\"\n" % (scanTime,hostname,"NO_OfficeScan"))
if Open12345 and not Open135:
f.write("%s %s \"%s\"\n" % (scanTime,hostname,"Non-Windows with 12345"))
f.close()
我會還探討了Dikei和Ignacio Vazquez-Abrams提供的xpath想法。
謝謝大家!
爲什麼不使用XPath表達式來看看如果你關心存在的節點? –