請考慮這種方法:
from bs4 import BeautifulSoup
with open('test.xml') as raw_resuls:
results = BeautifulSoup(raw_resuls, 'lxml')
for element in results.find_all("tag"):
for stat in element.find_all("stat"):
print(stat['pass'])
您的解決方案的問題是,通包含在統計而不是在標籤,你搜索。
該解決方案搜索所有標籤在這些標籤它搜索統計。從這些結果中得到通過。
XML文件
<tag>
<stat fail="0" pass="1">TR=111111 Sandbox=3000613</stat>
<stat fail="0" pass="1">TR=121212 Sandbox=3000618</stat>
<stat fail="0" pass="1">TR=999999 Sandbox=3000617</stat>
</tag>
上面的腳本得到的輸出
1
1
1
加成
由於一些detailes似乎仍然不清楚(見註釋)考慮這個完整解決方法是使用BeautifulSoup
來獲得所需的一切。如果您遇到性能問題,則使用字典作爲列表元素的解決方案可能並不完美。但是,由於您似乎在使用Python和Soup時遇到了一些麻煩,因此我認爲通過提供按名稱而不是索引訪問所有相關信息的可能性,儘可能簡化了此示例。
from bs4 import BeautifulSoup
# Parses a string of form 'TR=abc123 Sandbox=abc123' and stores it in a dictionary with the following
# structure: {'TR': abc123, 'Sandbox': abc123}. Returns this dictionary.
def parseTestID(testid):
dict = {'TR': testid.split(" ")[0].split("=")[1], 'Sandbox': testid.split(" ")[1].split("=")[1]}
return dict
# Parses the XML content of 'rawdata' and stores pass value, TR-ID and Sandbox-ID in a dictionary of the
# following form: {'Pass': pasvalue, TR': TR-ID, 'Sandbox': Sandbox-ID}. This dictionary is appended to
# a list that is returned.
def getTestState(rawdata):
# initialize parser
soup = BeautifulSoup(rawdata,'lxml')
parsedData= []
# parse for tags
for tag in soup.find_all("tag"):
# parse tags for stat
for stat in tag.find_all("stat"):
# store everthing in a dictionary
dict = {'Pass': stat['pass'], 'TR': parseTestID(stat.string)['TR'], 'Sandbox': parseTestID(stat.string)['Sandbox']}
# append dictionary to list
parsedData.append(dict)
# return list
return parsedData
您可以使用上面如下做任何你想要的腳本(如剛剛打印出來)
# open file
with open('test.xml') as raw_resuls:
# get list of parsed data
data = getTestState(raw_resuls)
# print parsed data
for element in data:
print("TR = {0}\tSandbox = {1}\tPass = {2}".format(element['TR'],element['Sandbox'],element['Pass']))
輸出看起來像這樣
TR = 111111 Sandbox = 3000613 Pass = 1
TR = 121212 Sandbox = 3000618 Pass = 1
TR = 222222 Sandbox = 3000612 Pass = 1
TR = 232323 Sandbox = 3000618 Pass = 1
TR = 333333 Sandbox = 3000605 Pass = 1
TR = 343434 Sandbox = ZZZZZZ Pass = 1
TR = 444444 Sandbox = 3000604 Pass = 1
TR = 454545 Sandbox = 3000608 Pass = 1
TR = 545454 Sandbox = XXXXXX Pass = 1
TR = 555555 Sandbox = 3000617 Pass = 1
TR = 565656 Sandbox = 3000615 Pass = 1
TR = 626262 Sandbox = 3000602 Pass = 1
TR = 666666 Sandbox = 3000616 Pass = 1
TR = 676767 Sandbox = 3000599 Pass = 1
TR = 737373 Sandbox = 3000603 Pass = 1
TR = 777777 Sandbox = 3000611 Pass = 1
TR = 787878 Sandbox = 3000614 Pass = 1
TR = 828282 Sandbox = 3000600 Pass = 1
TR = 888888 Sandbox = 3000610 Pass = 1
TR = 999999 Sandbox = 3000617 Pass = 1
讓我們summerize的核心要素被使用:
查找XML標記 要查找使用soup.find("tag")
的XML標籤,將返回第一個匹配的標籤或soup.find_all("tag")
,該標籤會查找所有匹配的標籤並將它們存儲在列表中。通過迭代列表可以輕鬆訪問單個標籤。
查找嵌套標籤 要發現你可以通過將其應用到第一find_all()
的結果再次使用find()
或find_all()
嵌套的標籤。
訪問標籤 的內容要訪問標籤的內容應用string
到單個標籤。例如,如果tag = <tag>I love Soup!</tag>
tag.string = "I love Soup!"
。
查找屬性值 要獲取屬性的值,可以使用下標符號。例如,如果tag = <tag color=red>I love Soup!</tag>
tag['color']="red"
。
用於解析表格"TR=abc123 Sandbox=abc123"
的字符串我使用了常見的Python字符串分割。你可以在這裏閱讀更多關於它:How can I split and parse a string in Python?
我明白了,我現在明白了,完全合理!它現在工作得很好,感謝它! 如果可以問,我還有一個問題:因爲我只有一個'tag'屬性,是否需要for循環?如果不是,我該如何直接去看那個'tag'屬性?謝謝! – Xour
我可以幫助你!你可以通過upvoting和接受它作爲正確答案來顯示這個答案滿足你的需求http://stackoverflow.com/help/someone-answers – datell
不能upvote,沒有足夠的代表:( – Xour