2016-02-01 34 views
1

我有一系列的我的驅動器上的XML文件,我想要做以下:如何使用lxml從磁盤加載.xml文件作爲元素樹?

  • 加載到lxml去的元素樹和使用XPath解析
  • 加載另一個XML文件的元素樹,並與解析XPath查找正確的位置附加信息
  • 我從一系列XML文件應設爲變量,這樣我就可以追加回到大.xml文件
之前運行的結果有一定的邏輯分析的信息

我在文件類型的一些問題/將XML文件作爲元素樹正確加載,以便它們可以通過lxml進行操作。我嘗試了幾種不同的方法,但仍遇到各種問題。目前存在的問題如下:

TypeError: Argument '_parent' has incorrect type (expected lxml.etree._Element, got list)

from lxml import etree 
from lxml import html 
import requests 

file = 'bgg.xml' 
# parse the xml file from disk as an element tree in lxml? 
treebgg = etree.parse(file) 

# create a list of IDs to iterate through from the bgg.xml file 
gameList = treebgg.xpath("//root/BGG/@ID") 

# iterate through the IDs 
for x in reversed(gameList): 
    url = 'https://somewhere.com/xmlapi/' + str(x) 
    page = requests.get(url) 
    # pull an xml file from a web url and turn it into an element tree in lxml 
    tree = html.fromstring(page.content) 
    # set my root variable so I can append children to this location 
    root = tree.xpath("//root/BGG[@ID=x]") 
    name = tree.xpath("//somewhere/name[@primary='true']" 
    # append child info into bgg.xml 
    child = etree.SubElement(root, "Name") 
    child.text = name 

# write bgg.xml back to file 

回答

1

獲取bgg.xml樹的根:

rootbgg = treebgg.getroot() 

,並用它的孩子追加到:

child = etree.SubElement(rootbgg, "Name") 

I'm having another problem...how do I select the correct element? I don't want to append to the root of the xml file itself.

現在,您將需要重新設計你遍歷元素的方式:

gameList = treebgg.xpath("//root/BGG") 

# iterate through the IDs 
for game in reversed(gameList): 
    url = 'https://somewhere.com/xmlapi/' + game.attrib["id"] 
    page = requests.get(url) 
    tree = html.fromstring(page.content) 
    # TODO: get the name 

    # append child info into bgg.xml 
    child = etree.SubElement(game, "Name") 
    child.text = name 
+0

我試圖rootbgg = treebgg.getroot(),但我有一個問題...我該如何選擇正確的元素?我不想追加到xml文件本身的根目錄。 您好 #append這裏 Aro

+0

@Aro與樣品代碼更新進行。我希望我明白你想要做的是什麼。 – alecxe

+0

謝謝!這解決了我的問題。 – Aro