Python用不尋常的標籤名稱解析XML（原子：鏈接）

我試圖從下面的XML中解析出href。有多個workspace標籤，下面我只是展示一個。Python用不尋常的標籤名稱解析XML（原子：鏈接）

myUrl = 'https://www.my-geoserver.com/geoserver/rest/workspaces' 
headers = {'Accept': 'text/xml'} 
resp = requests.get(myUrl,auth=('admin','password'),headers=headers)

如果我搜索 '工作空間'，我得到的對象返回：

<workspaces> 
    <workspace> 
    <name>practice</name> 
    <atom:link xmlns:atom="http://www.w3.org/2005/Atom" rel="alternate" href="https://www.my-geoserver.com/geoserver/rest/workspaces/practice.xml" type="application/xml"/> 
    </workspace> 
</workspaces>

使用請求庫以上來自於requests.get命令

lst = tree.findall('workspace') 
print(lst)

導致：

[<Element 'workspace' at 0x039E70F0>, <Element 'workspace' at 0x039E71B0>, <Element 'workspace' at 0x039E7240>]

那麼好吧，但我如何獲取文本HREF出字符串的，我曾嘗試：

lst = tree.findall('atom') 
lst = tree.findall('atom:link') 
lst = tree.findall('workspace/atom:link')

但他們沒有工作，隔離標籤，其實是最後一個創建錯誤

SyntaxError: prefix 'atom' not found in prefix map

如何獲得帶有這些標籤名稱的所有href實例？

來源

2017-06-20 Single Entity

對於其他人誰找到這個問題，冒號前的部分（在這種情況下）被稱爲一個名稱空間，在這裏引起的問題。解決方案很簡單：

myUrl = 'https://www.my-geoserver.com/geoserver/rest/workspaces' 
headers = {'Accept': 'text/xml'} 
resp = requests.get(myUrl,auth=('admin','my_password'),headers=headers) 
stuff = resp.text 
to_parse=BeautifulSoup(stuff, "xml") 

for item in to_parse.find_all("atom:link"): 
    print(item)

感謝薩基特米塔爾指出我朝着BeautifulSoup圖書館。關鍵是在BeautifulSoup函數中使用xml作爲參數。使用lxml根本不會正確解析名稱空間並忽略它們。

來源

2017-06-20 19:14:18

簡單的解決方案，我發現：

>>> y=BeautifulSoup(x) 
>>> y 
<workspaces> 
<workspace> 
<name>practice</name> 
<atom:link xmlns:atom="http://www.w3.org/2005/Atom" rel="alternate" href="https://www.my-geoserver.com/geoserver/rest/workspaces/practice.xml" type="application/xml"> 
</atom:link></workspace> 
</workspaces> 
>>> c = y.workspaces.workspace.findAll("atom:link") 
>>> c 
[<atom:link xmlns:atom="http://www.w3.org/2005/Atom" rel="alternate" href="https://www.my-geoserver.com/geoserver/rest/workspaces/practice.xml" type="application/xml"> 
</atom:link>] 
>>>

來源

2017-06-20 18:20:29

我得到簡單的[]作爲我的輸出，它必須與resp.text的格式有關，它只是文本，據我所知。如果我使用y.workspaces.findAll（「workspace」），它會起作用，但那不是我所追求的。 –

Python用不尋常的標籤名稱解析XML（原子：鏈接）

回答

相關問題