Python和XML處理

我已經使用的urllib得到以下數據：Python和XML處理

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> 
<videos xmlns:xs="http://www.w3.org/2001/XMLSchema" 
     xmlns:www="http://www.www.com""> 
    <video type="cl"> 
    <cd> 
     <src lang="music">http://www.google.com/ </src> 
    </cd> 
    </video> 
</videos>

我想http://www.google.com/出去，這裏是我的代碼：

import xml.etree.ElementTree as etree 
data='<?xml version="1.0" encoding="UTF-8" standalone="yes"?><videos xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:www="http://www.www.com""><video type="cl"><cd><src lang="music">http://www.google.com/ </src></cd></video></videos>' 
tree = etree.fromstring(data) 
geturl=tree.findtext('/video/cd/src').strip() 
print geturl

我得到錯誤：

AttributeError: 'NoneType' object has no attribute 'strip'

顯然，findtext失敗。我試過findtext('src')，也不會工作。

怎麼了？

來源

2011-08-12 DocWiki

錯誤的回溯是什麼？此外，運行此代碼不會產生相同的錯誤。 –

從路徑刪除第一個正斜槓：video/cd/src：

import xml.etree.ElementTree as etree 
data='''<?xml version="1.0" encoding="UTF-8" standalone="yes"?><videos xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:www="http://www.www.com"><video type="cl"><cd><src lang="music">http://www.google.com/ </src></cd></video></videos>''' 
tree = etree.fromstring(data) 
geturl=tree.findtext('video/cd/src').strip() 
print geturl

產生

http://www.google.com/

正斜線表示絕對路徑，這是不允許的元素。

PS。您發佈的數據中也存在語法錯誤：xmlns:www="http://www.www.com""最後有兩個雙引號......

來源

2011-08-12 00:21:42 unutbu

感謝您的回覆。我沒有在我的代碼中使用第一個正斜槓。這只是一個錯字。我用Beautifulsoup解決了我的問題。最可能的原因是ElementTree無法處理的非正式xml。 – DocWiki

Python和XML處理

回答

相關問題