2016-05-11 28 views
0

我正在使用urllib2庫來訪問我擁有的s3存儲桶。我得到一個xml結構。問題是我想找到的結構中,他們的密鑰以「part-」開頭的節點如何搜索xml響應中的某些字符串

我想然後提取並保存在列表/數組中的任何和循環,然後讀取這些文件的內容XML響應

部分

<Contents> 
<Key>output/part-00000</Key> 
<LastModified>2016-05-11T17:01:19.000Z</LastModified> 
<ETag>"d41d8cd98f00b204e9800998ecf8427e"</ETag> 
<Size>0</Size> 
<StorageClass>STANDARD</StorageClass> 
</Contents> 
<Contents> 
<Key>output/part-00001</Key> 
<LastModified>2016-05-11T17:01:15.000Z</LastModified> 
<ETag>"d41d8cd98f00b204e9800998ecf8427e"</ETag> 
<Size>0</Size> 
<StorageClass>STANDARD</StorageClass> 
</Contents> 

現在我做以下

import xml.etree.ElementTree as ET 

f = urllib2.urlopen("https://s3.amazonaws.com/*******") 

tree = ET.parse(f) 
root = tree.getroot() 

for child in root: 
    print child 

輸出

<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Name' at 0x103a325d0> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Prefix' at 0x103a32610> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Marker' at 0x103a32690> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}MaxKeys' at 0x103a32710> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}IsTruncated' at 0x103a32750> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a32790> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a32950> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a32b10> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a32cd0> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a32e90> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e090> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e250> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e410> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e5d0> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e790> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e950> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3eb10> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3ecd0> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3ee90> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47090> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47250> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47410> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a475d0> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47790> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47950> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47b10> 
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47cd0> 

我已經嘗試過使用minidom和xml.etree.ElementTree的各種解決方案,但我不太清楚。

所以我想要的是通過這些XML節點循環找到部分的所有引用 - *****並將它們保存在一個數組中。

任何幫助/線索是/歡迎

+0

後你試過的代碼,並出了什麼差錯,我們會解決它。 –

+0

@AlexHall嘿那裏,你可以檢查我以上與控制檯輸出一起試用,謝謝 –

+0

這是一個開始,你有所有的節點。 「問題是我想找到那個結構中的節點,它們的密鑰以」part-「開頭」你試圖過濾掉那些節點的位置? –

回答

0

我的解決方案

f = urllib2.urlopen("https://s3.amazonaws.com/******") 

tree = ET.parse(f) 
root = tree.getroot() 

for child in root.findall('{http://s3.amazonaws.com/doc/2006-03-01/}Contents'): 
    for key in child.findall("{http://s3.amazonaws.com/doc/2006-03-01/}Key"): 
     print key.text