將XML數據組織成字典

我試圖從XML數據中將數據組織成字典格式。這將用於運行蒙特卡洛模擬。將XML數據組織成字典

這裏是一個什麼樣的XML幾個條目看起來像一個例子：

<retirement> 
    <item> 
     <low>-0.34</low> 
     <high>-0.32</high> 
     <freq>0.0294117647058824</freq> 
     <variable>stock</variable> 
     <type>historic</type> 
    </item> 
    <item> 
     <low>-0.32</low> 
     <high>-0.29</high> 
     <freq>0</freq> 
     <variable>stock</variable> 
     <type>historic</type> 
    </item> 
</retirement>

我目前的數據集只有兩個變量的類型可以是3或4可能離散型1。硬編碼兩個變量不是問題，但我想開始處理具有更多變量並自動執行此過程的數據。我的目標是自動將這個XML數據導入到字典中，以便以後能夠進一步處理它，而不必在數組標題和變量中進行硬編碼。

以下是我有：

# Import XML Parser 
import xml.etree.ElementTree as ET 

# Parse XML directly from the file path 
tree = ET.parse('xmlfile') 

# Create iterable item list 
Items = tree.findall('item') 

# Create Master Dictionary 
masterDictionary = {} 

# Assign variables to dictionary 
for Item in Items: 
    thisKey = Item.find('variable').text 
    if thisKey in masterDictionary == False: 
     masterDictionary[thisKey] = [] 
    else: 
     pass 

thisList = masterDictionary[thisKey] 
newDataPoint = DataPoint(float(Item.find('low').text), float(Item.find('high').text), float(Item.find('freq').text)) 
thisSublist.append(newDataPoint)

我得到一個KeyError異常@ thisList = masterDictionary [thisKey]

我也試圖創建一個類來處理一些其他元素

# Define a class for each data point that contains low, hi and freq attributes 
class DataPoint: 
def __init__(self, low, high, freq): 
    self.low = low 
    self.high = high 
    self.freq = freq

會我再能與像檢查一個值：

的XML的

masterDictionary['stock'] [0].freq

任何和所有幫助表示讚賞

UPDATE

感謝您的幫助約翰。縮進問題對我來說是唾手可得的。這是我第一次在Stack上發佈，但我沒有正確複製/粘貼。 else之後的部分實際上是縮進成爲for循環的一部分，並且在我的代碼中縮進了四個空格 - 這裏只是一個糟糕的發佈。我會牢記大寫公約。你的建議的確是工作，現在用命令：

print masterDictionary.keys() 
print masterDictionary['stock'][0].low

產量：

['inflation', 'stock'] 
-0.34

這些確實是我的兩個變量，並在頂部列出的XML值同步。

更新2

嗯，我想我已經想通了這一個，但我又是粗心大意，事實證明，我還沒有完全固定的問題。之前的解決方案最終將所有數據寫入到我的兩個字典密鑰中，以便將兩個相同的列表分配給兩個不同的字典密鑰。這個想法是從XML中爲匹配的字典鍵分配不同的數據集。這裏是當前的代碼：

# Import XML Parser 
import xml.etree.ElementTree as ET 

# Parse XML directly from the file path 
tree = ET.parse(xml file) 

# Create iterable item list 
items = tree.findall('item') 

# Create class for historic variables 
class DataPoint: 
    def __init__(self, low, high, freq): 
     self.low = low 
     self.high = high 
     self.freq = freq 

# Create Master Dictionary and variable list for historic variables 
masterDictionary = {} 
thisList = [] 

# Loop to assign variables as dictionary keys and associate their values with them 
for item in items: 
    thisKey = item.find('variable').text 
    masterDictionary[thisKey] = thisList 
    if thisKey not in masterDictionary: 
     masterDictionary[thisKey] = [] 
    newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text)) 
    thisList.append(newDataPoint)

當我輸入：

print masterDictionary['stock'][5].low 
print masterDictionary['inflation'][5].low 
print len(masterDictionary['stock']) 
print len(masterDictionary['inflation'])

結果對於兩個鍵（ '庫存' 和 '通貨膨脹'）相同：

-.22 
-.22 
56 
56

有在XML文件中有27個商品帶有股票標籤，並且29個用通貨膨脹標記。我怎樣才能讓每個列表分配給一個字典鍵只拉動循環中的特定數據？

更新3

這似乎與2路的工作，但我不知道如何和爲什麼它不會在1個單循環工作。我管理這個意外：

# Import XML Parser 
import xml.etree.ElementTree as ET 

# Parse XML directly from the file path 
tree = ET.parse(xml file) 

# Create iterable item list 
items = tree.findall('item') 

# Create class for historic variables 
class DataPoint: 
    def __init__(self, low, high, freq): 
     self.low = low 
     self.high = high 
     self.freq = freq 

# Create Master Dictionary and variable list for historic variables 
masterDictionary = {} 

# Loop to assign variables as dictionary keys and associate their values with them 
for item in items: 
    thisKey = item.find('variable').text 
    thisList = [] 
    masterDictionary[thisKey] = thisList 

for item in items: 
    thisKey = item.find('variable').text 
    newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text)) 
    masterDictionary[thisKey].append(newDataPoint)

我已經嘗試了大量的排列，使其在一個單一的循環，但沒有運氣發生。我可以將所有數據都列入兩個鍵 - 所有數據的相同數組（不是非常有用），或者將數據正確地分爲兩個不同的數組，但只有最後一個數據條目（循環會覆蓋自身每次只留下數組中的一個條目）。

來源

2011-07-21 A.Krueger

您在（不必要的）else: pass之後有嚴重的縮進問題。修復並再試一次。您的樣本輸入數據是否出現問題？其他數據？第一次圍繞循環？ thisKey導致問題的價值是什麼[提示：它在KeyError錯誤信息中報告]？在發生錯誤之前masterDictionary的內容是什麼[提示：在代碼中附上幾條print聲明]？

其他言論不相關的問題：

而不是if thisKey in masterDictionary == False:考慮使用if thisKey not in masterDictionary: ...比較反對True或False幾乎都是多餘的和/或有點「代碼味道」的。

Python約定是爲類保留一個首字母大寫字母（如Item）。

每個縮進級別只使用一個空格會使代碼幾乎難以辨認，並被嚴重棄用。總是使用4（除非你有一個很好的理由 - 但我從來沒有聽說過）。

更新我錯了：thisKey in masterDictionary == False比我想象的要糟糕;因爲in是關係運算符，所以使用鏈式評估（如a <= b < c），因此您有(thisKey in masterDictionary) and (masterDictionary == False)，它總是評估爲False，因此字典從不更新。修正如我的建議：使用if thisKey not in masterDictionary:

它看起來像thisList（初始化但未使用）應爲thisSublist（已使用但未初始化）。

來源

2011-07-21 03:57:23

再次感謝您的幫助 –

@Yuri Hyuga：如果您認爲您的問題已得到解答，並感謝您，請點擊答案左側的大「勾號」輪廓以「接受」答案。 –

再次感謝您的幫助John。看起來我認爲我已經解決了這個問題。有任何想法嗎？？？ –

變化：

if thisKey in masterDictionary == False:

到

if thisKey not in masterDictionary:

這似乎可以解釋爲什麼你收到這個錯誤。另外，在嘗試附加它之前，您需要將某些內容分配給「thisSublist」。嘗試：

thisSublist = [] 
thisSublist.append(newDataPoint)

來源

2011-07-21 05:43:38 Sinzor

感謝您的幫助 –

-1

您在for循環中的if語句中有錯誤。取而代之的

if thisKey in masterDictionary == False:

寫

if (thisKey in masterDictionary) == False:

鑑於你原來的代碼的其餘部分，你將能夠訪問數據，像這樣：

masterDictionary['stock'][0].freq

約翰·馬金公司對於風格的一些有效點和氣味，（你應該考慮他的建議變化），但那些事情會隨着時間和經驗而來。

來源

2011-07-21 06:20:13 adparker

-1「」「if（thisKey in masterDictionary）== False」... aarrgghh !! –

將XML數據組織成字典

回答

相關問題