2015-08-19 98 views
5

我有以下形式的蟒蛇的字符串:將字符串轉換成一個樹狀結構在python

line a 
line b 
    line ba 
    line bb 
    line bba 
    line bc 
line c 
    line ca 
    line caa 
line d 

你可以得到的想法。它實際上與Python代碼本身有相似的形式,因爲有一條線,在該線的下面,縮進表示塊的一部分,由最近的較小縮進行開頭。

我需要做的是將此代碼解析爲樹結構,以便每個根級別的行是字典的關鍵字,其值是一個表示所有子行的字典。所以上面會:

{ 
'line a' => {}, 
'line b' => { 
    'line ba' => {}, 
    'line bb' => { 
    'line bba' => {} 
    }, 
    'line bc' => {} 
    }, 
'line c' => { 
    'line ca' => { 
    'line caa' => {} 
    }, 
    }, 
'line d' => {} 
} 

這裏是我得到了什麼:

def parse_message_to_tree(message): 
    buf = StringIO(message) 
    return parse_message_to_tree_helper(buf, 0) 

def parse_message_to_tree_helper(buf, prev): 
    ret = {} 
    for line in buf: 
     line = line.rstrip() 
     index = len(line) - len(line.lstrip()) 
     print (line + " => " + str(index)) 
     if index > prev: 
      ret[line.strip()] = parse_message_to_tree_helper(buf, index) 
     else: 
      ret[line.strip()] = {} 

    return ret 

打印顯示的是左剝離和0指數沒想到lstrip()是一個突變線,但無論如何該指數應該仍然是準確的。

任何建議是有幫助的。

編輯:不知道之前出了什麼問題,但我再次嘗試,它更接近工作,但仍不完全正確。這是我現在有:

{'line a': {}, 
'line b': {}, 
'line ba': {'line bb': {}, 
      'line bba': {'line bc': {}, 
          'line c': {}, 
          'line ca': {}, 
          'line caa': {}, 
          'line d': {}}}} 
+0

你見過autovivifyingating樹hack了嗎?可以爲你節省一些擊鍵:'tree = lambda:defaultdict(tree); t = tree(); t ['a'] ['b'] ['c'] =「bla」' –

回答

3

就像之前已經提到的str.lstrip()不是一個增變器,索引在我的系統中也很準確。

但問題在於,當你意識到該行的索引增加時,line實際上指向增加的索引行,例如,在第一種情況下,我們注意到行的索引增加爲line ba,所以lineline ba,然後在你的if情況下,你做 -

ret[line.strip()] = parse_message_to_tree_helper(buf, index) 

這是錯誤的,因爲你會被設置無論是通過parse_message_to_tree_helper()返回line ba,而不是其實際的父。另外,一旦你在函數內部遞進,除非文件已被完全讀取,否則你不會出來,但是在字典中存儲某一行的級別取決於當縮進減少時遞歸出來的級別。

我不知道,如果有,我能想出(基於大量的代碼)的任何內置庫,這將幫助你做到這一點,但一碼 -

def parse_message_to_tree(message): 
    buf = StringIO(message) 
    return parse_message_to_tree_helper(buf, 0, None)[0] 

def parse_message_to_tree_helper(buf, prev, prevline): 
    ret = {} 
    index = -1 
    for line in buf: 
     line = line.rstrip() 
     index = len(line) - len(line.lstrip()) 
     print (line + " => " + str(index)) 
     if index > prev: 
      ret[prevline.strip()],prevline,index = parse_message_to_tree_helper(buf, index, line) 
      if index < prev: 
       return ret,prevline,index 
      continue 
     elif not prevline: 
      ret[line.strip()] = {} 
     else: 
      ret[prevline.strip()] = {} 
     if index < prev: 
      return ret,line,index 
     prevline = line 
    if index == -1: 
     ret[prevline.strip()] = {} 
     return ret,None,index 
    if prev == index: 
     ret[prevline.strip()] = {} 
    return ret,None,0 

示例/演示 -

>>> print(s) 
line a 
line b 
    line ba 
    line bb 
    line bba 
    line bc 
line c 
    line ca 
    line caa 
>>> def parse_message_to_tree(message): 
...  buf = StringIO(message) 
...  return parse_message_to_tree_helper(buf, 0, None)[0] 
... 
>>> def parse_message_to_tree_helper(buf, prev, prevline): 
...  ret = {} 
...  index = -1 
...  for line in buf: 
...   line = line.rstrip() 
...   index = len(line) - len(line.lstrip()) 
...   print (line + " => " + str(index)) 
...   if index > prev: 
...    ret[prevline.strip()],prevline,index = parse_message_to_tree_helper(buf, index, line) 
...    if index < prev: 
...     return ret,prevline,index 
...    continue 
...   elif not prevline: 
...    ret[line.strip()] = {} 
...   else: 
...    ret[prevline.strip()] = {} 
...   if index < prev: 
...    return ret,line,index 
...   prevline = line 
...  if index == -1: 
...   ret[prevline.strip()] = {} 
...   return ret,None,index 
...  if prev == index: 
...   ret[prevline.strip()] = {} 
...  return ret,None,0 
... 
>>> pprint.pprint(parse_message_to_tree(s)) 
line a => 0 
line b => 0 
    line ba => 2 
    line bb => 2 
    line bba => 4 
    line bc => 2 
line c => 0 
    line ca => 2 
    line caa => 4 
{'line a': {}, 
'line b': {'line ba': {}, 'line bb': {'line bba': {}}, 'line bc': {}}, 
'line c': {'line ca': {'line caa': {}}}} 
>>> s = """line a 
... line b 
... line ba 
... line bb 
...  line bba 
... line bc 
... line c 
... line ca 
...  line caa 
... line d""" 
>>> pprint.pprint(parse_message_to_tree(s)) 
line a => 0 
line b => 0 
    line ba => 2 
    line bb => 2 
    line bba => 4 
    line bc => 2 
line c => 0 
    line ca => 2 
    line caa => 4 
line d => 0 
{'line a': {}, 
'line b': {'line ba': {}, 'line bb': {'line bba': {}}, 'line bc': {}}, 
'line c': {'line ca': {'line caa': {}}}, 
'line d': {}} 

您將需要測試的代碼,任何更多的錯誤或遺漏的一些情況。

+0

謝謝。我想通了,我需要在參數中包含前一行,但我還沒有得到彈出的問題。 – ewok

1

lstrip()不是突變,看documentation

string.lstrip(S [,字符])

返回字符串的一個副本主角被刪除。如果省略字符或無,則刪除空白字符。如果給出 而不是無,則字符必須是字符串;字符串 中的字符將從字符串的起始處被剝離,該方法的調用方法爲 。

而你的代碼似乎與我的機器上的示例文本一起工作。

+0

單數。它之前沒有工作,但現在似乎更接近於工作。我現在遇到的問題是它不會在「行c」和「行d」上「彈出」。所以'line c'在'line bba'的地圖 – ewok

+0

實際上它的結構比它更糟糕:'{'line a':{},'line b':{},'line ba':{'line bb ':{'line bb':{'line bc':{},'line ca':{},'line c':{},'line d':{},'line caa':{} }}}' – ewok

1

另一個答案,使用堆棧而不是遞歸。經過幾次迭代纔得到這個版本,它似乎處理了幾種可能的輸入場景,但不能保證完全沒有錯誤!這確實是一個棘手的問題。希望我的評論能夠說明一條正確的思路。感謝分享這個問題。

text = '''line a 
line b 
    line ba 
    line bb 
    line bba 
    line bc 
line c 
    line ca 
    line caa 
line d''' 

root_tree = {} 
stack = [] 
prev_indent, prev_tree = -1, root_tree 

for line in text.splitlines(): 

    # compute current line's indent and strip the line 
    origlen = len(line) 
    line = line.lstrip() 
    indent = origlen - len(line) 
    print indent, line 

    # no matter what, every line has its own tree, so let's create it. 
    tree = {} 

    # where to attach this new tree is dependent on indent, prev_indent 
    # assume: stack[-1] was the right attach point for the previous line 
    # then: let's adjust the stack to make that true for the current line 

    if indent < prev_indent: 
     while stack[-1][0] >= indent: 
      stack.pop() 
    elif indent > prev_indent: 
     stack.append((prev_indent, prev_tree)) 

    # at this point: stack[-1] is the right attach point for the current line 
    parent_indent, parent_tree = stack[-1] 
    assert parent_indent < indent 

    # attach the current tree 
    parent_tree[line] = tree 

    # update state 
    prev_indent, prev_tree = indent, tree 

print len(stack) 
print stack 
print root_tree