2013-06-27 74 views
0

我不熟悉字符串解析庫;並想從去:將字符串的部分解析爲字典?

'foo=5 z v xz er bar=" hel o" c z a == "hi" b = "who"' 

對此解析詞典:

{'foo':5, 'bar': ' hel o', 'a': 'hi', b: 'who'} 

但我不知道從哪裏開始。你能否給我一些處理這種轉換的建議?

+0

所以'v','xz','er'等東西,不有一個等號只是落在地板上?另外,Python開始的地方可能是[shlex](http://docs.python.org/2/library/shlex.html)。 – zwol

+0

沒有'='或'=='的所有內容都不應該出現在字典中。 (我正在分開處理它們) –

回答

2

您可以使用正則表達式。見python's documentation on regextutorial's point tutorial

像這樣的東西可以工作:

import re 

regex = re.compile(r"(\w+ ?=+ ?\d+|\w+ ?=+ ?\"(?: *\w*)*\")") 

#your example string: 
s = 'foo=5 z v xz er bar=" hel o" c z a == "hi" b = "who"' 

matches = regex.findall(s) 

dict1 = {} 
for m in matches: 
    elems = m.split("=") 
    #elems[0] = key 
    #elems[len(elems)-1] = value, to account for the case of multiple ='s 

    try: 
     #see if the element is a number 
     dict1[str(elems[0])] = int(elems[len(elems) - 1]) 

    except: 
     #if type casting didn't work, just store it as a string 
     dict1[str(elems[0])] = elems[len(elems) - 1] 

這裏的正則表達式分解:

(\w+ ?=+ ?\d+|\w+ ?=+ ?\"(?: *\w*)*\") 

\w+意味着一個或多個字母數字字符。

\d+表示一位或多位數字。

(?:regex)*表示匹配正則表達式的0個或更多副本而不分配組#。

(regex1|regex2)表示查找與regex1匹配的字符串或與regex2匹配的字符串。

\"是引號的轉義序列。

=+意味着匹配一個或多個「=」標誌

_?意味着匹配0或1位(假裝「_」爲空格)

0

Pyparsing是一個解析庫,讓你建立你的一次匹配表達式。

from pyparsing import Word, alphas, alphanums, nums, oneOf, quotedString, removeQuotes 

identifier = Word(alphas, alphanums) 
integer = Word(nums).setParseAction(lambda t: int(t[0])) 
value = integer | quotedString.setParseAction(removeQuotes) 

# equals could be '==' or '=' 
# (suppress it so it does not get included in the resulting tokens) 
EQ = oneOf("= ==").suppress() 

# define the expression for an assignment 
assign = identifier + EQ + value 

下面是代碼應用此解析器

# search sample string for matching assignments 
s = 'foo=5 z v xz er bar=" hel o" c z a == "hi" b = "who"' 
assignments = assign.searchString(s) 
dd = {} 
for k,v in assignments: 
    dd[k] = v 

# or more simply 
#dd = dict(assignments.asList()) 

print dd 

給出:

{'a': 'hi', 'b': 'who', 'foo': 5, 'bar': ' hel o'}