2014-12-04 72 views
0

如何解析N參數組成的字符串,並隨機排序,如:如何解析字符串在Python

{ UserID : 36875; tabName : QuickAndEasy} 
{ RecipeID : 1150; UserID : 36716} 
{ isFromLabel : 0; UserID : 36716; type : recipe; searchWord : soup} 
{ UserID : 36716; tabName : QuickAndEasy} 

最終我期待輸出中單獨列參數表。

+0

你有多遠?你遇到了什麼問題? – khelwood 2014-12-04 07:53:19

+0

這應該是微不足道的正則表達式,如果你能提供的正則表達式將需要實施更具體的規則。例如,什麼類型的字符被允許作爲鍵/值?值中是否可以有空格?如果是這樣,價值會被引用嗎?如果是這樣,那麼這樣的值是否會有逃脫的引號?等...... – 2014-12-04 07:54:13

+0

感謝您的回覆。沒有太多,因爲我只能得到一個選定的參數,排他的人。鍵和值是字符串,任何字符,最多15個字符。沒有其他規則。 – mmarboeuf 2014-12-04 08:09:25

回答

1

在你的例子([^{}\s:]+)\s*:\s*([^{}\s;]+)工作正則表達式。你需要知道,不過,所有的比賽將是字符串,所以如果你想存儲36875爲數字,你需要做一些額外的處理。

import re 
regex = re.compile(
    r"""(  # Match and capture in group 1: 
    [^{}\s:]+ # One or more characters except braces, whitespace or : 
    )   # End of group 1 
    \s*:\s*  # Match a colon, optionally surrounded by whitespace 
    (   # Match and capture in group 2: 
    [^{}\s;]+ # One or more characters except braces, whitespace or ; 
    )   # End of group 2""", 
    re.VERBOSE) 

然後,您可以做

>>> dict(regex.findall("{ isFromLabel : 0; UserID : 36716; type : recipe; searchWord : soup}")) 
{'UserID': '36716', 'isFromLabel': '0', 'searchWord': 'soup', 'type': 'recipe'} 

測試它live on regex101.com

+0

謝謝!我真的需要用正則表達式來啓動和運行。弄清楚你的代碼是一個很棒的練習。 – mmarboeuf 2014-12-10 03:36:28

0
lines = "{ UserID : 36875; tabName : QuickAndEasy } ", \ 
     "{ RecipeID : 1150; UserID : 36716}", \ 
     "{ isFromLabel : 0; UserID : 36716; type : recipe; searchWord : soup}" , \ 
     "{ UserID : 36716; tabName : QuickAndEasy}" 

counter = 0 

mappedLines = {} 

for line in lines: 
    counter = counter + 1 
    lineDict = {} 
    line = line.replace("{","") 
    line = line.replace("}","") 
    line = line.strip() 
    fieldPairs = line.split(";") 

    for pair in fieldPairs: 
     fields = pair.split(":") 
     key = fields[0].strip() 
     value = fields[1].strip() 
     lineDict[key] = value 

    mappedLines[counter] = lineDict 

def printField(key, lineSets, comma_desired = True): 
    if key in lineSets: 
     print(lineSets[key],end="") 
    if comma_desired: 
     print(",",end="") 
    else: 
     print() 

for key in range(1,len(mappedLines) + 1): 
    lineSets = mappedLines[key] 
    printField("UserID",lineSets) 
    printField("tabName",lineSets) 
    printField("RecipeID",lineSets) 
    printField("type",lineSets) 
    printField("searchWord",lineSets) 
    printField("isFromLabel",lineSets,False) 

CSV輸出:上述

36875,QuickAndEasy,,,, 
36716,,1150,,, 
36716,,,recipe,soup,0 
36716,QuickAndEasy,,,, 

的代碼是Python的3.4。你可以用2.7代替函數和最後一個for循環得到類似的輸出:

def printFields(keys, lineSets): 
    output_line = "" 
    for key in keys: 
     if key in lineSets: 
      output_line = output_line + lineSets[key] + "," 
     else: 
      output_line += "," 
    print output_line[0:len(output_line) - 1] 

fields = ["UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel"] 

for key in range(1,len(mappedLines) + 1): 
    lineSets = mappedLines[key] 
    printFields(fields,lineSets) 
+0

你好,非常感謝你的幫助。我無法理解這一點,但感到絕望。 – mmarboeuf 2014-12-18 07:22:04

+0

代碼不適合你嗎?如果不是,那麼錯誤或不正確的輸出是什麼? – Scooter 2014-12-18 13:22:27