如何從字符串中提取多次出現的字典？

我想從字符串中提取多個python字典。目前我正在使用正在失敗的正則表達式，因爲它也匹配字典之間的數據。我也使用了非貪婪的正則表達式({.+?})，但它弄亂了嵌套字典並將它們視爲不同的事件。如何從字符串中提取多次出現的字典？

例字符串：

mystring = '(2017-05-29, { "mydict": [{ "hello": "world"}, {"hello2":"world2"}]};;/url/string, {"dict2":{"world":"hello"}}'

代碼：

>>>import re 
>>>match_data = re.compile('({.+})') 
>>>match_data.findall(mystring.strip()) 
['{ "mydict": [{ "hello": "world"}, {"hello2":"world2"}]};;/url/string, {"dict2":{"world":"hello"}}']

預期輸出：

['{ "mydict": [{ "hello": "world"}, {"hello2":"world2"}]}', '{"dict2":{"world":"hello"}}']

來源

2017-05-29 Rahul

我想你會需要編寫Python字典的分析器。 – 0605002

試試這個're.findall（r'{。+？}'，mystring））'，它不會給出你完全不同的東西，但是你可以很容易地解析數據。 – Arun

是「;/url/string」數據總是在同一個地方？就像在兩個字典之間？ – DexJ

正則表達式可能是對這個問題過於簡單化了。然而，一個可能的解決方案是符合paratheses：

s = '{ "mydict": [{ "hello": "wo}}rld"}, {"hello2":"world2"}]};;/url/string, {"dict2":{"world":"hello"}}' 


number_of_parthesis = 0 
start_index = -1 
in_quotes = False 

for i,c in enumerate(s): 
    if c in ["\'", "\""]: 
     if in_quotes: 
      in_quotes = False 
     else: 
      in_quotes = True 
    if in_quotes: 
     continue 
    if c == "{": 
     number_of_parthesis += 1 
     if start_index == -1: 
      start_index = i 
    if c == "}": 
     number_of_parthesis -= 1 
     if number_of_parthesis == 0: 
      print(s[start_index:i+1]) 
      start_index = -1

導致：

{ "mydict": [{ "hello": "wo}}rld"}, {"hello2":"world2"}]} 
{"dict2":{"world":"hello"}}

來源

2017-05-29 05:00:18 Darkstarone

如何從字符串中提取多次出現的字典？

回答

相關問題