2017-01-06 47 views
0

這個問題是後續問題的What is the most efficient way to extract info from complex JSON files?如何從嵌套字典文件的最深層次提取文本?

我有一噸的字典文件,其結構可以是任意的。我希望在沒有附加嵌套時使用「文本」鍵捕捉所有字符串,以及使用「htext」鍵的所有字符串。

d = { 
     "section": { 
        "heading":{"lvl":"A1", "text":"today"}, 
        "htext":[ 
           {"color":"green", "text":"yesterday", "htext":["a","b","c"]}, 
           {"color":"purple", "text":"tomorrow"} 
           ] 
        } 
     } 

在上面的例子中,我想我的結果是["today", "yesterday", "a", "b", "c", "tomorrow"]

在前面的問題提供的解決方案是:

def extract_text(obj, acc): 
    if isinstance(obj, dict): 
     for k, v in obj.items(): 
      if isinstance(v, (dict, list)): 
       extract_text(v, acc) 
      elif k == "text": 
       acc.append(v) 
    elif isinstance(obj, list): 
     for item in obj: 
      extract_text(item, acc) 

我試圖通過增加k == 'htext'elif語句來修改這個功能,但沒有成功。我有新的Python。任何幫助是極大的讚賞!

回答

1

試試這個:

d = { 
     "section": { 
        "heading":{"lvl":"A1", "text":"today"}, 
        "htext":[ 
           {"color":"green", "text":"yesterday", "htext":["a","b","c"]}, 
           {"color":"purple", "text":"tomorrow"} 
           ] 
        } 
     } 

acc = []; 

def extract_text(obj, acc): 
    if isinstance(obj, dict): 
     for k, v in obj.items(): 
      if isinstance(v, dict): 
       extract_text(v, acc) 
      elif k == "text": 
       acc.append(v) 
      elif k == "htext" and isinstance(v, list) and all([isinstance(item, str) for item in v]): 
       for item in v: 
        acc.append(item) 
      elif isinstance(v, list): 
       extract_text(v, acc) 
    elif isinstance(obj, list): 
     for item in obj: 
      extract_text(item, acc) 


extract_text(d, acc) 
print(acc) 
1

可以檢查關鍵是 「htext」 和值是一個非嵌套列表:

def extract_text(obj, acc): 
    if isinstance(obj, dict): 
     for k, v in obj.items(): 
      if k == "htext" and isinstance(v, list) and not isinstance(v[0], (dict, list)): 
      for x in v: 
       acc.append(x) 
      elif isinstance(v, (dict, list)): 
       extract_text(v, acc) 
      elif k == "text": 
       acc.append(v) 

    elif isinstance(obj, list): 
     for item in obj: 
      extract_text(item, acc) 

#=> ['yesterday', 'a', 'b', 'c', 'tomorrow', 'today'] 
相關問題