如何解析自定義字符串並從該字符串創建字典？

我有兩個類型的字符串，類似於下面如何解析自定義字符串並從該字符串創建字典？

string1 = 'ID=mRNA42;Parent=gene19;integrity=0.95;foo=bar' 
string2 = 'transcript_id "g3.t1"; gene_id "g3";'

我試圖創建將根據字符串拿上面的字符串作爲輸入，並返回字典的功能。

爲STRING1字典，結構就像是

attributes = { 
    'ID': 'mrna42', 
    'Parent': 'gene19', 
    'integrity': '0.95', 
    'foo': 'bar', 
}

，爲字符串2

attributes = { 
    'transcript_id': 'g3.t1', 
    'gene_id': 'g3', 
}

我嘗試：

def parse_single_feature_line(attributestring): 

    attributes = dict() 
    for keyvaluepair in attributestring.split(';'): 
     for key, value in keyvaluepair.split('='): 
      attributes[key] = value 
    return attributes

我需要幫助建立功能。

來源

2017-07-31 Arijit

檢查我的簡化的答案的答案...我用你現有的函數與正則表達式 –

試試這個

string1 = 'ID=mRNA42;Parent=gene19;integrity=0.95;foo=bar' 
string2 = 'transcript_id "g3.t1"; gene_id "g3";' 

def str2dict(s): 
    result={} 
    for i in s.split(";"): 
      ele=i.strip() 
      if not ele:continue 
      if "=" in i: 
        key,val=ele.split("=") 
      else: 
        key,val=ele.split() 
      result[key]=val.strip('"') 
    return result 

str2dict(string1) 
str2dict(string2)

來源

2017-07-31 07:06:07

我正在運行腳本時得到「** ValueError：沒有足夠的值解壓（預期2，得到0）**」。所以你可以請檢查一下。 – Arijit

這來自尾隨的「;」在string2結尾處產生一個額外的空字符串 –

更新上面的代碼請檢查。 –

您可以使用字典理解！

>>> string1 
'ID=mRNA42;Parent=gene19;integrity=0.95;foo=bar' 
>>> string2 
'transcript_id "g3.t1"; gene_id "g3";' 
>>> {each.split('=')[0]:each.split('=')[1] for each in string1.split(';') if each} 
{'foo': 'bar', 'integrity': '0.95', 'ID': 'mRNA42', 'Parent': 'gene19'} 
>>> {each.split(' ')[0]:each.split(' ')[1] for each in string2.split(';') if each} 
{'': 'gene_id', 'transcript_id': '"g3.t1"'}

，解決你所面臨的問題，

def parse_single_feature_line(attributestring): 
    attributes = dict() 
    for keyvaluepair in attributestring.split(';'): 
     key,value=keyvaluepair.split('=') # you get a list when you split keyvaluepair string and not a list of list(if list of lists eg.[["this","these"],["that","those"]] then you can use - for key,value in list_of_lists:) 
     attributes[key] = value 
    return attributes 

print parse_single_feature_line(string1)

來源

2017-07-31 07:06:20

你甚至可以簡單地這... dict（each.split（'='）爲我在string1.split（';'））更多細節檢查我的答案在下面。我簡化了它 –

他們是不同的，因此需要處理的不同。

def return_dict(string): 
    if "=" in string: 
     return dict(i.strip().split("=") for i in string.split(";")) 
    else: 
     return dict([i.strip().split(" ") for i in string.split(";") if len(i.strip().split(" ")) > 1]) 

return_dict(string1) 
return_dict(string2)

給出：

{'ID': 'mRNA42', 'Parent': 'gene19', 'foo': 'bar', 'integrity': '0.95'} 
{'gene_id': '"g3"', 'transcript_id': '"g3.t1"'}

來源

2017-07-31 07:07:55

如果它是不同的非char？我們可以使用正則表達式.. re.split（'[=]'，字符串）...檢查我的答案希望你會明白的 –

@MohideenibnMohammed你得到的優秀版本:) –

您可以使用正則表達式一個全球性的解決方案：

import re 

string1 = 'ID=mRNA42;Parent=gene19;integrity=0.95;foo=bar' 
string2 = 'transcript_id "g3.t1"; gene_id "g3";' 

# Define the regular expression 
reg_exp = "([\.\-\w_]+)=([\.\-\w_]+);?|([\.\-\w_]+) \"([\.\-\w_]+)\"" 

# Get results and filter empty elements in tuples 
match = [filter(None, x) for x in re.findall(reg_exp, string1+"\n"+string2)] 

# Convert to dict 
result = {key:value for key, value in match}

這個正則表達式中包含兩大類：

A組([\.\-\w_]+)=([\.\-\w_);?和組B ([\.\-\w_]+) \"([\.\-\w_]+)\"

每個組都包含另外兩組，它們將與名稱和值對匹配。請注意，您可能需要調整這些組添加到您期望的名稱和值，或使用(.*?)

來源

2017-07-31 07:15:00 VMRuiz

解決方案一：分裂的空間，並剝去引號對結果的後半部分：

>>> key, val = 'transcript_id "g3.t1"'.split(" ", maxsplit=1) 
>>> val = val.strip('"') 
>>> key 
'transcript_id' 
>>> val 
'g3.t1'

二解決方案（更通用）：使用正則表達式來捕捉部分：

>>> import re 
>>> match = re.search(r'([a-z_]+) "(.+?)"', 'transcript_id "g3.t1"') 
>>> key, val = match.groups() 
>>> key 
'transcript_id' 
>>> val 
'g3.t1'

如果你事先知道其中你給定的字符串或文件中有你的兩種格式，你可以傳遞一個回調做串解析，即：

def parse_line(attributestring, itemparse): 
    attributes = dict() 
    for keyvaluepair in attributestring.split(';'): 
     if not keyvaluepair: 
      # empty string due to a trailing ";" 
      continue 
     for key, value in itemparse(keyvaluepair): 
      attributes[key] = value 
    return attributes 


def parse_eq(kvstring): 
    return kvstring.split("=") 

def parse_space(kvstring): 
    key, val = 'transcript_id "g3.t1"'.split(" ", maxsplit=1) 
    return key, val.strip('"') 

d1 = parse_line(string1, parse_eq) 
d2 = parse_line(string2, parse_space)

來源

2017-07-31 07:19:34

的簡化版本，您可以添加分隔符在正則表達式來拆分更多的字符串分割，

string1 = 'ID=mRNA42;Parent=gene19;integrity=0.95;foo=bar' 
string2 = 'transcript_id "g3.t1"; gene_id "g3";' 
import re 

def parse_single_feature_line(string): 
    attributes = dict(re.split('[ =]', i.strip()) for i in string.split(';') if i) 
    return attributes 

print parse_single_feature_line(string1) 
print parse_single_feature_line(string2)

來源

2017-07-31 13:19:09

如何解析自定義字符串並從該字符串創建字典？

回答

相關問題