提取值使用python

括在從文本文件中的大括號我在文本文件中的行如下所示：提取值使用python

0044xx AAA，BBB < +> 01/01/0017：53 < &> {3.01} {00001 } {XXX YYY DIFF} {（4.0-10.5）} {} 7.2

等

我試圖像提取值：

AAA is 0044xx aaa, bbb 

BBB is 01/01/0017:53 

CCC is 3.01 

DDD is 00001 

EEE is xxx yyy 

FFF is (4.0-10.5) 

HHH is 7.2

我不能從CCC中提取大括號內的HHH值。

我的腳本是這樣的：

import sys 

import re 

import csv 

def separateCodes(code): 
    values = re.compile('.*?\{(.*?)\}.*?') 
    field=values.findall(code)  

    for i in range(len(field)): 
     print field[i] 
    print"-------------------------"   

def handleError(self, record): 
    raise  
with open('/xxx.TXT') as ABCfp:  
    PP=ABCfp.read() 

PPwithNOrn=PP.replace('*\r','').replace('\n', '') 
PPByMsg=PPwithNOrn.split('<~>') 
print len(PPByMsg) 

for i in range(len(PPByMsg)): 

    AAA="" 
    BBB="" 
    CCC="" 
    DDD="" 
    EEE="" 
    FFF="" 
    GGG="" 
    HHH="" 

    print i,"=>",PPByMsg[i] 
    if PPByMsg[i].find("<L>")!=-1: 
     print "-----------------------" 
     # AAA found 
     AAA=PPByMsg[i].split('<L> <+>')[0] 
    # BBB found 
    BBB=PPByMsg[i].split('<L> <+>')[1].split('<&>')[0] 
     # REST Found 
    rest=separateCodes(PPByMsg[i].split('<L> <+>')[1].split('<&>')[1])

由於我是新手到Python無法繼續前進。請提出一種方法來提取這些值。

來源

2013-12-20 Bullu

歡迎堆棧溢出。請[格式化代碼]（http://stackoverflow.com/editing-help），以便每個人都可以閱讀。 – SuperSaiyan

「EEE」在你想要提取值的方式中是否正確？ – Jerry

這個怎麼樣，而不是：

a,b,c = re.split('<[+&]>', i) 
bits = re.split('{(.*?)}', c)[1:-1]

bits將有你的字符串的最後一部分的令牌：

>>> bits 
[' 3.01', '', '00001 ', '', 'xxx yyy DIFF', '', '(4.0-10.5)', '', '7.2'] 
>>> a 
'0044xx aaa, bbb ' 
>>> b 
' 01/01/0017:53 '

來源

2013-12-20 07:31:53

使用像這樣休息= separateCodes（PatientETLByMsg [I] .split（」 <+> '）[1] .split（' <&> '）[1]） \t ORDER = rest.split（'{（。*？） }'，c）[1：-1] \t print ORDER – Bullu

ORDER = rest.split（'{（。*？）}'，c）[1：-1] AttributeError：'NoneType'object has無屬性 '分裂' – Bullu

當作爲MRN，DATETIME其餘= re.split（ '<[+&]>'，I） \t位= re.split使用（ '{（。*？）}'，STR（休息））[1 ：-1]獲取錯誤返回_compile（pattern，0）.split（string，maxsplit） TypeError：期望的字符串或緩衝區 – Bullu

你可以做一個正則表達式的整個操作：

>>> t = '0044xx aaa, bbb <+> 01/01/0017:53 <&> { 3.01}{00001 }{xxx yyy DIFF}{(4.0-10.5)}{7.2}' 
>>> re.search(r'(.*?)\s<\+>\s(.*?)\s<&>\s{(.*?)\}\{(.*?)\}\{(.*?) DIFF\}\{(.*?)\}\{(.*?)\}', t).groups() 
('0044xx aaa, bbb', '01/01/0017:53', ' 3.01', '00001 ', 'xxx yyy', '(4.0-10.5)', '7.2')

您可以使用(?P<name>.*?)來擴展正則表達式而不是(.*?)給予命名結果：

>>> re.search(r'(?P<a>.*?)\s<\+>\s(?P<b>.*?)\s<&>\s{(?P<c>.*?)\}\{(?P<d>.*?)\}\{(?P<e>.*?) DIFF\}\{(?P<f>.*?)\}\{(?P<g>.*?)\}', t).groupdict() 
{'a': '0044xx aaa, bbb', 'c': ' 3.01', 'b': '01/01/0017:53', 'e': 'xxx yyy', 'd': '00001 ', 'g': '7.2', 'f': '(4.0-10.5)'}

或者，使用zip或元組的分配，如：

>>> results = re.search(...).groups() 
>>> resultdict = zip('abcdefg', results) 
>>> a, b, c, d, e, f, g = results

來源

2013-12-20 07:33:25 aquavitae

獲取錯誤TypeError：預期的字符串或緩衝區 – Bullu

它適用於我（Python 2.7）。如果您提供更多信息，只是「出現錯誤」，我可能會提供幫助。你在哪裏得到錯誤，你運行的是什麼版本的Python？錯誤是我發佈的代碼，還是在您自己的代碼中使用正則表達式時得到它？ – aquavitae

我正在使用python 2.6.6。我把它作爲結果= re.search（rest）.groups（） TypeError：search（）至少需要2個參數（1給出） – Bullu

我已經完成了我的以下要求：

rest=separateCodes(PatientETLByMsg[i].split('<L> <+>')[1].split('<&>')[1]) 

CCC=PPByMsg[i].split('{')[1].split('}')[0] 
DDD=PPByMsg[i].split('}{')[1] 
EEE=PPByMsg[i].split('}{')[2] 
FFF=PPByMsg[i].split('}{')[3] 
GGG=PPByMsg[i].split('}{')[4] 
HHH=PPByMsg[i].split('}{')[5] 
KKK=PPByMsg[i].split('}{')[6].split('}')[0]

來源

2013-12-31 09:44:45 Bullu

提取值使用python

回答

相關問題