2013-04-11 57 views
1

我有以下格式的數據文件:問題編寫正則表達式

1 AA/BB     0C89JG 
    2 ABANO/ANA VICTORIA  F12LFJ 
    3 ABBOUDLASTNAME/ABBOUDF DWPTHC 
    4 ABDALLAH/SIJAM   H0ZDM9 
    5 ABDEL MESSIH/DINA  T0SF8N 
    6 ABHISHEK/PRAMANIK  7SLKXV 
    7 ABHYANKAR/DHANANJAY 7SM0BV 
    8 ABOUSALAMA/FEMKE  LTTRQC 
    9 ABRAMOVA/NATALIA  77LCPZ 
    10 ABRANTES/JOAO   KXZC7Q 
    11 ABRATH/LUC    D5J99J 
    12 ABREO/HECTOR   CXDH4G 
    13 ABREU/ANDREA   242GRC 
    14 ABREU/MARCELO   2436R7 
    15 ABREU/VANDA   3HDNQQ 
    16 ABTS/NATHALIE   DSK9TN 
    17 ABTS/NATHALIE   FZ0LN4 

而且我想最後6個字符例如提取從線17 正則表達式,我拿出FZ0LN4是:

([0-9]{1,5})([A-Z /]) ([0-9A-Z]{6}) 

但它不是做什麼工作的。任何人都可以請指出什麼是問題?

回答

2

有幾個問題:

  • 你不匹配的一些空格的。
  • [A-Z /]缺少重複操作符。

我已經重寫了正則表達式,像這樣:

In [8]: re.match(r'\s*(\d+)\s*([A-Z /]+?)\s*(\w+)$', ' 15 ABREU/VANDA   3HDNQQ').groups() 
Out[8]: ('15', 'ABREU/VANDA', '3HDNQQ') 

如果你只需要在最後六個字符,那麼就沒有必要對一個正則表達式:

In [15]: s = ' 15 ABREU/VANDA   3HDNQQ' 

In [16]: s[-6:] 
Out[16]: '3HDNQQ' 
+1

比我好多了:) 但它會在第二紀錄失敗:( – RAB 2013-04-11 16:30:39

+0

@RaheelAliBaloch:好點,我忽略了空間。現在修復。 – NPE 2013-04-11 16:56:33

0

使用$字符對於非線性字符和\S

import re 
>>> s = s = ''' 1 AA/BB     0C89JG 
    2 ABANO/ANA VICTORIA  F12LFJ 
    3 ABBOUDLASTNAME/ABBOUDF DWPTHC 
    4 ABDALLAH/SIJAM   H0ZDM9 
    5 ABDEL MESSIH/DINA  T0SF8N 
    6 ABHISHEK/PRAMANIK  7SLKXV 
    7 ABHYANKAR/DHANANJAY 7SM0BV 
    8 ABOUSALAMA/FEMKE  LTTRQC 
    9 ABRAMOVA/NATALIA  77LCPZ 
    10 ABRANTES/JOAO   KXZC7Q 
    11 ABRATH/LUC    D5J99J 
    12 ABREO/HECTOR   CXDH4G 
    13 ABREU/ANDREA   242GRC 
    14 ABREU/MARCELO   2436R7 
    15 ABREU/VANDA   3HDNQQ 
    16 ABTS/NATHALIE   DSK9TN 
    17 ABTS/NATHALIE   FZ0LN4''' 

>>> re.findall('\\S{6}$', s, re.MULTILINE) 
['0C89JG', 'F12LFJ', 'DWPTHC', 'H0ZDM9', 'T0SF8N', '7SLKXV', '7SM0BV', 'LTTRQC', '77LCPZ', 'KXZC7Q', 'D5J99J', 'CXDH4G', '242GRC', '2436R7', '3HDNQQ', 'DSK9TN', 'FZ0LN4'] 
2

如果你只需要串在該行的末尾,你可以使用一個更簡單的正則表達式,如:\b\w{6}\b$

1

你只是爲了尋找最後一行(17)?如果是這樣,re.search整個字符串:

import re 
myString=""" 
    1 AA/BB     0C89JG 
    2 ABANO/ANA VICTORIA  F12LFJ 
    3 ABBOUDLASTNAME/ABBOUDF DWPTHC 
    4 ABDALLAH/SIJAM   H0ZDM9 
    5 ABDEL MESSIH/DINA  T0SF8N 
    6 ABHISHEK/PRAMANIK  7SLKXV 
    7 ABHYANKAR/DHANANJAY 7SM0BV 
    8 ABOUSALAMA/FEMKE  LTTRQC 
    9 ABRAMOVA/NATALIA  77LCPZ 
    10 ABRANTES/JOAO   KXZC7Q 
    11 ABRATH/LUC    D5J99J 
    12 ABREO/HECTOR   CXDH4G 
    13 ABREU/ANDREA   242GRC 
    14 ABREU/MARCELO   2436R7 
    15 ABREU/VANDA   3HDNQQ 
    16 ABTS/NATHALIE   DSK9TN 
    17 ABTS/NATHALIE   FZ0LN4 
""" 

m = re.search("(\S{6})$", myString) 
if m: 
    print m.group(1) 

如果你需要找到特定行,你應該遍歷單獨的線:

for line in myString.split("\n"): 
    m = re.search("^\s*17\s*.*(\S{6})$", line) 
    if m: 
     print m.group(1) 
+0

+1與我的相同 – User 2013-04-11 16:34:51

1

這是很容易沒有一個正則表達式來完成:

st='''\ 
    1 AA/BB     0C89JG 
    2 ABANO/ANA VICTORIA  F12LFJ 
    3 ABBOUDLASTNAME/ABBOUDF DWPTHC 
    4 ABDALLAH/SIJAM   H0ZDM9 
    5 ABDEL MESSIH/DINA  T0SF8N 
    6 ABHISHEK/PRAMANIK  7SLKXV 
    7 ABHYANKAR/DHANANJAY 7SM0BV 
    8 ABOUSALAMA/FEMKE  LTTRQC 
    9 ABRAMOVA/NATALIA  77LCPZ 
    10 ABRANTES/JOAO   KXZC7Q 
    11 ABRATH/LUC    D5J99J 
    12 ABREO/HECTOR   CXDH4G 
    13 ABREU/ANDREA   242GRC 
    14 ABREU/MARCELO   2436R7 
    15 ABREU/VANDA   3HDNQQ 
    16 ABTS/NATHALIE   DSK9TN 
    17 ABTS/NATHALIE   FZ0LN4''' 

for line in st.splitlines(): 
    print line.split()[-1] 

打印:

0C89JG 
F12LFJ 
DWPTHC 
H0ZDM9 
T0SF8N 
7SLKXV 
7SM0BV 
LTTRQC 
77LCPZ 
KXZC7Q 
D5J99J 
CXDH4G 
242GRC 
2436R7 
3HDNQQ 
DSK9TN 
FZ0LN4 

或者,如果你只是想 '第n個' 之一,是這樣的:

>>> li=[line.split()[-1] for line in st.splitlines()] 
>>> li[-1] 
'FZ0LN4' 
>>> li[-2] 
'DSK9TN' # etc etc 

或者,如果你真的一個正則表達式:

>>> re.findall(r'\s(\S{6})$',st,re.MULTILINE) 
['0C89JG', 'F12LFJ', 'DWPTHC', 'H0ZDM9', 'T0SF8N', '7SLKXV', '7SM0BV', 'LTTRQC', '77LCPZ', 'KXZC7Q', 'D5J99J', 'CXDH4G', '242GRC', '2436R7', '3HDNQQ', 'DSK9TN', 'FZ0LN4'] 
>>> re.findall(r'\s(\S{6})$',st,re.MULTILINE)[-1] 
'FZ0LN4'