2015-07-19 38 views
1

我在與一個Python正則表達式困難正則表達式結束。我想罰款任何N,S,E,W,NB,SB,EB,WB,包括字符串的開頭或結尾。我的正則表達式很容易在中間找到它,但在開始或結束時都失敗。麻煩匹配圖案或在Python

任何人都可以建議我在做什麼毛病dirPattern我下面的代碼示例?

注:我知道我有一些其他的問題來處理(例如,「W的」),但想我知道如何修改正則表達式的。

在此先感謝。

import re 

nameList = ['Boulder Highway and US 95 NB', 'Boulder Hwy and US 95 SB', 
'Buffalo and Summerlin N', 'Charleston and I-215 W', 'Eastern and I-215 S', 'Flamingo and NB I-15', 
'S Buffalo and Summerlin', 'Flamingo and SB I-15', 'Gibson and I-215 EB', 'I-15 at 3.5 miles N of Jean', 
'I-15 NB S I-215 (dual)', 'I-15 SB 4.3 mile N of Primm', 'I-15 SB S of Russell', 'I-515 SB at Eastern W', 
'I-580 at I-80 N E', 'I-580 at I-80 S W', 'I-80 at E 4TH St Kietzke Ln', 'I-80 East of W McCarran', 
'LV Blvd at I-215 S', 'S Buffalo and I-215 W', 'S Decatur and I-215 WB', 'Sahara and I-15 East', 
'Sands and Wynn South Gate', 'Silverado Ranch and I-15 (west side)'] 

dirMap = {'N': 'North', 'S': 'South', 'E': 'East', 'W': 'West'} 

dirPattern = re.compile(r'[ ^]([NSEW])B?[ $]') 

print('name\tmatch\tdirSting\tdirection') 
for name in nameList: 
    match = dirPattern.search(name) 
    direction = None 
    dirString = None 
    if match: 
     dirString = match.group(1) 
     if dirString in dirMap: 
      direction = dirMap[dirString] 
    print('%s\t%s\t%s\t%s'%(name, match, dirString, direction)) 

一些樣品預期輸出:

name match dirSting direction

Boulder Highway and US 95 NB <_sre.SRE_Match object at 0x7f68af836648> N North

Boulder Hwy and US 95 SB <_sre.SRE_Match object at 0x7f68ae836648> S South

Buffalo and Summerlin N <_sre.SRE_Match object at 0x7f68af826648> N North

Charleston and I-215 W <_sre.SRE_Match object at 0x7f68cf836648> W West

Flamingo and NB I-15 <_sre.SRE_Match object at 0x7f68af8365d0> N North

S Buffalo and Summerlin <_sre.SRE_Match object at 0x7f68aff36648> S South

Gibson and I-215 EB <_sre.SRE_Match object at 0x7f68afa36648> E East

然而,開始或結束的例子給:

Boulder Highway and US 95 NB None None None

+2

'^'和'$'*括號內*並不意味着仍然字符串的開始/結束,你知道嗎? – jonrsharpe

+0

喬恩,謝謝,我不知道,雖然我開始懷疑這一點。 –

+1

你想要做什麼?你也可以使用'direction = dirMap.get(dirString)',如果字典 –

回答

0

此代碼中的正則表達式修改的伎倆。這包括「在E」搬運東西像「W的」,以及類似:

import re 

nameList = ['Boulder Highway and US 95 NB', 'Boulder Hwy and US 95 SB', 
'Buffalo and Summerlin N', 'Charleston and I-215 W', 'Eastern and I-215 S', 'Flamingo and NB I-15', 
'S Buffalo and Summerlin', 'Flamingo and SB I-15', 'Gibson and I-215 EB', 'I-15 at 3.5 miles N of Jean', 
'I-15 NB S I-215 (dual)', 'I-15 SB 4.3 mile N of Primm', 'I-15 SB S of Russell', 'I-515 SB at Eastern W', 
'I-580 at I-80 N E', 'I-580 at I-80 S W', 'I-80 at E 4TH St Kietzke Ln', 'I-80 East of W McCarran', 
'LV Blvd at I-215 S', 'S Buffalo and I-215 W', 'S Decatur and I-215 WB', 'Sahara and I-15 East', 
'Sands and Wynn South Gate', 'Silverado Ranch and I-15 (west side)'] 

dirMap = {'N': 'North', 'S': 'South', 'E': 'East', 'W': 'West'} 

dirPattern = re.compile(r'(?:^|)(?<! at)(?<! of)([NSEW])B?(?! of)(?: |$)') 

print('name\tdirSting\tdirection') 
for name in nameList: 
    match = dirPattern.search(name) 
    direction = None 
    dirString = None 
    if match: 
     dirString = match.group(1) 
     direction = dirMap.get(dirString) 
    print('> %s\t\t%s\t%s'%(name, dirString, direction)) 

正則表達式可以如下理解:

(?:^|)開始與字符串或者開始或空間

(?<! at) '在'

(?<! of)不是由前面之前沒有通過 '的'

([NSEW]) 'N', 'S', 'E', 'W' 中的任何一個(這將是在match.group(1))

B?任選隨後 'B'(如在結合)

(?! of)不後跟 '在' 與串的任一端或空格

(?: |$)

最終輸出是:

Boulder Highway and US 95 NB N North

Boulder Hwy and US 95 SB S South

Buffalo and Summerlin N N North

Charleston and I-215 W W West

Eastern and I-215 S S South

Flamingo and NB I-15 N North

S Buffalo and Summerlin S South

Flamingo and SB I-15 S South

Gibson and I-215 EB E East

I-15 at 3.5 miles N of Jean None None

I-15 NB S I-215 (dual) N North

I-15 SB 4.3 mile N of Primm S South

I-15 SB S of Russell S South

I-515 SB at Eastern W S South

I-580 at I-80 N E N North

I-580 at I-80 S W S South

I-80 at E 4TH St Kietzke Ln None None

I-80 East of W McCarran None None

LV Blvd at I-215 S S South

S Buffalo and I-215 W S South

S Decatur and I-215 WB S South

Sahara and I-15 East None None

Sands and Wynn South Gate None None

Silverado Ranch and I-15 (west side) None None

西特注意:我決定我不想結束字符串的情況。對於這一點,正則表達式是:

dirPattern = re.compile(r'(?:^|)(?<! at)(?<! of)([NSEW])B? (?!of)')

1

您需要使用lookarounds

dirPattern = re.compile(r'(?<!\S)([NSEW])B?(?!\S)') 

[ ^]會匹配空格或插入符號。 (?<!\S)負面lookbehind斷言,比賽將在任何機器人之前,而不是非空間字符。 (?!\S)斷言他匹配的後面不能有非空格字符。

爲什麼我用積極的方式使用負面預測,python的默認re模塊將不支持(?<=^|)

+0

*「carrot symbol」* - [caret](https://en.wikipedia.org/wiki/Caret)? – jonrsharpe

+0

Avinash,感謝您的提示。答案,我開始用lookaround來處理像'E 2nd St'或'I-15 W'這樣的案例(都被排除在外,我想要的是N,NB等等,但是隻有它自己,也就是在開始時接着是空間,在p末尾被空間退回,或在前後空間的中間。你的答案可能會讓我在那裏,但現在我不知道如何。 –