我想過濾街道名稱，並得到我想要的部分。這些名字有幾種格式。這裏有一些例子，我想從他們那裏得到什麼。正則表達式python不會工作，因爲我想它

Car Cycle 5 B Ap 1233  < what I have 
Car Cycle 5 B    < what I want 

Potato street 13 1 AB  < what I have 
Potato street 13   < what I want 

Chrome Safari 41 Ap 765  < what I have 
Chrome Safari 41   < what I want 

Highstreet 53 Ap 2632/BH < what I have 
Highstreet 53    < what I want 

Something street 91/Daniel < what I have 
Something street 91   < what I want

通常我想要的是後面的街道號碼街道名稱（1-4名），如果有一個，然後在街上信（1號），如果有一個。我無法讓它正常工作。

這裏是我的代碼（我知道，它吮吸）：

import re 

def address_regex(address): 
    regex1 = re.compile("(\w+){1,4}(\d{1,4}){1}(\w{1})") 
    regex2 = re.compile("(\w+){1,4}(\d{1,4}){1}") 
    regex3 = re.compile("(\w+){1,4}(\d){1,4}") 
    regex4 = re.compile("(\w+){1,4}(\w+)") 

    s1 = regex1.search(text) 
    s2 = regex2.search(text) 
    s3 = regex3.search(text) 
    s4 = regex4.search(text) 

    regex_address = "" 

    if s1 != None: 
     regex_address = s1.group() 
    elif s2 != None: 
     regex_address = s2.group() 
    elif s3 != None: 
     regex_address = s3.group() 
    elif s4 != None: 
     regex_address = s4.group()  
    else: 
     regex_address = address 

    return regex_address

我使用Python 3.4

來源

2015-08-18 ZeZe

只需使用像科多獸正則表達式的工具。 – bgusach

你不想要最後一個例子中的數字91？ – Falko

最後一個例子的邏輯是什麼？「街道91/Daniel'爲什麼不帶91？ – PYPL

我要去這裏走出去的肢體，並在最後一個例子假設你實際上想要抓住91號，因爲它沒有意義。

下面是其捕獲所有你的例子（和你的最後，但包括91）的解決方案：

^([\p{L} ]+ \d{1,4}(?: ?[A-Za-z])?\b)

^開始匹配的字符串的開頭
[\p{L} ]+ Character類空間或Unicode字符屬於「信」類別，1 - 無限次
\d{1,4}號，1-4倍
(?: ?[A-Za-z])?非捕獲組可選空間和單個字母，0-1次

捕獲組1是整個地址。我不太瞭解你的分組背後的邏輯，但隨意分組，但你願意。

See demo

來源

2015-08-18 11:42:31 ohaal

謝謝，這種方法效果更好，但有時在街道名稱中沒有數字（我忘記提及這一點），然後正則表達式不會得到任何東西。一個例子是「馬鈴薯街」，它什麼都沒有。那我該怎麼做？ – ZeZe

您可選擇的東西越多，相互依賴，您添加的複雜程度越高。你可以嘗試一下這個改進版本，除了說它更強大之外，我不會再解釋得更遠了：'^（\ p {L} [\ p {L} - ] * \ p {L}（?: \ d {1,4}（?:？[A-Za-z]）？）？\ b）'[See demo]（http://rubular.com/r/MMpglOziNu） - 如果你想更好地理解它，請嘗試使用在線正則表達式資源，如[RegexStorm]（http://regexstorm.net/reference）。 – ohaal

是的，作品謝謝你！我也會考慮RegexStorm。 – ZeZe

本工程爲您提供

^([a-z]+\s+)*(\d*(?=\s))?(\s+[a-z])*\b

集多了5個樣品模式和不區分大小寫。如果你的正則表達式支持它，那就是（？im）。

來源

2015-08-18 11:30:26 buckley

也許你喜歡一個更可讀的Python版本（無正則表達式）：

import string 

names = [ 
    "Car Cycle 5 B Ap 1233", 
    "Potato street 13 1 AB", 
    "Chrome Safari 41 Ap 765", 
    "Highstreet 53 Ap 2632/BH", 
    "Something street 91/Daniel", 
    ] 

for name in names: 
    result = [] 
    words = name.split() 
    while any(words) and all(c in string.ascii_letters for c in words[0]): 
     result += [words[0]] 
     words = words[1:] 
    if any(words) and all(c in string.digits for c in words[0]): 
     result += [words[0]] 
     words = words[1:] 
    if any(words) and words[0] in string.ascii_uppercase: 
     result += [words[0]] 
     words = words[1:] 
    print " ".join(result)

輸出：

Car Cycle 5 B 
Potato street 13 
Chrome Safari 41 
Highstreet 53 
Something street

來源

2015-08-18 11:32:45 Falko

正則表達式python不會工作，因爲我想它

回答

See demo

相關問題