捕獲多種可能性與正則表達式

-1

"Title: /u/foo, /u/bar" 
"Title - /u/foo and /u/bar" 
"title-/u/foo, /u/bar and /u/foobar" 
"Title /u/barfoo (/u/foo and /u/bar)"

和我有麻煩匹配1，也許100名之間的任意數字。

編輯：我不認爲我說清楚了，我給出的示例字符串是我正在搜索的實際文本的小片段。我檢查崗位的屍體/ R/KarmaCourt，像這樣的：

http://www.reddit.com/r/KarmaCourt/comments/1ifz0u/ http://www.reddit.com/r/KarmaCourt/comments/28hv73/

的問題是圍繞構建一個正則表達式循環。我不想知道如何搜索我給名稱的樣本字符串。

我知道r'title.*/u/(\w{3:20})'將在該行最後一個名稱相匹配，r'title.*?/u/(\w{3:20})'將匹配第一個在該行，那我可以手動在表達的末尾添加的r'.*?/?u?/?(\w{3:20})?'一些數字來匹配多個名稱，但我不能不認爲這是一種不好的做法。

從r'title.*?(?=/u/\w{3:20})(.*)'取出匹配的字符串並從中拉出所有匹配的r'/u/(\w{3:20})'組會更好嗎？還是有辦法在一個步驟中完成我基本上缺失的所有操作？

注意：這個項目正在python中完成，但這更多的是一個基本問題。

來源

2015-01-08 Humus

你如果Python支持它，可以使用\G構造。
\ G表示在最後一場比賽結束時開始搜索。

這基本上可以讓你有資格新的搜索（標題在這種情況下）
的開始而實際上不必每次都去檢查。

然後只是做一個全球性的搜索。每場比賽後名稱都在第1組。
我設置了多行修飾符。如果您一次測試1行，則可能不需要。

# (?mi)(?:(?!\A)\G|^Title).*?/u/(\w{3,20}) 

(?xmi-)      # Inline modifier = 
           # expanded, multiline, case insensitive 
(?: 
     (?! \A)      # Not beginning of string 
     \G       # If matched before, start at end of last match 
    |        # or, 
    ^Title      # BOL then 'title' 
) 
.*?       # non-greedy any char's 
/u/       # until '/u/' 
(\w{3,20})     # (1), then 3 to 20 word characters

附錄
這裏是輸出，會給它是如何工作的想法。

輸出

** Grp 0 - (pos 0 , len 13) 
Title: /u/foo 
** Grp 1 - (pos 10 , len 3) 
foo 

------------ 

** Grp 0 - (pos 13 , len 8) 
, /u/bar 
** Grp 1 - (pos 18 , len 3) 
bar 

------------ 

** Grp 0 - (pos 24 , len 14) 
Title - /u/foo 
** Grp 1 - (pos 35 , len 3) 
foo 

------------ 

** Grp 0 - (pos 38 , len 11) 
and /u/bar 
** Grp 1 - (pos 46 , len 3) 
bar 

------------ 

** Grp 0 - (pos 52 , len 12) 
title-/u/foo 
** Grp 1 - (pos 61 , len 3) 
foo 

------------ 

** Grp 0 - (pos 64 , len 8) 
, /u/bar 
** Grp 1 - (pos 69 , len 3) 
bar 

------------ 

** Grp 0 - (pos 72 , len 14) 
and /u/foobar 
** Grp 1 - (pos 80 , len 6) 
foobar 

------------ 

** Grp 0 - (pos 89 , len 15) 
Title /u/barfoo 
** Grp 1 - (pos 98 , len 6) 
barfoo 

------------ 

** Grp 0 - (pos 104 , len 8) 
(/u/foo 
** Grp 1 - (pos 109 , len 3) 
foo 

------------ 

** Grp 0 - (pos 112 , len 11) 
and /u/bar 
** Grp 1 - (pos 120 , len 3) 
bar

來源

2015-01-08 23:03:18 sln

如果我在我非貪婪的例子'r'title。*？/ u /（\ w {3:20}）'上使用了這個，那麼下一個匹配就不會在下一個匹配之前尋找另一個「標題」名稱？這仍然會是兩步，類似於我上面提出的解決方案，對吧？ – Humus

@Humus - 它的一個步驟。增加了一些輸出。只需找到所有這些，它將一次性獲得所有這些值。 – sln

謝謝！我想我現在明白這一點。不幸的是，Python沒有\ G結構，但顯然有一種方法可以解決這個問題： http://stackoverflow.com/questions/529830/do-python-regexes-support-something-like-perls -g – Humus

如何在Python中執行此操作。 findall將返回在句子中匹配的單詞列表。一旦你有了，你可以遍歷它獲得用戶名。

import re 

s = ["Title: /u/foo, /u/bar", 
    "Title - /u/foo and /u/bar", 
    "title-/u/foo, /u/bar and /u/foobar", 
    "Title /u/barfoo (/u/foo and /u/bar)"] 

for t in s: 
    matches = re.findall(r'/u/(\w+)', t) 
    print matches

來源

2015-01-08 22:53:54

真的你不需要正則表達式，你可以使用str.split()和str.rstrip()：

>>> l=["Title: /u/foo, /u/bar", 
... "Title - /u/foo and /u/bar", 
... "title-/u/foo, /u/bar and /u/foobar", 
... "Title /u/barfoo (/u/foo and /u/bar)"] 
>>> s=[i.split() for i in l] 
>>> [[j.split('/u/')[1].rstrip(')') for j in i if '/u/' in j]for i in s] 
[['foo,', 'bar'], ['foo', 'bar'], ['foo,', 'bar', 'foobar'], ['barfoo', 'foo', 'bar']]

如果你想使用正則表達式，你可以只是我們positive look-behind：

>>> import re 
>>> s=[re.findall(r'(?<=/u/)\w+',i) for i in l] 
>>> s 
[['foo', 'bar'], ['foo', 'bar'], ['foo', 'bar', 'foobar'], ['barfoo', 'foo', 'bar']]

來源

2015-01-08 23:01:37 Kasramvd

不幸的是，假設我給你的例子是文本的全部。我從/ r/KarmaCourt的帖子中拉出這些文字，文字看起來更像是一個完整的法庭案卷。我正在尋找使用正則表達式的答案，因爲我確實需要他們瀏覽一下這樣的內容： http://www.reddit.com/r/KarmaCourt/comments/1ifz0u/the_people_of_reddit_vs_uvolumezero_for_blatant/ – Humus

@Humus ok ，檢查編輯。我認爲你可以使用積極的後顧之憂 – Kasramvd

是的，一個積極的後顧之道會給我所有在帖子中列出的名字，但我特別想用他們的「標題」對他們進行分類，這就是爲什麼我打擾搜索標題在第一位。我在尋找的是一種從正則表達式中提取任意數量匹配的方法，該正則表達式包含只需匹配一次的其他匹配元素。 – Humus

捕獲多種可能性與正則表達式

回答

相關問題