使用正則表達式趕子蟒

假設我有一些字符串這樣的：使用正則表達式趕子蟒

x = 'Wish she could have told me herself. @NicoleScherzy #nicolescherzinger #OneLove #myfav #MyQueen :heavy_black_heart::heavy_black_heart: some string too :smiling_face:'

所以，我希望從獲得：

:heavy_black_heart: 
:smiling_face:

爲了做到這一點，我做了以下內容：

import re 
result = re.search(':(.*?):', x) 
result.group()

它只給我':heavy_black_heart:'。我怎麼能使它工作？如果可能的話，我想在他們全部找到之後將它們存儲在dictonary中。

來源

2017-09-14 zwlayer

也許'設置（re.findall（R '：[^：] +：'，X））'會做什麼？不知道'：'，也許'r'之間可能會有什麼：\ w +：''會更好。 –

@WiktorStribiżew例如，它的工作原理，但我不明白你爲什麼不確定 – zwlayer

看到我的答案和一些解釋。其實，你沒有提供所有的要求，只是兩個例子，這就是爲什麼我說我不確定。 –

你似乎想匹配的是一些符號中2秒之間:表情。 .*?可以匹配0個符號，你的正則表達式可以匹配::，我認爲這不是你想得到的。 Besdies，re.search只返回一個 - 第一個 - 匹配，並且得到多個匹配，您通常使用re.findall或re.finditer。

我想你需要

set(re.findall(r':[^:]+:', x))

，或者如果你只需要匹配字字符內:...:：

set(re.findall(r':\w+:', x))

或 - 如果你想在兩個:之間的匹配任何非空白字符：

set(re.findall(r':[^\s:]+:', x))

該re.findall將f ind所有不重疊的事件和set將刪除愚蠢。

的模式將匹配:，然後1+比:（[^:]+）（或1個或多個字母，數字和_），並再次:其他字符。

>>> import re 
>>> x = 'Wish she could have told me herself. @NicoleScherzy #nicolescherzinger #OneLove #myfav #MyQueen :heavy_black_heart::heavy_black_heart: some string too :smiling_face:' 
>>> print(set(re.findall(r':[^:]+:', x))) 
{':smiling_face:', ':heavy_black_heart:'} 
>>>

來源

2017-09-14 12:21:11

print re.findall(':.*?:', x)正在做這項工作。

輸出：
[ '：heavy_black_heart：'， '：heavy_black_heart：'， '：smiling_face：']

但是，如果你想刪除重複的：

用途：

res = re.findall(':.*?:', x) 
dictt = {x for x in res} 
print list(dictt)

輸出：
[ '：heavy_black_heart：'， '：smiling_face：']

來源

2017-09-14 12:15:38

're.MULTILINE'沒有對模式做任何事情，因爲沒有'^'和'$'來修改行爲。 're.match'只在字符串的開頭搜索一個匹配項。 –

現在，你在比賽中沒有'：'。 –

現在檢查@WiktorStribiżew –

試此正則表達式：

:([a-z0-9:A-Z_]+):

來源

2017-09-14 12:15:41

當我嘗試它時，它會產生'：heavy_black_heart :: heavy_black_heart：'這不是我想要的 – zwlayer

@zwlayer它返回匹配，因爲'：'在字符類中，'+'是*貪婪*量詞，所以在字符類中定義的所有字符首先被匹配，儘可能多的出現在字符和數字的'_'之後的最後一個'：'。 –

import re 
x = 'Wish she could have told me herself. @NicoleScherzy #nicolescherzinger #OneLove #myfav #MyQueen :heavy_black_heart::heavy_black_heart: some string too :smiling_face:' 
print set(re.findall(':.*?:', x))

輸出：

{':heavy_black_heart:', ':smiling_face:'}

來源

2017-09-14 12:19:57 Arun

只是爲了好玩，這裏有一個沒有正則表達式的簡單解決方案。它分裂周圍':'並保持具有奇數索引的元素：

>>> text = 'Wish she could have told me herself. @NicoleScherzy #nicolescherzinger #OneLove #myfav #MyQueen :heavy_black_heart::heavy_black_heart: some string too :smiling_face:' 
>>> text.split(':')[1::2] 
['heavy_black_heart', 'heavy_black_heart', 'smiling_face'] 
>>> set(text.split(':')[1::2]) 
set(['heavy_black_heart', 'smiling_face'])

來源

2017-09-14 12:56:21

使用正則表達式趕子蟒

回答

相關問題