從相同的字符串

這裏換上一個字一個字符串的一部分，是我的代碼的輸出：從相同的字符串

Tue Dec 17 04:34:03 +0000 2013,Email me for tickets email me at [email protected],1708824644 
Tue Dec 17 04:33:58 +0000 2013,@musclepotential ok man. you can email [email protected],25016561

我想找到在,<text>,（在逗號之間文本）的電子郵件地址，然後重新打印只是電子郵件。

例子：

Tue Dec 17 04:34:03 +0000 2013, [email protected],1708824644 
Tue Dec 17 04:33:58 +0000 2013, [email protected],25016561

我知道我可以使用正則表達式如下得到公正的電子郵件，但後來我鬆散的其他數據。

string = str(messages) 
regex = "\[email protected]\w+\.com" 
match = re.findall(regex,string)

來源

2013-12-17 user2748540

輸入是什麼樣的？ – inspectorG4dget

我很確定'\ w +'不夠好。那麼'joe.smith @ gmail.com'呢？ – mgilson

上面的答案依賴於您的文本與您的示例非常相似。此代碼稍微靈活一些，可以匹配文本中的任意數量的電子郵件。我沒有完整地記錄它，但是......

harvest_emails採用線分隔的字符串的字符串，每個這樣的逗號分隔在你的例子，date，message_string，identifier，並返回產生3發電機長度元組(date,comma-sep-emails,identifier)。它將從文本中提取任意數量的電子郵件並匹配任何形式爲的電子郵件，其中x是非空白字符的非零長度系列。

def harvest_emails(target): 
    """"Takes string, splits it on \n, then yields each line formatted as: 
datecode, email, identifier 
""" 
    import re 

    for line in target.splitlines(): 
     t = line.split(",") 
     yield (
      t[0].strip(), 
      ','.join(
       re.findall("\[email protected]\S+\.(?:com|org|net)", 
          ''.join(t[1:-1]).strip(),re.I)[0:]), 
      t[-1].strip())

。

>>>messages = """04:34:03 +0000 2013,Email me for tickets email me at [email protected],1708824644 
Tue Dec 17 04:33:58 +0000 2013,@musclepotential ok, man. you can email [email protected],25016561 
Tue Dec 17 04:34:03 +0000 2013, [email protected], [email protected],1708824644 
Tue Dec 17 04:33:58 +0000 2013, [email protected],25016561""" 
>>>data = list() 
>>>for line in harvest_emails(messages): 
     d = dict() 
     d["date"],d["emails"],d["id"] = line[0],line[1].split(','),line[2] 
     data.append(d) 
>>>for value in data: 
     print(value) 
{'emails': ['[email protected]'], 'date': '04:34:03 +0000 2013', 'id': '1708824644'} 
{'emails': ['[email protected]'], 'date': 'Tue Dec 17 04:33:58 +0000 2013', 'id': '25016561'} 
{'emails': ['[email protected]', '[email protected]'], 'date': 'Tue Dec 17 04:34:03 +0000 2013', 'id': '1708824644'} 
{'emails': ['[email protected]'], 'date': 'Tue Dec 17 04:33:58 +0000 2013', 'id': '25016561'}

來源

2013-12-17 07:44:09

非常感謝你。 – user2748540

當前的代碼後，試試這個：

new_string = string.split(',') 
new_string[1] = match[0] 
output_string = ', '.join(new_string)

來源

2013-12-17 04:53:13 DasSnipez

根據你的例子
使用這種模式,.*?(\S+),Demo
該解決方案是獨立於電子郵件方式的，因爲它是一個最尋求模式，它可能會有很大的不同，如[email protected]

來源

2013-12-17 04:56:04

請注意，這隻適用於電子郵件地址介於逗號之間的情況，並且它會在逗號之間捕獲ANYTHING的最後一個單詞。 –

這可能會奏效...

string = str(messages) 
regex = "(?<=,).*?(?=\S+,\d+$)" 
ouput_str=re.sub(regex,"",string)

來源

2013-12-17 05:18:03 Wasi

從相同的字符串

回答

相關問題