替換URL到錨標記使用Python的正則表達式

我有一個HTML字符串，替換URL到錨標記使用Python的正則表達式

I was surfing http://www.google.com, where I found my tweet, 
check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a> 
<span>http://www.google.com</span>

此，

I was surfing <a href="http://www.google.com">http://www.google.com</a>, where I found my tweet, 
check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a> 
<span><a href="http://www.google.com">http://www.google.com</a></span>

我試試這個Demo

我的Python代碼

import re 
p = re.compile(ur'<a\b[^>]*>.*?</a>|((ftp|http|https):\/\/(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?)', re.MULTILINE) 
test_str = u"I was surfing http://www.google.com, where I found my tweet, check it out <a href=\"http://tinyurl.com/blah\">http://tinyurl.com/blah</a>" 

for item in re.finditer(p, test_str): 
    print item.group(0)

Ou tput的：

>>> http://www.google.com, 
>>> <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>

來源

2015-10-27 i' m PosSible

那麼你錯過了什麼？你找到的網址，現在只是檢查它是不是已經和並取代，對吧？ – mikus

@mikus我更新我的問題，當我在我的Python代碼中使用它時，它也返回錨標籤。 –

因此，所需的輸出只是「>>> http：// www.google.com」，「？ –

你可以做正則表達式的更復雜，但作爲mikus建議，它似乎更容易做到以下幾點：

for item in re.finditer(p, test_str): 
    result = item.group(0) 
    if not "<a " in result.lower(): 
     print(result)

來源

2015-10-27 14:00:16

它不是一個正確的方式，它使用正則表達式完成。謝謝！ –

我希望這可以幫助你。

代碼：

import re 
p = re.compile(ur'''[^<">]((ftp|http|https):\/\/(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?)[^< ,"'>]''', re.MULTILINE) 
test_str = u"I was surfing http://www.google.com, where I found my tweet, check it out <a href=\"http://tinyurl.com/blah\">http://tinyurl.com/blah</a>" 

for item in re.finditer(p, test_str): 
    result = item.group(0) 
    result = result.replace(' ', '') 
    print result 
    end_result = test_str.replace(result, '<a href="' + result + '">' + result + '</a>') 

print end_result

輸出：

http://www.google.com 
I was surfing <a href="http://www.google.com">http://www.google.com</a>, where I found my tweet, check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>

來源

2015-10-27 16:49:09 Tedezed

它的工作，但假設網址跨度或其他標籤，然後它也忽略。我只會忽略錨標籤，所以請幫助我解決這個問題。謝謝！！ –

我改變字符串問題謝謝！ –

好吧，我想我終於找到你要找的內容。基本的想法是嘗試匹配<a href和一個URL。如果有<a href則不要做任何事情，但如果沒有，請添加鏈接。下面是代碼：

import re 
test_str = """I was surfing http://www.google.com, where I found my tweet, 
check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a> 
<span>http://www.google.com</span> 
""" 
def repl_func(matchObj): 
    href_tag, url = matchObj.groups() 
    if href_tag: 
     # Since it has an href tag, this isn't what we want to change, 
     # so return the whole match. 
     return matchObj.group(0) 
    else: 
     return '<a href="%s">%s</a>' % (url, url) 

pattern = re.compile(
    r'((?:<a href[^>]+>)|(?:<a href="))?' 
    r'((?:https?):(?:(?://)|(?:\\\\))+' 
    r"(?:[\w\d:#@%/;$()~_?\+\-=\\\.&](?:#!)?)*)", 
    flags=re.IGNORECASE) 
result = re.sub(pattern, repl_func, test_str) 
print(result)

輸出：

I was surfing <a href="http://www.google.com">http://www.google.com</a>, where I found my tweet, 
check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a> 
<span><a href="http://www.google.com">http://www.google.com</a></span>

主要思想是從https://stackoverflow.com/a/3580700/5100564。我也借了https://stackoverflow.com/a/6718696/5100564。

來源

2015-10-28 18:41:28

替換URL到錨標記使用Python的正則表達式

回答

相關問題