使用正則表達式來查找不包含特定信息的URL

我正在使用Python 3.5和re模塊來處理刮板/網絡爬蟲，其中一個功能需要檢索YouTube頻道的URL。我使用的是包括正則表達式匹配下面的代碼部分來實現：使用正則表達式來查找不包含特定信息的URL

href = re.compile("(/user/|/channel/)(.+)")

什麼，它應該返回是一樣的東西/user/username或/channel/channelname。它在很大程度上成功地完成了這項工作，但是它每隔一段時間就會抓取一類包含更多信息的網址，例如/user/username/videos?view=60或在username/部分之後發生的其他內容。

在試圖ADRESS這個問題，我改寫的代碼位爲

href = re.compile("(/user/|/channel/)(?!(videos?view=60)(.+)")

上面沒有成功的其他變化一起。我如何重寫我的代碼，以便獲取URL中不包含videos?view=60的任何地方的URL？

來源

2016-11-20 erik7970

用下面的辦法與特定的正則表達式：

user_url = '/user/username/videos?view=60' 
channel_url = '/channel/channelname/videos?view=60' 

pattern = re.compile(r'(/user/|/channel/)([^/]+)') 

m = re.match(pattern, user_url) 
print(m.group()) # /user/username 

m = re.match(pattern, channel_url) 
print(m.group()) # /channel/channelname

來源

2016-11-20 21:42:56 RomanPerekhrest

...'（？=/| $）''在這裏似乎沒用...... –

@ l'l l，同意，刪除那 – RomanPerekhrest

@RomanPerekhrest謝謝！這工作。 – erik7970

我用這種方法，似乎它你想要做什麼。

import re 

user = '/user/username/videos?view=60' 
channel = '/channel/channelname/videos?view=60' 

pattern = re.compile(r"(/user/|/channel/)[\w]+/") 

user_match = re.search(pattern, user) 

if user_match: 
    print user_match.group() 
else: 
    print "Invalid Pattern" 

pattern_match = re.search(pattern,channel) 

if pattern_match: 
    print pattern_match.group() 
else: 
    print "Invalid pattern"

希望這有助於！

來源

2016-11-20 22:32:59

使用正則表達式來查找不包含特定信息的URL

回答

相關問題