URL通用名稱匹配 - Python

我試圖用一個通用名稱來匹配域名。當我查看證書時，我看到通用名稱是"*.example.com"。可能的領域可能是：URL通用名稱匹配 - Python

www.example.com # A match the leftmost label of *.example.com 
example.com # A match the leftmost label of *.example.com 
hello.example.com # A match for the leftmost label of *.example.com 
foo.bar.example.com # Not a match for the leftmost label of 
*.*.* # Not a match for the leftmost label of *.example.com 
www.*.com # Not a match for the leftmost label of *.example.com

* .example.com的

我試圖創建下面的正則表達式：

import re 
common_name = "*.example.com" 
regex = common_name.replace('*','.*') + '$' 
url = "foo.bar.example.com" 
if re.match(regex, url): 
    print "yes" 
else: 
    print "no"

什麼是錯在我的正則表達式？

來源

2016-03-06 cybertextron

您還需要事先對字符串進行轉義（'re.escape（）'），否則所有其他'.'將被視爲正則表達式通配符 – Felk

試試這個正則表達式：

(?:^|\s)(\w+\.)?example\.com(?:$|\s)

應該

www.example.com
hello.example.com
example.com

基於匹配在你的測試字符串上。

Full solution:

import re 

common_name = "*.example.com" 
rxString = r'(?:^|\s)(\w+\.)?' + common_name.replace('.', '\.')[3:] + '(?:$|\s)' 

regex = re.compile(rxString) 
url = "foo.bar.example.com" 

if regex.match(url): 
    print "yes" 
else: 
    print "no"

輸入：

url     
------------------- 
www.example.com  
example.com   
hello.example.com 
foo.bar.example.com 
*.*.*    
www.*.com

輸出：

url     | result 
------------------- | ----------- 
www.example.com  | yes 
example.com   | yes 
hello.example.com | yes 
foo.bar.example.com | no 
*.*.*    | no 
www.*.com   | no

來源

2016-03-06 01:33:30 Saleem

它不匹配'foo.bar.example.com' – Felk

@Felk：它不應該匹配'foo.bar.example.com'。通配符證書只能匹配單個級別的子域。 – mhawke

是的，你是對的。請看看OP說的：「foo.bar.example.com＃不是最左邊標籤」 – Saleem

使用re.search與正則表達式模式'^[^.]*\.?example\.com$'：

>>> import re 
>>> def check_match(url): 
...  if re.search(r'^[^.]*\.?example\.com$', url): 
...   print url 
... 
>>> 
>>> check_match('www.example.com') 
www.example.com 
>>> check_match('example.com') 
example.com 
>>> check_match('hello.example.com') 
hello.example.com 
>>> check_match('foo.bar.example.com') 
>>> check_match('*.*.*') 
>>> check_match('www.*.com') 
>>>

來源

2016-03-06 01:36:36 heemayl

從您的正則表達式排除.字符，並允許任何其他還必須添加的https://匹配，做更換行：

regex = common_name.replace('*','.*') + '$'

到

regex = r'(https?://)?' + common_name.replace('*.', r'([^\.]*\.)?') + '$'

R'（HTTPS ?: //）？」 - 將允許https://和http://匹配在URL的開頭

R '？（[^] *）' - 讓您的域名從*.開始，不包括.的重複（域foo.bar.example.com - 將被視爲無效）

一般來說，問題提供所有的情況會被正確地匹配。

來源

2016-03-06 01:36:47

@philippe：https：//從哪裏來？這不是在你的問題中指定的。如果這是一個要求，你應該把它添加到你的問題。此外，您需要匹配域名;該計劃不相關。 – mhawke

@mhawke我會更新這個問題。 'http [s]'被urlparse移除，所以域只是'example.com'或'www.example.com' – cybertextron

@andriy \t Andriy，通用名* .example.com應該匹配。比方說「example.com」是有效的，而「foo.bar.example.com」不是 – cybertextron

這個怎麼樣（需要注意的是，與*預期在開始時沒有它不工作）：

import re 
common_name = "*.example.com" 
# escaping the string to not contain any valid regex 
common_name = re.escape(common_name) 
# Replacing any occurences of the (regex-escaped) "*." with regex 
regex = "^" + common_name.replace(r"\*\.", r"(\w*\.)?") + "$" 
# yields the regex: ^(\w*\.)?example\.com$ 
url = "foo.bar.example.com" 
if re.match(regex, url): 
    print("yes") 
else: 
    print("no")

這是你的例子匹配預期

來源

2016-03-06 01:50:53 Felk

它引發了一個錯誤：引發錯誤，v＃無效表達 sre_constants.error：沒有重複 – cybertextron

適用於我在Python 2.7和3.5 – Felk

此正則表達式將處理大多數情況下：

r'([^\.]+\.)?example\.com'

把那成代碼：

import re 

common_name = '*.example.com' 
pattern = re.compile(common_name.replace('*.', r'([^\.]+\.)?', 1)) 

for domain in 'www.example.com', 'example.com', 'hello.example.com', 'foo.bar.example.com', '*.*.*', 'www.*.com': 
    print('{}: {}'.format(domain, pattern.match(domain) is not None))

輸出

 
www.example.com: True 
example.com: True 
hello.example.com: True 
foo.bar.example.com: False 
*.*.*: False 
www.*.com: False

這是值得商榷的example.com是否應該被接受，但上述正則表達式會接受它。

來源

2016-03-06 01:59:30 mhawke

mhawke，我從日誌中得到這個：通用名稱：* * .example。 com' url：'example.com' – cybertextron

@philippe：如果你需要它，那就沒問題。您可以查看[這裏]（https://en.wikipedia.org/wiki/Wildcard_certificate）關於您是否認爲您應該接受「裸體」域名。 – mhawke

URL通用名稱匹配 - Python

回答

相關問題