從域中提取第二級域名？ - Python

即使在運行時訪問Internet，也需要列表。向最終用戶出售第三級域名或第二級域名的決定由CCTLD授權機構制定。我認爲有些人甚至有一些保留的二級域名，並在其他地方的二級域名上銷售三級域名。當然，您還需要*維護*清單，因爲這些事情確實發生了變化（並且在您創建新的CCTLD之前就已經開始了） – Quentin 2011-02-06 23:32:06

繼@ kohlehydrat的建議：

import urllib2 

class TldMatcher(object): 
    # use class vars for lazy loading 
    MASTERURL = "http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1" 
    TLDS = None 

    @classmethod 
    def loadTlds(cls, url=None): 
     url = url or cls.MASTERURL 

     # grab master list 
     lines = urllib2.urlopen(url).readlines() 

     # strip comments and blank lines 
     lines = [ln for ln in (ln.strip() for ln in lines) if len(ln) and ln[:2]!='//'] 

     cls.TLDS = set(lines) 

    def __init__(self): 
     if TldMatcher.TLDS is None: 
      TldMatcher.loadTlds() 

    def getTld(self, url): 
     best_match = None 
     chunks = url.split('.') 

     for start in range(len(chunks)-1, -1, -1): 
      test = '.'.join(chunks[start:]) 
      startest = '.'.join(['*']+chunks[start+1:]) 

      if test in TldMatcher.TLDS or startest in TldMatcher.TLDS: 
       best_match = test 

     return best_match 

    def get2ld(self, url): 
     urls = url.split('.') 
     tlds = self.getTld(url).split('.') 
     return urls[-1 - len(tlds)] 


def test_TldMatcher(): 
    matcher = TldMatcher() 

    test_urls = [ 
     'site.co.uk', 
     'site.com', 
     'site.me.uk', 
     'site.jpn.com', 
     'site.org.uk', 
     'site.it' 
    ] 

    errors = 0 
    for u in test_urls: 
     res = matcher.get2ld(u) 
     if res != 'site': 
      print "Error: found '{0}', should be 'site'".format(res) 
      errors += 1 

    if errors==0: 
     print "Passed!" 
    return (errors==0)

來源

2011-02-07 16:11:33

@Hugh博思韋爾

在你不處理像parliament.uk特殊領域的例子，他們是代表帶「！」的文件（例如議會。英國）

我對你的代碼做了一些修改，也使它看起來更像我之前使用的PHP函數。

也增加了從本地文件加載數據的可能性。

還與一些結構域，測試它：

niki.bg，niki.1.bg
parliament.uk
niki.at，niki.co.at
尼基。我們，niki.ny.us
niki.museum，niki.national.museum
www.niki.uk - 由於「*」在Mozilla文件中報告爲OK。

隨時聯繫我@ github所以我可以添加你作爲共同作者。

GitHub庫是在這裏：

https://github.com/nmmmnu/TLDExtractor/blob/master/TLDExtractor.py

來源

2013-04-15 19:31:18 Nick

使用Python TLD

https://pypi.python.org/pypi/tld

$ PIP安裝TLD

from tld import get_tld 
print get_tld("http://www.google.co.uk") 
'google.co.uk'

來源

2013-05-18 11:31:35

從域中提取第二級域名？ - Python

回答

相關問題