2010-08-18 71 views
8

什麼是驗證xmpp jid的正確方法?語法描述爲here:,但我不太瞭解它。此外,它看起來很複雜,所以使用庫來做它似乎是一個好主意。使用python驗證XMPP jid?

我目前使用xmpppy,但我似乎無法找到如何驗證與它的JID。任何幫助感謝!

回答

20

首先,JID的當前最佳參考是RFC 6122

我正要給你在這裏的正則表達式,但有一個有點忘乎所以,並實現所有規格的:

import re 
import sys 
import socket 
import encodings.idna 
import stringprep 

# These characters aren't allowed in domain names that are used 
# in XMPP 
BAD_DOMAIN_ASCII = "".join([chr(c) for c in range(0,0x2d) + 
        [0x2e, 0x2f] + 
        range(0x3a,0x41) + 
        range(0x5b,0x61) + 
        range(0x7b, 0x80)]) 

# check bi-directional character validity 
def bidi(chars): 
    RandAL = map(stringprep.in_table_d1, chars) 
    for c in RandAL: 
     if c: 
      # There is a RandAL char in the string. Must perform further 
      # tests: 
      # 1) The characters in section 5.8 MUST be prohibited. 
      # This is table C.8, which was already checked 
      # 2) If a string contains any RandALCat character, the string 
      # MUST NOT contain any LCat character. 
      if filter(stringprep.in_table_d2, chars): 
       raise UnicodeError("Violation of BIDI requirement 2") 

      # 3) If a string contains any RandALCat character, a 
      # RandALCat character MUST be the first character of the 
      # string, and a RandALCat character MUST be the last 
      # character of the string. 
      if not RandAL[0] or not RandAL[-1]: 
       raise UnicodeError("Violation of BIDI requirement 3") 

def nodeprep(u): 
    chars = list(unicode(u)) 
    i = 0 
    while i < len(chars): 
     c = chars[i] 
     # map to nothing 
     if stringprep.in_table_b1(c): 
      del chars[i] 
     else: 
      # case fold 
      chars[i] = stringprep.map_table_b2(c) 
      i += 1 
    # NFKC 
    chars = stringprep.unicodedata.normalize("NFKC", "".join(chars)) 
    for c in chars: 
     if (stringprep.in_table_c11(c) or 
      stringprep.in_table_c12(c) or 
      stringprep.in_table_c21(c) or 
      stringprep.in_table_c22(c) or 
      stringprep.in_table_c3(c) or 
      stringprep.in_table_c4(c) or 
      stringprep.in_table_c5(c) or 
      stringprep.in_table_c6(c) or 
      stringprep.in_table_c7(c) or 
      stringprep.in_table_c8(c) or 
      stringprep.in_table_c9(c) or 
      c in "\"&'/:<>@"): 
      raise UnicodeError("Invalid node character") 

    bidi(chars) 

    return chars 

def resourceprep(res): 
    chars = list(unicode(res)) 
    i = 0 
    while i < len(chars): 
     c = chars[i] 
     # map to nothing 
     if stringprep.in_table_b1(c): 
      del chars[i] 
     else: 
      i += 1 
    # NFKC 
    chars = stringprep.unicodedata.normalize("NFKC", "".join(chars)) 
    for c in chars: 
     if (stringprep.in_table_c12(c) or 
      stringprep.in_table_c21(c) or 
      stringprep.in_table_c22(c) or 
      stringprep.in_table_c3(c) or 
      stringprep.in_table_c4(c) or 
      stringprep.in_table_c5(c) or 
      stringprep.in_table_c6(c) or 
      stringprep.in_table_c7(c) or 
      stringprep.in_table_c8(c) or 
      stringprep.in_table_c9(c)): 
      raise UnicodeError("Invalid node character") 

    bidi(chars) 

    return chars 

def parse_jid(jid): 
    # first pass 
    m = re.match("^(?:([^\"&'/:<>@]{1,1023})@)?([^/@]{1,1023})(?:/(.{1,1023}))?$", jid) 
    if not m: 
     return False 

    (node, domain, resource) = m.groups() 
    try: 
     # ipv4 address? 
     socket.inet_pton(socket.AF_INET, domain) 
    except socket.error: 
     # ipv6 address? 
     try: 
      socket.inet_pton(socket.AF_INET6, domain) 
     except socket.error: 
      # domain name 
      dom = [] 
      for label in domain.split("."): 
       try: 
        label = encodings.idna.nameprep(unicode(label)) 
        encodings.idna.ToASCII(label) 
       except UnicodeError: 
        return False 

       # UseSTD3ASCIIRules is set, but Python's nameprep doesn't enforce it. 
       # a) Verify the absence of non-LDH ASCII code points; that is, the 
       for c in label: 
        if c in BAD_DOMAIN_ASCII: 
         return False 
       # Verify the absence of leading and trailing hyphen-minus 
       if label[0] == '-' or label[-1] == "-": 
        return False 
       dom.append(label) 
      domain = ".".join(dom) 
    try: 
     if node is not None: 
      node = nodeprep(node) 
     if resource is not None: 
      resource = resourceprep(resource) 
    except UnicodeError: 
     return False 

    return node, domain, resource 

if __name__ == "__main__": 
    results = parse_jid(sys.argv[1]) 
    if not results: 
     print "FAIL" 
    else: 
     print results 

是的,這是很多的工作。所有這一切都有充分的理由,但是如果précis工作組取得成果,我們希望在未來有所簡化。

+0

對不起延遲請求;我打算按照你的方式來實現它,但是我想知道對codeprep的迭代對於stringprep是否真的是正確的。在[stringprep RFC](https://tools.ietf.org/html/rfc3454)中,他們討論的是字符,它不一定等同於代碼點(考慮組合變音符號)。或者我錯過了關於unicode術語的東西? – 2014-06-04 13:34:47

+0

stringprep RFC是在IETF爲解決該問題所需要的細緻入微的Unicode視圖之前編寫的。當RFC說「字符」在大多數地方意味着「codepoint」。我們正試圖在[précis](http://tools.ietf.org/wg/precis/charters)工作組中解決這個問題。 – 2014-06-04 14:17:05

+0

爲了幫助其他人(如我!)試圖在Python 3中使用這段代碼,需要做兩處改變:range()需要交給['itertools.chain()']( http://stackoverflow.com/a/14099894)而不是與+連接(並且一個列表也需要作爲'range()'),並且'unicode()'調用需要被移除。 – Kromey 2014-12-03 19:57:55