使用python驗證XMPP jid？

什麼是驗證xmpp jid的正確方法？語法描述爲here:，但我不太瞭解它。此外，它看起來很複雜，所以使用庫來做它似乎是一個好主意。使用python驗證XMPP jid？

我目前使用xmpppy，但我似乎無法找到如何驗證與它的JID。任何幫助感謝！

2010-08-18 static_rtti

首先，JID的當前最佳參考是RFC 6122。

我正要給你在這裏的正則表達式，但有一個有點忘乎所以，並實現所有規格的：

import re 
import sys 
import socket 
import encodings.idna 
import stringprep 

# These characters aren't allowed in domain names that are used 
# in XMPP 
BAD_DOMAIN_ASCII = "".join([chr(c) for c in range(0,0x2d) + 
        [0x2e, 0x2f] + 
        range(0x3a,0x41) + 
        range(0x5b,0x61) + 
        range(0x7b, 0x80)]) 

# check bi-directional character validity 
def bidi(chars): 
    RandAL = map(stringprep.in_table_d1, chars) 
    for c in RandAL: 
     if c: 
      # There is a RandAL char in the string. Must perform further 
      # tests: 
      # 1) The characters in section 5.8 MUST be prohibited. 
      # This is table C.8, which was already checked 
      # 2) If a string contains any RandALCat character, the string 
      # MUST NOT contain any LCat character. 
      if filter(stringprep.in_table_d2, chars): 
       raise UnicodeError("Violation of BIDI requirement 2") 

      # 3) If a string contains any RandALCat character, a 
      # RandALCat character MUST be the first character of the 
      # string, and a RandALCat character MUST be the last 
      # character of the string. 
      if not RandAL[0] or not RandAL[-1]: 
       raise UnicodeError("Violation of BIDI requirement 3") 

def nodeprep(u): 
    chars = list(unicode(u)) 
    i = 0 
    while i < len(chars): 
     c = chars[i] 
     # map to nothing 
     if stringprep.in_table_b1(c): 
      del chars[i] 
     else: 
      # case fold 
      chars[i] = stringprep.map_table_b2(c) 
      i += 1 
    # NFKC 
    chars = stringprep.unicodedata.normalize("NFKC", "".join(chars)) 
    for c in chars: 
     if (stringprep.in_table_c11(c) or 
      stringprep.in_table_c12(c) or 
      stringprep.in_table_c21(c) or 
      stringprep.in_table_c22(c) or 
      stringprep.in_table_c3(c) or 
      stringprep.in_table_c4(c) or 
      stringprep.in_table_c5(c) or 
      stringprep.in_table_c6(c) or 
      stringprep.in_table_c7(c) or 
      stringprep.in_table_c8(c) or 
      stringprep.in_table_c9(c) or 
      c in "\"&'/:<>@"): 
      raise UnicodeError("Invalid node character") 

    bidi(chars) 

    return chars 

def resourceprep(res): 
    chars = list(unicode(res)) 
    i = 0 
    while i < len(chars): 
     c = chars[i] 
     # map to nothing 
     if stringprep.in_table_b1(c): 
      del chars[i] 
     else: 
      i += 1 
    # NFKC 
    chars = stringprep.unicodedata.normalize("NFKC", "".join(chars)) 
    for c in chars: 
     if (stringprep.in_table_c12(c) or 
      stringprep.in_table_c21(c) or 
      stringprep.in_table_c22(c) or 
      stringprep.in_table_c3(c) or 
      stringprep.in_table_c4(c) or 
      stringprep.in_table_c5(c) or 
      stringprep.in_table_c6(c) or 
      stringprep.in_table_c7(c) or 
      stringprep.in_table_c8(c) or 
      stringprep.in_table_c9(c)): 
      raise UnicodeError("Invalid node character") 

    bidi(chars) 

    return chars 

def parse_jid(jid): 
    # first pass 
    m = re.match("^(?:([^\"&'/:<>@]{1,1023})@)?([^/@]{1,1023})(?:/(.{1,1023}))?$", jid) 
    if not m: 
     return False 

    (node, domain, resource) = m.groups() 
    try: 
     # ipv4 address? 
     socket.inet_pton(socket.AF_INET, domain) 
    except socket.error: 
     # ipv6 address? 
     try: 
      socket.inet_pton(socket.AF_INET6, domain) 
     except socket.error: 
      # domain name 
      dom = [] 
      for label in domain.split("."): 
       try: 
        label = encodings.idna.nameprep(unicode(label)) 
        encodings.idna.ToASCII(label) 
       except UnicodeError: 
        return False 

       # UseSTD3ASCIIRules is set, but Python's nameprep doesn't enforce it. 
       # a) Verify the absence of non-LDH ASCII code points; that is, the 
       for c in label: 
        if c in BAD_DOMAIN_ASCII: 
         return False 
       # Verify the absence of leading and trailing hyphen-minus 
       if label[0] == '-' or label[-1] == "-": 
        return False 
       dom.append(label) 
      domain = ".".join(dom) 
    try: 
     if node is not None: 
      node = nodeprep(node) 
     if resource is not None: 
      resource = resourceprep(resource) 
    except UnicodeError: 
     return False 

    return node, domain, resource 

if __name__ == "__main__": 
    results = parse_jid(sys.argv[1]) 
    if not results: 
     print "FAIL" 
    else: 
     print results

是的，這是很多的工作。所有這一切都有充分的理由，但是如果précis工作組取得成果，我們希望在未來有所簡化。

來源

2010-08-19 07:42:24

對不起延遲請求;我打算按照你的方式來實現它，但是我想知道對codeprep的迭代對於stringprep是否真的是正確的。在[stringprep RFC]（https://tools.ietf.org/html/rfc3454）中，他們討論的是字符，它不一定等同於代碼點（考慮組合變音符號）。或者我錯過了關於unicode術語的東西？ – 2014-06-04 13:34:47

stringprep RFC是在IETF爲解決該問題所需要的細緻入微的Unicode視圖之前編寫的。當RFC說「字符」在大多數地方意味着「codepoint」。我們正試圖在[précis]（http://tools.ietf.org/wg/precis/charters）工作組中解決這個問題。 – 2014-06-04 14:17:05

爲了幫助其他人（如我！）試圖在Python 3中使用這段代碼，需要做兩處改變：range（）需要交給['itertools.chain（）']（ http://stackoverflow.com/a/14099894）而不是與+連接（並且一個列表也需要作爲'range（）'），並且'unicode（）'調用需要被移除。 – Kromey 2014-12-03 19:57:55

使用python驗證XMPP jid？

回答

相關問題