8
什麼是驗證xmpp jid的正確方法?語法描述爲here:,但我不太瞭解它。此外,它看起來很複雜,所以使用庫來做它似乎是一個好主意。使用python驗證XMPP jid?
我目前使用xmpppy,但我似乎無法找到如何驗證與它的JID。任何幫助感謝!
什麼是驗證xmpp jid的正確方法?語法描述爲here:,但我不太瞭解它。此外,它看起來很複雜,所以使用庫來做它似乎是一個好主意。使用python驗證XMPP jid?
我目前使用xmpppy,但我似乎無法找到如何驗證與它的JID。任何幫助感謝!
首先,JID的當前最佳參考是RFC 6122。
我正要給你在這裏的正則表達式,但有一個有點忘乎所以,並實現所有規格的:
import re
import sys
import socket
import encodings.idna
import stringprep
# These characters aren't allowed in domain names that are used
# in XMPP
BAD_DOMAIN_ASCII = "".join([chr(c) for c in range(0,0x2d) +
[0x2e, 0x2f] +
range(0x3a,0x41) +
range(0x5b,0x61) +
range(0x7b, 0x80)])
# check bi-directional character validity
def bidi(chars):
RandAL = map(stringprep.in_table_d1, chars)
for c in RandAL:
if c:
# There is a RandAL char in the string. Must perform further
# tests:
# 1) The characters in section 5.8 MUST be prohibited.
# This is table C.8, which was already checked
# 2) If a string contains any RandALCat character, the string
# MUST NOT contain any LCat character.
if filter(stringprep.in_table_d2, chars):
raise UnicodeError("Violation of BIDI requirement 2")
# 3) If a string contains any RandALCat character, a
# RandALCat character MUST be the first character of the
# string, and a RandALCat character MUST be the last
# character of the string.
if not RandAL[0] or not RandAL[-1]:
raise UnicodeError("Violation of BIDI requirement 3")
def nodeprep(u):
chars = list(unicode(u))
i = 0
while i < len(chars):
c = chars[i]
# map to nothing
if stringprep.in_table_b1(c):
del chars[i]
else:
# case fold
chars[i] = stringprep.map_table_b2(c)
i += 1
# NFKC
chars = stringprep.unicodedata.normalize("NFKC", "".join(chars))
for c in chars:
if (stringprep.in_table_c11(c) or
stringprep.in_table_c12(c) or
stringprep.in_table_c21(c) or
stringprep.in_table_c22(c) or
stringprep.in_table_c3(c) or
stringprep.in_table_c4(c) or
stringprep.in_table_c5(c) or
stringprep.in_table_c6(c) or
stringprep.in_table_c7(c) or
stringprep.in_table_c8(c) or
stringprep.in_table_c9(c) or
c in "\"&'/:<>@"):
raise UnicodeError("Invalid node character")
bidi(chars)
return chars
def resourceprep(res):
chars = list(unicode(res))
i = 0
while i < len(chars):
c = chars[i]
# map to nothing
if stringprep.in_table_b1(c):
del chars[i]
else:
i += 1
# NFKC
chars = stringprep.unicodedata.normalize("NFKC", "".join(chars))
for c in chars:
if (stringprep.in_table_c12(c) or
stringprep.in_table_c21(c) or
stringprep.in_table_c22(c) or
stringprep.in_table_c3(c) or
stringprep.in_table_c4(c) or
stringprep.in_table_c5(c) or
stringprep.in_table_c6(c) or
stringprep.in_table_c7(c) or
stringprep.in_table_c8(c) or
stringprep.in_table_c9(c)):
raise UnicodeError("Invalid node character")
bidi(chars)
return chars
def parse_jid(jid):
# first pass
m = re.match("^(?:([^\"&'/:<>@]{1,1023})@)?([^/@]{1,1023})(?:/(.{1,1023}))?$", jid)
if not m:
return False
(node, domain, resource) = m.groups()
try:
# ipv4 address?
socket.inet_pton(socket.AF_INET, domain)
except socket.error:
# ipv6 address?
try:
socket.inet_pton(socket.AF_INET6, domain)
except socket.error:
# domain name
dom = []
for label in domain.split("."):
try:
label = encodings.idna.nameprep(unicode(label))
encodings.idna.ToASCII(label)
except UnicodeError:
return False
# UseSTD3ASCIIRules is set, but Python's nameprep doesn't enforce it.
# a) Verify the absence of non-LDH ASCII code points; that is, the
for c in label:
if c in BAD_DOMAIN_ASCII:
return False
# Verify the absence of leading and trailing hyphen-minus
if label[0] == '-' or label[-1] == "-":
return False
dom.append(label)
domain = ".".join(dom)
try:
if node is not None:
node = nodeprep(node)
if resource is not None:
resource = resourceprep(resource)
except UnicodeError:
return False
return node, domain, resource
if __name__ == "__main__":
results = parse_jid(sys.argv[1])
if not results:
print "FAIL"
else:
print results
是的,這是很多的工作。所有這一切都有充分的理由,但是如果précis工作組取得成果,我們希望在未來有所簡化。
對不起延遲請求;我打算按照你的方式來實現它,但是我想知道對codeprep的迭代對於stringprep是否真的是正確的。在[stringprep RFC](https://tools.ietf.org/html/rfc3454)中,他們討論的是字符,它不一定等同於代碼點(考慮組合變音符號)。或者我錯過了關於unicode術語的東西? – 2014-06-04 13:34:47
stringprep RFC是在IETF爲解決該問題所需要的細緻入微的Unicode視圖之前編寫的。當RFC說「字符」在大多數地方意味着「codepoint」。我們正試圖在[précis](http://tools.ietf.org/wg/precis/charters)工作組中解決這個問題。 – 2014-06-04 14:17:05
爲了幫助其他人(如我!)試圖在Python 3中使用這段代碼,需要做兩處改變:range()需要交給['itertools.chain()']( http://stackoverflow.com/a/14099894)而不是與+連接(並且一個列表也需要作爲'range()'),並且'unicode()'調用需要被移除。 – Kromey 2014-12-03 19:57:55