我們試圖匹配使用Python通過Oracle MD5哈希算法的哈希。根據他們的forums一切都在AL21UTF8散列之前編碼:如何使用Python對AL32UTF8中的文本進行編碼
-- Prior to encryption, hashing or keyed hashing, CLOB datatype is
-- converted to AL32UTF8. This allows cryptographic data to be
-- transferred and understood between databases with different
-- character sets, across character set changes and between
-- separate processes (for example, Java programs).
--
我開始還以爲是UTF-8是不夠好,但如果我這樣做,我的哈希值仍然不匹配。因此,在額外挖掘之後,我發現這個article,其中規定從Oracle's Database Companion CD installation Guide:
AL32UTF8是適用於XMLType數據的Oracle數據庫字符集。它相當於IANA註冊的標準UTF-8編碼,它支持所有有效的XML字符。
不要將Oracle數據庫數據庫字符集UTF8(無連字符)與數據庫字符集AL32UTF8或字符編碼UTF-8混淆。數據庫字符集UTF8已被AL32UTF8取代。不要對XML數據使用UTF8。 UTF8僅支持Unicode版本3.1及更早版本;它不支持所有有效的XML字符。 AL32UTF8沒有這樣的限制。
所以我不能使用UTF-8,我無法弄清楚如何讓Python的編解碼器模塊區分utf-8和utf8。如果我嘗試AL32UTF8,它會引發錯誤。有沒有其他人在Python中使用AL32UTF8進行編碼?
我的編解碼器的代碼如下所示:
import codecs
sourceFmt = "ascii"
targetFmt = "utf8"
utfFile = "kesa_utf8.dat"
with codecs.open(old, "rU", sourceFmt) as sourceFile:
with codecs.open(utfFile, "w", targetFmt) as targetFile:
targetFile.write(sourceFile.read())
文件本身看起來像這樣:
WC000|IC |KESA |KESA | | | |2012-07-31-15.12.36 |0090| | |\c\n
WC001|100534 |W.47212-0100534 |2012-07-31-15.12.36 | 00000000001270.00|USD|\c\n
WC002|100534 |W.47212-0100534 |Sally |H |Klass |1235 14th St. W. || |Palma Sola ||FL |USA |34209 | | | | | | | | |9412587545 | | |O | | ||20800426|645858741 |SSN | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |KESAPC | | | | | |N| | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |\c\n
WC999|1000000000|1000000000|4000000000|
哈希應該是86D993FA7121E3B9EE1657A23345FE21
反正我哈希使用hashlib它:
import hashlib
with open(path) as f:
data = f.read()
mdhash = hashlib.md5(data)
mdhash = mdhash.hexdigest()
print mdhash
其結果是8421877dd9cdf7235eec47765821998c
您可以使用Oracle中的['convert'](http://www.techonthenet.com/oracle/functions/convert.php)函數來編碼UTF-8中的AL32UTF8字符嗎? – Ben 2012-08-01 13:46:32
我想你錯過了文章。 UTF8有點奇怪,但與您的問題無關。 Oracle的AL32UTF8就是其他地方所謂的UTF-8。請顯示您的代碼並顯示一個示例。 – Codo 2012-08-01 14:01:45
ASCII和UTF-8中的相同字符之間沒有區別,即byte-for-byte。 – 2012-08-01 15:04:42