我遇到了一個情況下SQL服務器可以存儲「索菲亞」和「索菲亞」是兩個不同的字符串,但在TSQL比較時,他們是不管逐份使用,即使二進制整理相同:爲什麼TSQL將「sofia」視爲「sofia」?這是什麼字符串編碼?
CREATE TABLE #R (NAME NvarchAR(255) COLLATE SQL_Latin1_General_CP1_CI_AS)
INSERT INTO #R VALUES (N'sofia')
INSERT INTO #r VALUES (N'sofia')
SELECT * FROM #r WHERE NAME = N'sofia'
sofia
sofia
(2 row(s) affected)
IF 'sofia' = 'sofia' COLLATE SQL_Latin1_General_CP1_CI_AS
SELECT 'Values are the same'
ELSE
SELECT 'Values are different'
-------------------
Values are the same
(1 row(s) affected)
IF 'sofia' = 'sofia' COLLATE SQL_Latin1_General_CP437_BIN
SELECT 'Values are the same'
ELSE
SELECT 'Values are different'
-------------------
Values are the same
(1 row(s) affected)
I tried to find out the encode of "sofia"
http://stackoverflow.com/questions/1025332/determine-a-strings-encoding-in-c-sharp
It said:
// If all else fails, the encoding is probably (though certainly not
// definitely) the user's local codepage! One might present to the user a
// list of alternative encodings as shown here: http://stackoverflow.com/questions/8509339/what-is-the-most-common-encoding-of-each-language
// A full list can be found using Encoding.GetEncodings();
I iterate through all the encoding returned from Encoding.GetEncodings(), none of them match
Looking into the binary I found an interesting fact: 「sofia」 itself is encoded with UTF16, but it can be generated from "SOFIA" UTF16 by filling 「1」 instead of 「0」 in the extra byte besides ASCII code (Ex for ‘S’: 83 255 vs 83 0) It is shown as lower case. In C#,
「sofia」
[0] 83 byte
[1] 255 byte
[2] 79 byte
[3] 255 byte
[4] 70 byte
[5] 255 byte
[6] 73 byte
[7] 255 byte
[8] 65 byte
[9] 255 byte
"SOFIA"
[0] 83 byte
[1] 0 byte
[2] 79 byte
[3] 0 byte
[4] 70 byte
[5] 0 byte
[6] 73 byte
[7] 0 byte
[8] 65 byte
[9] 0 byte
"sofia"
[0] 115 byte
[1] 0 byte
[2] 79 byte
[3] 0 byte
[4] 70 byte
[5] 0 byte
[6] 105 byte
[7] 0 byte
[8] 97 byte
[9] 0 byte
One can create two different directorie/files with name as C:\sofia\, C:\sofia\ or sofia.txt, sofia.txt.
Why does the SQL engine think they are the same while storing them with the original streams?
In order to get just the exact I want I had to convert to binary first:
SELECT * FROM #r WHERE CONVERT(VARBINARY(100), Name) = CONVERT(VARBINARY(100), N'sofia')
sofia
(1 row(s) affected)
SELECT * FROM #r WHERE CONVERT(VARBINARY(100), Name) = CONVERT(VARBINARY(100), N'sofia')
sofia
(1 row(s) affected)
但這有很多副作用,如文化和案例。我如何教導 TSQL引擎知道他們是不同的,沒有太多的成本?
是否有這種字符串編碼的正式名稱?
我很好奇,如果我的回答幫助你解決了你的問題。 –