2010-07-10 68 views
8

任何人都可以指向我的排序算法在JavaScript中,將排序SQL Server的相同方式(對於nvarchar/unicode列)?JavaScript排序匹配SQL Server排序

作爲參考,我以前對這種行爲問題可以在這裏找到:SQL Server 2008 - different sort orders on VARCHAR vs NVARCHAR values

,而不是試圖改變在服務器端排序行爲,是有辦法,我可以在客戶端匹配呢?我之前的問題特別提到了按排序順序的破折號,但我會假設它比簡單地忽略破折號更重要一點。

我已經在這裏添加了一些額外的使用情況,以更好地展示問題

採樣數據從SQL Server(2008)排序:

?test 
^&$Grails Found 
bags of Garbage 
Brochures distributed 
Calls Received 
exhibit visitors 
Exhibit Visitors 
-Exhibit Visitors 
--Exhibit Visitors 
Ëxhibit Visitors 
Grails Found 

我怎樣才能得到JavaScript以相同的值進行排序一樣的方法?

請讓我知道,如果我可以進一步澄清。

+0

所以,從這個問題,你想要的JavaScript來排序,現在的Unicode'統一之前A''-A'? – 2010-07-11 00:42:50

+0

@Bock - 正確,雖然更具體地說,我想要一個與服務器端相匹配的javascript排序算法(我想除了「 - 」字符還有更多要考慮) – DanP 2010-07-11 01:16:00

回答

6

第一你的數據庫整理是什麼?我會假設它是SQL_Latin1_General_CP1_CS_ASSQL_Latin1_General_CP1_CI_AS。如果是這樣,那麼以下應該工作(還沒有完全測試)。

它看起來像是寫一個true Unicode分揀機是一項主要工作。我見過比規格更直接的稅碼。 ;-)它似乎總是涉及查找表和至少一個3級排序 - 修改字符和收縮來解釋。

我限於以下的Latin 1Latin Extended-A,和Latin Extended-B表/歸類。該算法應該在那些集合上工作得很好,但我沒有完全測試它,也沒有正確考慮修改字符(以節省速度和複雜性)。

查看它in action at jsbin.com

功能:

function bIgnoreForPrimarySort (iCharCode) 
{ 
    /*--- A bunch of characters get ignored for the primary sort weight. 
     The most important ones are the hyphen and apostrophe characters. 
     A bunch of control characters and a couple of odds and ends, make up 
     the rest. 
    */ 
    if (iCharCode < 9)             return true; 

    if (iCharCode >= 14 && iCharCode <= 31)       return true; 

    if (iCharCode >= 127 && iCharCode <= 159)       return true; 

    if (iCharCode == 39 || iCharCode == 45 || iCharCode == 173) return true; 

    return false; 
} 


function SortByRoughSQL_Latin1_General_CP1_CS_AS (sA, sB) 
{ 
    /*--- This Sorts Latin1 and extended Latin1 unicode with an approximation 
     of SQL's SQL_Latin1_General_CP1_CS_AS collation. 
     Certain modifying characters or contractions my be off (not tested), we trade-off 
     perfect accuracy for speed and relative simplicity. 

     True unicode sorting is devilishly complex and we're not getting paid enough to 
     fully implement it in Javascript. ;-) 

     It looks like a definative sort would require painstaking exegesis of documents 
     such as: http://unicode.org/reports/tr10/ 
    */ 
    //--- This is the master lookup table for Latin1 code-points. Here through the extended set \u02AF 
    //--- Make this static? 
    var aSortOrder = [ 
        -1, 151, 152, 153, 154, 155, 156, 157, 158, 2, 3, 4, 5, 6, 159, 160, 161, 162, 163, 164, 
        165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 0, 7, 8, 9, 10, 11, 12, 210, 
        13, 14, 15, 41, 16, 211, 17, 18, 65, 69, 71, 74, 76, 77, 80, 81, 82, 83, 19, 20, 
        42, 43, 44, 21, 22, 214, 257, 266, 284, 308, 347, 352, 376, 387, 419, 427, 438, 459, 466, 486, 
        529, 534, 538, 559, 576, 595, 636, 641, 647, 650, 661, 23, 24, 25, 26, 27, 28, 213, 255, 265, 
        283, 307, 346, 350, 374, 385, 418, 426, 436, 458, 464, 485, 528, 533, 536, 558, 575, 594, 635, 640, 
        646, 648, 660, 29, 30, 31, 32, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 
        190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 
         1, 33, 53, 54, 55, 56, 34, 57, 35, 58, 215, 46, 59, 212, 60, 36, 61, 45, 72, 75, 
        37, 62, 63, 64, 38, 70, 487, 47, 66, 67, 68, 39, 219, 217, 221, 231, 223, 233, 250, 276, 
        312, 310, 316, 318, 392, 390, 395, 397, 295, 472, 491, 489, 493, 503, 495, 48, 511, 599, 597, 601, 
        603, 652, 590, 573, 218, 216, 220, 230, 222, 232, 249, 275, 311, 309, 315, 317, 391, 389, 394, 396, 
        294, 471, 490, 488, 492, 502, 494, 49, 510, 598, 596, 600, 602, 651, 589, 655, 229, 228, 227, 226, 
        235, 234, 268, 267, 272, 271, 270, 269, 274, 273, 286, 285, 290, 287, 324, 323, 322, 321, 314, 313, 
        326, 325, 320, 319, 358, 357, 362, 361, 356, 355, 364, 363, 378, 377, 380, 379, 405, 404, 403, 402, 
        401, 400, 407, 406, 393, 388, 417, 416, 421, 420, 432, 431, 428, 440, 439, 447, 446, 444, 443, 442, 
        441, 450, 449, 468, 467, 474, 473, 470, 469, 477, 484, 483, 501, 500, 499, 498, 507, 506, 527, 526, 
        540, 539, 544, 543, 542, 541, 561, 560, 563, 562, 567, 566, 565, 564, 580, 579, 578, 577, 593, 592, 
        611, 610, 609, 608, 607, 606, 613, 612, 617, 616, 615, 614, 643, 642, 654, 653, 656, 663, 662, 665, 
        664, 667, 666, 574, 258, 260, 262, 261, 264, 263, 281, 278, 277, 304, 292, 289, 288, 297, 335, 337, 
        332, 348, 349, 369, 371, 382, 415, 409, 434, 433, 448, 451, 462, 476, 479, 509, 521, 520, 524, 523, 
        531, 530, 552, 572, 571, 569, 570, 583, 582, 581, 585, 632, 631, 634, 638, 658, 657, 669, 668, 673, 
        677, 676, 678, 73, 79, 78, 680, 644, 50, 51, 52, 40, 303, 302, 301, 457, 456, 455, 482, 481, 
        480, 225, 224, 399, 398, 497, 496, 605, 604, 626, 625, 620, 619, 624, 623, 622, 621, 334, 241, 240, 
        237, 236, 254, 253, 366, 365, 360, 359, 430, 429, 505, 504, 515, 514, 675, 674, 422, 300, 299, 298, 
        354, 353, 84, 85, 86, 87, 239, 238, 252, 251, 513, 512, 243, 242, 245, 244, 328, 327, 330, 329, 
        411, 410, 413, 412, 517, 516, 519, 518, 547, 546, 549, 548, 628, 627, 630, 629, 88, 89, 90, 91, 
        92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 
        112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 
        132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 246, 247, 248, 259, 279, 280, 293, 291, 
        339, 336, 338, 331, 340, 341, 342, 423, 367, 373, 351, 370, 372, 383, 381, 384, 408, 414, 386, 445, 
        453, 452, 454, 461, 463, 460, 475, 478, 465, 508, 522, 525, 532, 550, 553, 554, 555, 545, 556, 557, 
        537, 551, 568, 333, 424, 343, 344, 586, 584, 618, 633, 637, 639, 645, 659, 649, 670, 671, 672, 679, 
        681, 682, 683, 282, 686, 256, 345, 368, 375, 425, 435, 437, 535, 684, 685, 305, 296, 306, 591, 587, 
        588, 144, 145, 146, 147, 148, 149, 150 
        ]; 

    var iLenA   = sA.length, iLenB   = sB.length; 
    var jA    = 0,   jB    = 0; 
    var sIgnoreBuff_A = [],   sIgnoreBuff_B = []; 


    function iSortIgnoreBuff() 
    { 
     var iIgLenA = sIgnoreBuff_A.length, iIgLenB = sIgnoreBuff_B.length; 
     var kA  = 0,     kB  = 0; 

     while (kA < iIgLenA && kB < iIgLenB) 
     { 
      var igA = sIgnoreBuff_A [kA++], igB = sIgnoreBuff_B [kB++]; 

      if (aSortOrder[igA] > aSortOrder[igB]) return 1; 
      if (aSortOrder[igA] < aSortOrder[igB]) return -1; 
     } 
     //--- All else equal, longest string loses 
     if (iIgLenA > iIgLenB)  return 1; 
     if (iIgLenA < iIgLenB)  return -1; 

     return 0; 
    } 


    while (jA < iLenA && jB < iLenB) 
    { 
     var cA = sA.charCodeAt (jA++); 
     var cB = sB.charCodeAt (jB++); 

     if (cA == cB) 
     { 
      continue; 
     } 

     while (bIgnoreForPrimarySort (cA)) 
     { 
      sIgnoreBuff_A.push (cA); 
      if (jA < iLenA) 
       cA = sA.charCodeAt (jA++); 
      else 
       break; 
     } 
     while (bIgnoreForPrimarySort (cB)) 
     { 
      sIgnoreBuff_B.push (cB); 
      if (jB < iLenB) 
       cB = sB.charCodeAt (jB++); 
      else 
       break; 
     } 

     /*--- Have we reached the end of one or both strings, ending on an ignore char? 
      The strings were equal, up to that point. 
      If one of the strings is NOT an ignore char, while the other is, it wins. 
     */ 
     if (bIgnoreForPrimarySort (cA)) 
     { 
      if (! bIgnoreForPrimarySort (cB)) return -1; 
     } 
     else if (bIgnoreForPrimarySort (cB)) 
     { 
      return 1; 
     } 
     else 
     { 
      if (aSortOrder[cA] > aSortOrder[cB]) 
       return 1; 

      if (aSortOrder[cA] < aSortOrder[cB]) 
       return -1; 

      //--- We are equal, so far, on the main chars. Where there ignore chars? 
      var iBuffSort = iSortIgnoreBuff(); 
      if (iBuffSort) return iBuffSort; 

      //--- Still here? Reset the ignore arrays. 
      sIgnoreBuff_A = []; 
      sIgnoreBuff_B = []; 
     } 

    } //-- while (jA < iLenA && jB < iLenB) 

    /*--- We have gone through all of at least one string and they are still both 
     equal barring ignore chars or unequal lengths. 
    */ 
    var iBuffSort = iSortIgnoreBuff(); 
    if (iBuffSort) return iBuffSort; 

    //--- All else equal, longest string loses 
    if (iLenA > iLenB)  return 1; 
    if (iLenA < iLenB)  return -1; 

    return 0; 

} //-- function SortByRoughSQL_Latin1_General_CP1_CS_AS 

測試:

var aPhrases = [ 
        'Grails Found', 
        '--Exhibit Visitors', 
        '-Exhibit Visitors', 
        'Exhibit Visitors', 
        'Calls Received', 
        'Ëxhibit Visitors', 
        'Brochures distributed', 
        'exhibit visitors', 
        'bags of Garbage', 
        '^&$Grails Found', 
        '?test' 
       ]; 

aPhrases.sort (SortByRoughSQL_Latin1_General_CP1_CS_AS); 

console.log (aPhrases.join ('\n')); 

結果:

?test 
^&$Grails Found 
bags of Garbage 
Brochures distributed 
Calls Received 
exhibit visitors 
Exhibit Visitors 
-Exhibit Visitors 
--Exhibit Visitors 
Ëxhibit Visitors 
Grails Found 
+0

我已驗證服務器排序規則設置爲:SQL_Latin1_General_CP1_CI_AS,我將調查您的方法以查看它是如何排除的。順便說一句,我認爲我的賞金有點便宜......如果這樣做,我會允許它在接受你的答案之前到期,這樣我就可以給你一個更高的獎賞(看起來公平/合理嗎?) – DanP 2010-07-16 16:18:29

+0

@ DanP:不要擔心賞金(除非你沒有得到滿意的答案)。我喜歡積分,但我也會做這些事情來幫助和挑戰 - 而不是像數獨或填字遊戲。 – 2010-07-16 22:37:48

+0

這對我來說很好! – Patricia 2010-07-19 20:33:34

2

對不起,JavaScript沒有整理功能。唯一的字符串比較是直接在String中的UTF-16代碼單元,由charCodeAt()返回。

對於基本多語言平面中的字符,這與二進制排序規則相同,所以如果您需要JS和SQL Server來同意(無論如何忽略星體平面),我認爲這是您要做的唯一方法它。 (建設JS字符串和核對,精心複製SQL Server的排序規則,反正短,不是很好玩那裏。)

(有什麼用的情況下,爲什麼他們需要匹配?)

+1

感謝您的洞察力;用例很簡單 - 我從sql server發回已排序的數據,並在表中具有客戶端排序功能。當他們不同意時,我在分頁等時遇到問題。 – DanP 2010-07-11 01:14:41

2

@BrockAdams' answer是偉大的,但我有幾個優勢的情況下,在未與SQL服務器匹配的字符串中的連字符,我不能完全弄清楚是去哪兒錯了,所以我寫了一個更多的功能版本,只是過濾掉被忽略的字符,然後比較基於拉丁代碼點的數組。

它可能性能較差,但代碼要理解的更少,它適用於我在下面添加的SQL測試用例的匹配項。

我正在使用SQL Server數據庫與Latin1_General_100_CI_AS,所以它是不區分大小寫的,但我保持這裏的代碼區分大小寫,很容易切換到不區分大小寫的檢查,通過創建一個包裝函數將toLowerCase應用於變量。

這兩個排序規則與我的測試用例之間的排序沒有區別。

/** 
 
* This is a modified version of sortByRoughSQL_Latin1_General_CP1_CS_AS 
 
* This has a more functional approach, it is more basic 
 
* It simply does a character filter and then sort 
 
* @link https://stackoverflow.com/a/3266430/327074 
 
* 
 
* @param {String} a 
 
* @param {String} b 
 
* @returns {Number} -1,0,1 
 
*/ 
 
function latinSqlSort(a, b) { 
 
    'use strict'; 
 
    //--- This is the master lookup table for Latin1 code-points. 
 
    // Here through the extended set \u02AF 
 
    var latinLookup = [ 
 
     -1,151,152,153,154,155,156,157,158, 2, 3, 4, 5, 6,159,160,161,162,163,164, 
 
     165,166,167,168,169,170,171,172,173,174,175,176, 0, 7, 8, 9, 10, 11, 12,210, 
 
     13, 14, 15, 41, 16,211, 17, 18, 65, 69, 71, 74, 76, 77, 80, 81, 82, 83, 19, 20, 
 
     42, 43, 44, 21, 22,214,257,266,284,308,347,352,376,387,419,427,438,459,466,486, 
 
     529,534,538,559,576,595,636,641,647,650,661, 23, 24, 25, 26, 27, 28,213,255,265, 
 
     283,307,346,350,374,385,418,426,436,458,464,485,528,533,536,558,575,594,635,640, 
 
     646,648,660, 29, 30, 31, 32,177,178,179,180,181,182,183,184,185,186,187,188,189, 
 
     190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209, 
 
      1, 33, 53, 54, 55, 56, 34, 57, 35, 58,215, 46, 59,212, 60, 36, 61, 45, 72, 75, 
 
     37, 62, 63, 64, 38, 70,487, 47, 66, 67, 68, 39,219,217,221,231,223,233,250,276, 
 
     312,310,316,318,392,390,395,397,295,472,491,489,493,503,495, 48,511,599,597,601, 
 
     603,652,590,573,218,216,220,230,222,232,249,275,311,309,315,317,391,389,394,396, 
 
     294,471,490,488,492,502,494, 49,510,598,596,600,602,651,589,655,229,228,227,226, 
 
     235,234,268,267,272,271,270,269,274,273,286,285,290,287,324,323,322,321,314,313, 
 
     326,325,320,319,358,357,362,361,356,355,364,363,378,377,380,379,405,404,403,402, 
 
     401,400,407,406,393,388,417,416,421,420,432,431,428,440,439,447,446,444,443,442, 
 
     441,450,449,468,467,474,473,470,469,477,484,483,501,500,499,498,507,506,527,526, 
 
     540,539,544,543,542,541,561,560,563,562,567,566,565,564,580,579,578,577,593,592, 
 
     611,610,609,608,607,606,613,612,617,616,615,614,643,642,654,653,656,663,662,665, 
 
     664,667,666,574,258,260,262,261,264,263,281,278,277,304,292,289,288,297,335,337, 
 
     332,348,349,369,371,382,415,409,434,433,448,451,462,476,479,509,521,520,524,523, 
 
     531,530,552,572,571,569,570,583,582,581,585,632,631,634,638,658,657,669,668,673, 
 
     677,676,678, 73, 79, 78,680,644, 50, 51, 52, 40,303,302,301,457,456,455,482,481, 
 
     480,225,224,399,398,497,496,605,604,626,625,620,619,624,623,622,621,334,241,240, 
 
     237,236,254,253,366,365,360,359,430,429,505,504,515,514,675,674,422,300,299,298, 
 
     354,353, 84, 85, 86, 87,239,238,252,251,513,512,243,242,245,244,328,327,330,329, 
 
     411,410,413,412,517,516,519,518,547,546,549,548,628,627,630,629, 88, 89, 90, 91, 
 
     92, 93, 94, 95, 96, 97, 98, 99,100,101,102,103,104,105,106,107,108,109,110,111, 
 
     112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131, 
 
     132,133,134,135,136,137,138,139,140,141,142,143,246,247,248,259,279,280,293,291, 
 
     339,336,338,331,340,341,342,423,367,373,351,370,372,383,381,384,408,414,386,445, 
 
     453,452,454,461,463,460,475,478,465,508,522,525,532,550,553,554,555,545,556,557, 
 
     537,551,568,333,424,343,344,586,584,618,633,637,639,645,659,649,670,671,672,679, 
 
     681,682,683,282,686,256,345,368,375,425,435,437,535,684,685,305,296,306,591,587, 
 
     588,144,145,146,147,148,149,150 
 
    ]; 
 

 
    /** 
 
    * A bunch of characters get ignored for the primary sort weight. 
 
    * The most important ones are the hyphen and apostrophe characters. 
 
    * A bunch of control characters and a couple of odds and ends, make up 
 
    * the rest. 
 
    * 
 
    * @param {Number} 
 
    * @returns {Boolean} 
 
    * @link https://stackoverflow.com/a/3266430/327074 
 
    */ 
 
    function ignoreForPrimarySort(iCharCode) { 
 
     if (iCharCode < 9) { 
 
      return true; 
 
     } 
 

 
     if (iCharCode >= 14 && iCharCode <= 31) { 
 
      return true; 
 
     } 
 

 
     if (iCharCode >= 127 && iCharCode <= 159) { 
 
      return true; 
 
     } 
 

 
     if (iCharCode == 39 || iCharCode == 45 || iCharCode == 173) { 
 
      return true; 
 
     } 
 

 
     return false; 
 
    } 
 

 
    // normal sort 
 
    function compare(a, b) { 
 
     return a === b ? 0 : a > b ? 1 : -1; 
 
    } 
 

 
    // compare two arrays return first compare difference 
 
    function arrayCompare(a, b) { 
 
     return a.reduce(function (acc, x, i) { 
 
      return acc === 0 && i < b.length ? compare(x, b[i]) : acc; 
 
     }, 0); 
 
    } 
 

 
    /** 
 
    * convert a string to array of latin code point ordering 
 
    * @param {String} x 
 
    * @returns {Array} integer array 
 
    */ 
 
    function toLatinOrder(x) { 
 
     return x.split('') 
 
      // convert to char codes 
 
      .map(function(x){return x.charCodeAt(0);}) 
 
      // filter out ignored characters 
 
      .filter(function(x){return !ignoreForPrimarySort(x);}) 
 
      // convert to latin order 
 
      .map(function(x){return latinLookup[x];}); 
 
    } 
 

 
    // convert inputs 
 
    var charA = toLatinOrder(a), 
 
     charB = toLatinOrder(b); 
 

 
    // compare the arrays 
 
    var charsCompare = arrayCompare(charA, charB); 
 
    if (charsCompare !== 0) { 
 
     return charsCompare; 
 
    } 
 

 
    // fallback to the filtered array length 
 
    var charsLenCompare = compare(charA.length, charB.length); 
 
    if (charsLenCompare !== 0) { 
 
     return charsLenCompare; 
 
    } 
 

 
    // Final fallback to a basic length comparison 
 
    return compare(a.length, b.length); 
 
} 
 

 
var tests = [ 
 
    'Grails Found', 
 
    '--Exhibit Visitors', 
 
    '-Exhibit Visitors', 
 
    'Exhibit Visitors', 
 
    'Calls Received', 
 
    'Ëxhibit Visitors', 
 
    'Brochures distributed', 
 
    'exhibit visitors', 
 
    'bags of Garbage', 
 
    '^&$Grails Found', 
 
    '?test', 
 
    '612C-520', 
 
    '612-C-122', 
 
    '612C-122 I', 
 
    '612-C-126 L', 
 
    '612C-301 B', 
 
    '612C-304 B', 
 
    '612C-306', 
 
    '612-C-306', 
 
    '612-C-306 2', 
 
    '612-C-403 H', 
 
    '612C403 O', 
 
    '612-C-403(V)', 
 
    '612E-306A/B I', 
 
    '612E-306A/B O', 
 
    '612C-121 O', 
 
    '612C-111 B', 
 
    '- -612C-111 B' 
 
].sort(latinSqlSort).join('<br>'); 
 

 
document.write(tests);

+0

不確定' - -612C-111 B'值是否正確排序,但總的來說這個答案似乎很好(現在不想重新審視這個問題)。 – 2018-02-03 18:35:06

+1

@BrockAdams這實際上是把我拖下這個兔子洞的案例之一。我檢查過對SQL Server - 這是一個[SQL小提琴](http://sqlfiddle.com/#!18/3195a/2)的排序。 – icc97 2018-02-03 18:54:24