2016-09-10 101 views
2

我有一個字符串,其中包含UTF-32(但可能更高的16位將始終爲0)代碼點。每個標記是長字符串中每個字符的代碼點的4個字節中的1個。 請注意,在轉換爲字符串之前,將字節解釋爲signed int,我無法控制此字符串。JavaScript:如何將多字節字符串數組轉換爲32位int數組?

// Provided: 
    intEncodedBytesString= "0,0,0,-31,0,0,0,-15,0,0,0,-31"; //3 chars: áñá 

    // Wanted 
    actualCodePoints = [225,241,225]; 

我需要將intEncodedBytesString轉換爲actualCodePoints數組。 到目前爲止,我想出了這一點:

var intEncodedBytesStringArray = intEncodedBytesString.toString().split(','); 
var i, str = ''; 
var charAmount = intEncodedBytesStringArray.length/4; 

for (i = 0; i < charAmount; i++) { 
    var codePoint = 0; 

    for (var j = 0; j < 4; j++) { 
    var num = parseInt(intEncodedBytesStringArray[i * 4 + j], 10); 
    if (num != 0) { 
     if (num < 0) { 
     num = (1 << (8 * (4 - j))) + num; 
     } 

     codePoint += (num << (8 * (3 - j))); 
    } 
    } 

    str += String.fromCodePoint(codePoint); 
} 

是否有這樣做的更好的,更簡單的和/或更有效的方式?

我已經看到了幾十個答案和代碼snipets來處理類似的事情,但沒有解決這個問題,我的輸入字節在簽署整數的字符串:S

編輯:此代碼不會以最高的工作代碼點自1 < < 32是1而不是2^32。

+0

@ T.J.Crowder事實上,UTF-32。編輯補充說。 – TigerShark

回答

1

既然是簡單的UTF-32,不錯,有一種更簡單的方法:只用四字節塊。此外,處理可能的負面影響的簡單方法是(value + 256) % 256

所以:

var intEncodedBytesString = "0,0,0,-31,0,0,0,-15,0,0,0,-31"; //3 char 
var actualCodePoints = []; 
var bytes = intEncodedBytesString.split(",").map(Number); 
for (var i = 0; i < bytes.length; i += 4) { 
    actualCodePoints.push(
     (((bytes[i]  + 256) % 256) << 24) + 
     (((bytes[i + 1] + 256) % 256) << 16) + 
     (((bytes[i + 2] + 256) % 256) << 8) + 
     (bytes[i + 3] + 256) % 256 
); 
} 

與詳細的說明實施例中的註釋:

// Starting point 
 
var intEncodedBytesString = "0,0,0,-31,0,0,0,-15,0,0,0,-31"; //3 char 
 
// Target array 
 
var actualCodePoints = []; 
 
// Get the bytes as numbers by splitting on comman running the array 
 
// through Number to convert to number. 
 
var bytes = intEncodedBytesString.split(",").map(Number); 
 

 
// Loop through the bytes building code points 
 
var i, cp; 
 
for (i = 0; i < bytes.length; i += 4) { 
 
    // (x + 256) % 256 will handle turning (for instance) -31 into 224 
 
    // We shift the value for the first byte left 24 bits, the next byte 16 bits, 
 
    // the next 8 bits, and don't shift the last one at all. Adding them all 
 
    // together gives us the code point, which we push into the array. 
 
    cp = (((bytes[i]  + 256) % 256) << 24) + 
 
     (((bytes[i + 1] + 256) % 256) << 16) + 
 
     (((bytes[i + 2] + 256) % 256) << 8) + 
 
     (bytes[i + 3] + 256) % 256; 
 
    actualCodePoints.push(cp); 
 
} 
 

 
// Show the result 
 
console.log(actualCodePoints); 
 

 
// If the JavaScript engine supports it, show the string 
 
if (String.fromCodePoint) { // ES2015+ 
 
    var str = String.fromCodePoint.apply(String, actualCodePoints); 
 
    // The above could be 
 
    // `let str = String.fromCodePoint(...actualCodePoints);` 
 
    // on an ES2015+ engine 
 
    console.log(str); 
 
} else { 
 
    console.log("(Your browser doesn't support String.fromCodePoint)"); 
 
}

相關問題