2012-02-07 44 views
1

我將FIX消息字符串(ASCII)作爲ByteBuffer。我解析標籤值對並將值作爲原始對象存儲在樹標籤中,並以標籤作爲關鍵字。所以我需要根據其類型將byte []值轉換爲int/double/date等。包含ascii字符串的byte []快速轉換爲int/double/date等無新字符串

最簡單的方法是創建新的字符串並將其傳遞給標準轉換器函數。例如

int convertToInt(byte[] buffer, int offset, int length) 
{ 
    String valueStr = new String(buffer, offset, length); 
    return Integer.parseInt(valueStr); 
} 

據我所知,在Java中,創建新的對象是非常便宜的,仍然是沒有什麼辦法來此ASCII字節[]直接轉換爲基本類型。我嘗試使用手寫函數來完成這項工作,但發現它很耗時,並且不會帶來更好的性能。

是否有任何第三方庫這樣做,最重要的是值得去做?

+2

測量性能,即微基準很難並且幾乎總是出錯。如果您需要整體性能,則將字符串化是個壞主意。您應該使用'ByteBufefr.putInt'來代替。除此之外,手寫'ByteBuffer'解析將會執行,如果使用'ByteBuffer'則不會將其轉換爲byte [],這會破壞ByteBuffer本身的用途。 – bestsss 2012-02-07 07:24:53

+0

謝謝bestss,但它是ASCII ByteBuffer,不是二進制的,所以不能使用getInt,putInt。 – Mahendra 2012-02-07 07:51:23

+0

什麼是你稱爲ASCII byteBuffer(標準jdk中沒有這樣的類) – bestsss 2012-02-07 07:57:46

回答

2

最重要的是值得去做嗎?

幾乎肯定不會 - 你應該測量檢查,這是去顯著努力減輕其性能瓶頸。

你現在的表現如何?它需要成爲什麼? (「越快越好」是不是一個很好的目標,否則你永遠不會停止 - 工作出來的時候,你可以說你是「完成」。)

配置文件的代碼 - 是問題真的在字符串創建?檢查你多久收集一次垃圾(再次使用分析器)。

每個解析類型可能具有不同的特徵。例如,對於解析整數,如果你發現了的時間顯著量你有一個單一的數字,你可能要特殊情況是:

if (length == 1) 
{ 
    char c = buffer[index]; 
    if (c >= '0' && c <= '9') 
    { 
     return c - '0'; 
    } 
    // Invalid - throw an exception or whatever 
} 

...但如何檢查往往這發生在你走下去之前。對於從未實際發生的特定優化應用大量檢查是相反的。

+0

我同意你的說法。我意識到,要獲得單位數微秒的性能改善將是太多努力。 Profiler說新的String會導致很多次要的集合。但我認爲,在應用程序上下文中剖析解析庫以獲得更清晰的圖像會更有意義。 – Mahendra 2012-02-07 07:55:54

+0

目前,需要大約20微秒的時間才能創建40個以上標記值對中的樹圖。 – Mahendra 2012-02-07 08:02:28

1

看看ByteBuffer。它具有執行此操作的功能,包括處理字節順序(字節順序)。

+0

我不認爲ByteBuffer有任何解析*文本*數據,是嗎? – 2012-02-07 07:38:29

+0

@JonSkeet - 不,但OP說:「我需要將byte []值轉換爲int/double/date等。」 – 2012-02-07 07:44:53

+2

謝謝泰德!它說byte []包含ascii字符串。 – Mahendra 2012-02-07 07:51:35

1

一般來說,我沒有任何偏好粘貼這樣的代碼,但不管怎麼說,100線它是如何做(生產代碼) 使用它,但有一定的參考代碼,這是不錯的(通常)

package t1; 

import java.io.UnsupportedEncodingException; 
import java.nio.ByteBuffer; 

public class IntParser { 
    final static byte[] digits = { 
     '0' , '1' , '2' , '3' , '4' , '5' , 
     '6' , '7' , '8' , '9' , 'a' , 'b' , 
     'c' , 'd' , 'e' , 'f' , 'g' , 'h' , 
     'i' , 'j' , 'k' , 'l' , 'm' , 'n' , 
     'o' , 'p' , 'q' , 'r' , 's' , 't' , 
     'u' , 'v' , 'w' , 'x' , 'y' , 'z' 
    }; 

    static boolean isDigit(byte b) { 
    return b>='0' && b<='9'; 
    } 

    static int digit(byte b){ 
     //negative = error 

     int result = b-'0'; 
     if (result>9) 
      result = -1; 
     return result; 
    } 

    static NumberFormatException forInputString(ByteBuffer b){ 
     byte[] bytes=new byte[b.remaining()]; 
     b.get(bytes); 
     try { 
      return new NumberFormatException("bad integer: "+new String(bytes, "8859_1")); 
     } catch (UnsupportedEncodingException e) { 
      throw new RuntimeException(e); 
     } 
    } 
    public static int parseInt(ByteBuffer b){ 
     return parseInt(b, 10, b.position(), b.limit()); 
    } 
    public static int parseInt(ByteBuffer b, int radix, int i, int max) throws NumberFormatException{ 
     int result = 0; 
     boolean negative = false; 


     int limit; 
     int multmin; 
     int digit;  

     if (max > i) { 
      if (b.get(i) == '-') { 
       negative = true; 
       limit = Integer.MIN_VALUE; 
       i++; 
      } else { 
       limit = -Integer.MAX_VALUE; 
      } 
      multmin = limit/radix; 
      if (i < max) { 
       digit = digit(b.get(i++)); 
       if (digit < 0) { 
        throw forInputString(b); 
       } else { 
        result = -digit; 
       } 
      } 
      while (i < max) { 
       // Accumulating negatively avoids surprises near MAX_VALUE 
       digit = digit(b.get(i++)); 
       if (digit < 0) { 
        throw forInputString(b); 
       } 
       if (result < multmin) { 
        throw forInputString(b); 
       } 
       result *= radix; 
       if (result < limit + digit) { 
        throw forInputString(b); 
       } 
       result -= digit; 
      } 
     } else { 
      throw forInputString(b); 
     } 
     if (negative) { 
      if (i > b.position()+1) { 
       return result; 
      } else { /* Only got "-" */ 
       throw forInputString(b); 
      } 
     } else { 
      return -result; 
     } 
    } 

} 
2

我不會建議但是,在處理很多FIX消息時,請同意Jon的意見,這很快就會增加。 下面的方法將允許填充空格的數字。如果你需要處理小數,那麼代碼會稍有不同。兩種方法之間的速度差異是因子11. ConvertToLong結果爲0個GC。以下代碼位於c#:

///<summary> 
///Converts a byte[] of characters that represent a number into a .net long type. Numbers can be padded from left 
/// with spaces. 
///</summary> 
///<param name="buffer">The buffer containing the number as characters</param> 
///<param name="startIndex">The startIndex of the number component</param> 
///<param name="endIndex">The EndIndex of the number component</param> 
///<returns>The price will be returned as a long from the ASCII characters</returns> 
public static long ConvertToLong(this byte[] buffer, int startIndex, int endIndex) 
{ 
    long result = 0; 
    for (int i = startIndex; i <= endIndex; i++) 
    { 
     if (buffer[i] != 0x20) 
     { 
      // 48 is the decimal value of the '0' character. So to convert the char value 
      // of an int to a number we subtract 48. e.g '1' = 49 -48 = 1 
      result = result * 10 + (buffer[i] - 48); 
     } 
    } 
    return result; 
} 

/// <summary> 
/// Same as above but converting to string then to long 
/// </summary> 
public static long ConvertToLong2(this byte[] buffer, int startIndex, int endIndex) 
{ 
    for (int i = startIndex; i <= endIndex; i++) 
    { 
     if (buffer[i] != SpaceChar) 
     { 
      return long.Parse(System.Text.Encoding.UTF8.GetString(buffer, i, (endIndex - i) + 1)); 
     } 
    } 
    return 0; 
} 

[Test] 
public void TestPerformance(){ 
    const int iterations = 200 * 1000; 
    const int testRuns = 10; 
    const int warmUp = 10000; 
    const string number = " 123400"; 
    byte[] buffer = System.Text.Encoding.UTF8.GetBytes(number); 

    double result = 0; 
    for (int i = 0; i < warmUp; i++){ 
     result = buffer.ConvertToLong(0, buffer.Length - 1); 
    } 
    for (int testRun = 0; testRun < testRuns; testRun++){ 
     Stopwatch sw = new Stopwatch(); 
     sw.Start(); 
     for (int i = 0; i < iterations; i++){ 
      result = buffer.ConvertToLong(0, buffer.Length - 1); 
     } 
     sw.Stop(); 
     Console.WriteLine("Test {4}: {0} ticks, {1}ms, 1 conversion takes = {2}μs or {3}ns. GCs: {5}", sw.ElapsedTicks, 
      sw.ElapsedMilliseconds, (((decimal) sw.ElapsedMilliseconds)/((decimal) iterations))*1000, 
      (((decimal) sw.ElapsedMilliseconds)/((decimal) iterations))*1000*1000, testRun, 
      GC.CollectionCount(0) + GC.CollectionCount(1) + GC.CollectionCount(2)); 
    } 
} 
RESULTS 
ConvertToLong: 
Test 0: 9243 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 1: 8339 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 2: 8425 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 3: 8333 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 4: 8332 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 5: 8331 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 6: 8409 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 7: 8334 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 8: 8335 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
Test 9: 8331 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2 
ConvertToLong2: 
Test 0: 109067 ticks, 55ms, 1 conversion takes = 0.275000μs or 275.000000ns. GCs: 4 
Test 1: 109861 ticks, 56ms, 1 conversion takes = 0.28000μs or 280.00000ns. GCs: 8 
Test 2: 102888 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 9 
Test 3: 105164 ticks, 53ms, 1 conversion takes = 0.265000μs or 265.000000ns. GCs: 10 
Test 4: 104083 ticks, 53ms, 1 conversion takes = 0.265000μs or 265.000000ns. GCs: 11 
Test 5: 102756 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 13 
Test 6: 102219 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 14 
Test 7: 102086 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 15 
Test 8: 102672 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 17 
Test 9: 102025 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 18