2013-11-28 23 views
3

我想優化用C編寫的Kasumi crypto算法。 有S盒用於加密數據。我代表一個巨大的數組:如何優化KASUMI密碼S盒?

int S7[128] = { 
    54, 50, 62, 56, 22, 34, 94, 96, 38, 6, 63, 93, 2, 18,123, 33, 
    55,113, 39,114, 21, 67, 65, 12, 47, 73, 46, 27, 25,111,124, 81, 
    53, 9,121, 79, 52, 60, 58, 48,101,127, 40,120,104, 70, 71, 43, 
    20,122, 72, 61, 23,109, 13,100, 77, 1, 16, 7, 82, 10,105, 98, 
    117,116, 76, 11, 89,106, 0,125,118, 99, 86, 69, 30, 57,126, 87, 
    112, 51, 17, 5, 95, 14, 90, 84, 91, 8, 35,103, 32, 97, 28, 66, 
    102, 31, 26, 45, 75, 4, 85, 92, 37, 74, 80, 49, 68, 29,115, 44, 
    64,107,108, 24,110, 83, 36, 78, 42, 19, 15, 41, 88,119, 59, 3 
}; 

int S9[512] = { 
    167,239,161,379,391,334, 9,338, 38,226, 48,358,452,385, 90,397, 
    183,253,147,331,415,340, 51,362,306,500,262, 82,216,159,356,177, 
    175,241,489, 37,206, 17, 0,333, 44,254,378, 58,143,220, 81,400, 
    95, 3,315,245, 54,235,218,405,472,264,172,494,371,290,399, 76, 
    165,197,395,121,257,480,423,212,240, 28,462,176,406,507,288,223, 
    501,407,249,265, 89,186,221,428,164, 74,440,196,458,421,350,163, 
    232,158,134,354, 13,250,491,142,191, 69,193,425,152,227,366,135, 
    344,300,276,242,437,320,113,278, 11,243, 87,317, 36, 93,496, 27, 
    487,446,482, 41, 68,156,457,131,326,403,339, 20, 39,115,442,124, 
    475,384,508, 53,112,170,479,151,126,169, 73,268,279,321,168,364, 
    363,292, 46,499,393,327,324, 24,456,267,157,460,488,426,309,229, 
    439,506,208,271,349,401,434,236, 16,209,359, 52, 56,120,199,277, 
    465,416,252,287,246, 6, 83,305,420,345,153,502, 65, 61,244,282, 
    173,222,418, 67,386,368,261,101,476,291,195,430, 49, 79,166,330, 
    280,383,373,128,382,408,155,495,367,388,274,107,459,417, 62,454, 
    132,225,203,316,234, 14,301, 91,503,286,424,211,347,307,140,374, 
    35,103,125,427, 19,214,453,146,498,314,444,230,256,329,198,285, 
    50,116, 78,410, 10,205,510,171,231, 45,139,467, 29, 86,505, 32, 
    72, 26,342,150,313,490,431,238,411,325,149,473, 40,119,174,355, 
    185,233,389, 71,448,273,372, 55,110,178,322, 12,469,392,369,190, 
    1,109,375,137,181, 88, 75,308,260,484, 98,272,370,275,412,111, 
    336,318, 4,504,492,259,304, 77,337,435, 21,357,303,332,483, 18, 
    47, 85, 25,497,474,289,100,269,296,478,270,106, 31,104,433, 84, 
    414,486,394, 96, 99,154,511,148,413,361,409,255,162,215,302,201, 
    266,351,343,144,441,365,108,298,251, 34,182,509,138,210,335,133, 
    311,352,328,141,396,346,123,319,450,281,429,228,443,481, 92,404, 
    485,422,248,297, 23,213,130,466, 22,217,283, 70,294,360,419,127, 
    312,377, 7,468,194, 2,117,295,463,258,224,447,247,187, 80,398, 
    284,353,105,390,299,471,470,184, 57,200,348, 63,204,188, 33,451, 
    97, 30,310,219, 94,160,129,493, 64,179,263,102,189,207,114,402, 
    438,477,387,122,192, 42,381, 5,145,118,180,449,293,323,136,380, 
    43, 66, 60,455,341,445,202,432, 8,237, 15,376,436,464, 59,461 
}; 

在加密過程中,我們非常頻繁地訪問這個數組。 我已經完成了一個優化,將這個數組從頭文件移動到本地函數,這樣就不會發生某些緩存未命中。

任何建議更好地優化這個要麼通過改變這個數組到任何其他數據結構?

+4

爲什麼標題與功能會影響緩存性能? –

+3

但僅供參考,如果您正在使用典型的32位平臺,「int」是您真正需要的空間的兩倍。您可以通過切換到「int16_t」來將緩存使用量減半。 –

+0

將s7和s3放在相同的連續陣列中以提供參考的空間局部性,可能是微觀優化 – UmNyobe

回答

2

該數組並不龐大。一個典型的L1高速緩存至少有10s的kB(這就是蘋果ii的總內存)。並將數組從一個標題移動到一個函數不會改變緩存的位置。如果將其存儲在適當的形式(如註釋中)可能有意義(它將適用於l1緩存,但如果您有其他數據,可能被另一個線程使用,則存在其更多機會) - 有每個值不需要超過2個字節(但是我不知道是否與使用本地大小整數相比引入了額外的成本)。

如果這真的很關鍵,你應該看看生成的代碼並優化它。

2

首先,請確保將這些數組聲明爲const,以便編譯器知道它們永遠不會改變。

其次,正如Oli Charlesworth在評論中所建議的那樣,您並不需要完整的int來存儲每個值。的S7S9數組的元素是7位和9位無符號整數,因此任一int8_tuint8_tS7是不夠的,並且任一int16_tuint16_tS9。 (您可能需要基準是否有使用符號或無符號的類型,雖然我不會真的指望任何之間的任何差別。)

最後,如果你真的要完全刪除的陣列,它也可以通過使用位操作(具體地,AND和XOR)來直接實現KASUMI S盒而不用任何查找表。有關詳細信息,請參見KASUMI specification的第13頁–。但是,我強烈懷疑這對於軟件實現不會有用,除非您使用bit-slicing來並行加密多個塊。