提高陣列的速度交叉映射陣列

只是尋求一點從Perl到PHP的轉換幫助。我利用哈希來映射值作爲從兩個文件讀入的兩個數組的鍵。我使用的文件不是很大，一個大概有150,000行，另一個大約是50,000個。在Perl中，這個過程大概需要10秒鐘，但是在PHP中，我將讀入文件從150,000行減少到大約20000行，並且需要將近3分鐘。我想知道這是否是語言的限制，或者我的設計本身是否有缺陷。陣列的提高陣列的速度交叉映射陣列

兩個現有的陣列是$ ao_hash和$ string_hash，內置如下：

// Load file contents 
$file_contents = str_replace("\t","|",file_get_contents($_FILES['file']['tmp_name'])); 
$file_array = explode("\n",$file_contents); 

// Pass client dictionary into an array of arrays 
foreach ($file_array as $line) { 
    $line_array = explode("|",$line); 
    if (stripos($line_array[0], 'mnemonic') !== false) { 
     continue; 
    } 

    if (!isset($line_array[1])) { 
     continue; 
    } 

    if (stripos($line_array[1], 'n') !== false) { 
     continue; 
    } 

    if (!isset($line_array[10])) { 
     continue; 
    } 

    $ao_hash[$line_array[10]] = $line; 
}

兩個散列被使用這種方法構建，並且都工作良好（預期的結果，快速執行）。它看起來像這樣：

$array1[NDC] = some|delimited|file|output 
$array2[NDC] = another|file|with|delimited|output

我使用NDC作爲交叉映射這兩個數組的主鍵。

// Compare the client's drug report against the cut-down file 
while (list ($key, $value) = each ($ao_hash)) { 

    // Use the NDC to match across array of arrays 
    if (isset($string_hash[substr($key,0,11)])) { 
     $string_selector = $string_hash[substr($key,0,11)]; 
    } 

    // Check if the client NDC entry exists in cut-down file 
    if (!isset($string_selector)) { 

     // No direct NDC match, reserve for an FSV look-up 
     $ao_array = explode("|", $value); 
     if (isset($ao_array[2]) && isset($ao_array[16])) { 
      $no_matches[$ao_array[2].'|'.$ao_array[16]]['NDC'] = $ao_array[10]; 
      $no_matches[$ao_array[2].'|'.$ao_array[16]]['MNEMONIC'] = $ao_array[0]; 
     } 
    } else { 

     // Direct match found 
     $ao_array = explode("|", $value); 
     $cutdown_array = explode("|", $value); 
     foreach ($cutdown_array as $cutdown_col) { 
      if ($cutdown_col == "") { 
       $cutdown_col = "0"; 
      } 
      $cutdown_verified[] = $cutdown_col; 
     } 

     // Drop the last column 
     array_pop($cutdown_verified); 

     // Merge into a single string 
     $final_string = implode("|", $cutdown_verified); 

     // Prepare data for FSV match 
     if (isset($ao_array[2]) && isset($ao_array[16])) { 
      $yes_matches[$ao_array[2].'|'.$ao_array[16]]['DRUG_STRING'] = $final_string; 
     } 

     // Add the mnemonic to the end 
     $final_string .= '|'.$ao_array[0]; 
     $drug_map[$ao_array[0]] = $final_string; 
    } 
}

任何幫助將是真棒，只是想這樣跑得更快。

來源

2016-02-08 Ryan

我還沒有做過任何測試，但有幾件事對我來說是微觀優化和我的一般問題。上傳的文件顯示爲CSV或選項卡分隔列表。你嘗試過使用'fgetcsv'還是'str_getcsv'？接下來，您只能匹配鑰匙中的前10個字符。而不是存儲整個密鑰，只需存儲前10個字符，這將節省2個substr（不多）。爲什麼不存儲數組，而不是將字符串存儲在地圖中。這將減少爆炸電話。 –

這是一個管道分隔的文本文件，但我想要捕獲製表符分隔的文件（從Excel導出的用戶並不總是知道切換到管道）。我不能substr來存儲密鑰，因爲NDC可能有第12個值（如A或B），我需要稍後區分它。我會看看是否可以減少微編輯。我會看看我是否可以減少爆炸電話。在Perl中，split/join調用很容易被濫用，因爲它們相對較快。 – Ryan

Redditor https://www.reddit.com/user/the_alias_of_andrea解決了這個問題：

而不是使用：

while (list($key, $value) = each($ao_hash))

這將是更有效地使用

foreach ($ao_hash as $key => $value)

現在一個13MB的文件，立即執行，我得到預期的結果。

來源

2016-02-08 22:36:52 Ryan

提高陣列的速度交叉映射陣列

回答

相關問題