2011-08-27 83 views
2

我想要循環超過200,000個用戶數據集來過濾30,000個產品,我如何優化這個嵌套的大循環以獲得最佳性能?重構循環?

//settings , 5 max per user, can up to 200,000 
    $settings = array(...); 

    //all prods, up to 30,000 
    $prods = array(...); 

    //all prods category relation map, up to 2 * 30,000 
    $prods_cate_ref_all = array(...); 

    //msgs filtered by settings saved yesterday , more then 100 * 200,000 
    $msg_all = array(...); 

    //filter counter 
    $j = 0; 

    //filter result 
    $res = array(); 

    foreach($settings as $set){ 

     foreach($prods as $k=>$p){ 

      //filter prods by site_id 
      if ($set['site_id'] != $p['site_id']) continue; 

       //filter prods by city_id , city_id == 0 is all over the country 
      if ($set['city_id'] != $p['city_id'] && $p['city_id'] > 0) continue; 

      //muti settings of a user may get same prods 
       if (prod_in($p['id'], $set['uuid'], $res)) continue; 

      //prods filtered by settings saved to msg table yesterday 
      if (msg_in($p['id'], $set['uuid'], $msg_all)) continue; 

       //filter prods by category id 
      if (!prod_cate_in($p['id'], $set['cate_id'], $prods_cate_ref_all)) continue; 

      //filter prods by tags of set not in prod title, website ... 
       $arr = array($p['title'], $p['website'], $p['detail'], $p['shop'], $p['tags']); 
      if (!tags_in($set['tags'], $arr)) continue; 

       $res[$j]['name'] = $v['name']; 
      $res[$j]['prod_id'] = $p['id']; 
       $res[$j]['uuid'] = $v['uuid']; 
       $res[$j]['msg'] = '...'; 
       $j++; 
     } 

    } 

    save_to_msg($res); 

function prod_in($prod_id, $uuid, $prod_all){ 
    foreach($prod_all as $v){ 
    if ($v['prod_id'] == $prod_id && $v['uuid'] == $uuid) 
     return true; 
    } 
    return false; 
} 

function prod_cate_in($prod_id, $cate_id, $prod_cate_all){ 
    foreach($prod_cate_all as $v){ 
    if ($v['prod_id'] == $prod_id && $v['cate_id'] == $cate_id) 
     return true; 
    } 
    return false; 
} 

function tags_in($tags, $arr){ 
    $tag_arr = explode(',', str_replace(',', ',', $tags)); 
    foreach($tag_arr as $v){ 
    foreach($arr as $a){ 
     if(strpos($a, strtolower($v)) !== false){ 
     return true; 
     } 
    } 
    } 
    return false; 
} 

function msg_in($prod_id, $uuid, $msg_all){ 
    foreach($msg_all as $v){ 
    if ($v['prod_id'] == $prod_id && $v['uuid'] == $uuid) 
     return true; 
    } 
    return false; 
} 

更新: 非常感謝。 是的,數據是在MySQL中,以下是主要結構:

-- user settings to filter prods, 5 max per user 
CREATE TABLE setting(
    id INT NOT NULL AUTO_INCREMENT, 
    uuid VARCHAR(100) NOT NULL DEFAULT '', 
    tags VARCHAR(100) NOT NULL DEFAULT '', 
    site_id SMALLINT UNSIGNED NOT NULL DEFAULT 0, 
    city_id MEDIUMINT UNSIGNED NOT NULL DEFAULT 0, 
    cate_id MEDIUMINT UNSIGNED NOT NULL DEFAULT 0, 
    addtime INT UNSIGNED NOT NULL DEFAULT 0, 
    PRIMARY KEY (`id`), 
    KEY `idx_setting_uuid` (`uuid`), 
    KEY `idx_setting_tags` (`tags`), 
    KEY `idx_setting_city_id` (`city_id`), 
    KEY `idx_setting_cate_id` (`cate_id`) 
) DEFAULT CHARSET=utf8; 


CREATE TABLE users(
    id INT NOT NULL AUTO_INCREMENT, 
    uuid VARCHAR(100) NOT NULL DEFAULT '', 
    PRIMARY KEY (`id`), 
    UNIQUE KEY `idx_unique_uuid` (`uuid`) 
) DEFAULT CHARSET=utf8; 


-- filtered prods 
CREATE TABLE msg_list(
    id INT NOT NULL AUTO_INCREMENT, 
    uuid VARCHAR(100) NOT NULL DEFAULT '', 
    prod_id INT UNSIGNED NOT NULL DEFAULT 0, 
    msg TEXT NOT NULL DEFAULT '', 
    PRIMARY KEY (`id`), 
    KEY `idx_ml_uuid` (`uuid`) 
) DEFAULT CHARSET=utf8; 



-- prods and prod_cate_ref table in another database, so can not join it 


CREATE TABLE prod(
    id INT NOT NULL AUTO_INCREMENT, 
    website VARCHAR(100) NOT NULL DEFAULT '' COMMENT ' site name ', 
    site_id MEDIUMINT UNSIGNED NOT NULL DEFAULT 0, 
    city_id MEDIUMINT UNSIGNED NOT NULL DEFAULT 0, 
    title VARCHAR(50) NOT NULL DEFAULT '', 
    tags VARCHAR(50) NOT NULL DEFAULT '', 
    detail VARCHAR(500) NOT NULL DEFAULT '', 
    shop VARCHAR(300) NOT NULL DEFAULT '', 
    PRIMARY KEY (`id`), 
    KEY `idx_prod_tags` (`tags`), 
    KEY `idx_prod_site_id` (`site_id`), 
    KEY `idx_prod_city_id` (`city_id`), 
    KEY `idx_prod_mix` (`site_id`,`city_id`,`tags`) 
) DEFAULT CHARSET=utf8; 

CREATE TABLE prod_cate_ref(
    id MEDIUMINT NOT NULL AUTO_INCREMENT, 
    prod_id INT NOT NULL NULL DEFAULT 0, 
    cate_id MEDIUMINT NOT NULL NULL DEFAULT 0, 
    PRIMARY KEY (`id`), 
    KEY `idx_pcr_mix` (`prod_id`,`cate_id`) 
) DEFAULT CHARSET=utf8; 


-- ENGINE all is myisam 

我不知道如何只使用一個SQL來獲取所有。

+3

我假設你從數據庫中獲取這個數據。我還假設你正在使用SQL數據庫。那爲什麼不使用'JOINs'和'WHERE'子句呢? – NullUserException

+0

你從哪裏得到設置?數據庫? – svick

+0

這是一個很重要的數據存儲在PHP數組中。這是從數據庫或其他東西出來的嗎? –

回答

2

謝謝大家對我的啓發,我終於得到了它,它的確是一個很簡單的方法,但一個巨大的進步!

我重組在$ prods_cate_ref_all數據和$ msg_all(使用在最後兩個功能), 也結果數組$ RES, 然後使用strpos和in_array而不是三個迭代函數(prod_in msg_in prod_cate_in)

我得到了驚人的50倍加速!隨着數據變大,效果變得更加有效。

//settings , 5 max per user, can up to 200,000 
    $settings = array(...); 

    //all prods, up to 30,000 
    $prods = array(...); 

    //all prods category relation map, up to 2 * 30,000 
    $prods_cate_ref_all = get_cate_ref_all(); 

    //msgs filtered by settings saved yesterday , more then 100 * 200,000 
    $msg_all = get_msg_all(); 

    //filter counter 
    $j = 0; 

    //filter result 
    $res = array(); 


    foreach($settings as $set){ 

     foreach($prods as $p){ 

     $res_uuid_setted = false; 

     $uuid = $set['uuid']; 

     if (isset($res[$uuid])){ 
      $res_uuid_setted = true; 
     } 

     //filter prods by site_id 
     if ($set['site_id'] != $p['site_id']) 
       continue; 

     //filter prods by city_id , city_id == 0 is all over the country 
     if ($set['city_id'] != $p['city_id'] && $p['city_id'] > 0) 
       continue; 


     //muti settings of a user may get same prods 
     if ($res_uuid_setted) 
      //in_array faster than strpos if item < 1000 
      if (in_array($p['id'], $res[$uuid]['prod_ids'])) 
      continue; 

     //prods filtered by settings saved to msg table yesterday 
     if (isset($msg_all[$uuid])) 
      //strpos faster than in_array in large data 
      if (false !== strpos($msg_all[$uuid], ' ' . $p['id'] . ' ')) 
      continue; 

     //filter prods by category id 
     if (false === strpos($prods_cate_ref_all[$p['id']], ' ' . $set['cate_id'] . ' ')) 
      continue; 

     $arr = array($p['title'], $p['website'], $p['detail'], $p['shop'], $p['tags']); 
     if (!tags_in($set['tags'], $arr)) 
      continue; 


     $res[$uuid]['prod_ids'][] = $p['id']; 

     $res[$uuid][] = array(
     'name' => $set['name'], 
     'prod_id' => $p['id'], 
     'msg' => '', 
     ); 

     } 

    } 


function get_msg_all(){ 

    $temp = array(); 
    $msg_all = array(
     array('uuid' => 312, 'prod_id' => 211), 
     array('uuid' => 1227, 'prod_id' => 31), 
     array('uuid' => 1, 'prod_id' => 72), 
     array('uuid' => 993, 'prod_id' => 332), 
     ... 
    ); 

    foreach($msg_all as $k=>$v){ 
    if (!isset($temp[$v['uuid']])) 
     $temp[$v['uuid']] = ' '; 

    $temp[$v['uuid']] .= $v['prod_id'] . ' '; 
    } 

    $msg_all = $temp; 
    unset($temp); 

    return $msg_all; 
} 


function get_cate_ref_all(){ 

    $temp = array(); 
    $cate_ref = array(
     array('prod_id' => 3, 'cate_id' => 21), 
     array('prod_id' => 27, 'cate_id' => 1), 
     array('prod_id' => 1, 'cate_id' => 232), 
     array('prod_id' => 3, 'cate_id' => 232), 
     ... 
    ); 

    foreach($cate_ref as $k=>$v){ 
    if (!isset($temp[$v['prod_id']])) 
     $temp[$v['prod_id']] = ' '; 

    $temp[$v['prod_id']] .= $v['cate_id'] . ' '; 
    } 
    $cate_ref = $temp; 
    unset($temp); 

    return $cate_ref; 
} 
0

正如你已經有了較大的集外循環,很難說在那裏你可以優化這一點。你可以內聯函數代碼來省去函數調用,或者部分從你的許多foreach中展開。

例如,該功能

function tags_in($tags, $arr){ 
    $tag_arr = explode(',', str_replace(',', ',', $tags)); 
    foreach($tag_arr as $v){ 
    foreach($arr as $a){ 
     if(strpos($a, strtolower($v)) !== false){ 
     return true; 
     } 
    } 
    } 
    return false; 
} 

內你基本上使用字符串訪問的陣列。做弦更直接的(注:完整的標籤匹配,你做了部分匹配):

function tags_in($tags, $arr) 
{ 
    $tags = ', '.strtolower($tags).', '; 

    foreach($arr as $tag) 
    { 
     if (false !== strpos($tags, ', '.$tag.', ') 
      return true; 
    } 
    return false; 
} 

但是當你有一個大的數據量,事情就只是需要很長時間。

  • 只有優化你的發佈版本。
  • 只做小而體面的變化。
  • 每次更改後運行您的測試。
  • 簡介針對代表真實世界測試數據的每個變化。

因此,下至小碼走勢,你就可能尋找尺度的變化。如果你之前可以解決問題,你可以做一個地圖和減少策略。也許你已經在使用基於文檔的數據庫,爲此提供了一個接口。

+0

您在tags_in中的重構非常好。 – hamlet