2013-03-17 65 views
0

我很難找出算法來檢測URL列表中的重複目錄模式,任何人都可以爲此提供一種方法嗎?林相當肯定它將需要一個遞歸調用,但我不能決定如何爲每個可能的模式保存記錄。如何從URL中獲取所有可能的目錄組合

注意:這是在PHP中。

Lests說你有一些網址:

1. http://www.goodfood.com/recipes/special_occasion/desserts/pie/chocolate-pie.html 
2. http://www.goodfood.com/recipes/special_occasion/desserts/pie/cherry-pie.html 
3. http://www.goodfood.com/recipes/special_occasion/apps/chex-mix.html 
4. http://www.goodfood.com/recipes/special_occasion/soup/tomato.html 
5. http://www.goodfood.com/special/special_occasion/soup/beef-stew.html 
6. http://www.goodfood.com/special/special_occasion/soup/vegetable.html 

我想找到一種方法來確定一個以上的網址有目錄的所有可能的模式。因此,其結果將是這個樣子:

'recipes/special_occasion' is found in urls 1, 2, 3 and 4. 
'recipes/special_occasion/desserts' is found in urls 1, and 2. 
'recipes/special_occasion/desserts/pie' is found in urls 1, and 2. 
'special_occasion/desserts/pie' is found in urls 1, and 2. 
'desserts/pie' is found in urls 1, and 2. 
'special_occasion/desserts' is found in urls 1, and 2. 
'special_occasion/desserts/pie' is found in urls 1, and 2. 
'special/special_occasion' is found in urls 5, and 6. 
'special/special_occasion/soup' is found in urls 5, and 6. 
'special_occasion/soup' is found in urls 5, and 6. 

我的想法是要經過的每個網址,並拉出每一個可能的新格局,並將其存儲在數組中。到目前爲止,我有: $ commonDomains = array();

 foreach($query AS $row) { 


     $urlPath = parse_url($row['href'], PHP_URL_PATH); 
     echo "$urlPath<br/>"; 

     $urlChunks = explode('/', $urlPath); 
     //var_dump($urlChunks); 

     foreach($urlChunks AS $domain) { 
      if(strlen($domain) > 0) { 
       $thisDomain = $domain.'/'; 
       $commonDomains[$thisDomain][] = $row['id']; 
      } 
     } 
     var_dump($commonDomains); 
    } 

有沒有人跑過這個呢?它尖叫着我的模式,但我無法在網上找到答案。我想到的一切都非常複雜。請幫忙,謝謝。


我有什麼即時通訊工作的一個例子:http://phpfiddle.org/main/code/kn4-zyh

我的繼承人的結果爲止

/recipes/special_occasion/desserts/pie/grandmas-chocolate-pie.html 
array(5) { [0]=> string(7) "recipes" [1]=> string(16) "special_occasion" [2]=> string(8) "desserts" [3]=> string(3) "pie" [4]=> string(27) "grandmas-chocolate-pie.html" } 

0 : 4 : recipes/special_occasion/desserts/pie/grandmas-chocolate-pie.html 
0 : 3 : recipes/special_occasion/desserts/pie 
0 : 2 : recipes/special_occasion/desserts 
0 : 1 : recipes/special_occasion 

1 : 4 : special_occasion/desserts/pie/grandmas-chocolate-pie.html 
2 : 4 : desserts/pie/grandmas-chocolate-pie.html 
3 : 4 : pie/grandmas-chocolate-pie.html 

0 : 4 : recipes/special_occasion/desserts/pie/grandmas-chocolate-pie.html 
1 : 3 : special_occasion/desserts/pie 


**Im missing: 
2 : 3 : special_occasion/desserts 
1 : 2 : recipes/special_occasion 

**

+0

我承認這是一項艱鉅的任務:) – 2013-03-17 01:58:03

回答

0

實例搜索一個目錄:

$links = array(
    'http://www.goodfood.com/recipes/special_occasion/desserts/pie/chocolate-pie.html', 
    'http://www.goodfood.com/recipes/special_occasion/desserts/pie/cherry-pie.html', 
    'http://www.goodfood.com/recipes/special_occasion/apps/chex-mix.html', 
    'http://www.goodfood.com/recipes/special_occasion/soup/tomato.html', 
    'http://www.goodfood.com/special/special_occasion/soup/beef-stew.html', 
    'http://www.goodfood.com/special/special_occasion/soup/vegetable.html', 
); 

$dirs = array(); 
foreach ($links as $key => $link) { 
    $urlPath = parse_url($link, PHP_URL_PATH); 
    $arrayUrlPath = explode('/', $urlPath); 
    $dirs[$key] = array(); 
    $counter = 0; 
    foreach ($arrayUrlPath as $dir) { 
     if (empty($dir) || in_array(substr($dir, -5), array('.html'))) { 
      continue; 
     } 
     $dirs[$key][$counter++] = $dir; 
    } 
} 

$searchDirs = $dirs; 

foreach ($searchDirs as $key => $dir) { 
    foreach ($dir as $name) { 
     echo 'dir: ' . $name . ', found in: ' . search($name, $key, $dirs) . "\n"; 
    } 
} 

function search($name, $excludeKey, $dirs) 
{ 
    $return = array(); 
    foreach ($dirs as $key => $dir) { 
     if ($key === $excludeKey) { 
      continue; 
     } 
     if (in_array($name, $dir)) { 
      $return[] = (int)$key + 1; 
     } 
    } 
    return join(', ', $return); 
} 

如果你想搜索長字符串重建search功能,添加explode$name和比較的研究$key,如果dir是aaa/bbb/ccc,檢查index 0是「AAA」和index 1bbbindex 2ccc,除非移動指針index+1並再次檢查。

我希望我能幫上忙。

相關問題