preg_match_all查找鏈接，刪除相同的結果？

-2

我遇到了匹配結果的問題，這是我的腳本，無法找到如何從抓取的內容中添加鏈接並避免相同的結果？我只需要一個開始http://www.autogidas.lt/結果....preg_match_all查找鏈接，刪除相同的結果？

<? 
$id= $_GET['id']; 
$user= $_GET['user']; 
$login=$_COOKIE['login']; 

$query = mysql_query("SELECT pavadinimas,nuoroda,kuras,data,data_new from autogidas where vartotojas='$user' and id='$id'"); 
$rezultatas=mysql_fetch_row($query); 

$url = "$rezultatas[1]"; 

$info = file_get_contents($url); 

function scrape_between($data, $start, $end){ 
$data = stristr($data, $start); 
$data = substr($data, strlen($start)); 
$stop = stripos($data, $end); 
$data = substr($data, 0, $stop); 
return str_replace(' ', ' ', $data); 
} 
$contents = scrape_between($info, "<table border=\"0\" cellspacing=\"0\">", "</table>"); 

    preg_match_all('/<span class="ttitle2".*?>(.*?)<\/span>/',$contents,$pavadinimas); 

    preg_match_all('/<span class="ttitle3".*?>(.*?)<\/span>/',$contents,$miestas); 

    preg_match_all('/<span class="ttitle1".*?>(.*?)<\/span>/',$contents,$metai_kaina); 

    foreach($metai_kaina[0] as $key=>$metai_kaina_val){ 
    if($key%2==0) 
    $metai[] = strip_tags($metai_kaina_val); 
    else 
    $kaina[] = strip_tags($metai_kaina_val); 
    } 

    preg_match_all('/<img .*?(?=src)src=\"([^\"]+)\"/si', $contents, $img_link); 
    preg_match_all('/<a href="http:\/\/www.autogidas.lt(.*?)"/s', $contents, $matches); 

    for($i=0; $i<count($pavadinimas[0]); $i++){ 
    echo '<tr> 
     <td><a href='HERE I NEED LINKS'><img src="'.$img_link[1][$i].'"></a></td> 
     <td>'.$pavadinimas[0][$i].'</td> 
     <td>'.$miestas[0][$i].'</td> 
     <td>'.$metai[$i].'</td> 
     <td><center>'.$kaina[$i].'</center></td> 
    </tr>'; 
    } 

    echo "</table>"; 
    ?>

我嘗試了一些幫助，但不知道如何更新腳本，最後一件事，我需要什麼，無法找到如何做到這一點...我不是profi我只提供自己的PHP的樂趣，謝謝你的幫助！對不起，我的英文不好....

來源

2015-11-08 dagamo

添加您的'HTTP： \/\/www.adress.com'前綴到捕獲組。 – mario

問題是我不知道如何編輯這個正則表達式 – dagamo

你能幫助我嗎？ – dagamo

-1

您可以使用此代碼：原來的問題所做的更改後

// RegEx to only match with http://www.address.com/* kind of URLs in anchors 
$regexp = "<a\s[^>]*href=(\"??)(http\:\/\/www\.adress\.com\/[^\" >]*?)\\1[^>]*>(.*)<\/a>"; 
if (preg_match_all("/$regexp/siU", $svetaines_turinys, $matches, PREG_SET_ORDER)) { 
    // collect results in array 
    $arr = []; 
    foreach($matches as $match) { 
     $arr[] = $match[2]; 
    } 
    // remove duplicates from it 
    $arr = array_unique($arr); 
    // send to client 
    foreach($arr as $match) { 
     echo "$match <BR/>"; 
    } 
}

編輯：

你想獲得獨特的超鏈接，因爲相同的超鏈接在您正在抓取的頁面上使用兩次。但是，這兩個不完全相同的方式發生，只有兩個中的一個後跟一個img標籤，所以你可以按如下方式更改的正則表達式得到$matches：

preg_match_all('/<a href="(http:\/\/www.autogidas.lt[^"]*)"\s*>\s*<img/s', 
    $contents, $matches);

注意，在上述正表達式我也移動了左括號以匹配整個url，這是您在下面的代碼中需要的。

然後在你的循環，可以輸出你的引號字符串中使用這塊超鏈接：

<a href="'.$matches[1][$i].'">

注意：你應該開始你的代碼<?php不僅僅是<?

來源

2015-11-08 16:50:13 trincot

你可以檢查我的代碼嗎？我已經更新了它... – dagamo

你能幫我嗎？ – dagamo

我在回答中添加了您需要使用代碼進行的操作，以避免重複的超鏈接以及如何輸出它們。 – trincot

preg_match_all查找鏈接，刪除相同的結果？

回答

相關問題