如何從頁面獲取所有網址（php）

我有一個網頁，其中的描述在另一個網站（如書籤/網站列表）中列出。我如何使用PHP從該頁面獲取所有URL並將它們寫入txt文件（每行一個，只有url沒有描述）？如何從頁面獲取所有網址（php）

頁看起來是這樣的：

而且我想腳本的TXT輸出看起來像這樣：

2009-07-15 Phil

單程

$url="http://wwww.somewhere.com"; 
$data=file_get_contents($url); 
$data = strip_tags($data,"<a>"); 
$d = preg_split("/<\/a>/",$data); 
foreach ($d as $k=>$u){ 
    if(strpos($u, "<a href=") !== FALSE){ 
     $u = preg_replace("/.*<a\s+href=\"/sm","",$u); 
     $u = preg_replace("/\".*/","",$u); 
     print $u."\n"; 
    } 
}

來源

2009-07-15 00:34:32 ghostdog74

如果我的鏈接是這樣的：以上代碼找不到鏈接 – 2017-09-22 05:45:33

另一種方式

$url = "http://wwww.somewhere.com"; 

$html = file_get_contents($url); 

$doc = new DOMDocument(); 
$doc->loadHTML($html); //helps if html is well formed and has proper use of html entities! 

$xpath = new DOMXpath($doc); 

$nodes = $xpath->query('//a'); 

foreach($nodes as $node) { 
    var_dump($node->getAttribute('href')); 
}

來源

2013-03-14 16:17:29 user2066719

你可以用它來獲取在給定的網頁所有鏈接。

<?php 

    $var = fread_url($url); 

    preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\']+". 
        "(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a>/", 
        $var, &$matches); 

    $matches = $matches[1]; 
    $list = array(); 

    foreach($matches as $var) 
    {  
     print($var."<br>"); 
    } 

    function fread_url($url,$ref="") 
    { 
     if(function_exists("curl_init")){ 
      $ch = curl_init(); 
      $user_agent = "Mozilla/4.0 (compatible; MSIE 5.01; ". 
          "Windows NT 5.0)"; 
      $ch = curl_init(); 
      curl_setopt($ch, CURLOPT_USERAGENT, $user_agent); 
      curl_setopt($ch, CURLOPT_HTTPGET, 1); 
      curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
      curl_setopt($ch, CURLOPT_FOLLOWLOCATION , 1); 
      curl_setopt($ch, CURLOPT_FOLLOWLOCATION , 1); 
      curl_setopt($ch, CURLOPT_URL, $url); 
      curl_setopt($ch, CURLOPT_REFERER, $ref); 
      curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt'); 
      $html = curl_exec($ch); 
      curl_close($ch); 
     } 
     else{ 
      $hfile = fopen($url,"r"); 
      if($hfile){ 
       while(!feof($hfile)){ 
        $html.=fgets($hfile,1024); 
       } 
      } 
     } 
     return $html; 
    } 

    ?>

來源

2016-04-30 13:47:16

如何從頁面獲取所有網址（php）

回答

相關問題