我真的很困惑PHP的正則表達式。PHP：正則表達式搜索一個文件中的模式，並撿起它

無論如何，我現在無法閱讀整個教程的事情，因爲我有一堆html中的文件，我必須在那裏儘快找到鏈接。我想出了一個用php代碼實現自動化的想法，它是我知道的語言。

，所以我想我可以用戶此腳本：

$address = "file.txt"; 
$input = @file_get_contents($address) or die("Could not access file: $address"); 
$regexp = "??????????"; 
if(preg_match_all("/$regexp/siU", $input, $matches)) { 
    // $matches[2] = array of link addresses 
    // $matches[3] = array of link text - including HTML code 
}

我的問題是$regexp

我需要的模式是這樣的：

href="/content/r807215r37l86637/fulltext.pdf" title="Download PDF

我要搜索並獲得/content/r807215r37l86637/fulltext.pdf從我上面有許多文件中。

有幫助嗎？

==================

編輯

標題屬性是對我和所有的人，我想重要的是，在題爲

title =「Download PDF」

來源

2011-02-11 Alireza

再次正則表達式是bad for parsing html。

保存您的理智並使用內置的DOM庫。

$dom = new DOMDocument(); 
@$dom->loadHTML($html); 
$x = new DOMXPath($dom); 
    $data = array(); 
foreach($x->query("//a[@title='Download PDF']") as $node) 
{ 
    $data[] = $node->getAttribute("href"); 
}

編輯基於ircmaxell評論更新後的代碼。

來源

2011-02-11 20:25:11

呃。爲什麼xpath如果你只做一個nodename搜索？爲什麼不只是`$ dom-> getElementsByTagName（'a'）;`？我可以理解xpath，如果你做了$ x-> query（'// a [contains（@title，「Download Pdf」）]'）;`這將返回完全匹配... ;-) – ircmaxell 2011-02-11 20:31:40

@ircmaxell，你完全正確.`getElementsByTagName（）`可能是一種更有效的方法。 – 2011-02-11 20:35:26

@safaali在查詢中，更改`@ title ='將Pdf'`下載到`@ class ='nameOfClass'`或使用`包含（@title，'下載PDF'）`。即使他們有額外的東西，包含會抓住他們。 – 2011-02-11 20:46:30

href="([^]+)"將會爲您提供該表格的所有鏈接。

來源

2011-02-11 20:22:10 Blindy

謝謝你，但也有在文件中許多herfs，我想那鏈接標題爲「下載PDF」 – Alireza 2011-02-11 20:24:28

嘗試這樣的事情。如果它不起作用，請顯示您想要解析的鏈接的一些示例。

<?php 
$address = "file.txt"; 
$input = @file_get_contents($address) or die("Could not access file: $address"); 
$regexp = '#<a[^>]*href="([^"]*)"[^>]*title="Download PDF"#'; 

if(preg_match_all($regexp, $input, $matches, PREG_SET_ORDER)) { 
    foreach ($matches as $match) { 
    printf("Url: %s<br/>", $match[1]); 
    } 
}

編輯：更新，因此它會搜索下載「PDF項」僅

來源

2011-02-11 20:25:43

這與phpQuery或QueryPath簡單：

foreach (qp($html)->find("a") as $a) { 
    if ($a->attr("title") == "PDF") { 
     print $a->attr("href"); 
     print $a->innerHTML(); 
    } 
}

除了正規這取決於源的一些一致性：

preg_match_all('#<a[^>]+href="([^>"]+)"[^>]+title="Download PDF"[^>]*>(.*?)</a>#sim', $input, $m);

尋找固定的title="..." attrib ute是可行的，但由於它取決於右括號之前的位置，因此更加困難。

來源

2011-02-11 20:26:37 mario

最好的辦法是使用DomXPath做搜索一步到位：

$dom = new DomDocument(); 
$dom->loadHTML($html); 
$xpath = new DomXPath($dom); 

$links = array(); 
foreach($xpath->query('//a[contains(@title, "Download PDF")]') as $node) { 
    $links[] = $node->getAttribute("href"); 
}

甚至：

$links = array(); 
$query = '//a[contains(@title, "Download PDF")]/@href'; 
foreach($xpath->evaluate($query) as $attr) { 
    $links[] = $attr->value; 
}

來源

2011-02-11 20:37:06 ircmaxell

PHP：正則表達式搜索一個文件中的模式，並撿起它

編輯

回答

相關問題