2014-11-02 23 views
0

我有一個從網頁上抓取數據的函數。我選擇數據應該被刮掉的標籤,我可以得到結果。該function.php就是這樣:如何在一系列相同的字符串之間刮擦?

<meta http-equiv="Content-Type" content="text/HTML; charset=utf-8" /> 

<?php 

function LoadCURLPage($url, $agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4 Gecko/20030624 Netscape/7.1 (ax)", 
$cookie = '', $referer = '', $post_fields = '', $return_transfer = 1, 
$follow_location = 1, $ssl = '', $curlopt_header = 0) 
{ 
$ch = curl_init(); 

curl_setopt($ch, CURLOPT_URL, $url); 

if($ssl) 
{ 
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2); 
} 

curl_setopt ($ch, CURLOPT_HEADER, $curlopt_header); 

if($agent) 
{ 
curl_setopt($ch, CURLOPT_USERAGENT, $agent); 
} 

if($post_fields) 
{ 
curl_setopt($ch, CURLOPT_POST, 1); 
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields); 
} 

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 


if($referer) 
{ 
curl_setopt($ch, CURLOPT_REFERER, $referer); 
} 

if($cookie) 
{ 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie); 
} 

$result = curl_exec ($ch); 

curl_close ($ch); 

return $result; 
} 

function extract_unit($string, $start, $end) 
{ 
$pos = stripos($string, $start); 

$str = substr($string, $pos); 

$str_two = substr($str, strlen($start)); 

$second_pos = stripos($str_two, $end); 

$str_three = substr($str_two, 0, $second_pos); 

$unit = trim($str_three); // remove whitespaces 

return $unit; 
} 

?> 

和process.php就是這樣:

<?php 

error_reporting (E_ALL^E_NOTICE); 

include 'function.php'; 

// Connect to this url using CURL 

$url1 = 'http://www.remixon.com.tr/remixon.xml'; 


// Letâs use cURL to connect to the 

$data1 = LoadCURLPage($url1); 


// Extract information between STRING 1 & STRING 2 

$string_one1 = '<SatisFiyati>'; 
$string_two1 = '</SatisFiyati>'; 

$info1 = extract_unit($data1, $string_one1, $string_two1); 

$info1 = duzenL($info1); 

echo $info1; 

?> 

這process.php回波只能從第一個標籤刮掉數據。但是我在該網址中有30個相同的標籤,我需要將它們全部刮掉。

如何在一個URL中檢索所有相同的「SatisFiyati」和「/ SatisFiyati」標籤之間的數據?

+1

嘗試使用DOM解析.. – 2014-11-02 14:10:39

回答

1

不是處理原始文本,而是使用DOMDocument從遠程站點加載xml。然後,您可以提取所有elements by tagname類似的例子:

<?php 
include 'function.php'; 

// Connect to this url using CURL 

$url1 = 'http://www.remixon.com.tr/remixon.xml'; 
$data1 = LoadCURLPage($url1); 

$dom = new DOMDocument; 
$dom->loadXML($data1); 
$items = $dom->getElementsByTagName('SatisFiyati'); 
foreach ($items as $item) { 
    // do something with the data here 
    echo $item->nodeValue, PHP_EOL; 
} 
+0

非常感謝你它完美的工作。 – 2014-11-02 14:46:41

0

您可以使用preg_match_all()返回所有匹配的正則表達式。

http://php.net/manual/en/function.preg-match-all.php

在你的情況,你的函數extract_unit()會是這樣的:

function extract_unit($string, $start, $end) 
{ 
    preg_match_all("/" . $start . "([^<]*)" . $end . "/", $string, $matches, PREG_PATTERN_ORDER); 
    return $matches[1]; 
} 

$matches[0]包含匹配整個模式的字符串數組,並$matches[1]包含標誌包圍字符串數組。所以你實際上需要$matches[1]

相關問題