2014-05-07 35 views
0

我試圖從URL(www.xxxxx.co.uk/bar.html)得到的所有圖像,並把它們JSON如何從URL圖像JSON

{"images":http://www.xxxxx.co.uk/foo.jpg} 

這是我曾嘗試:

<?php 
$html = file_get_contents('www.xxxxx.co.uk/bar.html'); 

function linkExtractor($html){ 
$linkArray = array(); 
if(preg_match_all('/<img\s+.*?src=[\"\']?([^\"\' >]*)[\"\']?[^>]*>/i',$html,$matches,PREG_SET_ORDER)){ 
foreach($matches as $match){ 
$arr = array('images' => $match); 
} 
} 
echo json_encode($arr); 
} 

echo json_encode($arr); 
?> 

編輯:

所以我TRIE d這樣的:

$page = file_get_contents('www.xxxxx.co.uk/bar.html'); 
$doc = new DOMDocument(); 
$doc->loadHTML($page); 
$images = $doc->getElementsByTagName('img'); 
foreach($images as $image) { 
    $src = $image->getAttribute('src'); 
    $arr = array('images' => $src); 
    echo json_encode($arr); 
} 

,我收到這些錯誤:

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3188 in /home/content/57/9770557/html/untitled folder/json.php on line 5 

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3207 in /home/content/57/9770557/html/untitled folder/json.php on line 5 

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3226 in /home/content/57/9770557/html/untitled folder/json.php on line 5 

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3245 in /home/content/57/9770557/html/untitled folder/json.php on line 5 

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : iframe in Entity, line: 3287 in /home/content/57/9770557/html/untitled folder/json.php on line 5 

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : iframe in Entity, line: 3330 in /home/content/57/9770557/html/untitled folder/json.php on line 5 

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3351 in /home/content/57/9770557/html/untitled folder/json.php on line 5 

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3370 in /home/content/57/9770557/html/untitled folder/json.php on line 5 

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3389 in /home/content/57/9770557/html/untitled folder/json.php on line 5 

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3389 in /home/content/57/9770557/html/untitled folder/json.php on line 5 

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3408 in /home/content/57/9770557/html/untitled folder/json.php on line 5 

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity, line: 3408 in /home/content/57/9770557/html/untitled folder/json.php on line 5 

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3427 in /home/content/57/9770557/html/untitled folder/json.php on line 5 

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3446 in /home/content/57/9770557/html/untitled folder/json.php on line 5 

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3466 in /home/content/57/9770557/html/untitled folder/json.php on line 5 

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3485 in /home/content/57/9770557/html/untitled folder/json.php on line 5 
{"images":"loader.gif"}{"images":"logo.png"}{"images":"facebook.png"}{"images":"Yotube.png"}{"images":"twitter.png"}{"images":"Soundcloud.png"}{"images":"1355334348_br_down.png"}{"images":"video images\/ONYX sofa.jpg"}{"images":"video images\/aaron duran.jpg"}{"images":"video images\/littledragon.jpg"}{"images":"video images\/cantalivering house.jpg"}{"images":"video images\/Chef.jpg"}{"images":"video images\/monument valley.jpg"}{"images":"video images\/set a drift t shirts.jpg"}{"images":"video images\/Leica camera.jpg"}{"images":"video images\/Bubbledogs restuarant.jpg"}{"images":"video images\/Architectural density.jpg"}{"images":"video images\/Seven Automatic Landscapes.jpg"}{"images":"video images\/alphabet.jpg"}{"images":"video images\/offices in the forest.jpg"}{"images":"video images\/Environmental Street Art by ROA.jpg"}{"images":"video images\/ Camille Seaman.jpg"}{"images":"video images\/Klaus Pitchler.jpg"}{"images":"video images\/Lowdi.jpg"}{"images":"video images\/Mary OMalley.jpg"}{"images":"video images\/Patricia Piccinini.jpg"}{"images":"video images\/Santa Cruz.jpg"}{"images":"video images\/Sonia Rentsch.jpg"}{"images":"video images\/Studio Natural.jpg"}{"images":"video images\/The Tea Calender.jpg"}{"images":"video images\/Watch Dogs.jpg"}{"images":"video images\/wes21.jpg"}{"images":"video images\/Act Romegialli Architects.jpg"}{"images":"video images\/Romain Jacquet-Lagreze.jpg"}{"images":"video images\/Nicholas Hance McElroy.jpg"}{"images":"video images\/Insa.gif"}{"images":"video images\/Tsatsas bag.jpg"}{"images":"video images\/st pancras.jpg"}{"images":"video images\/anthillfilms.jpg"}{"images":"video images\/mt wolf.jpg"}{"images":"video images\/die.jpg"}{"images":"video images\/jazz that nobody asked for.jpg"}{"images":"video images\/oscilate.jpg"}{"images":"video images\/ghostpoet.jpg"}{"images":"video images\/oak hanger.jpg"}{"images":"video images\/iceball.jpg"}{"images":"video images\/fabian oefner.jpg"}{"images":"video images\/yago portal.jpg"}{"images":"video images\/illustrations on bike wheels.jpg"}{"images":"video images\/symmetrees.jpg"}{"images":"video images\/undercity.jpg"}{"images":"video images\/IFHY.jpg"}{"images":"video images\/the abc of architects.jpg"}{"images":"video images\/chum.jpg"}{"images":"video images\/crankworx.jpg"}{"images":"video images\/romare.jpg"}{"images":"video images\/White noise.jpg"}{"images":"video images\/silvestre architects.jpg"}{"images":"video images\/airport.jpg"}{"images":"video images\/feather.jpg"}{"images":"video images\/Nico Van Der Meulen.jpg"}{"images":"video images\/51m trampoline.jpg"}{"images":"video images\/lets talk about soil.jpg"}{"images":"video images\/alberto seveso.jpg"}{"images":"video images\/ibike.jpg"}{"images":"video images\/robs wood grain bike.jpg"}{"images":"video images\/smokehouse.jpg"}{"images":"video images\/laurent chehere.jpg"}{"images":"video images\/SOHN.jpg"}{"images":"video images\/the employment.jpg"}{"images":"video images\/little printer.jpg"}{"images":"video images\/procrastination.jpg"}{"images":"video images\/touchwood commercial.jpg"}{"images":"video images\/fusefones.jpg"}{"images":"video images\/allandale house.jpg"}{"images":"video images\/Spherikal.jpg"}{"images":"video images\/power.jpg"}{"images":"video images\/reykjavik house.jpg"}{"images":"video images\/click&grow.jpg"}{"images":"video images\/sfelt table.jpg"}{"images":"video images\/gopro.jpg"} 
  1. 爲什麼鏈接具有/不只是/?
  2. 爲什麼它在做多個「{」images「:」video images/gopro.jpg「}」
  3. 什麼是Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 3107 in /home/content/57/9770557/html/untitled folder/json.php on line 5錯誤?
+0

[PHP中圖像鏈接的屏幕刮擦]可能的重複(http://stackoverflow.com/questions/3261820/screen-scraping-of-image-links-in-php) –

+0

是否有任何錯誤消息或它只是不工作?我們需要更多信息! – Chancho

+0

看到我的答案,它的工作原理。 – Chancho

回答

1

你幾乎沒有錯,在你的URL前面有一些東西丟失,如http://,並從你的函數返回值而不是回顯它。 試試這個:

$html = file_get_contents('http://www.setours.com'); 

function linkExtractor($html){ 
    $imageArr = array(); 
    $doc = new DOMDocument(); 
    @$doc->loadHTML($html); 
    $images = $doc->getElementsByTagName('img'); 
    foreach($images as $image) { 
     array_push($imageArr, $image->getAttribute('src')); 
    } 
    return $imageArr; 
} 

echo json_encode(array("images" => linkExtractor($html))); 

使用loadHTML功能的@盈來說,攔截未知的HTML元素的警告。

+0

這不做「形象」:鏈接? – maxisme

+0

我更新了我的答案,這是你的意思嗎? – Chancho

+0

非常感謝你@Chancho – maxisme

0

使用DOM解析器從HTML文檔中提取信息:

function extractImgages($url) { 

    // Prepare result 
    $result = array('images' => array()); 

    // Create a document object out of the HTML 
    $doc = new DOMDocument(); 
    if([email protected]$doc->loadHTML($url)) { 
     throw new Exception('Bad HTML'); 
    } 

    // Iterate through '<img>' elements and store urls 
    foreach($doc->getElementsByTagName('img') as $img) { 
     $result['images'][]= $img->getAttribute('src'); 
    } 

    return json_encode($result); 
} 

我用沉默操作@解析HTML時,因爲當HTML來自不受信任來源它能產生警告,如果HTML源無效。 @壓制他們。

+0

我得到這個致命錯誤:在/ home/content/57/9770557/html/untitled文件夾/ json.php:8堆棧跟蹤:#0 {main}拋出/ home/content/57/9770557/html/untitled folder/json.php on line 8' – maxisme

+0

對不起,應該是'@ $ doc-> loadHTMLFile()' – hek2mgl

+0

現在我得到了'致命錯誤:調用未定義的函數json_econde()in/home/content/57/9770557/html/untitled folder/json.php on line 20' – maxisme