將html轉換爲url scraper

因此，一個非常有幫助的人幫助我在Stackoverflow上獲得了這麼多，但是我需要將他的代碼從HTMl轉換爲一個URL來刮擦我嘗試了一遍又一遍，並且一直打錯了任何想法？將html轉換爲url scraper

function getElementByIdAsString($html, $id, $pretty = true) { 
$doc = new DOMDocument(); 
@$doc->loadHTML($html); 

if(!$doc) { 
    throw new Exception("Failed to load $url"); 
} 
$element = $doc->getElementById($id); 
if(!$element) { 
    throw new Exception("An element with id $id was not found"); 
} 

// get all object tags 
$objects = $element->getElementsByTagName('object'); // return node list 

// take the the value of the data attribute from the first object tag 
$data = $objects->item(0)->getAttributeNode('data')->value; 

// cut away the unnecessary parts and return the info 
return substr($data, strpos($data, '=')+1); 

} 

// call it: 
$finalcontent = getElementByIdAsString($html, 'mainclass'); 

print_r ($finalcontent);

來源

2015-11-19 Jamie

你提到的錯誤......它們是什麼？ – camelCase

它只是空白。有沒有更好的方法讓我得到錯誤？所有這一切都是新的 – Jamie

我簡單地試圖放置一個URL來抓取，而不是堆棧溢出的人做的$ html示例 – Jamie

記住，試圖捕捉當您使用的功能，因爲它很可能會拋出Exception S的將導致500服務器錯誤。

$finalcontent = getElementByIdAsString($html, 'mainclass');

應該成爲

try { 
    $finalcontent = getElementByIdAsString($html, 'mainclass'); 
}catch(Exception $e){ 
    echo $e->getMessage(); 
}

來源

2015-11-19 17:26:38 Elijah

非常感謝您刪除了錯誤！現在是主要問題。我需要從URL中抓取這些數據，我怎樣才能將這段代碼轉換成讀取一個URL，而不是使用目前正在做的$ html。 – Jamie

根據您擁有的託管方式，您應該能夠調用'$ html = file_get_contents（$ url）;'這將採用您提供的URL並嘗試獲取該文檔的HTML，如果這不起作用，您將可能不得不查看cURL，並且可以通過這種方式獲取頁面的HTML！ – Elijah

我假設它現在白色屏蔽這將不適用於自定義linode上的wordpress？ – Jamie

將html轉換爲url scraper

回答

相關問題