Simple-html-d跳過屬性

我想解析HTML頁面Google play並獲取有關應用程序的一些信息。簡單的html-dom完美的工作，但如果頁面包含沒有空格的代碼，它完全ingnores屬性。舉例來說，我的html代碼：Simple-html-d跳過屬性

<div class="doc-banner-icon"><img itemprop="image"src="https://lh5.ggpht.com/iRd4LyD13y5hdAkpGRSb0PWwFrfU8qfswGNY2wWYw9z9hcyYfhU9uVbmhJ1uqU7vbfw=w124"/></div>

正如你所看到的，有沒有image和src之間的任何空間，所以簡單的HTML DOM忽略src屬性，只返回<img itemprop="image">。如果我增加空間，它完美的作品。爲了得到這個屬性我使用下面的代碼：

foreach($html->find('div.doc-banner-icon') as $e){   
     foreach($e->find('img') as $i){ 
      $bannerIcon = $i->src;    
     } 
}

我的問題是如何改變這個美麗庫得到這個div的全內的文字？

來源

2013-06-20 Nolesh

您可以使用[PHP的DOMDocument]（http://php.net/manual/en/class.domdocument.php）而不是簡單的HTML Dom解析器。否則，只需在http://codepad.org/HdUQKx3l查看此代碼片段，只需通過DOMDocument加載並保存HTML即可在Simple HTML Dom Parser上添加所需的空格。 –

我只是創建功能，增加了neccessary空格內容：

function placeNeccessarySpaces($contents){ 
$quotes = 0; $flag=false; 
$newContents = ''; 
for($i=0; $i<strlen($contents); $i++){ 
    $newContents.=$contents[$i]; 
    if($contents[$i]=='"') $quotes++; 
    if($quotes%2==0){ 
     if($contents[$i+1]!== ' ' && $flag==true) {    
      $newContents.=' '; 
      $flag=false; 
     }   
    } 
    else $flag=true;   
} 
return $newContents; 
}

再經過file_get_contents功能使用。所以：

$contents = file_get_contents($url, $use_include_path, $context, $offset); 
$contents = placeNeccessarySpaces($contents);

希望它對別人有幫助。

來源

2013-06-20 14:20:00 Nolesh

Simple-html-d跳過屬性

回答

相關問題