2011-10-20 15 views
2

我想在PHP中使用DOMDocument來添加/解析HTML文檔中的東西。從我所能讀到的,將formOutput設置爲true並將preserveWhiteSpace設置爲false應該保持製表符和換行符的順序,但它看起來不像是新創建或附加的節點。使用PHP的DomDocument時保持換行符appendChild

下面的代碼:

$dom = new \DOMDocument; 
$dom->formatOutput = true; 
$dom->preserveWhiteSpace = false; 
$dom->loadHTMLFile($htmlsource); 
$tables = $dom->getElementsByTagName('table'); 
foreach($tables as $table) 
{ 
    $table->setAttribute('class', 'tborder'); 
    $div = $dom->createElement('div'); 
    $div->setAttribute('class', 'm2x'); 
    $table->parentNode->insertBefore($div, $table); 
    $div->appendChild($table); 
} 
$dom->saveHTMLFile($html) 

這裏的HTML是什麼樣子:

<table> 
    <tr> 
     <td></td> 
    </tr> 
</table> 

這裏就是我想:

<div class="m2x"> 
    <table class="tborder"> 
     <tr> 
      <td></td> 
     </tr> 
    </table> 
</div> 

這裏就是我得到:

<div class="m2x"><table class="tborder"><tr> 
<td></td> 
     </tr></table></div> 

有什麼我做錯了嗎?我試過用盡可能多的不同方式使用Google搜索,因爲我沒有運氣。

回答

2

不幸的是,您可能需要編寫一個函數來縮減輸出結果的方式。我做了一些你可能會覺得有幫助的功能。

function indentContent($content, $tab="\t") 
{    

     // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries) 
     $content = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $content); 

     // now indent the tags 
     $token = strtok($content, "\n"); 
     $result = ''; // holds formatted version as it is built 
     $pad = 0; // initial indent 
     $matches = array(); // returns from preg_matches() 

     // scan each line and adjust indent based on opening/closing tags 
     while ($token !== false) 
     { 
       $token = trim($token); 
       // test for the various tag states 

       // 1. open and closing tags on same line - no change 
       if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) $indent=0; 
       // 2. closing tag - outdent now 
       elseif (preg_match('/^<\/\w/', $token, $matches)) 
       { 
         $pad--; 
         if($indent>0) $indent=0; 
       } 
       // 3. opening tag - don't pad this one, only subsequent tags 
       elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)) $indent=1; 
       // 4. no indentation needed 
       else $indent = 0; 

       // pad the line with the required number of leading spaces 
       $line = str_pad($token, strlen($token)+$pad, $tab, STR_PAD_LEFT); 
       $result .= $line."\n"; // add to the cumulative result, with linefeed 
       $token = strtok("\n"); // get the next token 
       $pad += $indent; // update the pad size for subsequent lines  
     }  

     return $result; 
} 

indentContent($dom->saveHTML())將返回:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> 
<html> 
    <body> 
     <div class="m2x"> 
      <table class="tborder"> 
       <tr> 
        <td> 
        </td> 
       </tr> 
      </table> 
     </div> 
    </body> 
</html> 

我創造了這個功能開始this one

+0

很棒的功能!但不幸的是,它使用void元素時會縮小太多空間。 – Stan

1

我修改了偉大的功能ghbarratt寫道,所以它不縮進void elements

function indentContent($content, $tab="\t") 
{ 
    // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries) 
    $content = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $content); 

    // now indent the tags 
    $token = strtok($content, "\n"); 
    $result = ''; // holds formatted version as it is built 
    $pad = 0; // initial indent 
    $matches = array(); // returns from preg_matches() 

    // scan each line and adjust indent based on opening/closing tags 
    while ($token !== false) 
    { 
     $token = trim($token); 
     // test for the various tag states 

     // 1. open and closing tags on same line - no change 
     if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) $indent=0; 
     // 2. closing tag - outdent now 
     elseif (preg_match('/^<\/\w/', $token, $matches)) 
     { 
      $pad--; 
      if($indent>0) $indent=0; 
     } 
     // 3. opening tag - don't pad this one, only subsequent tags (only if it isn't a void tag) 
     elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)) 
     { 
      $voidTag = false; 
      foreach ($matches as $m) 
      { 
       // Void elements according to http://www.htmlandcsswebdesign.com/articles/voidel.php 
       if (preg_match('/^<(area|base|br|col|command|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)/im', $m)) 
       { 
        $voidTag = true; 
        break; 
       } 
      } 

      if (!$voidTag) $indent=1; 
     } 
     // 4. no indentation needed 
     else $indent = 0; 

     // pad the line with the required number of leading spaces 
     $line = str_pad($token, strlen($token)+$pad, $tab, STR_PAD_LEFT); 
     $result .= $line."\n"; // add to the cumulative result, with linefeed 
     $token = strtok("\n"); // get the next token 
     $pad += $indent; // update the pad size for subsequent lines  
    }  

    return $result; 
} 

所有學分都轉到ghbarratt。

+0

+1讓它變得更好的方法。我幾乎不值得讚賞,因爲我大部分都是從http://recursive-design.com/blog/2007/04/05/format-xml-with-php/ – ghbarratt