2013-03-11 74 views
4

我一直試圖用PHP和XMLReader解析一個非常大的XML文件,但似乎無法得到我期待的結果。基本上,我正在搜索大量的信息,如果某個郵件包含某個郵政編碼,我想返回那一點XML,或者繼續搜索,直到找到該郵政編碼。從本質上講,我將把這個大文件分解成幾個小塊,所以不必查看數千或數百萬個信息組,它可能是10或20個。使用PHP和XMLReader解析XML

這裏有一個位XML與想什麼,我到

//search through xml 
<lineups country="USA"> 
//cache TX02217 as a variable 
<headend headendId="TX02217"> 
//cache Grande Gables at The Terrace as a variable 
    <name>Grande Gables at The Terrace</name> 
//cache Grande Communications as a variable 
    <mso msoId="17541">Grande Communications</mso> 
    <marketIds> 
    <marketId type="DMA">635</marketId> 
    </marketIds> 
//check to see if any of the postal codes are equal to $pc variable that will be set in the php 
    <postalCodes> 
    <postalCode>11111</postalCode> 
    <postalCode>22222</postalCode> 
    <postalCode>33333</postalCode> 
    <postalCode>78746</postalCode> 
    </postalCodes> 
//cache Austin to a variable 
    <location>Austin</location> 
    <lineup> 
//cache all prgSvcID's to an array i.e. 20014, 10722 
    <station prgSvcId="20014"> 
//cache all channels to an array i.e. 002, 003 
    <chan effDate="2006-01-16" tier="1">002</chan> 
    </station> 
    <station prgSvcId="10722"> 
    <chan effDate="2006-01-16" tier="1">003</chan> 
    </station> 
    </lineup> 
    <areasServed> 
    <area> 
//cache community to a variable $community 
    <community>Thorndale</community> 
    <county code="45331" size="D">Milam</county> 
//cache state to a variable i.e. TX 
    <state>TX</state> 
    </area> 
    <area> 
    <community>Thrall</community> 
    <county code="45491" size="B">Williamson</county> 
    <state>TX</state> 
    </area> 
    </areasServed> 
</headend> 

//if any of the postal codes matched $pc 
//echo back the xml from <headend> to </headend> 

//if none of the postal codes matched $pc 
//clear variables and move to next <headend> 

<headend> 
etc 
etc 
etc 
</headend> 
<headend> 
etc 
etc 
etc 
</headend> 
<headend> 
etc 
etc 
etc 
</headend> 
</lineups> 

PHP:

<?php 
$pc = "78746"; 
$xmlfile="myFile.xml"; 
$reader = new XMLReader(); 
$reader->open($xmlfile); 

while ($reader->read()) { 
//search to see if groups contain $pc and echo info 
} 

我知道我在做這個難度比它應該是,但我有點不知所措試圖操縱這樣一個大文件。任何幫助表示讚賞。

+0

什麼是你實際上是在XML的該塊找? XPath是你的朋友。你只是想看看是否有包含預定值? – mkaatman 2013-03-11 18:15:28

+0

類別。如果我搜索這個大文件,並且塊包含預定的郵編,那麼我想基本上返回該塊。它會將這個龐大文件的大小減少到2%。我仍然會返回XML,但是我將不得不引用的數量將會大大減小。 – user1129107 2013-03-11 18:21:14

回答

0

編輯:哦,你想返回父塊?一會兒。

下面是一個將所有postalCodes拉出到數組中的例子。

http://codepad.org/kHss4MdV

<?php 

$string='<lineups country="USA"> 
<headend headendId="TX02217"> 
    <name>Grande Gables at The Terrace</name> 
    <mso msoId="17541">Grande Communications</mso> 
    <marketIds> 
    <marketId type="DMA">635</marketId> 
    </marketIds> 
    <postalCodes> 
    <postalCode>11111</postalCode> 
    <postalCode>22222</postalCode> 
    <postalCode>33333</postalCode> 
    <postalCode>78746</postalCode> 
    </postalCodes> 
    <location>Austin</location> 
    <lineup> 
    <station prgSvcId="20014"> 
    <chan effDate="2006-01-16" tier="1">002</chan> 
    </station> 
    <station prgSvcId="10722"> 
    <chan effDate="2006-01-16" tier="1">003</chan> 
    </station> 
    </lineup> 
    <areasServed> 
    <area> 
    <community>Thorndale</community> 
    <county code="45331" size="D">Milam</county> 
    <state>TX</state> 
    </area> 
    <area> 
    <community>Thrall</community> 
    <county code="45491" size="B">Williamson</county> 
    <state>TX</state> 
    </area> 
    </areasServed> 
</headend></lineups>'; 

$dom = new DOMDocument(); 
$dom->loadXML($string); 

$xpath = new DOMXPath($dom); 
$elements= $xpath->query('//lineups/headend/postalCodes/*[text()=78746]'); 

if (!is_null($elements)) { 
    foreach ($elements as $element) { 
    echo "<br/>[". $element->nodeName. "]"; 

    $nodes = $element->childNodes; 
    foreach ($nodes as $node) { 
     echo $node->nodeValue. "\n"; 
    } 
    } 
} 

輸出:

<br/>[postalCode]78746 
+0

它會像'if(count($ nodes)){echo $ string; }而不是foreach,還是有更多的呢? – mkaatman 2013-03-11 18:37:41

+0

由於文件太大(可能是一個演出或更多),我認爲解決它的最好方法是使用XMLReader逐個節點。我無法預先加載文件,因爲它太大了。我不想像中包含的其他信息那樣打印出郵編。我想查看一個塊是否包含某個郵編,如果有,我想要回顯整個塊。 – user1129107 2013-03-11 18:46:50

6

爲了獲得更大的靈活性XMLReader我通常創建自己iterators that are able to work on the XMLReader object and provide the steps I need

從對所有節點的簡單迭代開始,直到迭代元素(可選地使用特定名稱)。我們將最後一個XMLElementIterator稱爲讀取器和元素名稱作爲參數。

在你的情況我然後將創建一個返回當前元素的SimpleXMLElement的迭代器,只服用了<headend>元素:

require('xmlreader-iterators.php'); // https://gist.github.com/hakre/5147685 

class HeadendIterator extends XMLElementIterator { 
    const ELEMENT_NAME = 'headend'; 

    public function __construct(XMLReader $reader) { 
     parent::__construct($reader, self::ELEMENT_NAME); 
    } 

    /** 
    * @return SimpleXMLElement 
    */ 
    public function current() { 
     return simplexml_load_string($this->reader->readOuterXml()); 
    } 
} 

配備該迭代作業的其餘部分主要是小菜一碟。首先加載10千兆字節的文件:

$pc  = "78746"; 

$xmlfile = '../data/lineups.xml'; 
$reader = new XMLReader(); 
$reader->open($xmlfile); 

然後檢查<headend>元素包含的信息,如果是的話,顯示數據/ XML:

foreach (new HeadendIterator($reader) as $headend) { 
    /* @var $headend SimpleXMLElement */ 
    if (!$headend->xpath("/*/postalCodes/postalCode[. = '$pc']")) { 
     continue; 
    } 

    echo 'Found, name: ', $headend->name, "\n"; 
    echo "==========================================\n"; 
    $headend->asXML('php://stdout'); 
} 

這並不字面上你想實現:迭代大文檔(這對內存友好),直到找到感興趣的元素。然後處理具體元素,它只是XML; XMLReader::readOuterXml()在這裏是一個很好的工具。

輸出例:

Found, name: Grande Gables at The Terrace 
========================================== 
<?xml version="1.0"?> 
<headend headendId="TX02217"> 
     <name>Grande Gables at The Terrace</name> 
     <mso msoId="17541">Grande Communications</mso> 
     <marketIds> 
      <marketId type="DMA">635</marketId> 
     </marketIds> 
     <postalCodes> 
      <postalCode>11111</postalCode> 
      <postalCode>22222</postalCode> 
      <postalCode>33333</postalCode> 
      <postalCode>78746</postalCode> 
     </postalCodes> 
     <location>Austin</location> 
     <lineup> 
      <station prgSvcId="20014"> 
       <chan effDate="2006-01-16" tier="1">002</chan> 
      </station> 
      <station prgSvcId="10722"> 
       <chan effDate="2006-01-16" tier="1">003</chan> 
      </station> 
     </lineup> 
     <areasServed> 
      <area> 
       <community>Thorndale</community> 
       <county code="45331" size="D">Milam</county> 
       <state>TX</state> 
      </area> 
      <area> 
       <community>Thrall</community> 
       <county code="45491" size="B">Williamson</county> 
       <state>TX</state> 
      </area> 
     </areasServed> 
    </headend> 
+0

我認爲你釘了它。這正是我想要做的。然而,我並不是那麼熟悉PHP並且無法遵循你的例子。你能簡化一下嗎?如果你沒有時間,我會繼續嘗試理解它。 感謝您的回覆! – user1129107 2013-03-12 03:08:30

+0

我應付了你的例子。在主要的PHP文件中,我有include('iterator.php');但是,我收到以下錯誤:致命錯誤:Class'XMLElementIterator'not found in iterator.php – user1129107 2013-03-12 18:44:17

+0

如何在不創建新類的情況下使用父'XMLElementIterator'類? – 2013-08-26 17:03:45