2012-10-01 31 views
0

我有一個SimpleXML對象,它是通過合併來自PubMed的多個XML(下面的代碼片斷)製作的,但是有合併的重複。我怎麼能比較所有的第一個孩子數組 - 數組[] [0],數組[] [1]等 - 並放棄任何重複? 雖然序列化可能是答案,但你不能序列化SimpleXML對象afaik ..如何區分SimpleXML多維數組?

我不知道從哪裏開始?

Array 
(
    [0] => Array 
    (
     [title] => SimpleXMLElement Object 
      (
       [0] => Superstructure of the centromeric complex of TubZRC plasmid partitioning systems. 
      ) 

     [link] => SimpleXMLElement Object 
      (
       [@attributes] => Array 
        (
         [Version] => 1 
        ) 

       [0] => 23010931 
      ) 

     [author] => Aylett, CH., Löwe, J. 
     [journal] => SimpleXMLElement Object 
      (
       [0] => Proc. Natl. Acad. Sci. U.S.A. 
      ) 

     [pubdate] => 2012-9-27 
     [day] => SimpleXMLElement Object 
      (
       [0] => 25 
      ) 

     [month] => SimpleXMLElement Object 
      (
       [0] => Sep 
      ) 

     [year] => SimpleXMLElement Object 
      (
       [0] => 2012 
      ) 

    ) 
    [1] => Array 
    (
     [title] => SimpleXMLElement Object 
      (
       [0] => Superstructure of the centromeric complex of TubZRC plasmid partitioning systems. 
      ) 

     [link] => SimpleXMLElement Object 
      (
       [@attributes] => Array 
        (
         [Version] => 1 
        ) 

       [0] => 23010931 
      ) 

     [author] => Aylett, CH., Löwe, J. 
     [journal] => SimpleXMLElement Object 
      (
       [0] => Proc. Natl. Acad. Sci. U.S.A. 
      ) 

     [pubdate] => 2012-9-27 
     [day] => SimpleXMLElement Object 
      (
       [0] => 25 
      ) 

     [month] => SimpleXMLElement Object 
      (
       [0] => Sep 
      ) 

     [year] => SimpleXMLElement Object 
      (
       [0] => 2012 
      ) 

    ) 

或者它可以在初始XML合併階段進行 - 我用下面的代碼的那一刻,如果任何人都可以建議如何修改它來刪除重複?

function simplexml_merge (SimpleXMLElement &$xml1, SimpleXMLElement $xml2) { 
    $dom1 = new DomDocument(); 
    $dom2 = new DomDocument(); 

    $dom1->loadXML($xml1->asXML()); 
    $dom2->loadXML($xml2->asXML()); 

    $xpath = new domXPath($dom2); 
    $xpathQuery = $xpath->query('/*/*'); 
    for ($i = 0; $i < $xpathQuery->length; $i++) { 
     $dom1->documentElement->appendChild(
     $dom1->importNode($xpathQuery->item($i), true)); 
    } 
    $xml1 = simplexml_import_dom($dom1); 
} 

$xml1 = new SimpleXMLElement($search1); 
$xml2 = new SimpleXMLElement($search2); 

simplexml_merge($xml1, $xml2); 

謝謝。

... ...

爲了清晰 - 這裏的,我導入到SimpleXML的XML源佈局 - 每個PubmedArticle是一個「元素」我感興趣的比較,並確保有沒有重複 -

<xml...> 
    <Document> 
     <PubmedArticle> 
      <MedlineCitation> 
       <PMID version="1">xxx</PMID> 
       ... 
      </MedlineCitation> 
      ... 
     </PubmedArticle> 
     <PubmedArticle> 
      <MedlineCitation> 
       <PMID version="1">xxx</PMID> 
       ... 
      </MedlineCitation> 
      ... 
     </PubmedArticle> 
     etc 
    </Document> 
    </xml> 

PMID節點是唯一的,因此可用於檢查重複項。

... ...

使用從@Gordon鏈接 - 我知道使用:

//Get my source XML 
$xml1 = new SimpleXMLElement($search1); 
$xml2 = new SimpleXMLElement($search2); 

//Run through $xml1 and build a query based on it's PMIDs 
$query = array(); 
foreach ($xml1->PubmedArticle as $paper) { 
    $query[] = sprintf('(PMID != %s)',$paper->MedlineCitation->PMID); 
} 
$query = implode('and', $query); 

//Run through $xml2 and get node which don't have PMID matching $xml1 
foreach ($xml2->xpath(sprintf('PubmedArticle/MedlineCitation[%s]', $query)) as $paper) { 
    echo $paper->asXml(); 
} 

不過我還有一個問題 - 讓輸出合併。 $xml2的輸出缺少圍繞每個「匹配」開頭的<PubmedArticle>節點。然後我認爲我可以使用相同的合併代碼(上面)來進行合併。 你能指出我正確的方向嗎?

+1

看看http://stackoverflow.com/questions/6640255/show-the-differences-between-2-xml-files-with-php/6641021#6641021回答你的問題 – Gordon

+0

@戈登 - 太好了,謝謝。這真的有幫助。我還有一個問題。你能幫我指點一下如何合併它? 我想,除了第二個'foreach'中的'echo',我需要做一些類似於我現有的合併代碼的東西,但addChild'在添加結果之前創建''? – phil

回答

0

決定遵循@ Gordon的路線,因爲它保留了XML。最終得到了這一切的工作:

//function to check 2 xml inputs for duplicate nodes 
    function dedupeXML($xml1, $xml2) { 
     $query = array(); 
     foreach ($xml1->PubmedArticle as $paper) { 
      $query[] = sprintf('(MedlineCitation/PMID != %s)',$paper->MedlineCitation->PMID); 
     } 
     $query = implode('and', $query); 

     $xmlClean = '<Document>'; 
     foreach ($xml2->xpath(sprintf('PubmedArticle[%s]', $query)) as $paper) { 
      $xmlClean .= $paper->asXML(); 
     } 
     $xmlClean .= '</Document>'; 
     $xmlClean = new SimpleXMLElement($xmlClean); 
     return $xmlClean; 
    } 
//function to merge 2 xml inputs 
    function mergeXML (SimpleXMLElement &$xml1, SimpleXMLElement $xml2) { 
     // convert SimpleXML objects into DOM ones 
     $dom1 = new DomDocument(); 
     $dom2 = new DomDocument(); 
     $dom1->loadXML($xml1->asXML()); 
     $dom2->loadXML($xml2->asXML()); 
     // pull all child elements of second XML 
     $xpath = new domXPath($dom2); 
     $xpathQuery = $xpath->query('/*/*'); 
     for ($i = 0; $i < $xpathQuery->length; $i++) { 
      // and pump them into first one 
      $dom1->documentElement->appendChild(
      $dom1->importNode($xpathQuery->item($i), true)); 
     } 
     $xml = simplexml_import_dom($dom1); 
     return $xml; 
    } 

    $xml1 = new SimpleXMLElement($search1); 
    $xml2 = new SimpleXMLElement($search2); 
    $xml3 = new SimpleXMLElement($search3); 
    //dedupe and merge inputs 
    //input 1 & 2 
    $xml2Clean = dedupeXML($xml1, $xml2); 
    $xml12 = mergeXML($xml1, $xml2Clean); 
    //input 1+2 & 3 
    $xml3Clean = dedupeXML($xml12, $xml3); 
    $xml123 = mergeXML($xml12, $xml3Clean); 

這會很容易適應其他數據源 - 只需修改dedupeXML功能來匹配您的XML的數據結構。

1

將它轉換爲一個數組(我不打算爲你寫,只是迭代和添加。),然後array_diff()

+0

合併之前你的意思是? 我試圖將每個XML源代碼轉換爲數組,然後使用array_diff,但這不適用於多維數組afaik。我錯過了明顯的東西嗎? – phil

+1

是的,你必須寫一些遞歸或使用這個。 數組合並遞歸不同。 http://us.php.net/manual/en/function.array-merge-recursive.php#92195 – wesside