2013-07-16 43 views
0

前段時間,我編寫了一個用於將ONIX文件導入零售數據庫系統的過程。 (ONIX是出版商用於發佈其目錄信息的XML標準。)該過程將XML文件直接導入到數據集中,並且對於我們接收的大多數文件來說運行良好,但偶爾也會有例外。

在這種特殊情況下,我試圖導入的文件在產品描述字段中包含HTML標籤,這與標準Dataset.ReadXML()方法混淆,因爲它試圖將HTML標籤解釋爲XML。一些ONIX文件包括避免這個問題CDATA標籤,但是在這種情況下,發佈已經選擇使用一個標籤屬性來指定該字段是HTML格式,例如:將ONIX XML導入爲忽略HTML標記的數據集

<othertext> 
     <d102>03</d102> 
     <d104 textformat="05"> 
      <p>Enter a world where bloody battles, and heroic deeds combine in the historic struggle to unite Britain in the face of a common enemy.</p> 
      <p>The third instalment in Bernard Cornwell’s King Alfred series, follows on from the outstanding previous novels The Last Kingdom and The Pale Horseman.</p> 
      <p>The year is 878 and the Vikings have been thrown out of Wessex. Uhtred, fresh from fighting for Alfred in the battle to free Wessex, travels north to seek revenge for his father's death, killed in a bloody raid by Uhtred's old enemy, renegade Danish lord, Kjartan.</p> 
      <p>While Kjartan lurks in his formidable stronghold of Dunholm, the north is overrun by chaos, rebellion and fear. Together with a small band of warriors, Uhtred plans his attack on his enemy, revenge fuelling his anger, resolute on bloody retribution. But, he finds himself betrayed and ends up on a desperate slave voyage to Iceland. Rescued by a remarkable alliance of old friends and enemies, he and his allies, together with Alfred the Great, are free to fight once more in a battle for power, glory and honour.</p> 
      <p>‘The Lords of the North’ is a tale of England's making, a powerful story of betrayal, struggle and romance, set in an England torn apart by turmoil and upheaval.</p> 
     </d104> 
    </othertext> 

的TextFormat =「05」屬性表示HTML。

如果不編寫用於解釋HTML的自定義代碼,是否仍然可以使用ReadXML()導入它,還是需要先編程插入CDATA標籤才能解決它?

注意:我不想刪除HTML標記,因爲數據將顯示在網站上。

回答

1

這是Linqpad中的程序,它應該找到textformat = 05節點並將它們的內容包裝在CData節中。看到這個stackoverflow post

void Main() 
{ 
    string xml = @"<othertext> 
      <d102>03</d102> 
      <d104 textformat=""05""> 
       <p>Enter a world where bloody battles, and heroic deeds combine in the historic struggle to unite Britain in the face of a common enemy.</p> 
       <p>The third instalment in Bernard Cornwell’s King Alfred series, follows on from the outstanding previous novels The Last Kingdom and The Pale Horseman.</p> 
       <p>The year is 878 and the Vikings have been thrown out of Wessex. Uhtred, fresh from fighting for Alfred in the battle to free Wessex, travels north to seek revenge for his father's death, killed in a bloody raid by Uhtred's old enemy, renegade Danish lord, Kjartan.</p> 
       <p>While Kjartan lurks in his formidable stronghold of Dunholm, the north is overrun by chaos, rebellion and fear. Together with a small band of warriors, Uhtred plans his attack on his enemy, revenge fuelling his anger, resolute on bloody retribution. But, he finds himself betrayed and ends up on a desperate slave voyage to Iceland. Rescued by a remarkable alliance of old friends and enemies, he and his allies, together with Alfred the Great, are free to fight once more in a battle for power, glory and honour.</p> 
       <p>‘The Lords of the North’ is a tale of England's making, a powerful story of betrayal, struggle and romance, set in an England torn apart by turmoil and upheaval.</p> 
      </d104> 
     </othertext>"; 

    XmlDocument xmlDoc = new XmlDocument(); 
    xmlDoc.LoadXml(xml); 
    var nodes = xmlDoc.SelectNodes("//othertext/*[@textformat='05']"); 
    foreach(XmlNode node in nodes) 
    { 
     var cdata = xmlDoc.CreateCDataSection(node.InnerXml); 
     node.InnerText = string.Empty; 
     node.AppendChild(cdata); 
     node.InnerXml.Dump(); 
    } 
} 
+0

謝謝 - 這是訣竅!我必須做的唯一的調整是在下面的行(注意雙斜線): 'var nodes = xmlDoc.SelectNodes(「// othertext/* [@ textformat = '05']」);' – Billious

+0

I更正了你的更正答案。謝謝 –