XML轉換爲純文本格式

我的目標是建立一個引擎，採用最新的HL7 CDA 3.0文件，使之與HL7 2.5這是一個完全不同的野獸向後兼容。XML轉換爲純文本格式

的CDA文檔是當與其匹配的XSL文件配對呈現一個HTML文件適合顯示給最終用戶的XML文件。

在HL7 2.5中，我需要得到渲染文本，沒有任何標記，並將其摺疊成文字流（或類似文字），我可以用80個字符行寫出來填充HL7 2.5消息。

到目前爲止，我正在使用XslCompiledTransform轉換使用XSLT和產品所得到的HTML文檔我的XML文檔的方法。

我的下一步是採取文件（或者在此之前一個步驟），並呈現HTML文本。我搜索了一段時間，但無法弄清楚如何實現這一點。我希望能夠簡單地忽略它，或者找不到神奇的搜索條件。任何人都可以提供幫助嗎？

FWIW，我讀過的SO 5個或10等問題，這擁抱或使用正則表達式這個告誡，不要以爲我想這條路走。我需要呈現的文本。

using System; 
using System.IO; 
using System.Xml; 
using System.Xml.Xsl; 
using System.Xml.XPath; 

public class TransformXML 
{ 

    public static void Main(string[] args) 
    { 
     try 
     { 

      string sourceDoc = "C:\\CDA_Doc.xml"; 
      string resultDoc = "C:\\Result.html"; 
      string xsltDoc = "C:\\CDA.xsl"; 

      XPathDocument myXPathDocument = new XPathDocument(sourceDoc); 
      XslCompiledTransform myXslTransform = new XslCompiledTransform(); 

      XmlTextWriter writer = new XmlTextWriter(resultDoc, null); 
      myXslTransform.Load(xsltDoc); 

      myXslTransform.Transform(myXPathDocument, null, writer); 

      writer.Close(); 

      StreamReader stream = new StreamReader (resultDoc); 

     } 

     catch (Exception e) 
     { 
      Console.WriteLine ("Exception: {0}", e.ToString()); 
     } 
    } 
}

來源

2009-06-26 David Walker

既然你有XML源代碼，可以考慮寫一個XSL會給你想要的輸出，無需中間HTML步。這將比嘗試轉換HTML更可靠。

來源

2009-06-26 21:53:03

這將離開你只是文本：

class Program 
{ 
    static void Main(string[] args) 
    { 
     var blah = new System.IO.StringReader(sourceDoc); 
     var reader = System.Xml.XmlReader.Create(blah); 
     StringBuilder result = new StringBuilder(); 

     while (reader.Read()) 
     { 
      result.Append(reader.Value); 
     } 
     Console.WriteLine(result); 
    } 

    static string sourceDoc = "<html><body><p>this is a paragraph</p><p>another paragraph</p></body></html>"; 
}

來源

2009-06-26 19:25:26

或者你可以使用正則表達式：

public static string StripHtml(String htmlText) 
{ 
    // replace all tags with spaces... 
    htmlText = Regex.Replace(htmlText, @"<(.|\n)*?>", " "); 

    // .. then eliminate all double spaces 
    while (htmlText.Contains(" ")) 
    { 
     htmlText = htmlText.Replace(" ", " "); 
    } 

    // clear out non-breaking spaces and & character code 
    htmlText = htmlText.Replace("&nbsp;", " "); 
    htmlText = htmlText.Replace("&amp;", "&"); 

    return htmlText; 
}

來源

2009-06-26 20:09:15 ProKiner

你可以使用類似this它採用猞猁和Perl來呈現HTML，然後將其轉換爲純文本？

來源

2009-06-26 20:12:46 yrral

看到這個答案對一個類似問題上的SO：

How can I Convert HTML to Text in C#

來源

2009-06-26 20:16:51

這是一個偉大的用例的XSL：FO和FOP。 FOP不僅適用於PDF輸出，支持的其他主要輸出之一是文本。你應該能夠構建一個簡單的xslt + fo樣式表，它具有你想要的規格（即行寬）。

這個解決方案的重量更輕一點，就像ScottSEA建議的那樣使用xml-> xslt-> text，但是如果您有更復雜的格式要求（例如縮進），它將變得更容易表達fo，而不是嘲笑xslt。

我會避免提取文本的正則表達式。這太低級並且保證脆弱。如果您只想要文本和80個字符行，則默認的xslt模板將只打印元素文本。一旦你只有文本，你可以應用任何文本處理是必要的。

順便說一句，我爲一家生產CDA的公司工作，這是我們產品的一部分（語音識別功能）。我會研究將3.0直接轉換爲2.5的XSLT。根據您希望在兩個版本之間保持的保真度，如果您真正想要實現的格式之間的轉換，完整的XSLT路徑可能是您最簡單的賭注。這就是XSLT的目標。

來源

2009-06-29 17:09:44

XML轉換爲純文本格式

回答

相關問題