將ASCII編碼爲HTML

我正在嘗試WebClient中的DownloadData方法。我目前的問題是，我一直無法弄清楚如何將ASCII result（<到<,\n,>到>）從Encoding.ASCII.GetString(myDataBuffer);生產出來，在page之外。將ASCII編碼爲HTML

pagesource http://iforce.co.nz/i/z4f2wggp.evi.png

/// <summary> 
    /// Curl data from the PMID 
    /// </summary> 
    private void ClientPMID(int pmid) 
    { 
     //generate the URL for the client 
     StringBuilder pmid_url_string = new StringBuilder(); 
     pmid_url_string.Append("http://www.ncbi.nlm.nih.gov/pubmed/").Append(pmid.ToString()).Append("?report=xml"); 
     Uri PMIDUri = new Uri(pmid_url_string.ToString()); 
     //declare and initialize the client 
     WebClient client = new WebClient(); 
     // Download the Web resource and save it into a data buffer. 
     byte[] myDataBuffer = client.DownloadData(PMIDUri); 
     this.DownloadCompleted(myDataBuffer); 
    } 
    /// <summary> 
    /// Crawl over the binary from myDataBuffer 
    /// </summary> 
    /// <param name="myDataBuffer">Binary Buffer</param> 
    private void DownloadCompleted(byte[] myDataBuffer) 
    { 
     string download = Encoding.ASCII.GetString(myDataBuffer); 
     PMIDCrawler pmc = new PMIDCrawler(download, "/pre/PubmedArticle/MedlineCitation/Article"); 
     //iterate over each node in the file 
     foreach (XmlNode xmlNode in pmc.crawl) 
     { 
      string AbstractTitle = xmlNode["ArticleTitle"].InnerText; 
      string AbstractText = xmlNode["Abstract"]["AbstractText"].InnerText; 
     } 
    }

代碼PMIDCrawler可以用我的關於DownloadStringCompletedEventHandler其他SO問題。儘管從string html = HttpUtility.HtmlDecode(nHtml);輸出無效HTML (OR XML)（由於它不響應xml http標頭），在收到Encoding.ASCII.GetString的內容後。

來源

2013-03-13 Killrawr

下面是如何用JavaScript做到這一點，例如http://stackoverflow.com/questions/5796718/html-entity-decode – Hogan 2013-03-13 02:48:28

不幸的是這臺服務器無法正確響應Accept: text/xml或Accept: application/xml所以你必須要做到這一點艱難地（HttpUtility）

string download = HttpUtility.HtmlDecode(Encoding.ASCII.GetString(myDataBuffer));

（在.NET FX或WebUtility.Decode 4.5+）

或

string download = Encoding.ASCII.GetString(myDataBuffer); 
if (download != null) { // this won't get all HTML escaped characters... 
    download = download.Replace("&lt;", "<").Replace("&gt;", ">"); 
}

另請參閱this question瞭解更多信息。

來源

2013-03-13 03:13:21 cfeduke

+1爲一個很好的建議，但無論如何要解決的事實，每個'屬性'正在逃脫？例如[<？xml version = \「1.0 \」encoding = \「utf-8 \」？>]（http://pastebin.com/hjCwhEhL） – Killrawr 2013-03-13 03:16:16

確保你的'\''和'\ n'你看到的不僅僅是Visual Studio調試器的工件，如果你在斷點處檢查一個字符串的話（以前一直都是這樣）我們可以通過Console.WriteLine來驗證我是否記得我的C＃ /.NET正確。 – cfeduke 2013-03-13 03:19:07

Are you certain？'curl --header「Accept：text/html」http：//www.ncbi.nlm.nih.gov/pubmed/22918716 \？report \ = xml'告訴我HTML實體轉義了「XML」，但沒有「\ n」和「\」'標記。 – cfeduke 2013-03-13 03:22:23

將ASCII編碼爲HTML

回答

相關問題