使用C＃解析器解析HTML內容

我有以下的HTML文件，我想要得到每個H2（標準（靈活比率）..和執行（靈活比率）...僅包含房間，包含早餐使用C＃解析器解析HTML內容

然後將房間和早餐包括與每個2價格的每個對象，我有標準以及房價和早餐包括2個價格，同樣的行政

我試着用AgilityPack Fizzler但是，沒有得到正確的結果，你能否給我建議一個想法或一個很好的解析器爲這種情況？謝謝

<div id="accordionResizer" style="padding:5px; height:300px; border-radius:6px;" class="ui-widget-content regestancias"> 
    <div id="accordion" class="dias"> 
    <h2> 
     <a href="#"> 
     Standard (Flexible Rate) from 139 € 
     </a> 
    </h2> 
    <div class="estancias_precios estancias_precios_new"> 
     <table style="width: 285px;"> 
     <tr class="" title=""> 
      <cont> 
      <td style="width: 25px;"> 
       <input type="radio" name="estancias" id="tarifa602385" elem="tarifa" idelem="602" idreg="385" precio="139" reg="Only%20Bed" nombre="Standard%20%28Flexible%20Rate%29" /> 
      </td> 
      <td style="width: 155px;"> 
       <label class="descrip" for="tarifa602385" precio="139.00" reg="Only%20Bed" nombre="Standard%20%28Flexible%20Rate%29"> 
       Only Bed 
       </label> 
      </td> 
      <td style="width: 55px;"></td> 
      <td style="width: 55px;"> 
       <strong class="precios_mos">139.00 €</strong> 
      </td> 
      </cont> 
     </tr> 
     <tr class="" title=""> 
      <cont> 
      <td style="width: 25px;"> 
       <input type="radio" name="estancias" id="tarifa602386" elem="tarifa" idelem="602" idreg="386" precio="156.9" reg="Breakfast%20Included" nombre="Standard%20%28Flexible%20Rate%29" /> 
      </td> 
      <td style="width: 155px;"> 
       <label class="descrip" for="tarifa602386" precio="156.90" reg="Breakfast%20Included" nombre="Standard%20%28Flexible%20Rate%29"> 
       Breakfast Included 
       </label> 
      </td> 
      <td style="width: 55px;"></td> 
      <td style="width: 55px;"> 
       <strong class="precios_mos">156.90 €</strong> 
      </td> 
      </cont> 
     </tr> 
     </table> 
    </div> 
    <h2> 
     <a href="#"> 
     Executive (Flexible Rate) from 169 € 
     </a> 
    </h2> 
    <div class="estancias_precios estancias_precios_new"> 
     <table style="width: 285px;"> 
     <tr class="" title=""> 
      <cont> 
      <td style="width: 25px;"> 
       <input type="radio" name="estancias" id="tarifa666385" elem="tarifa" idelem="666" idreg="385" precio="169" reg="Only%20Bed" nombre="Executive%20%28Flexible%20Rate%29" /> 
      </td> 
      <td style="width: 155px;"> 
       <label class="descrip" for="tarifa666385" precio="169.00" reg="Only%20Bed" nombre="Executive%20%28Flexible%20Rate%29"> 
       Only Bed 
       </label> 
      </td> 
      <td style="width: 55px;"></td> 
      <td style="width: 55px;"> 
       <strong class="precios_mos">169.00 €</strong> 
      </td> 
      </cont> 
     </tr> 
     <tr class="" title=""> 
      <cont> 
      <td style="width: 25px;"> 
       <input type="radio" name="estancias" id="tarifa666386" elem="tarifa" idelem="666" idreg="386" precio="186.9" reg="Breakfast%20Included" nombre="Executive%20%28Flexible%20Rate%29" /> 
      </td> 
      <td style="width: 155px;"> 
       <label class="descrip" for="tarifa666386" precio="186.90" reg="Breakfast%20Included" nombre="Executive%20%28Flexible%20Rate%29"> 
       Breakfast Included 
       </label> 
      </td> 
      <td style="width: 55px;"></td> 
      <td style="width: 55px;"> 
       <strong class="precios_mos">186.90 €</strong> 
      </td> 
      </cont> 
     </tr> 
     </table> 
    </div> 
    </div> 
</div>

來源

2014-04-04 bluewonder

在這裏，你去一個快速和骯髒的方法：

class RoomInfo 
    { 
     public String Name { get; set; } 
     public Dictionary<String, Double> Prices { get; set; } 
    } 

    private static void HtmlFile() 
    { 
     List<RoomInfo> rooms = new List<RoomInfo>(); 

     HtmlDocument document = new HtmlDocument(); 
     document.Load("file.txt"); 

     var h2Nodes = document.DocumentNode.SelectNodes("//h2"); 
     foreach (var h2Node in h2Nodes) 
     { 
      RoomInfo roomInfo = new RoomInfo 
      { 
       Name = h2Node.InnerText.Trim(), 
       Prices = new Dictionary<string, double>() 
      }; 

      var labels = h2Node.NextSibling.NextSibling.SelectNodes(".//label"); 
      foreach (var label in labels) 
      { 
       roomInfo.Prices.Add(label.InnerText.Trim(), Convert.ToDouble(label.Attributes["precio"].Value, CultureInfo.InvariantCulture)); 
      } 
      rooms.Add(roomInfo); 
     } 
    }

剩下的就是你了！ ;-)

來源

2014-04-04 13:12:48 TheCutter

謝謝你的偉大的方法，它在我的例子中效果很好，但是，因爲我必須處理HTML頁面的整個原始內容，xpath可能會與其他標記錯誤，請引導我如何能夠在原始HTML文件中僅僅分離上面的HTML內容，或者我們必須在代碼中更改以適應整個html文件？ html文件在這裏http://notepad.cc/share/dlEOcHwncJ謝謝@TheCutter – bluewonder

只是改變行var h2Nodes = document.DocumentNode.SelectNodes（「// h2」）; to var h2Nodes = document.DocumentNode.SelectNodes（「// div [@ class ='dias']/h2」）; – TheCutter

看看xpath：http://de.wikipedia.org/wiki/XPath – TheCutter

使用C＃解析器解析HTML內容

回答

相關問題