2012-12-20 27 views
3

語境:我該如何解析使用HtmlAgilityPack的<option>標籤的InnerText?

我想從這個Page here解析 「城市」。我已經設法模擬這個組合框的數據請求,這是一個Ajax調用。

提琴手請求:

POST http://www.telelistas.net/AjaxHandler.ashx HTTP/1.1 
Host: www.telelistas.net 
Connection: keep-alive 
Content-Length: 106 
Origin: http://www.telelistas.net 
X-Requested-With: XMLHttpRequest 
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko)  Chrome/23.0.1271.97 Safari/537.11 
Content-Type: application/x-www-form-urlencoded; charset=UTF-8 
Accept: */* 
Referer: http://www.telelistas.net/ 
Accept-Encoding: gzip,deflate,sdch 
Accept-Language: pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4 
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 
Cookie: cert_Origin=directo; [email protected]; auto=automatico=0; searchparameters=bottom=0&btnsite=0&email=&uf=rj&origem=0&nome=&pagina=1&codlogradouro=&predio=213&tiquete=0&localidadeendmap=&codbairro=0&pcount=25&estacionamento=0&letra=&top=&entrega=0&pchave=&info=&logradouro=rua+da+lapa&codtitulo=-1&chave=&zoom=&comercial=0&ddd=0&comib=0&btnemail=0&pgresultado=&localidade=&telefone=&manobrista=0&codlocalidade=21000&site=&cartoes=0&atividade=&bairro=&reserva=0&residencial=0; perfil=logged=1&iduser=2563063&[email protected]&usertype=2&specialsearch=3&siteusernome=BigDataCorp&siteuserdatanasc=15/01/1988&siteusersexo=M&siteuserlocalidade=21000&siteuseruf=RJ&siteuserddd=21&siteusertelefone=94118439&siteuserprofissao=4&siteuserrenda=5000&siteuserformacao=4&siteusernovidades=0&siteusernovidadesrevista=&siteusernovidadesparceiros=0&siteusercpf=10541308769&siteuseracesso=brasil&siteusercep=22631000&siteuseridade=24&siteuserparceiro=telelistas&siteuserconhecimento=2&siteuseroperadora=oi&siteuserurlorigem=http://www.telelistas.net/&siteuserdatacadastro=13/12/2012 11:45:00; __utma=70879631.392027796.1355939587.1356014801.1356021821.5; __utmb=70879631.1.10.1356021821; __utmc=70879631; __utmz=70879631.1355939587.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none) 

PostData : state=rj&style=busca_interna&selectedCity=21000&clientId=pch_localidade_select&method=GetSearchCitiesNamed 

問題:

下面是該請求返回的字符串的一個片段:

<select name='pch_localidade_select' class='busca_interna' id='pch_localidade_select' tabindex="4"><option value="">Selecione</option><option selected value="21000">Rio de Janeiro</option><option value="21380">Abraão</option><option value="21001">Afonso Arinos</option><option value="21002">Agência Luterback</option><option value="21847">Agriões de Dentro</option> 

什麼,我試圖做的,是到達選項標籤InnerText(「里約熱內盧」,「Abraao」...),但由於某種奇怪的原因,InnerText總是空,找到每個選項節點。

有這麼失敗的一些代碼片段:

 // Iterating over nodes to build the dictionary 
     foreach (HtmlNode city in citiesNodes) 
     { 
      string key = city.InnerText; 
      string value = city.Attributes["value"].Value; 

      citiesHash.AddCity (key,value); 
     } 

技術到位:

我使用HtmlAgilityPack支持XPath語法節點選擇,C#代碼和Fiddler2爲WebDebugging。

在此先感謝

回答

1

出於某種奇怪的原因,HtmlAgilityPack不正確地處理這些標籤,所以這個設法解決我的問題。

 // Iterating over nodes to build the dictionary 
     foreach (HtmlNode city in citiesNodes) 
     { 
      if (city.NextSibling != null) 
      { 
       string key = city.NextSibling.InnerText; 
       string value = city.Attributes["value"].Value; 

       citiesHash.AddCity (key,value); 
      } 
     } 

代替直接到達該節點的,我設法通過使用從先前simbling的NextSimbling參考以獲得每個節點的值。

14

加載HTML

HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("option"); 

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 
doc.LoadHtml(html); 

var options = doc.DocumentNode.Descendants("option").Skip(1) 
       .Select(n => new 
       { 
        Value = n.Attributes["value"].Value, 
        Text = n.InnerText 
       }) 
       .ToList(); 
+0

這也爲我工作前只需使用HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("option");,我應該刪除我的回答還是離開爲社會的未來使用這兩個選項? –

+0

我會離開它,但有時候人們不必要地低估了答案。所以這是你的選擇。 –

相關問題