2015-11-07 60 views
-1

我是新來解析,我需要從網站獲取CSRF令牌來檢查用戶名是否可用。我知道CSRF令牌存儲在前20行左右的網站HTML源代碼中。如何用VB中的HtmlAgilityPack解析這些數據?

<head> 
<title>Website</title> 
<link href="https://fd8c6a1c31abbcfc87c6-9d6bfcdc55882636852ba868a15bca98.ssl.cf5.rackcdn.com/assets/application-afcd9b96896e2ce19d68b2974eb4eb13.css" media="screen" rel="stylesheet"> 
<meta charset="utf-8"> 
<meta content="IE=edge" http-equiv="X-UA-Compatible" 
<meta content="name check, username, domain, check username" name="keywords"> 
<meta content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" name="viewport"> 
<meta content="yes" name="mobile-web-app-capable"> 
<meta content="yes" name="apple-mobile-web-app-capable"> 
<meta content="black" name="apple-mobile-web-app-status-bar-style"> 
<meta content="Namechk | Username &amp;amp; Domain Availability Search" property="og:title"> 
<meta content="https://namechk.com/" property="og:url"> 
<meta content="website" property="og:type"> 
<meta content="Use Namechk to search for an available username or domain and secure your brand across the internet." property="og:description"> 
<meta content="https://fd8c6a1c31abbcfc87c6-9d6bfcdc55882636852ba868a15bca98.ssl.cf5.rackcdn.com/assets/logo-full-61eada359058051842c4209ccb16acba.png" property="og:image"> 
<meta content="en_US" property="og:locale"> 
<meta content="authenticity_token" name="csrf-param"> 
<meta content="hVv1hnUD4epiXiojaU2ZjZeRlZfYmoY8Dm6d/h0X3fI=" name="csrf-token"> 
<link href="https://use.fonticons.com/kits/4e70153b/4e70153b.css" media="all" rel="stylesheet"> 
<link href="https://use.fonticons.com/kits/48e45036/48e45036.css" media="all" rel="stylesheet"> 
<script type="text/javascript" src="https://wd-edge.sharethis.com/button/getAllAppDefault.esi?cb=stLight.allDefault&amp;app=all&amp;publisher=8e46a0ce-9473-4683-b2db-c97461495d29&amp;domain=namechk.com"></script> 
<style> 
    .adsbygoogle, 
    .top-ad { 
     display: none !important; 
    } 
</style> 
<link rel="stylesheet" type="text/css" href="//sd.sharethis.com/disc/css/hoverbuttons.6eab8de2ee93b309873157b6d3f977fe.css"> 
<script type="text/javascript" src="//sd.sharethis.com/disc/js/hoverbuttons.035267d71d894482eb413e5bea488ff5.js"></script> 
<link rel="stylesheet" type="text/css" href="https://ws.sharethis.com/button/css/buttons-secure.css"> 
<script type="text/javascript" src="https://ssl.google-analytics.com/ga.js"></script> 

我需要解析出是CSRF令牌,該令牌,在上述代碼段中,是 「hVv1hnUD4epiXiojaU2ZjZeRlZfYmoY8Dm6d/h0X3fI =」。我想使用HTMLAgilityPack庫來做到這一點。

回答

0

我們假設HTML文件存儲在您的驅動器中。首先我們加載HTML文件。

Dim doc = New HtmlDocument() 
doc.Load("HTMLPage1.htm") ' assume it's in the executable folder 

然後,您可以使用Linq to XML來查詢這個HTML文件。後代(「meta」)意味着獲得名稱爲meta的所有節點。然後檢查節點是否具有name屬性。如果它具有name屬性,請檢查它的值是否爲csrf-token

Dim node = doc. _ 
    DocumentNode. _ 
    Descendants("meta"). _ 
    FirstOrDefault(Function(x) 
         Return _ 
          x.Attributes.Contains("name") _ 
          AndAlso x.Attributes("name").Value = "csrf-token" 
        End Function) 

然後你就可以得到該節點的content屬性的值。我使用一個控制檯應用程序,所以我只是把它打印到屏幕上。

If Not node Is Nothing Then 
     Console.WriteLine(node.Attributes("content").Value) 
    Else 
     Console.WriteLine("Not found!") 
    End If 

完整的源代碼。

Imports HtmlAgilityPack 

Module Module1 

    Sub Main() 

     ' load the html 
     Dim doc = New HtmlDocument() 
     doc.Load("HTMLPage1.htm") 

     ' query the html 
     Dim node = doc. _ 
      DocumentNode. _ 
      Descendants("meta"). _ 
      FirstOrDefault(Function(x) 
           Return _ 
            x.Attributes.Contains("name") _ 
            AndAlso x.Attributes("name").Value = "csrf-token" 
          End Function) 

     ' print result 
     If Not node Is Nothing Then 
      Console.WriteLine(node.Attributes("content").Value) 
     Else 
      Console.WriteLine("Not found!") 
     End If 

     Console.ReadKey(True) 

    End Sub 

End Module 

如果HTML文件在線,您應該實例化一個HtmlWeb類。然後用它從服務器加載你的HTML文件。

Dim web = New HtmlWeb() 
doc = web.Load("www.somewebsite.com/somefile.html")