2017-06-19 18 views
0

我想爲LinkedIn查詢提供最高搜索結果。使用html敏捷包從LinkedIn領取搜索結果

在此琴:https://dotnetfiddle.net/Vtwi7g

傳遞到 'HTML' VAR此鏈接:
https://www.linkedin.com/search/results/index/?keywords=firstname%3Ajohn%20AND%20lastname%3Adoe%20AND%20company%3Amicrosoft%20AND%20title%3Aceo&origin=GLOBAL_SEARCH_HEADER

我想第一個結果: https://www.linkedin.com/in/john-doe-63803769/

  • 我猜節目的需求首先登錄LinkedIn的一些憑據 - 我如何通過這些?

  • 我試過Inspect元素來查看它的位置 - 如何遍歷DOM以獲得第一個結果?

回答

1

在搜索中鏈接會更復雜。他們的搜索關閉未授權的用戶。

首先,您需要用您的瀏覽器登錄,然後以您的會話cookie li_at_lipt

LinkedIn不會將結果列表直接呈現給html標記。他將大的json對象渲染成<code>元素,然後使用JS來渲染它。

你的控制檯應用程序應該是這樣的:

static void Main(string[] args) 
{ 
    var html = @"https://www.linkedin.com/search/results/index/?keywords=firstname%3Ajohn%20AND%20lastname%3Adoe%20AND%20company%3Amicrosoft%20AND%20title%3Aceo&origin=GLOBAL_SEARCH_HEADER"; 

    HtmlWeb web = new HtmlWeb(); 
    web.PreRequest = new HtmlWeb.PreRequestHandler(OnPreRequest2); 
    var htmlDoc = web.Load(html); 

    var codeElement = htmlDoc.DocumentNode.SelectNodes("//code[starts-with(@id,'bpr-guid')][last()]"); 
    var json = WebUtility.HtmlDecode(codeElement.Last().InnerText); 
    var obj = JsonConvert.DeserializeObject<Rootobject>(json); 
    var profiles = obj.included.Where(i => i.firstName != null); 
    foreach(var profile in profiles) 
    { 
     Console.WriteLine("Profile Name: " + profile.firstName + ";" + profile.lastName + ";" + profile.occupation + ";https://www.linkedin.com/in/" + profile.publicIdentifier); 
    } 
    Console.ReadKey(); 
} 
public static bool OnPreRequest2(HttpWebRequest request) 
{ 
    var cookies = "li_at={YOURCOOKIEHERE};" + 
        "_lipt={YOURCOOKIEHERE}"; 
    request.Headers.Add(@"cookie:" + cookies); 
    return true; 
} 


public class Rootobject 
{ 
    public Included[] included { get; set; } 
} 


public class Included 
{ 
    public string firstName { get; set; } 
    public string lastName { get; set; } 
    public string occupation { get; set; } 
    public string objectUrn { get; set; } 
    public string publicIdentifier { get; set; } 
} 

在年底將打印

Profile Name: John;Doe;ceo at Microsoft;https://www.linkedin.com/in/john-doe-8102b868 
Profile Name: John;Doe;Ceo at Microsoft;https://www.linkedin.com/in/john-doe-63803769 
Profile Name: John;Doe;CEO at Microsoft;https://www.linkedin.com/in/john-doe-2151b69b 
+0

我安裝HTML敏捷性包,但得到的生成錯誤 - JsonConvert,HttpWebRequest的,WebUtility不存在,OnPreRequest2沒有重載與..PreRequestHandler相匹配 - 我應該包含哪些其他名稱空間? –

+0

nvm,我不得不安裝NewtonSoft.Json版本9,因爲10不適用於VS2012 ..順便說一句,我有一個簡單的要求 - 從文件/表中讀取html搜索變量並將輸出轉儲到文件/表格。你有興趣爲$ 25亞馬遜禮品卡編碼嗎? –