在搜索中鏈接會更復雜。他們的搜索關閉未授權的用戶。
首先,您需要用您的瀏覽器登錄,然後以您的會話cookie li_at
和_lipt
。
LinkedIn不會將結果列表直接呈現給html標記。他將大的json對象渲染成<code>
元素,然後使用JS來渲染它。
你的控制檯應用程序應該是這樣的:
static void Main(string[] args)
{
var html = @"https://www.linkedin.com/search/results/index/?keywords=firstname%3Ajohn%20AND%20lastname%3Adoe%20AND%20company%3Amicrosoft%20AND%20title%3Aceo&origin=GLOBAL_SEARCH_HEADER";
HtmlWeb web = new HtmlWeb();
web.PreRequest = new HtmlWeb.PreRequestHandler(OnPreRequest2);
var htmlDoc = web.Load(html);
var codeElement = htmlDoc.DocumentNode.SelectNodes("//code[starts-with(@id,'bpr-guid')][last()]");
var json = WebUtility.HtmlDecode(codeElement.Last().InnerText);
var obj = JsonConvert.DeserializeObject<Rootobject>(json);
var profiles = obj.included.Where(i => i.firstName != null);
foreach(var profile in profiles)
{
Console.WriteLine("Profile Name: " + profile.firstName + ";" + profile.lastName + ";" + profile.occupation + ";https://www.linkedin.com/in/" + profile.publicIdentifier);
}
Console.ReadKey();
}
public static bool OnPreRequest2(HttpWebRequest request)
{
var cookies = "li_at={YOURCOOKIEHERE};" +
"_lipt={YOURCOOKIEHERE}";
request.Headers.Add(@"cookie:" + cookies);
return true;
}
public class Rootobject
{
public Included[] included { get; set; }
}
public class Included
{
public string firstName { get; set; }
public string lastName { get; set; }
public string occupation { get; set; }
public string objectUrn { get; set; }
public string publicIdentifier { get; set; }
}
在年底將打印
Profile Name: John;Doe;ceo at Microsoft;https://www.linkedin.com/in/john-doe-8102b868
Profile Name: John;Doe;Ceo at Microsoft;https://www.linkedin.com/in/john-doe-63803769
Profile Name: John;Doe;CEO at Microsoft;https://www.linkedin.com/in/john-doe-2151b69b
我安裝HTML敏捷性包,但得到的生成錯誤 - JsonConvert,HttpWebRequest的,WebUtility不存在,OnPreRequest2沒有重載與..PreRequestHandler相匹配 - 我應該包含哪些其他名稱空間? –
nvm,我不得不安裝NewtonSoft.Json版本9,因爲10不適用於VS2012 ..順便說一句,我有一個簡單的要求 - 從文件/表中讀取html搜索變量並將輸出轉儲到文件/表格。你有興趣爲$ 25亞馬遜禮品卡編碼嗎? –