遞歸鏈接c＃

我一整天都在苦苦掙扎，我似乎無法弄清楚。我有一個功能，給我一個特定網址上的所有鏈接的列表。這工作正常。但是我想使這個函數遞歸，以便它搜索與第一個搜索找到的鏈接，並將它們添加到列表中並繼續，以便它遍歷網站上的所有頁面。我該如何做這個遞歸？遞歸鏈接c＃

我的代碼：

class Program 
{ 
public static List<LinkItem> urls; 
private static List<LinkItem> newUrls = new List<LinkItem>(); 

static void Main(string[] args) 
{ 
    WebClient w = new WebClient(); 
    int count = 0; 
    urls = new List<LinkItem>(); 
    newUrls = new List<LinkItem>(); 
    urls.Add(new LinkItem{Href = "http://www.smartphoto.be", Text = ""}); 

    while (urls.Count > 0) 
    { 
    foreach (var url in urls) 
    { 
     if (RemoteFileExists(url.Href)) 
     { 
     string s = w.DownloadString(url.Href); 
     newUrls.AddRange(LinkFinder.Find(s)); 
     } 
    } 
    urls = newUrls.Select(x => new LinkItem{Href = x.Href, Text=""}).ToList(); 
    count += newUrls.Count; 
    newUrls.Clear(); 
    ReturnLinks(); 
    } 

    Console.WriteLine(); 
    Console.Write("Found: " + count + " links."); 
    Console.ReadLine(); 
} 

private static void ReturnLinks() 
{ 
    foreach (LinkItem i in urls) 
    { 
    Console.WriteLine(i.Href); 
    //ReturnLinks(); 
    } 
} 

private static bool RemoteFileExists(string url) 
{ 
    try 
    { 
    HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest; 
    request.Method = "HEAD"; 
    //Getting the Web Response. 
    HttpWebResponse response = request.GetResponse() as HttpWebResponse; 
    //Returns TURE if the Status code == 200 
    return (response.StatusCode == HttpStatusCode.OK); 
    } 
    catch 
    { 
    return false; 
    } 
} 
}

背後LinkFinder.Find的代碼可以在這裏找到：http://www.dotnetperls.com/scraping-html

任何人都知道我怎麼可以讓該函數的遞歸或者，我可以讓ReturnLinks函數的遞歸？我更喜歡不要觸摸LinkFinder.Find方法，因爲這適用於一個鏈接，我應該可以根據需要多次調用它以展開最終的url列表。

來源

2011-07-29 PitAttack76

我假設你想加載每個鏈接並找到鏈接，並繼續，直到你用完鏈接？

由於遞歸深度可能會變得非常大，我會避免遞歸，這應該工作，我認爲。

WebClient w = new WebClient(); 
int count = 0;  
urls = new List<string>(); 
newUrls = new List<LinkItem>(); 
urls.Add("http://www.google.be"); 

while (urls.Count > 0) 
{ 
    foreach(var url in urls) 
    { 
     string s = w.DownloadString(url); 
     newUrls.AddRange(LinkFinder.Find(s)); 
    } 
    urls = newUrls.Select(x=>x.Href).ToList(); 
    count += newUrls.Count; 
    newUrls.Clear(); 
    ReturnLinks(); 
} 

Console.WriteLine(); 
Console.Write("Found: " + count + " links."); 
Console.ReadLine();

來源

2011-07-29 23:57:41

THX您的快速回復！但它不起作用:(當我清除newUrls沒有顯示，這是正確的，因爲urls = newUrls。當我刪除newUrls.clear，我收到修改錯誤的行urls = newUrls ... – PitAttack76

@ Stieven76 ，是的，你是對的，我編輯了一點，只是提取hrefs作爲我認爲應該工作的字符串列表，否則你可以簡單地用普通的for循環來做，並且在飛行時追加到列表中。 –

我有修改它有點現在它似乎工作，但現在我已經遇到了其他問題。其中一個找到的鏈接在循環中的這一行中會引發404錯誤：string s = w.DownloadString（url）;其中是正確的，因爲該網頁是不可用的。我怎樣才能跳過一個鏈接，引發錯誤？Thx很多爲您的幫助到目前爲止！ – PitAttack76

static void Main() 
{ 
    WebClient w = new WebClient(); 

    List<ListItem> allUrls = FindAll(w.DownloadString("http://www.google.be")); 
} 

private static List<ListItem> FindAll(string address) 
{ 
    List<ListItem> list = new List<ListItem>(); 

    foreach (url in LinkFinder.Find(address)) 
    { 
     list.AddRange(FindAll(url.Address)));//or url.ToString() or what ever the string that represents the address 
    } 

    return list; 
}

來源

2011-07-30 00:07:05

回答

相關問題