Beautifulsoup在while循環中調用時返回相同的結果

我是python的新手，並試圖編寫一個scrapper來獲取頁面上的所有鏈接，具有多個分頁。我在while循環中調用以下代碼。Beautifulsoup在while循環中調用時返回相同的結果

page = urllib2.urlopen(givenurl,"",10000) 

soup = BeautifulSoup(page, "lxml") 

linktags = soup.findAll('span',attrs={'class':'paginationLink pageNum'}) 

page.close() 

BeautifulSoup.clear(soup) 

return linktags

它總是返回第一個URL我傳遞的結果。難道我做錯了什麼？

來源

2012-11-21 vih

你能說明你是如何調用循環的嗎？你確定這個網址是不同的嗎？ – jdi

如果循環內有回車，它將不會迭代多次。 –

@uncollected：我敢打賭你只是釘了它 – jdi

@uncollected可能在評論中對您有正確的答案，但我想對其進行擴展。

如果您要求確切的代碼，但嵌套在while塊中，它將立即返回第一個結果。你可以在這裏做兩件事。

我不確定您在自己的環境中如何使用while，所以我在此處使用for循環。

擴展結果列表，並返回

def getLinks(urls): 
    """ processes all urls, and then returns all links """ 
    links = [] 
    for givenurl in urls: 
     page = urllib2.urlopen(givenurl,"",10000) 
     soup = BeautifulSoup(page, "lxml") 
     linktags = soup.findAll('span',attrs={'class':'paginationLink pageNum'}) 
     page.close() 
     BeautifulSoup.clear(soup) 
     links.extend(linktags) 
     # dont return here or the loop is over 

    return links

或者，而不是返回的整個列表，你可以把它generator, using the yield keyword。生成器將返回每個結果並暫停，直到下一個循環：

def getLinks(urls): 
    """ generator yields links from one url at a time """ 
    for givenurl in urls: 
     page = urllib2.urlopen(givenurl,"",10000) 
     soup = BeautifulSoup(page, "lxml") 
     linktags = soup.findAll('span',attrs={'class':'paginationLink pageNum'}) 
     page.close() 
     BeautifulSoup.clear(soup) 
     # this will return the current results, 
     # and pause the state, until the the next 
     # iteration is requested  
     yield linktags

來源

2012-11-21 01:38:13 jdi

Beautifulsoup在while循環中調用時返回相同的結果

回答

相關問題