我怎樣才能提取IGN網站的URL鏈接

我想在不同的選項卡我怎樣才能提取IGN網站的URL鏈接

提取的評論的url本網頁 http://uk.ign.com/games/reviews 然後打開前5現在，我已經嘗試不同的選擇去嘗試拿起正確的數據，但沒有東西似乎返回任何東西。我似乎無法超越提取列表中每個評論的網址，更不用說在單獨的標籤中打開前5個。

我使用Python 3與Python IDE

這裏是我的代碼：

import webbrowser, bs4, requests, re 

webPage = requests.get("http://uk.ign.com/games/reviews", headers={'User- 
Agent': 'Mozilla/5.0'}) 

webPage.raise_for_status() 

webPage = bs4.BeautifulSoup(webPage.text, "html.parser") 

#Me trying different selections to try extract the right part of the page 
webLinks = webPage.select(".item-title") 
webLinks2 = webPage.select("h3") 
webLinks3 = webPage.select("div item-title") 

print(type(webLinks)) 
print(type(webLinks2)) 
print(type(webLinks3)) 
#I think this is where I've gone wrong. These all returning empty lists. 
#What am I doing wrong? 


lenLinks = min(5, len(webLinks)) 
for i in range(lenLinks): 
    webbrowser.open('http://uk.ign.com/' + webLinks[i].get('href'))

來源

2017-05-13 SeyiA

任何運氣找到這些鏈接？ – Nevermore

我可以找到網頁上的所有鏈接，但我無法提取我想要的鏈接。 webLinks = webPage.find_all（'a'）給我所有頁面上的鏈接現在我試圖提取「項目標題」與「h3」類下的鏈接。我試過 webItems = webPage.find_all（ 'A'，{ '階級'：「標題」}）威比= webPage.find_all（類_ = 「H3」）沒有這些工作，也許我應該使用一個for循環的某種？ – SeyiA

使用BS4，BeautifulSoup和soup對象時，它返回（你必須爲webPage，您可以撥打：

webLinks = webPage.find_all('a') 
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
# <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

find_all返回基於他們的標題元素的列表（在你的情況下，這些都是HTML元素;到g等你需要更進一步的鏈接。您可以訪問HTML元素的屬性（在你的情況，你想要的HREF）像使用字典：

for a in soup.find_all('a', href=True): 
    print "Found the URL:", a['href']

詳情請參閱BeautifulSoup getting href。或者當然docs

PS蟒蛇通常寫有snake_case而不是駝峯:)

來源

2017-05-13 18:46:25 Nevermore

這是有效的，我正在閱讀Beautiful Soup doc的find_all部分，並且想知道如果我想要定位網頁上的特定鏈接，還是應該使用for循環來取出鏈接，是否需要使用find_parents（）我想從最初的find_all（'a'）聲明中得到，就像你使用['href']一樣？ – SeyiA

嗨！我很高興它有效 - 我不確定你的下一個問題，但我認爲你是在正確的軌道上：'find_parents/children'將返回一個對象，你可以再次調用'find_all'。 ..無論如何，如果這是您正在尋找的答案，請將其標記爲已接受，以便其他人稍後可以找到它:) – Nevermore

我怎樣才能提取IGN網站的URL鏈接

回答

相關問題