1

我正在使用selenium驅動程序並使用python腳本來執行此操作。這裏是我的代碼。我需要瀏覽網頁中的每個鏈接及其子頁面鏈接

d = webdriver.Chrome() 
d.get("http://localhost:8080") 
list_links = d.find_elements_by_tag_name('a') 

for i in list_links: 
    print url 

上述程序正確地給予了把儘可能

https://www.w3schools.com/ 
https://www.ubuntu.com/ 
None 

但是當我編譯下面的代碼:

d = webdriver.Chrome() 
d.get("http://localhost:8080") 
list_links = d.find_elements_by_tag_name('a') 

for i in list_links: 
    url=i.get_attribute('href') 
    print url 
    d.get(url) 

它瀏覽到第一個鏈接https://www.w3schools.com/ successfully.Then它說:

Traceback (most recent call last): 
File "web_nav.py", line 20, in <module> 
url=i.get_attribute('href') 
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 141, in get_attribute 
resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name}) 
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 493, in _execute 
return self._parent.execute(command, params) 
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 256, in execute 
self.error_handler.check_response(response) 
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response 
raise exception_class(message, screen, stacktrace) 
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document 
(Session info: chrome=59.0.3071.115) 
(Driver info: chromedriver=2.30.477691 
(6ee44a7247c639c0703f291d320bdf05c1531b57),platform=Linux 4.4.0-31- 
generic x86_64) 

我在這裏使用Ubuntu 14.04,語言Python和我使用硒網絡驅動程序

回答

1

首先獲得所有的URL,然後導航到他們

d = webdriver.Chrome() 
d.get("http://localhost:8080") 
list_links = d.find_elements_by_tag_name('a') 
urls = []  
for i in list_links: 
    urls.append(i.get_attribute('href')) 
for url in urls: 
    d.get(url) 

你可以用函數

def get_link_urls(url,driver): 
    driver.get(url) 
    urls = [] 
    for link in d.find_elements_by_tag_name('a'): 
     urls.append(link.get_attribute('href')) 
    return urls 

urls = get_link_urls("http://localhost:8080") 
sub_urls = [] 
for url in urls: 
    sub_urls.extend(get_link_urls(url)) 
簡化這個
+0

您保存了我的很多工作,謝謝您。但是,此處僅導航到第一頁中的鏈接。是不是這樣?有沒有辦法導航子頁面鏈接到一個特定的深度.. – Kit

+0

例如:在這裏首先我先導航https://www.w3schools.com/ ..我需要通過鏈接在這個頁面內給定深度 – Kit

+0

我需要擴展這段代碼,以便在導航時保存動態html頁面。請幫助我解決這個問題 – Kit