使用Selenium生成一個URL列表

我試圖用Selenium生成一個URL列表。我希望用戶瀏覽檢測過的瀏覽器並最終創建他訪問的URL列表。使用Selenium生成一個URL列表

我發現屬性「current_url」可以幫助做到這一點，但我沒有找到一種方法來知道用戶點擊了一個鏈接。

In [117]: from selenium import webdriver 

In [118]: browser = webdriver.Chrome() 

In [119]: browser.get("http://stackoverflow.com") 

--> here, I click on the "Questions" link. 

In [120]: browser.current_url 

Out[120]: 'http://stackoverflow.com/questions' 

--> here, I click on the "Jobs" link. 

In [121]: browser.current_url 

Out[121]: 'http://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab'

任何提示讚賞！

謝謝

來源

2017-03-07 reike

是不是真的要監視的用戶在硒做一個正式的方式。你唯一能做的就是啓動驅動程序，然後運行一個不斷檢查driver.current_url的循環。但是，我不知道退出這個循環的最佳方法是什麼，因爲我不知道你的用法是什麼。也許你可以試試：

from selenium import webdriver 


urls = [] 

driver = webdriver.Firefox() 

current = 'http://www.google.com' 
driver.get('http://www.google.com') 
while True: 
    if driver.current_url != current: 
     current = driver.current_url 

     # if you want to capture every URL, including duplicates: 
     urls.append(current) 

     # or if you only want to capture unique URLs: 
     if current not in urls: 
      urls.append(current)

如果你沒有對如何結束這個循環什麼想法，我建議要麼將用戶導航到一個URL，將打破循環，如http://www.endseleniumcheck.com，並將其添加代碼如下：

from selenium import webdriver 


urls = [] 

driver = webdriver.Firefox() 

current = 'http://www.google.com' 
driver.get('http://www.google.com') 
while True: 
    if driver.current_url == 'http://www.endseleniumcheck.com': 
     break 

    if driver.current_url != current: 
     current = driver.current_url 

     # if you want to capture every URL, including duplicates: 
     urls.append(current) 

     # or if you only want to capture unique URLs: 
     if current not in urls: 
      urls.append(current)

或者，如果你想得到狡猾，你可以在用戶退出瀏覽器時終止循環。您可以通過與psutil庫（pip install psutil）監測的進程ID做到這一點：

from selenium import webdriver 
import psutil 


urls = [] 

driver = webdriver.Firefox() 
pid = driver.binary.process.pid 

current = 'http://www.google.com' 
driver.get('http://www.google.com') 
while True: 
    if pid not in psutil.pids(): 
     break 

    if driver.current_url != current: 
     current = driver.current_url 

     # if you want to capture every URL, including duplicates: 
     urls.append(current) 

     # or if you only want to capture unique URLs: 
     if current not in urls: 
      urls.append(current)

來源

2017-03-07 19:49:33 crookedleaf

非常感謝您！它會做的。就我個人而言，我最終使用了try/catch結構來處理瀏覽器出口（拋出異常）。這不是乾淨的，但足夠我所要做的。 – reike

使用Selenium生成一個URL列表

回答

相關問題