2017-07-26 68 views
1

我想從許多不同的網站,包含JavaScript代碼(因此爲什麼我使用硒方法來獲取信息)的網絡刮取數據。 一切是偉大的工作,但是當我嘗試加載下一個網址,我得到一個很長的錯誤消息:如何在Python中編寫硒循環?

> Traceback (most recent call last): 
    File "C:/Python27/air17.py", line 46, in <module> 
    scrape(urls) 
    File "C:/Python27/air17.py", line 28, in scrape 
    browser.get(url) 
    File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 268, in get 
    self.execute(Command.GET, {'url': url}) 
    File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 254, in execute 
    response = self.command_executor.execute(driver_command, params) 
    File "C:\Python27\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 464, in execute 
    return self._request(command_info[0], url, body=data) 
    File "C:\Python27\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 487, in _request 
    self._conn.request(method, parsed_url.path, body, headers) 
    File "C:\Python27\lib\httplib.py", line 1042, in request 
    self._send_request(method, url, body, headers) 
    File "C:\Python27\lib\httplib.py", line 1082, in _send_request 
    self.endheaders(body) 
    File "C:\Python27\lib\httplib.py", line 1038, in endheaders 
    self._send_output(message_body) 
    File "C:\Python27\lib\httplib.py", line 882, in _send_output 
    self.send(msg) 
    File "C:\Python27\lib\httplib.py", line 844, in send 
    self.connect() 
    File "C:\Python27\lib\httplib.py", line 821, in connect 
    self.timeout, self.source_address) 
    File "C:\Python27\lib\socket.py", line 575, in create_connection 
    raise err 
error: [Errno 10061] 

從第一個網站的數據是CSV文件,但是當代碼試圖打開下一個網站凍結,我得到這個錯誤信息。 我在做什麼錯?

from bs4 import BeautifulSoup 
from selenium import webdriver 
import time 
import urllib2 
import unicodecsv as csv 
import os 
import sys 
import io 
import time 
import datetime 
import pandas as pd 
from bs4 import BeautifulSoup 
import MySQLdb 
import re 
import contextlib 
import selenium.webdriver.support.ui as ui 

filename=r'output.csv' 

resultcsv=open(filename,"wb") 
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1') 
output.writerow(['TIME','FLIGHT','FROM','AIRLANE','AIRCRAFT','IHAVETODELETETHIS','STATUS']) 


def scrape(urls): 
    browser = webdriver.Firefox() 
    for url in urls: 
     browser.get(url) 
     html = browser.page_source 
     soup=BeautifulSoup(html,"html.parser") 
     table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" }) 
     datatable=[] 
     for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"): 
      temp_data = [] 
      for data in record.find_all("td"): 
       temp_data.append(data.text.encode('latin-1')) 
      datatable.append(temp_data) 

     output.writerows(datatable) 

     resultcsv.close() 
     time.sleep(10) 
     browser.quit() 

urls = ["https://www.flightradar24.com/data/airports/bud/arrivals", "https://www.flightradar24.com/data/airports/fco/arrivals"] 
scrape(urls) 
+1

這些有太多是外循環(一個片以內): resultcsv.close() browser.quit() – CrazyElf

+0

這是解決方案!謝謝,它正在運行! :) – tardos93

回答

4

不知道你有方法結束時browser.quit()是個好主意。按照Selenium doc

退出()

退出駕駛員和密切相關的每一個窗口。

我認爲一個browser.close()as documented here)將在循環中足夠。將browser.quit()保持在循環之外。

+0

我不覺得連循環內需要browser.close() – CrazyElf

+2

的確,退出正在查殺webdriver –

+1

@CrazyElf關閉當前頁面比較乾淨,它會釋放內存。 –