我想從許多不同的網站,包含JavaScript代碼(因此爲什麼我使用硒方法來獲取信息)的網絡刮取數據。 一切是偉大的工作,但是當我嘗試加載下一個網址,我得到一個很長的錯誤消息:如何在Python中編寫硒循環?
> Traceback (most recent call last):
File "C:/Python27/air17.py", line 46, in <module>
scrape(urls)
File "C:/Python27/air17.py", line 28, in scrape
browser.get(url)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 268, in get
self.execute(Command.GET, {'url': url})
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 254, in execute
response = self.command_executor.execute(driver_command, params)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 464, in execute
return self._request(command_info[0], url, body=data)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 487, in _request
self._conn.request(method, parsed_url.path, body, headers)
File "C:\Python27\lib\httplib.py", line 1042, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 1082, in _send_request
self.endheaders(body)
File "C:\Python27\lib\httplib.py", line 1038, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 882, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 844, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 821, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 575, in create_connection
raise err
error: [Errno 10061]
從第一個網站的數據是CSV文件,但是當代碼試圖打開下一個網站凍結,我得到這個錯誤信息。 我在做什麼錯?
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import urllib2
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
from bs4 import BeautifulSoup
import MySQLdb
import re
import contextlib
import selenium.webdriver.support.ui as ui
filename=r'output.csv'
resultcsv=open(filename,"wb")
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')
output.writerow(['TIME','FLIGHT','FROM','AIRLANE','AIRCRAFT','IHAVETODELETETHIS','STATUS'])
def scrape(urls):
browser = webdriver.Firefox()
for url in urls:
browser.get(url)
html = browser.page_source
soup=BeautifulSoup(html,"html.parser")
table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })
datatable=[]
for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
temp_data = []
for data in record.find_all("td"):
temp_data.append(data.text.encode('latin-1'))
datatable.append(temp_data)
output.writerows(datatable)
resultcsv.close()
time.sleep(10)
browser.quit()
urls = ["https://www.flightradar24.com/data/airports/bud/arrivals", "https://www.flightradar24.com/data/airports/fco/arrivals"]
scrape(urls)
這些有太多是外循環(一個片以內): resultcsv.close() browser.quit() – CrazyElf
這是解決方案!謝謝,它正在運行! :) – tardos93