1
我可以打印出表格,但無法將其提取到.csv文件中。我是新來的每天都在拼搶和學習。將表格數據劃分爲.csv
如何將此數據屏幕抓取到CSV文件?
標準庫模塊 進口OS 進口SYS
# The wget module
import wget
# The BeautifulSoup module
from bs4 import BeautifulSoup
# The selenium module
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome() # if you want to use chrome, replace Firefox() with Chrome()
driver.get("https://www.arcountydata.com/county.asp?county=Benton") # load the web page
search_begin = driver.find_element_by_xpath("//*[@id='Assessor']/div/div[2]/a/i").click()
# for websites that need you to login to access the information
elem = driver.find_element_by_id("OwnerName") # Find the email input field of the login form
elem.send_keys("Roth Family Inc") # Send the users email
search_exeute = driver.find_element_by_xpath("//*[@id='Search']").click()
src = driver.page_source # gets the html source of the page
parser = BeautifulSoup(src,"lxml") # initialize the parser and parse the source "src"
table = parser.find("table", attrs={"class" : "table table-striped-yellow table-hover table-bordered table-condensed table-responsive"}) # A list of attributes that you want to check in a tag)
f = open('/output.csv', 'w')
parcel=""
owner=""
ptype=""
site_address=""
location=""
acres=""
summons =[]
#print table
list_of_rows = []
for row in table.findAll('tr')[1:]:
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace(" ", "")
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
print list_of_rows
driver.close() # closes the driver ?>
感謝故障。它可以幫助,因爲我現在能夠解析數據爲csv。然而,整個數據被解析爲1行..我該怎麼做,以便它在html表格中逐行解析... – user3691781
這應該工作,不要忘記導入csv。我會考慮添加Selenium Selenium。 –
非常感謝您先生..它幫助我學習今天新的東西..我會嘗試進一步擦洗代碼...對於一些數據是一個列被解析成多行.. – user3691781