2016-07-27 192 views
0
import urllib2  
from selenium import webdriver  
from selenium.webdriver.common.by import By  
from selenium.webdriver.common.keys import Keys 

url = ("http://www.justdial.com/Mumbai/CA")  
driver = webdriver.Firefox()  
driver.get(url) 

driver 

elements = driver.find_elements_by_xpath('//div[@class="col-md-12 col-xs-12 padding0"]') 

for e in elements:  
    print e.text  
url = driver.current_url  
company_name = driver.find_element_by_xpath('//span[@class="jcn"]').text 

contact_number = driver.find_element_by_xpath('//p[@class="contact_info"]').text  
address = driver.find_element_by_xpath('//p[@class="adress_info"]').text  
address_info = driver.find_element_by_xpath('//p[@class="address-info adinfoex"]').text 

estd = driver.find_element_by_xpath('//li[@class="fr"]').text  
ratings = driver.find_element_by_xpath('//li[@class="last"]').text 

tf = 'textfile.csv'  
f2 = open(tf, 'a+') 

f2.write(', '.join([data.encode('utf-8') for data in [company_name]]) + ',')  
f2.write(', '.join([data.encode('utf-8') for data in [contact_number]]) + ',')  
f2.write(', '.join([data.encode('utf-8') for data in [address]]) + ',')  
f2.write(', '.join([data.encode('utf-8') for data in [address_info]]) + ',')  
f2.write(', '.join([data.encode('utf-8') for data in [estd_ratings]]) + '\n') 

f2.close() 

回答

0

下面應該讓你開始。重要的是要確保你的xpath條目準確地選擇你所需要的。 Python的csv module可以用來自動獲取到逗號分隔的條目列表轉換沒有你需要添加自己的逗​​號:

import csv 
import urllib2  
from selenium import webdriver  
from selenium.webdriver.common.by import By  
from selenium.webdriver.common.keys import Keys 

def get_elements_by_xpath(driver, xpath): 
    return [entry.text for entry in driver.find_elements_by_xpath(xpath)] 


url = ("http://www.justdial.com/Mumbai/CA")  
driver = webdriver.Firefox()  
driver.get(url) 

search_entries = [ 
    ("CompanyName",  "//span[@class='jcn']"), 
    ("ContactNumber", "//p[@class='contact-info']/span/a"), 
    ("Address",   "//span[@class='desk-add jaddt']"), 
    ("AddressInfo",  "//p[@class='address-info adinfoex']"), 
    ("Estd",   "//li[@class='fr']"), 
    ("Ratings",   "//li[@class='last']/a/span[@class='rt_count']")] 

with open('textfile.csv', 'wb') as f_output: 
    csv_output = csv.writer(f_output) 

    # Write header 
    csv_output.writerow([name for name, xpath in search_entries]) 
    entries = [] 

    for name, xpath in search_entries: 
     entries.append(get_elements_by_xpath(driver, xpath)) 

    csv_output.writerows(zip(*entries)) 

這會給你一個CSV文件看起來像:

CompanyName,ContactNumber,Address,AddressInfo,Estd,Ratings 
Bansal Investment & Consult...,+(91)-22-38578062,Manpada-thane West.. | more..,"CA, Tax Consultants, more...",Estd.in 2003,27 Ratings 
G.Kedia & Associates,+(91)-22-38555914,"Station Road, Thane We.. | more..","CA, Company Registration Consultants, more...",Estd.in 2010,17 Ratings 
Tarun Shah & Associates,+(91)-22-38552775,"Mogra Lane, Andheri Ea.. | more..","CA, Income Tax Consultants, more...",Estd.in 2000,12 Ratings 
Hemant Shah And Associates LLP,+(91)-22-38588696,"Azad Road, Andheri Eas.. | more..","CA, Company Secretary, more...",Estd.in 1988,65 Ratings 

的循環請求每個xpath搜索併爲每個搜索創建一個數組條目。每個搜索都會返回一個匹配數組,以便最終生成一組條目數組。

這需要寫入CSV文件。 entries按列順序排列,並且需要按行順序寫入CSV文件。爲此,zip(*entries)用於轉換爲行順序。由於整個陣列現在按正確順序排列,因此可以使用一次調用writerows來一次寫入整個文件。

使用Python的CSV庫的額外好處是,如果任何字段包含逗號,它將自動在該字段周圍添加引號以確保Excel不會將其解釋爲另一列。請注意,當Excel將嘗試猜測時,您可能需要更改默認的單元格類型格式。