2017-01-15 32 views
0

我對python很陌生。儘量在做項目時儘量學習,以保持興趣水平。循環播放項目並保存爲.xlsx文件,僅使用網頁抓取保存最後一個值?

在下面的代碼中,我試圖從網站上刮取信息,並將所有的公司名稱和地址等存入excel文件。我想我需要定義如何爲每個迭代/公司分配excel行和列。我只是想知道如何去做。

import requests, os 
from bs4 import BeautifulSoup 
from openpyxl import Workbook 
from openpyxl import load_workbook 


url = "https://dir.indiamart.com/search.mp?ss=Power+Distribution+Transformers" 
r = requests.get(url) 

soup = BeautifulSoup(r.content) 

links = soup.find_all("a") 

for link in links: 
    print("<a href='%s'>%s</a>" % (link.get("href"), link.text)) 


g_data = soup.find_all("div", {"class": "nes"}) 

c = [] 
d = [] 
for item in g_data: 
    c.append(item.contents[3].text) 
    d.append(item.contents[1].text) 
    wb = load_workbook("Trial.xlsx") 
    ws1 = wb.get_sheet_by_name("Sheet1") 
    for i in c: 
     ws1["A2"] = i 
     wb.save("Trial.xlsx") 
     for x in d: 
      ws1["B2"] = x 
      wb.save("Trial.xlsx") 
+0

您一直覆蓋相同的單元格並保存該文件。 –

回答

1
import requests, bs4, re, csv 

url = 'https://dir.indiamart.com/search.mp?ss=Power+Distribution+Transformers' 
r = requests.get(url) 
soup = bs4.BeautifulSoup(r.text, 'lxml') 
blocks = soup.find_all('div', class_='lst') 

with open('output.csv', 'w', newline='') as f: 
    writer = csv.writer(f) 
    for b in blocks: 
     name = b.find(class_='cnm').get_text(strip=True) 
     addr = b.find(class_='clg').get_text(strip=True) 
     call = b.find(class_='ls_co phn').find(text=re.compile('\d+')).strip() 
     writer.writerow([name, addr, call]) 

出來:

"Padmavahini Transformers Private Limited, Coimbatore","Saravanampatti, CoimbatoreS. F. No. 353/1, Door No. 7/140, Ruby Matriculation School Road Keeranatham, Saravanampatti,Coimbatore-641035,Tamil Nadu",8071681548 
Guru Teg Bahadur Metal Works,"Shimlapuri, LudhianaNo. 1621, Street No. 4, Kwality Road, Near Kwality Chowk Shimlapuri,Ludhiana-141003,Punjab",8079452881 
Servokon Systems Ltd.,"Servokon House, New DelhiServokon House, C-13, Radhu Palace Road Opposite Scope Minar,New Delhi-110092,Delhi",8048077499 
Muskaan Power Infrastructure Ltd,"Dhandari Kalan, LudhianaSua Road, Industrial Area - C, Dhandari Kalan,Ludhiana-141014,Punjab",8079465606 
Tamilnadu Electricals,"Ambattur Industrial Estate, ChennaiNo. 95 - H, (SP) Ambattur Industrial Estate,Chennai-600058,Tamil Nadu",8046073728 
L. D. Power Transformers Pvt. Ltd.,"Sector 3, NoidaA-9, Sector- 59, Phase- 3,Noida-201301,Uttar Pradesh",8048111124 
Western Electricals (pvt.) Ltd.,"Kaman, PalgharS. No. 6, H. No. 1, (Part), Behind Shanti Metal, Near Sai Service, Vasai - Kaman Road Sativali Village, Taluka Vasai (E),Palghar-401208,Maharashtra",8071683491 

您可以使用CSV文件來存儲數據,然後在Excel中打開它。 CSV模塊易於使用。

+0

謝謝。這是完美的。現在我需要了解如何。 :)剛注意到一些東西,頁面可以選擇:顯示更多結果。然而,url不會改變。我想我可以簡單地有一個自動更改網址的功能,對此感到困惑。 – Sid

+0

@Sid csv部分或find()部分? –

+0

find()部分。它使用頁面inspect元素中的類,然後使用get_text(來自請求庫?)strip來刪除標籤?我想在此之後進入每家公司的網頁,並獲取聯繫我們的信息,因爲搜索頁面上的號碼或多或少沒有用處。真的很感謝幫助。 – Sid

相關問題