2015-12-22 145 views
0

我想寫一個腳本,用Python和BeautifulSoup刮一個網站,然後將數據寫入和excel工作表。Python BeautifulSoup刮數據寫入Excel「NotImplementedError」

它的工作直到寫作部分,然後我得到NotImplementedError?我查了一下,然後用TRY:和Pass:blocks ....將代碼的寫入部分包圍起來。它解決了Python解釋器控制檯窗口中的錯誤,但是我的Excel表單是空白的。

這是我到目前爲止有:

import requests, openpyxl 
from bs4 import BeautifulSoup 

wb = openpyxl.Workbook('RDWM_CRM.xls') 
wb.create_sheet('Phone') 
sheet = wb.get_sheet_by_name('Phone') 

# nav to webpage I want to scrape 
url = "http://www.yellowpages.com/search?search_terms=roofing%20company&geo_location_terms=New%20York%2C%20NY&page=2" 
r = requests.get(url) 
soup = BeautifulSoup(r.content) 

# for loop finds info then prints 
for div in soup.find_all("div", {"class": "info"}): 
    print (div.contents[0].text) 
    print (div.contents[1].text)    

# for loop finds info then writes to excel cells 
for div in soup.find_all("div", {"class": "info"}): 
    sheet['A1'] = div.contents[0].text 
    sheet['B1'] = div.contents[1].text 

wb.save('RDWM_CRM.xls') 

就像我上面說的,即使沒有任何錯誤,我得到一個空白Excel工作表。這是在控制檯中看到的回溯:

Neptune Construction 
Serving the New York Area.(866) 664-1759 
>>> # for loop finds info then writes to excel cells 
... for div in soup.find_all("div", {"class": "info"}): 
...  sheet['A1'] = div.contents[0].text 
...  sheet['B1'] = div.contents[1].text 
... 
Traceback (most recent call last): 
File "<stdin>", line 3, in <module> 
File "C:\Users\Josh\AppData\Local\Programs\Python\Python35\lib\site-packages\openpyxl\writer\write_only.py", line 223, in removed_method 
raise NotImplementedError 
NotImplementedError 
>>> wb.save('RDWM_CRM.xls') 

這是最後一塊數據以及錯誤。





感謝您的幫助!我仍然遇到excel工作表空白...這裏是我使用的代碼,沒有錯誤....只是一個空白的Excel表。它創建了名爲電話的新表,它只是空白...

import requests 
from bs4 import BeautifulSoup 
from openpyxl import Workbook 
url = "http://www.yellowpages.com/search?search_terms=roofing%20company&geo_location_terms=Seattle%2C%20WA&page=4" # nav to webpage I want to scrape 
r = requests.get(url) 
soup = BeautifulSoup(r.content) 

# create a dummy list of texts to write to excel file 
divs = [] 

wb = Workbook() # open new workbook, use load_workbook if existing 
ws = wb.create_sheet('Phone') 
for div in divs: 
    row = [div.contents[0].text, div.contents[1].text] # construct a row: shown only for example purposes 
    ws.append(row)   # could use ws.append(div) since each div is a list 

wb.save('RDWM_CRM.xlsx')  # save workbook, will overwrite if exists 

任何幫助表示讚賞!

+0

請包含回溯,錯誤是否發生在'wb.save'中? – memoselyk

+0

第二個for循環,應該打印。 – user3429394

+0

回溯(最近通話最後一個): 文件 「」,3號線,在 文件「C:\用戶\喬希\應用程序數據\本地\程序\ Python的\ Python35 \ LIB \站點包\ o penpyxl \作家\ write_only.py」,線路223,在removed_method 提高NotImplementedError NotImplementedError >>> wb.save( 'RDWM_CRM.xls') – user3429394

回答

2

如果我沒有完全理解你的問題,可以提前道歉,但是使用openpyxl似乎存在一些問題。

下面是如何編寫使用openpyxl工作表,可能會有所幫助的情況爲例:

from openpyxl import Workbook 

# create a dummy list of texts to write to excel file 
divs = [[chr(i)*8, chr(i+1)*8] for i in range(65, 75, 1)] 

wb = Workbook()    # open new workbook, use load_workbook if existing 
ws = wb.create_sheet(title="Example") 
for div in divs: 
    row = [div[0], div[1]] # construct a row: shown only for example purposes 
    ws.append(row)   # could use ws.append(div) since each div is a list 
wb.save('example.xlsx')  # save workbook, will overwrite if exists 

僞名單的div看起來是這樣的:

[['AAAAAAAA', 'BBBBBBBB'], 
['BBBBBBBB', 'CCCCCCCC'], 
['CCCCCCCC', 'DDDDDDDD'], 
['DDDDDDDD', 'EEEEEEEE'], 
['EEEEEEEE', 'FFFFFFFF'], 
['FFFFFFFF', 'GGGGGGGG'], 
['GGGGGGGG', 'HHHHHHHH'], 
['HHHHHHHH', 'IIIIIIII'], 
['IIIIIIII', 'JJJJJJJJ'], 
['JJJJJJJJ', 'KKKKKKKK']] 

和Excel文件「的例子。 XLSX」有這個工作表‘例子’:

A  B 
1 AAAAAAAA BBBBBBBB 
2 BBBBBBBB CCCCCCCC 
3 CCCCCCCC DDDDDDDD 
4 DDDDDDDD EEEEEEEE 
5 EEEEEEEE FFFFFFFF 
6 FFFFFFFF GGGGGGGG 
7 GGGGGGGG HHHHHHHH 
8 HHHHHHHH IIIIIIII 
9 IIIIIIII JJJJJJJJ 
10 JJJJJJJJ KKKKKKKK 

你會建造一排像這個:

row = [div.contents[0].text, div.contents[1].text] 

假設div.contents是正確的。希望這可以幫助。 PS。我正在使用openpyxl版本2.3.0

+0

謝謝你的幫助! – user3429394

+0

我仍然遇到excel表空白,這裏是我修改後的代碼: – user3429394

+0

你能複製我發佈的代碼,在你的系統上運行它並輸出excel文件example.xlsx? –