3
我想從我的項目的web上抓取一些股票相關數據。我遇到了一些問題。
問題1:
我試圖從這個網站http://sharesansar.com/c/today-share-price.html
它的工作搶了表,但這些列未在order.For如一把抓住:列「公司名稱」具有「開放價格」的值。我該如何解決這個問題?
問題2:
我還試圖從'價格歷史'選項卡下從http://merolagani.com/CompanyDetail.aspx?symbol=ADBL獲取公司特定的數據。
這個時間,同時抓住了表data.The錯誤我得到了我得到了一個錯誤是:如下圖所示從網站抓取表格數據時出錯
代碼:
import logging
import requests
from bs4 import BeautifulSoup
import pandas
module_logger = logging.getLogger('mainApp.dataGrabber')
class DataGrabberTable:
''' Grabs the table data from a certain url. '''
def __init__(self, url, csvfilename, columnName=[], tableclass=None):
module_logger.info("Inside 'DataGrabberTable' constructor.")
self.pgurl = url
self.tableclass = tableclass
self.csvfile = csvfilename
self.columnName = columnName
self.tableattrs = {'class':tableclass} #to be passed in find()
module_logger.info("Done.")
def run(self):
'''Call this to run the datagrabber. Returns 1 if error occurs.'''
module_logger.info("Inside 'DataGrabberTable.run()'.")
try:
self.rawpgdata = (requests.get(self.pgurl, timeout=5)).text
except Exception as e:
module_logger.warning('Error occured: {0}'.format(e))
return 1
#module_logger.info('Headers from the server:\n {0}'.format(self.rawpgdata.headers))
soup = BeautifulSoup(self.rawpgdata, 'lxml')
module_logger.info('Connected and parsed the data.')
table = soup.find('table',attrs = self.tableattrs)
rows = table.find_all('tr')[1:]
#initializing a dict in a format below
# data = {'col1' : [...], 'col2' : [...], }
#col1 and col2 are from columnName list
self.data = {}
self.data = dict(zip(self.columnName, [list() for i in range(len(self.columnName))]))
module_logger.info('Inside for loop.')
for row in rows:
cols = row.find_all('td')
index = 0
for key in self.data:
if index > len(cols): break
self.data[key].append(cols[index].get_text())
index += 1
module_logger.info('Completed the for loop.')
self.dataframe = pandas.DataFrame(self.data) #make pandas dataframe
module_logger.info('writing to file {0}'.format(self.csvfile))
self.dataframe.to_csv(self.csvfile)
module_logger.info('written to file {0}'.format(self.csvfile))
module_logger.info("Done.")
return 0
def getData(self):
""""Returns 'data' dictionary."""
return self.data
# Usage example
def main():
url = "http://sharesansar.com/c/today-share-price.html"
classname = "table"
fname = "data/sharesansardata.csv"
cols = [str(i) for i in range(18)] #make a list of columns
'''cols = [
'S.No', 'Company Name', 'Symbol', 'Open price', 'Max price',
'Min price','Closing price', 'Volume', 'Previous closing',
'Turnover','Difference',
'Diff percent', 'Range', 'Range percent', '90 days', '180 days',
'360 days', '52 weeks high', '52 weeks low']'''
d = DataGrabberTable(url, fname, cols, classname)
if d.run() is 1:
print('Data grabbing failed!')
else:
print('Data grabbing done.')
if __name__ == '__main__':
main()
幾個建議將help.Thank你!
我仍然得到列不匹配(問題1)。 – Kishor
@Kishor看我的編輯。 –
它工作!非常感謝你。問題2呢?你有沒有發現任何錯誤? – Kishor