2016-06-13 413 views
0
import urllib.request 
import re 
import csv 
import pandas as pd 
from bs4 import BeautifulSoup 

columns = [] 
data = [] 
f = open('companylist.csv') 
csv_f = csv.reader(f) 

for row in csv_f: 
    stocklist = row 
    print(stocklist) 

    for s in stocklist: 
     print('http://finance.yahoo.com/q?s='+s) 
     optionsUrl = urllib.request.urlopen('http://finance.yahoo.com/q?s='+s).read() 
     soup = BeautifulSoup(optionsUrl, "html.parser") 
     stocksymbol = ['Symbol:', s] 
     optionsTable = [stocksymbol]+[ 
     [x.text for x in y.parent.contents] 
     for y in soup.findAll('td', attrs={'class': 'yfnc_tabledata1','rtq_table': ''}) 
     ] 
     if not columns: 
      columns = [o[0] for o in optionsTable] #list(my_df.loc[0]) 
     data.append(o[1] for o in optionsTable) 




# create DataFrame from data 
df = pd.DataFrame(data, columns=columns) 
df.to_csv('test.csv', index=False) 

腳本工作正常,當我有大約200到300股票,但我的公司名單有大約6000個符號。暫停URL請求下載

  1. 有沒有一種方法可以下載數據塊,比如像200個股票一次,暫停一段時間,然後再次恢復下載?
  2. 出口是一次一個股票;我如何一次寫入200,並將下一批次追加到最初的批次(CSV)?

回答

0

要暫停每200個下載後,你可以 - 也當您使用pandas_datareader

import time 
for i, s in enumerate(stocklist): 
    if i % 200 == 0: 
     time.sleep(5) # in seconds 

要保存所有的數據到一個文件中(IIUC):

stocks = pd.DataFrame() # to collect all results 

在每迭代:

stocks = pd.concat([stocks, pd.DataFrame(data, columns=columns)) 

最後:

stocks.to_csv(path, index=False) 
1

爲此使用python_datareader

In [1]: import pandas_datareader.data as web 

In [2]: import datetime 

In [3]: start = datetime.datetime(2010, 1, 1) 

In [4]: end = datetime.datetime(2013, 1, 27) 

In [5]: f = web.DataReader("F", 'yahoo', start, end) 

In [6]: f.ix['2010-01-04'] 
Out[6]: 
Open    10.170000 
High    10.280000 
Low    10.050000 
Close    10.280000 
Volume  60855800.000000 
Adj Close   9.151094 
Name: 2010-01-04 00:00:00, dtype: float64 
+0

謝謝你的回答,它沒有我需要的所有標題。像div,市值.. – showri

+1

探索python_datareader,它可能只是你需要的數據。 – Merlin

+1

@showri,不要小看大熊貓!在'pandas/io/tests/test_data.py'文件中搜索'MarketCap'字樣 – MaxU

5

由於@Merlin建議你 - 仔細看看pandas_datareader模塊 - 您可以使用這個工具做了很多。這裏是一個小例子:

import csv 
import pandas_datareader.data as data 
from pandas_datareader.yahoo.quotes import _yahoo_codes 

stocklist = ['aapl','goog','fb','amzn','COP'] 

#http://www.jarloo.com/yahoo_finance/ 
#https://greenido.wordpress.com/2009/12/22/yahoo-finance-hidden-api/ 
_yahoo_codes.update({'Market Cap': 'j1'}) 
_yahoo_codes.update({'Div Yield': 'y'}) 
_yahoo_codes.update({'Bid': 'b'}) 
_yahoo_codes.update({'Ask': 'a'}) 
_yahoo_codes.update({'Prev Close': 'p'}) 
_yahoo_codes.update({'Open': 'o'}) 
_yahoo_codes.update({'1 yr Target Price': 't8'}) 
_yahoo_codes.update({'Earnings/Share': 'e'}) 
_yahoo_codes.update({"Day’s Range": 'm'}) 
_yahoo_codes.update({'52-week Range': 'w'}) 
_yahoo_codes.update({'Volume': 'v'}) 
_yahoo_codes.update({'Avg Daily Volume': 'a2'}) 
_yahoo_codes.update({'EPS Est Current Year': 'e7'}) 
_yahoo_codes.update({'EPS Est Next Quarter': 'e9'}) 

data.get_quote_yahoo(stocklist).to_csv('test.csv', index=False, quoting=csv.QUOTE_NONNUMERIC) 

輸出:我故意調換的結果集,因爲有太多的列到他們在這裏展示

In [2]: data.get_quote_yahoo(stocklist).transpose() 
Out[2]: 
           aapl    goog     fb     amzn    COP 
1 yr Target Price    124.93   924.83    142.87    800.92    51.23 
52-week Range   89.47 - 132.97 515.18 - 789.87 72.000 - 121.080 422.6400 - 731.5000 31.0500 - 64.1300 
Ask       97.61   718.75    114.58    716.73    44.04 
Avg Daily Volume   3.81601e+07  1.75567e+06  2.56467e+07   3.94018e+06  8.94779e+06 
Bid        97.6   718.57    114.57    716.65    44.03 
Day’s Range   97.10 - 99.12 716.51 - 725.44 113.310 - 115.480 711.1600 - 721.9900 43.8000 - 44.9600 
Div Yield      2.31    N/A    N/A     N/A    4.45 
EPS Est Current Year   8.28    33.6    3.55     5.39    -2.26 
EPS Est Next Quarter   1.66    8.38    0.87     0.96    -0.48 
Earnings/Share     8.98   24.58    1.635    2.426    -4.979 
Market Cap     534.65B   493.46B   327.71B    338.17B    54.53B 
Open       98.6   716.51    115    713.37    43.96 
PE        10.87   29.25    70.074    295.437    N/A 
Prev Close      98.83   719.41    116.62    717.91    44.51 
Volume     3.07086e+07   868366  2.70182e+07   2.42218e+06  5.20412e+06 
change_pct     -1.23%   -0.09%   -1.757%    -0.1644%   -1.0782% 
last       97.61   718.75   114.571    716.73   44.0301 
short_ratio      1.18    1.41    0.81     1.29    1.88 
time       3:15pm   3:15pm    3:15pm    3:15pm    3:15pm 

如果您需要更多的雅虎字段(代碼財經API),你可能要檢查以下鏈接:

http://www.jarloo.com/yahoo_finance/

https://greenido.wordpress.com/2009/12/22/yahoo-finance-hidden-api/

+0

您可以獲取歷史拆分數據和/或收益日期。這些股息被埋在網站上的歷史股息數據中。 – Merlin

+0

@ Merlin,是的,我認爲你可以得到歷史數據 - 看看....../site-packages/pandas_datareader/base.py' - class _BaseReader'。我不明白什麼是「收益日期」...... – MaxU

+1

我認爲get_quote_yahoo目前已被打破。 – Ahmed