2017-05-28 112 views
0

至於你們現在可能知道的一些情況,似乎Yahoo!金融已經停止了其股票市場數據的API。雖然我知道fix-yahoo-finance解決方案的存在,但我試圖通過直接從雅虎的歷史數據中獲取更穩定的解決方案。用Python擷取Yahoo Finance的歷史數據

因此,這裏是我的時刻:

import requests 
from bs4 import BeautifulSoup 

page = requests.get("https://finance.yahoo.com/quote/AAPL/history?period1=345423600&period2=1495922400&interval=1d&filter=history&frequency=1d") 
soup = BeautifulSoup(page.content, 'html.parser') 
soup 
print(soup.prettify()) 

要想從雅虎表中的數據我可以這樣做:

c=soup.find_all('tbody') 
print(c) 

我的問題是,我該如何把「C」變成更好的數據框?謝謝!

+0

你知道,大熊貓可以導入雅虎財務數據(觀看大熊貓版本!),它給你一個很好的數據幀? – Stein

+0

完全知道這一點,但我的工作依賴於此代碼,我知道,雅虎有合法性來防止熊貓在任何時候下載數據,就像他們停止API一樣,這就是爲什麼我正在尋找更安全的解決方案 – WhelanG

回答

0

我寫這個來直接從下載csv鏈接獲取YF的歷史數據。它需要提出兩個請求,一個獲取cookie和碎屑,另一個獲取數據。它返回一個大熊貓據幀

import re 
from io import StringIO 
from datetime import datetime, timedelta 

import requests 
import pandas as pd 


class YahooFinanceHistory: 
    timeout = 2 
    crumb_link = 'https://finance.yahoo.com/quote/{0}/history?p={0}' 
    crumble_regex = r'CrumbStore":{"crumb":"(.*?)"}' 
    quote_link = 'https://query1.finance.yahoo.com/v7/finance/download/{quote}?period1={dfrom}&period2={dto}&interval=1d&events=history&crumb={crumb}' 

    def __init__(self, symbol, days_back=7): 
     self.symbol = symbol 
     self.session = requests.Session() 
     self.dt = timedelta(days=days_back) 

    def get_crumb(self): 
     response = self.session.get(self.crumb_link.format(self.symbol), timeout=self.timeout) 
     response.raise_for_status() 
     match = re.search(self.crumble_regex, response.text) 
     if not match: 
      raise ValueError('Could not get crumb from Yahoo Finance') 
     else: 
      self.crumb = match.group(1) 

    def get_quote(self): 
     if not hasattr(self, 'crumb') or len(self.session.cookies) == 0: 
      self.get_crumb() 
     now = datetime.utcnow() 
     dateto = int(now.timestamp()) 
     datefrom = int((now - self.dt).timestamp()) 
     url = self.quote_link.format(quote=self.symbol, dfrom=datefrom, dto=dateto, crumb=self.crumb) 
     response = self.session.get(url) 
     response.raise_for_status() 
     return pd.read_csv(StringIO(response.text), parse_dates=['Date']) 

您可以使用它像這樣:

df = YahooFinanceHistory('AAPL', days_back=30).get_quote()