我基本上使用的urllib打開在參數列表中的每個 股票所需的網頁,閱讀該網頁的HTML代碼 的全部內容。然後,我正在切片,以便找到我正在尋找的報價 。
下面是Beautiful Soup
和requests
,落實:
import requests
from bs4 import BeautifulSoup
def get_quotes(*stocks):
quotelist = {}
base = 'https://finance.google.com/finance?q={}'
for stock in stocks:
url = base.format(stock)
soup = BeautifulSoup(requests.get(url).text, 'html.parser')
quote = soup.find('span', attrs={'class' : 'pr'}).get_text().strip()
quotelist[stock] = float(quote)
return quotelist
print(get_quotes('AAPL', 'GE', 'C'))
{'AAPL': 160.86, 'GE': 23.91, 'C': 68.79}
# 1 loop, best of 3: 1.31 s per loop
正如你可能想看看multithreading或grequests的評論中提到。
使用grequests
進行異步HTTP請求:
def get_quotes(*stocks):
quotelist = {}
base = 'https://finance.google.com/finance?q={}'
rs = (grequests.get(u) for u in [base.format(stock) for stock in stocks])
rs = grequests.map(rs)
for r, stock in zip(rs, stocks):
soup = BeautifulSoup(r.text, 'html.parser')
quote = soup.find('span', attrs={'class' : 'pr'}).get_text().strip()
quotelist[stock] = float(quote)
return quotelist
%%timeit
get_quotes('AAPL', 'BAC', 'MMM', 'ATVI',
'PPG', 'MS', 'GOOGL', 'RRC')
1 loop, best of 3: 2.81 s per loop
更新:這裏是從塵土飛揚菲利普斯Python 3的面向對象的編程使用修改後的版本內置threading
模塊。
from threading import Thread
from bs4 import BeautifulSoup
import numpy as np
import requests
class QuoteGetter(Thread):
def __init__(self, ticker):
super().__init__()
self.ticker = ticker
def run(self):
base = 'https://finance.google.com/finance?q={}'
response = requests.get(base.format(self.ticker))
soup = BeautifulSoup(response.text, 'html.parser')
try:
self.quote = float(soup.find('span', attrs={'class':'pr'})
.get_text()
.strip()
.replace(',', ''))
except AttributeError:
self.quote = np.nan
def get_quotes(tickers):
threads = [QuoteGetter(t) for t in tickers]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
quotes = dict(zip(tickers, [thread.quote for thread in threads]))
return quotes
tickers = [
'A', 'AAL', 'AAP', 'AAPL', 'ABBV', 'ABC', 'ABT', 'ACN', 'ADBE', 'ADI',
'ADM', 'ADP', 'ADS', 'ADSK', 'AEE', 'AEP', 'AES', 'AET', 'AFL', 'AGN',
'AIG', 'AIV', 'AIZ', 'AJG', 'AKAM', 'ALB', 'ALGN', 'ALK', 'ALL', 'ALLE',
]
%time get_quotes(tickers)
# Wall time: 1.53 s
退房[美麗的湯(https://www.crummy.com/software/BeautifulSoup/bs4/doc/) – Mako212
我會用'requests'包工作,而不是'urllib'直接。我會認爲上面的代碼運行得非常快,不是嗎?當你有很多請求時,你可以看看多線程。應該很好地根據代碼加快速度。 – Andras
哦,是的,並檢查美麗的湯或lxml,如上所述。 – Andras