2017-07-03 69 views
0
import requests 
from bs4 import BeautifulSoup 
import pandas as pd 
import matplotlib.pyplot as plt 

plt.style.use('ggplot') 


url = "https://www.google.com/finance/historical?cid=207437&startdate=Jan%201%2C%201971&enddate=Jul%201%2C%202017&start={0}&num=30" 
how_many_pages=3 
start=0 

for i in range(how_many_pages): 
    new_url = url.format(start) 
    page = requests.get(new_url) 
    soup = BeautifulSoup(page.content, "lxml") 
    table = soup.find_all('table', class_='gf-table historical_price')[0] 

    columns_header = [th.getText() for th in table.findAll('tr')[0].findAll('th')] 
    data_rows=table.findAll('tr')[1:] 
    data=[[td.getText() for td in data_rows[i].findAll(['td'])] for i in range(len(data_rows))] 

    if start == 0: 
     final_df = pd.DataFrame(data, columns=columns_header) 
    else: 
     df = pd.DataFrame(data, columns=columns_header) 
     final_df = pd.concat([final_df, df],axis=0) 
    start += 30 
    final_df.to_csv('nse_data.csv', sep='\t', encoding='utf-8') 


final_df.columns = ['Date'] 
final_df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d', utc=True) 


df.plot(x='Date', y='Close') 


plt.savefig('foo.png') 

的數據下載的是按以下格式ValueError異常:長度失配:預期軸具有6個元素,新值具有1個元件

"Date 
" "Open 
" "High 
" "Low 
" "Close 
" "Volume 
" 
0 "Jun 30, 2017 
" "9,478.50 
" "9,535.80 
" "9,448.75 
" "9,520.90 
" "- 
" 
1 "Jun 29, 2017 
" "9,522.95 
" "9,575.80 
" "9,493.80 
" "9,504.10 
" "- 

暫時我只要繪製Date(X上y軸)對Close(在Y軸)

但是我得到的錯誤

ValueError: Length mismatch: Expected axis has 6 elements, new values have 1 elements

回答

0
  • 您的標題和數據包含換行符。 print(final_df.columns)回報:

    Index(['Date\n', 'Open\n', 'High\n', 'Low\n', 'Close\n', 'Volume\n'], dtype='object') 
    

    使用rstrip擺脫他們的:

    columns_header = [th.getText().rstrip() for th in table.findAll('tr')[0].findAll('th')] 
    

    data = [[td.getText().rstrip() for td in data_rows[i].findAll(['td'])] for i in range(len(data_rows))] 
    
  • final_df.columns = ['Date']產生的錯誤。數據幀需要與其列數一樣多的標題。因此,在你的情況下,預計有6個元素的列表。我不確定你想在這裏做什麼,我想你可以簡單地刪除這一行。

  • 您爲日期分析指定的格式與您的數據['Apr 4, 2017', 'Apr 5, 2017', 'Apr 6, 2017',...]不匹配。有關format codes here的文檔。改用:

    final_df['Date'] = pd.to_datetime(df['Date'], format='%b %d, %Y') 
    
  • 您的數據轉換爲數值,因此您可以繪製出來:

    final_df['Close'] = [float(val.replace(',', '')) for val in final_df['Close']] 
    
  • 最後,你可以撥打:

    final_df.plot(x='Date', y='Close') 
    
相關問題