2016-12-30 68 views
2

我使用以下read_html()調用來讀表(後面的付費牆):熊貓read_html()缺少列

df = pd.read_html('http://markets.ft.com/data/equities/tearsheet/' + 
       'financials?s=BAG:LSE&subView=BalanceSheet&periodType=a')[0] 

它比它缺少最後兩列解析精細,其他。我使用的是最新版本的Anaconda(Python 3.5,pandas 0.18.1,html5lib,BeautifulSoup4)。

輸出的開始是這樣的:

   Fiscal data as of Jan 30 2016 2016 2015 2014 
             ASSETS NaN  NaN  NaN 
      Cash And Short Term Investments 6.80  25  13 
         Total Receivables, Net 50  49  45 
          Total Inventory 16  17  16 

(太大,無法顯示所有)

的HTML的開始是這樣的:

<table class="mod-ui-table"> 
      <thead> 
       <tr> 
        <th class="mod-ui-table__header--text">Fiscal data as of Jan 30 2016</th> 
        <th>2016</th> 
        <th class="mod-ui-hide-xsmall">2015</th> 
        <th class="mod-ui-hide-xsmall">2014</th> 
        <th class="mod-ui-hide-xsmall">2013</th> 
        <th class="mod-ui-hide-xsmall">2012</th> 
       </tr> 
      </thead> 
      <tr class="mod-ui-table__row--section-header"> 
       <th colspan="6">ASSETS</th> 
      </tr> 
      <tr class="mod-ui-table__row--striped"> 
       <th class="mod-ui-table__header--row-label">Cash And Short Term Investments</th> 
       <td>6.80</td> 
       <td class="mod-ui-hide-xsmall">25</td> 
       <td class="mod-ui-hide-xsmall">13</td> 
       <td class="mod-ui-hide-xsmall">0.91</td> 
       <td class="mod-ui-hide-xsmall">8.29</td> 
      </tr> 
      <tr> 
       <th class="mod-ui-table__header--row-label">Total Receivables, Net</th> 
       <td>50</td> 
       <td class="mod-ui-hide-xsmall">49</td> 
       <td class="mod-ui-hide-xsmall">45</td> 
       <td class="mod-ui-hide-xsmall">42</td> 
       <td class="mod-ui-hide-xsmall">37</td> 
      </tr> 

結束的HTML看起來像這樣:

<tr class="mod-ui-table__row--highlight"> 
        <th class="mod-ui-table__header--row-label">Total liabilities &amp; shareholders&#39; equity</th> 
        <td>269</td> 
        <td class="mod-ui-hide-xsmall">255</td> 
        <td class="mod-ui-hide-xsmall">227</td> 
        <td class="mod-ui-hide-xsmall">215</td> 
        <td class="mod-ui-hide-xsmall">196</td> 
       </tr> 
       <tr class="mod-ui-table__row--striped"> 
        <th class="mod-ui-table__header--row-label">Total common shares outstanding</th> 
        <td>117</td> 
        <td class="mod-ui-hide-xsmall">117</td> 
        <td class="mod-ui-hide-xsmall">117</td> 
        <td class="mod-ui-hide-xsmall">117</td> 
        <td class="mod-ui-hide-xsmall">117</td> 
       </tr> 
       <tr> 
        <th class="mod-ui-table__header--row-label">Treasury shares - common primary issue</th> 
        <td>0</td> 
        <td class="mod-ui-hide-xsmall">0</td> 
        <td class="mod-ui-hide-xsmall">0</td> 
        <td class="mod-ui-hide-xsmall">0</td> 
        <td class="mod-ui-hide-xsmall">--</td> 
       </tr> 
      </table> 

如果不是很明顯,可能會出現什麼問題,我會很感激一些關於如何開始逐步完成read_html()代碼以找出問題根源的提示。我現在是Python/pdb的新手。

+0

事實證明,如果您沒有登錄到FT網站,你只能獲得三年的數據。 – langbourne

回答

0

事實證明,如果您未登錄英國「金融時報」網站,則只能獲得三年的數據。

因此,我現在正在着手研究如何登錄FT網站(也許使用Twill)。

有一個相關的問題here