2016-11-14 40 views
1

我想得到季度結果Profit Attb。從http://klse.i3investor.com/servlets/stk/fin/8982.jspPython:需要在表格中抓取數據,並使用lxml進行擴展

<b>Quarter Result:</b><br/> <table cellpadding="0" cellspacing="0" border="0" class="nc" width="100%"> <tr> <th class="left">F.Y.</th> <th class="left">Quarter</th> <th class="right">Revenue ('000)</th> <th class="right">Profit before Tax ('000)</th> <th class="right">Profit ('000)</th> <th class="right">Profit Attb. to SH ('000)</th> <th class="right">EPS (Cent)</th> <th class="right">DPS (Cent)</th> <th class="right">NAPS</th> <th class="center" width="33"></th> </tr> <tr> <td class="left" valign="top" nowrap="nowrap"> 2016-12-31 </td> <td class="left" valign="top" nowrap="nowrap"> 2016-09-30 </td> <td class="right" valign="top" nowrap="nowrap"> 79,082 </td> <td class="right" valign="top" nowrap="nowrap"> 14,376 </td> <td class="right" valign="top" nowrap="nowrap"> 10,692 </td> <td class="right" valign="top" nowrap="nowrap"> 10,398 </td> <td class="right" valign="top" nowrap="nowrap"> 3.37 </td> <td class="right" valign="top" nowrap="nowrap"> 0.00 </td> <td class="right" valign="top" nowrap="nowrap"> 1.5100 </td> <td class="center" valign="top" nowrap="nowrap"> <a href="" onclick="viewFinancialSource('62459');return false;" title="View Source"> <img class="sp view16" src="http://cdn1.i3investor.com/cm/icon/trans16.gif" width="16px;" alt="View Source"/> </a> 
<span class="hide" id="financialSourceTitle62459"> Quarter: 2016-09-30 </span> <span class="hide" id="financialSourceDetail62459"> <p> <a target="_blank" href ="/servlets/staticfile/290836.jsp"> <img src="http://cdn1.i3investor.com/cm/icon/file-download-small.png" width="16px" height="16px" alt="3rd Q2016_CGB.PDF"/> 3rd Q2016_CGB.PDF </a> </p> </span> </td> </tr> <tr> <td class="left" valign="top" nowrap="nowrap"> 2016-12-31 </td> <td class="left" valign="top" nowrap="nowrap"> 2016-06-30 </td> <td class="right" valign="top" nowrap="nowrap"> 51,277 </td> <td class="right" valign="top" nowrap="nowrap"> 7,050 </td> <td class="right" valign="top" nowrap="nowrap"> 5,364 </td> <td class="right" valign="top" nowrap="nowrap"> 5,068 </td> <td class="right" valign="top" nowrap="nowrap"> 1.64 </td> <td class="right" valign="top" nowrap="nowrap"> 0.00 </td> <td class="right" valign="top" nowrap="nowrap"> 1.4800 </td> <td class="center" valign="top" nowrap="nowrap"> <a href="" onclick="viewFinancialSource('56288');return false;" title="View Source"> <img class="sp view16" src="http://cdn1.i3investor.com/cm/icon/trans16.gif" width="16px;" alt="View Source"/> </a> 

例SH('000),有年度業績表,季度業績表,我只是想從過去兩個季度的利潤的attB數據。到SH('000),這是10,398和5,068 enter image description here

但是,該表每季度都在擴大。我想有一個強大的方法來檢索數據使用lxml,xpath或cssselect。因此,當下一季度的數據到來時,我的代碼仍然有效。

from lxml import html 
import requests 

page = requests.get('http://klse.i3investor.com/servlets/stk/fin/8982.jsp') 
tree = html.fromstring(page.content) 
output = tree.xpath('//table[contains(@class,"nc")]/text()') 

但返回坯料[ '', '', '', '']

+1

你到目前爲止試圖做些什麼?請發佈您的代碼。 – James

+0

@詹姆斯,添加無法返回結果慾望的代碼 – vindex

回答

0

嘗試使用索引位置()

import requests 
from lxml import html 

response = requests.get("http://klse.i3investor.com/servlets/stk/fin/8982.jsp") 
tree = html.fromstring(response.content) 

profit_attb_to_sh = [x.strip() for x in tree.xpath('//table[2]/tr[position()>1 and position()<=3]/td[@class="right"][4]/text()')] 

print '\n'.join(profit_attb_to_sh) 

表[2]表示: 您將從DOM獲得第二個表格。

TR [位置()> 1和位置()< = 3]指:你將得到每TR在該索引範圍(2和3)。排除第一個,因爲它包含標題。

TD [@類= 「右」] [4]意味着:你會從符合上述規則的TR的每一個獲得第四TD

請記住,XPath的索引從1開始(而不是從0)

結果:

10,398 
5,068 

PS:

你可以改變:

profit_attb_to_sh = [x.strip() for x in tree.xpath('//table[2]/tr[position()>1 and position()<=3]/td[@class="right"][4]/text()')] 

用於:

profit_attb_to_sh = map(lambda x: x.strip(), tree.xpath('//table[2]/tr[position()>1 and position()<=3]/td[@class="right"][4]/text()'))