2013-12-14 62 views
2

我試圖從這裏textarea的JavaScript代碼是我的代碼:如何使用lxml從textarea獲取javascript代碼?

def getCode(self,request): 
    #print "Extracting URL: " + request 
    opener = self.login(self.username,self.password) 
    html = etree.HTML(opener.open(request).read()) 

    textarea = html.xpath('//*[@id="codeText"]/text()')   
    for code in textarea: 
     return code 

這是我從試圖提取html代碼:

<textarea onclick="javascript: this.select();" id="codeText" style="height: 300px;width:500px;">   <!-- Clickon Affiliate code start here --> 
    <object type="application/x-shockwave-flash" data="http://banners.clickon.co.il/LOVELY2_banners/swf/JWFLZxzNxjclWGP.swf?url=http://track.clickon.co.il/click/Q8uTE8BXZz1pskj/JWFLZxzNxjclWGP/TsQ8uTE8BXZz1pskjtS" width="728" height="90"> 
    <param name="movie" value="http://banners.clickon.co.il/LOVELY2_banners/swf/JWFLZxzNxjclWGP.swf?url=http://track.clickon.co.il/click/Q8uTE8BXZz1pskj/JWFLZxzNxjclWGP/TsQ8uTE8BXZz1pskjtS" /> 
    <param name="scale" value="exactfit" /> 
    <param name="wmode" value="transparent" /> 
    </object> 
    <img alt="" style="visibility: hidden;" src="http://track.clickon.co.il/imp/Q8uTE8BXZz1pskj/JWFLZxzNxjclWGP/TsQ8uTE8BXZz1pskjtS" /> 

</textarea> 

如果textarea的僅包含鏈接或一個文本我的getCode函數工作得很好,但如果它包含Java腳本代碼,我不能提取它。 你能幫我嗎?

謝謝,

Yaniv。

+0

你想提取'的JavaScript:this.select();'? – falsetru

+0

不,我想提取textarea標籤內的所有內容。 – TheGoodGuy

+0

TheGoodGuy

回答

0

在代碼中,for循環返回太早;只返回第一個文本。

如果你想要所有的標籤和文本,請嘗試下面的內容。

import lxml.etree as etree 

htmlchunk = ''' 
<textarea onclick="javascript: this.select();" id="codeText" style="height: 300px;width:500px;">   <!-- Clickon Affiliate code start here --> 
     <object type="application/x-shockwave-flash" data="http://banners.clickon.co.il/LOVELY2_banners/swf/JWFLZxzNxjclWGP.swf?url=http://track.clickon.co.il/click/Q8uTE8BXZz1pskj/JWFLZxzNxjclWGP/TsQ8uTE8BXZz1pskjtS" width="728" height="90"> 
      <param name="movie" value="http://banners.clickon.co.il/LOVELY2_banners/swf/JWFLZxzNxjclWGP.swf?url=http://track.clickon.co.il/click/Q8uTE8BXZz1pskj/JWFLZxzNxjclWGP/TsQ8uTE8BXZz1pskjtS" /> 
      <param name="scale" value="exactfit" /> 
      <param name="wmode" value="transparent" /> 
     </object> 
     <img alt="" style="visibility: hidden;" src="http://track.clickon.co.il/imp/Q8uTE8BXZz1pskj/JWFLZxzNxjclWGP/TsQ8uTE8BXZz1pskjtS" /> 

    </textarea> 
''' 

html = etree.HTML(htmlchunk) 
textarea, = html.xpath('//*[@id="codeText"]') 
print(textarea.text + ''.join(etree.tostring(code) for code in textarea) + textarea.tail) 

輸出:

  <!-- Clickon Affiliate code start here --> 
     <object type="application/x-shockwave-flash" data="http://banners.clickon.co.il/LOVELY2_banners/swf/JWFLZxzNxjclWGP.swf?url=http://track.clickon.co.il/click/Q8uTE8BXZz1pskj/JWFLZxzNxjclWGP/TsQ8uTE8BXZz1pskjtS" width="728" height="90"> 
      <param name="movie" value="http://banners.clickon.co.il/LOVELY2_banners/swf/JWFLZxzNxjclWGP.swf?url=http://track.clickon.co.il/click/Q8uTE8BXZz1pskj/JWFLZxzNxjclWGP/TsQ8uTE8BXZz1pskjtS"/> 
      <param name="scale" value="exactfit"/> 
      <param name="wmode" value="transparent"/> 
     </object> 
     <img alt="" style="visibility: hidden;" src="http://track.clickon.co.il/imp/Q8uTE8BXZz1pskj/JWFLZxzNxjclWGP/TsQ8uTE8BXZz1pskjtS"/> 
相關問題