2016-11-08 41 views
0

如何使用scrapy python從JavaScript內容獲取數據? javascript看起來像這樣使用scrapy python獲取數據從JavaScript到Python python

<script type="text/javascript"> 
    var ad_reply_url = "http://www2.mudah.my/ar/send/0?ca=3_s&id=49825097&l=0"; 
    var mcvl = ""; 
    var images = [ 
    'http://img.rnudah.com/images/13/133608119523265.jpg', 
    'http://img.rnudah.com/images/13/135608116569903.jpg', 
    'http://img.rnudah.com/images/13/137608113616541.jpg', 
    'http://img.rnudah.com/images/13/139608119186498.jpg' 
    ]; 
var thumbnails = [ 
    'http://img.rnudah.com/thumbs/13/133608119523265.jpg', 
    'http://img.rnudah.com/thumbs/13/135608116569903.jpg', 
    'http://img.rnudah.com/thumbs/13/137608113616541.jpg', 
    'http://img.rnudah.com/thumbs/13/139608119186498.jpg' 
];</script> 

所以,我想要的是。我想從var圖像的數據和打印像這樣的數據

['http://img.rnudah.com/images/13/133608119523265.jpg','http://img.rnudah.com/images/13/135608116569903.jpg', 'http://img.rnudah.com/images/13/137608113616541.jpg','http://img.rnudah.com/images/13/139608119186498.jpg' ]; 

任何人都可以幫助我嗎?謝謝。

回答

0

我沒有使用Scrapy Python,只是普通的Python。 這是很簡單的,但:

代碼示例:

import ast 
import re 

page_source = ''' 
<script type="text/javascript"> 
    var ad_reply_url = "http://www2.mudah.my/ar/send/0?ca=3_s&id=49825097&l=0"; 
    var mcvl = ""; 
    var images = [ 
    'http://img.rnudah.com/images/13/133608119523265.jpg', 
    'http://img.rnudah.com/images/13/135608116569903.jpg', 
    'http://img.rnudah.com/images/13/137608113616541.jpg', 
    'http://img.rnudah.com/images/13/139608119186498.jpg' 
    ]; 
var thumbnails = [ 
    'http://img.rnudah.com/thumbs/13/133608119523265.jpg', 
    'http://img.rnudah.com/thumbs/13/135608116569903.jpg', 
    'http://img.rnudah.com/thumbs/13/137608113616541.jpg', 
    'http://img.rnudah.com/thumbs/13/139608119186498.jpg' 
];</script> 
''' 

variables = re.findall('(?si)var(.*?);', page_source) 

var_collection = {} 
for var in variables: 
    var = var.strip() 
    var_key = var.split(' = ')[0] 
    var_value = ast.literal_eval(var.split(' = ')[1]) 
    var_collection.update({var_key: var_value}) 

print(var_collection['images']) 

輸出:

['http://img.rnudah.com/images/13/133608119523265.jpg', 'http://img.rnudah.com/images/13/135608116569903.jpg', 'http://img.rnudah.com/images/13/137608113616541.jpg', 'http://img.rnudah.com/images/13/139608119186498.jpg'] 

相關: https://stackoverflow.com/a/18108644/295246

+0

好嗎..謝謝給我一個提示..只是現在的我試圖操縱你的代碼,現在我得到了我想要的......謝謝你! :) – shahril

+0

@shahril很高興幫助。隨時歡迎或接受這個答案作爲您的解決方案,由您自行決定。謝謝! –