2016-12-29 62 views
1

我正在嘗試從網站獲取動態內容。如何從網站獲取JavaScript動態內容

我試圖得到scrapy的內容。但內容正在加載js文件。所以它沒有輸入文字。

然後,我爲此安裝了硒,但現在我得到沒有這樣的會話錯誤。

例如,這是我試圖獲取內容的頁面。

http://www.hepsiburada.com/fox-fitness-new-target-70e-2-5-hp-motorlu-masajli-kosu-bandi-hediye-secenekleriyle-p-SPORKONKSBFOX0081?magaza=Finspor

我只是想這對這個網站。

item = ProductItem 
     name = response.css('h1.product-name::text').extract_first() 
     price = response.css('span[id=offering-price] > span::text').extract_first() 
     xpath = response.xpath('/html/head/script[17]') 
     data = xpath.re(" = (\{.+\})") 
     print(data) 

這就是我想要得到的內容。

var utagData = {"merchant_names":["Finspor"],"new_site":"new","order_store":"Finspor","order_currency":"TRY","page_domain":"www.hepsiburada.com","page_language":"tr-TR","page_site_name":"Hepsiburada","page_site_region":"tr","site_type":"desktop","page_type":"pdp","page_name":"Product Detail","category_path":"/product/spor-outdoor/spor-fitness/fitness-kondisyon/kosu-bantlari/sporkonksbfox008/","page_title":"Fox Fitness New Target 70E 2.5 Hp Motorlu, Masajlı Koşu Fiyatı","page_url":"http://www.hepsiburada.com/fox-fitness-new-target-70e-2-5-hp-motorlu-masajli-kosu-bandi-hediye-secenekleriyle-p-SPORKONKSBFOX0081?magaza=Finspor","page_referring_url":"http://www.hepsiburada.com/gunun-firsati-teklifi?element=1","page_query_string":["magaza=Finspor"],"is_canonical":"1","canonical_url":"http://www.hepsiburada.com/fox-fitness-new-target-70e-2-5-hp-motorlu-masajli-kosu-bandi-hediye-secenekleriyle-pm-sporkonksbfox008","product_prices":["999.00"],"product_unit_prices":["999.00"],"product_brands":["Fox Fitness"],"product_brand":"Fox Fitness","product_skus":["SPORKONKSBFOX0081"],"product_ids":["sporkonksbfox008"],"product_top_5":["sporkonksbfox008"],"product_names":["Fox Fitness New Target 70E 2.5 Hp Motorlu, Masajlı Koşu Bandı (Hediye Seçenekleriyle)"],"product_category_ids":["19249"],"product_categories":["kosu-bantlari"],"shipping_type":["super-hizli"],"product_quantities":["1"],"product_barcodes":["8691128100776"],"product_barcode":"8691128100776","product_name_array":"Fox Fitness New Target 70E 2.5 Hp Motorlu, Masajlı Koşu Bandı (Hediye Seçenekleriyle)","merchant_ids":["95df0e3483104fc1a16cca6e38bc45cc"],"order_subtotal":["999.00"],"category_id_hierarchy":"60001546 > 2147483635 > 353045 > 19249","category_name_hierarchy":"Spor Outdoor > Spor/Fitness > Fitness - Kondisyon > Koşu Bantları","product_status":"InStock"}; 
    var utagObject = utagData; 
    var utag_data = {"merchant_names":["Finspor"],"new_site":"new","order_store":"Finspor","order_currency":"TRY","page_domain":"www.hepsiburada.com","page_language":"tr-TR","page_site_name":"Hepsiburada","page_site_region":"tr","site_type":"desktop","page_type":"pdp","page_name":"Product Detail","category_path":"/product/spor-outdoor/spor-fitness/fitness-kondisyon/kosu-bantlari/sporkonksbfox008/","page_title":"Fox Fitness New Target 70E 2.5 Hp Motorlu, Masajlı Koşu Fiyatı","page_url":"http://www.hepsiburada.com/fox-fitness-new-target-70e-2-5-hp-motorlu-masajli-kosu-bandi-hediye-secenekleriyle-p-SPORKONKSBFOX0081?magaza=Finspor","page_referring_url":"http://www.hepsiburada.com/gunun-firsati-teklifi?element=1","page_query_string":["magaza=Finspor"],"is_canonical":"1","canonical_url":"http://www.hepsiburada.com/fox-fitness-new-target-70e-2-5-hp-motorlu-masajli-kosu-bandi-hediye-secenekleriyle-pm-sporkonksbfox008","product_prices":["999.00"],"product_unit_prices":["999.00"],"product_brands":["Fox Fitness"],"product_brand":"Fox Fitness","product_skus":["SPORKONKSBFOX0081"],"product_ids":["sporkonksbfox008"],"product_top_5":["sporkonksbfox008"],"product_names":["Fox Fitness New Target 70E 2.5 Hp Motorlu, Masajlı Koşu Bandı (Hediye Seçenekleriyle)"],"product_category_ids":["19249"],"product_categories":["kosu-bantlari"],"shipping_type":["super-hizli"],"product_quantities":["1"],"product_barcodes":["8691128100776"],"product_barcode":"8691128100776","product_name_array":"Fox Fitness New Target 70E 2.5 Hp Motorlu, Masajlı Koşu Bandı (Hediye Seçenekleriyle)","merchant_ids":["95df0e3483104fc1a16cca6e38bc45cc"],"order_subtotal":["999.00"],"category_id_hierarchy":"60001546 > 2147483635 > 353045 > 19249","category_name_hierarchy":"Spor Outdoor > Spor/Fitness > Fitness - Kondisyon > Koşu Bantları","product_status":"InStock"}; 
+0

你沒有顯示你的硒代碼(你應該從哪裏得到響應) – eLRuLL

回答

3

這裏沒有必要執行任何javascript。如果右側的頁面上以JSON格式並單擊「查看頁面源代碼」(或類似),您可以找到數據就在那裏:

# assuming we're crawling: 
# 'http://www.hepsiburada.com/fox-fitness-new-target-70e-2-5-hp-motorlu-masajli-kosu-bandi-hediye-secenekleriyle-p-SPORKONKSBFOX0081?magaza=Finspor' 

import json 

def parse(self, response): 
    # get the java-script in the <script> node 
    node = response.xpath("//script[contains(text(),'var utagData = ')]/text()") 
    # extract the json bit from the script text with regex 
    data = node.re('= (\{.+\})')[0] 
    # convert json to python dictionary 
    data = json.loads(data) 
    print(data) 
    print(data['merchant_names']) 
    # gives ['Finspor'] 
0

在過去,我用這個庫來抓取網站,並獲取內容,我需要:https://github.com/lapwinglabs/x-ray

它具有良好的API來找到你所需要的具體數據:

//get title 
xray('http://google.com', 'title')(function(err, title) { 
    console.log(title); 
}) 

或通過取景器看到:

xray('http://reddit.com', '.content')(function(err, innerHTML) { 
    console.log(innerHTML); 
}) 

得到具體的屬性值:

xray('http://techcrunch.com', '[email protected]')(function(err, value) { 
    console.log(value); 
}) 

那麼請看看這個庫。也許它可以幫助你達到要求的結果。