2016-01-22 27 views
0

我想在網站上颳去廣告...如何使用python/scrapy在網站上刮取小部件的輸出?

本網站例如

http://www.bestyling.com/15-of-the-most-expensive-shoes-ever-and-you-wont-believe-whats-1/?utm_source=Ourbrain&utm_medium=cpc&utm_campaign=15%20Shoes%20-%20Desktop%20USA

我試圖讓廣告從這個

/HTML/body [@ class ='single single-post postid-171 single-format-standard custom-background hasGoogleVoiceExt']/div [@ id ='site']/div [@ id ='site-out']/div [@ ID = '位點固定'] /格[@ ID = '含量出'] /格[@ ID = '內容在'] /格[@ ID = '主內容纏繞'] /格[@ id ='main-content-contain']/div [@ id ='content-wrap']/div [@ class ='sec-marg-out4 rel [@ class ='post-171 post-type-post status-publish format-standard hentry category-uncategorized']/div [@ id ='['class ='sec-marg-in4']/article [後區域 '] /格[@類=' 後體出 '] /格[@類=' 體後-在 '] /格[@ ID =' 內容的區域'] /格[@class ='content-area-cont left relative']/div [@ class ='sec-marg-out relative']/div [@ class ='sec-marg-in']/div [@ class ='content-area -out']/div [@ class ='content-area-in']/div [@ class ='content-main left relative']/div [@ id ='article-ad']/div [1] /格[@ ID = 'ac_110238'] /格[@類= 'ac_adbox'] /格[@類= 'ac_adbox_inner']

'ac_container' 或 'AC-adbox'

當我去在瀏覽器中的頁面我看到了廣告,當我使用scrapy來獲取HTML時

其腳本

<div id="contentad110238"></div> 
    <script type="text/javascript"> 
     (function(d) { 
     var params = 
     { 
      id: "d12cd6f3-b896-443b-9140-07e35e66e222", 
      d: "YmVzdHlsaW5nLmNvbQ==", 
      wid: "110238", 
      cb: (new Date()).getTime() 
     }; 

    var qs=[]; 
    for(var key in params) qs.push(key+'='+encodeURIComponent(params[key])); 
    var s = d.createElement('script');s.type='text/javascript';s.async=true; 
    var p = 'https:' == document.location.protocol ? 'https' : 'http'; 
    s.src = p + "://api.content.ad/Scripts/widget2.aspx?" + qs.join('&'); 
    d.getElementById("contentad110238").appendChild(s); 
})(document); 
</script>              </div> 

我該如何刮這個?任何幫助將不勝感激...我猜我必須在python或scrapy中使用js渲染器....建議?

回答

0

這些廣告是通過Javascript獲取的,所以當你下載原始HTML(如Scrapy)時,你不會看到它們。

雖然,你可以看看Splash(原ScrapyJS)與Scrapy integration無縫嵌入瀏覽器與Javascript。直接來自Scrapy開發人員。

一切都在Python中,除了瀏覽器渲染的Qt。

+0

我是否在我的假設中正確?我將不得不渲染,即。這就是爲什麼它沒有顯示? – user3707960