我想在網站上颳去廣告...如何使用python/scrapy在網站上刮取小部件的輸出?
本網站例如
我試圖讓廣告從這個
/HTML/body [@ class ='single single-post postid-171 single-format-standard custom-background hasGoogleVoiceExt']/div [@ id ='site']/div [@ id ='site-out']/div [@ ID = '位點固定'] /格[@ ID = '含量出'] /格[@ ID = '內容在'] /格[@ ID = '主內容纏繞'] /格[@ id ='main-content-contain']/div [@ id ='content-wrap']/div [@ class ='sec-marg-out4 rel [@ class ='post-171 post-type-post status-publish format-standard hentry category-uncategorized']/div [@ id ='['class ='sec-marg-in4']/article [後區域 '] /格[@類=' 後體出 '] /格[@類=' 體後-在 '] /格[@ ID =' 內容的區域'] /格[@class ='content-area-cont left relative']/div [@ class ='sec-marg-out relative']/div [@ class ='sec-marg-in']/div [@ class ='content-area -out']/div [@ class ='content-area-in']/div [@ class ='content-main left relative']/div [@ id ='article-ad']/div [1] /格[@ ID = 'ac_110238'] /格[@類= 'ac_adbox'] /格[@類= 'ac_adbox_inner']
'ac_container' 或 'AC-adbox'
當我去在瀏覽器中的頁面我看到了廣告,當我使用scrapy來獲取HTML時
其腳本
<div id="contentad110238"></div>
<script type="text/javascript">
(function(d) {
var params =
{
id: "d12cd6f3-b896-443b-9140-07e35e66e222",
d: "YmVzdHlsaW5nLmNvbQ==",
wid: "110238",
cb: (new Date()).getTime()
};
var qs=[];
for(var key in params) qs.push(key+'='+encodeURIComponent(params[key]));
var s = d.createElement('script');s.type='text/javascript';s.async=true;
var p = 'https:' == document.location.protocol ? 'https' : 'http';
s.src = p + "://api.content.ad/Scripts/widget2.aspx?" + qs.join('&');
d.getElementById("contentad110238").appendChild(s);
})(document);
</script> </div>
我該如何刮這個?任何幫助將不勝感激...我猜我必須在python或scrapy中使用js渲染器....建議?
我是否在我的假設中正確?我將不得不渲染,即。這就是爲什麼它沒有顯示? – user3707960