Scrapy是否支持JavaScript for webcrawling？

我聽說scrapy不支持javascript。我想知道如果我可以使用scrapy抓取使用爬網頁的鏈接到我們的Intranet站點，我們的Intranet站點有javascript，我相信它會在您點擊鏈接時生成，但我不能100％確定。Scrapy是否支持JavaScript for webcrawling？

但視圖源是由xml樣式表，並且當我使用螢火它具有相同的數據作爲HTML。我也無法使用html標籤颳去網站，我需要使用xml標籤才能刮掉。我很困惑爲什麼它既有html和xml以及這兩者的數據，爲什麼我不能只刮xml的HTML？

我知道我可以使用XML標籤抓取的第一頁，但我可以繼續遵循一個鏈接，並保持爬行？

是否仍然可以使用scrapy的crawlspider抓取每一個環節或者我不能？如果我不能你可以建議我可以使用另一個工具？支持javascript和後驗證登錄（https）。

的感謝！

下面是HTML數據當我使用螢火蟲（相同的數據作爲XML）

<tr> 
<td class="crt">1</td> 
<td class="listCell" align="center"> 
<td class="listCell" align="center"> 
<a href="/dis/packages.jsp?view=list&show=perdevice&device_gid=3651746C4173775343535452414567746D75643855673D3D53564A6151624D41716D534C68395A6337634E2F62413D3D&hwdid=probe0&subscrbid=6505550000&mdn=6505550000&maxlength=100">probe0</a> 
</td> 
<td class="listCell" align="center"> 
<a href="/dis/packages.jsp?view=list&show=perdevice&device_gid=3651746C4173775343535452414567746D75643855673D3D53564A6151624D41716D534C68395A6337634E2F62413D3D&hwdid=probe0&subscrbid=6505550000&mdn=6505550000&maxlength=100">6505550000</a> 
</td> 
<td class="listCell" align="center"> 
<a href="/dis/packages.jsp?view=timeline&show=perdevice&device_gid=3651746C4173775343535452414567746D75643855673D3D53564A6151624D41716D534C68395A6337634E2F62413D3D&hwdid=probe0&subscrbid=6505550000&mdn=6505550000&maxlength=100&date=20130716T141624949">2013-07-16 14:16:24.949</a> 
</td> 
<td class="cell" align="center">2013-07-16 14:16:24.949</td> 
<td class="cell" align="left">1 - SMS_PullRequest_CS</td> 
<td class="listCell" align="right"> 
<a href="/dis/profile_download?profileId=4294967295">4294967295</a> 
</td> 
<td class="listCell" align="center"> 
<a href="/dis/sessions.jsp?view=list&device_gid=3651746C4173775343535452414567746D75643855673D3D53564A6151624D41716D534C68395A6337634E2F62413D3D&hwdid=probe0&mdn=6505550000&subscrbid=6505550000&maxlength=100">view sessions</a> 
</td> 
<td class="listCell" align="center"> 
<a href="/dis/errors_agg.jsp?view=list&device_gid=3651746C4173775343535452414567746D75643855673D3D53564A6151624D41716D534C68395A6337634E2F62413D3D&hwdid=probe0&mdn=6505550000&subscrbid=6505550000&maxlength=100">view errors</a> 
</td> 
</tr>

這裏是數據當我使用視圖源XML樣式表（相同的數據作爲HTML）

<row> 
<cell type="href" href="/dis/packages.jsp?view=list&show=perdevice&device_gid=3651746C4173775343535452414567746D75643855673D3D53564A6151624D41716D534C68395A6337634E2F62413D3D&hwdid=probe0&mdn=6505550000&subscrbid=6505550000&maxlength=100">6505550000</cell> 
<cell type="href" href="/dis/packages.jsp?view=list&show=perdevice&device_gid=3651746C4173775343535452414567746D75643855673D3D53564A6151624D41716D534C68395A6337634E2F62413D3D&hwdid=probe0&subscrbid=6505550000&mdn=6505550000&maxlength=100">probe0</cell> 
<cell type="href" href="/dis/packages.jsp?view=list&show=perdevice&device_gid=3651746C4173775343535452414567746D75643855673D3D53564A6151624D41716D534C68395A6337634E2F62413D3D&hwdid=probe0&subscrbid=6505550000&mdn=6505550000&maxlength=100">6505550000</cell> 
<cell type="href" href="/dis/packages.jsp?view=timeline&show=perdevice&device_gid=3651746C4173775343535452414567746D75643855673D3D53564A6151624D41716D534C68395A6337634E2F62413D3D&hwdid=probe0&subscrbid=6505550000&mdn=6505550000&maxlength=100&date=20130716T143636194">2013-07-16 14:36:36.194</cell> 
<cell type="plain">2013-07-16 14:36:36.194</cell> 
<cell type="plain">1 - SMS_PullRequest_CS</cell> 
<cell type="href" href="/dis/profile_download?profileId=4294967295">4294967295</cell> 
<cell type="href" href="/dis/sessions.jsp?view=list&device_gid=3651746C4173775343535452414567746D75643855673D3D53564A6151624D41716D534C68395A6337634E2F62413D3D&hwdid=probe0&mdn=6505550000&subscrbid=6505550000&maxlength=100">view sessions</cell> 
<cell type="href" href="/dis/errors_agg.jsp?view=list&device_gid=3651746C4173775343535452414567746D75643855673D3D53564A6151624D41716D534C68395A6337634E2F62413D3D&hwdid=probe0&mdn=6505550000&subscrbid=6505550000&maxlength=100">view errors</cell> 
</row>

來源

2013-07-16 Gio

-1

我也一樣，正在與js拼搶...在那裏高五。

一個快速的方法我知道，如果scrapy響應從一個特定的網站，JS，從scrapy外殼進行檢查。 http://doc.scrapy.org/en/latest/topics/shell.html您可以查看scrapy如何通過查看（響應）瞭解您的請求網址。

例如，視圖（響應）不顯示從百思買的產品評論，但它的確定與eBay商品評論。

相關搜索，http://snipplr.com/all/tags/scrapy/可能是有幫助的。

如果你在這裏發佈你的蜘蛛，它可能也是有用的。

祝你好運！給我發消息，如果你解決它！

來源

2013-07-23 17:53:15 pforyogurt

通常爲JS你使用一個無頭的瀏覽器，將執行你的JavaScript。 Splash與scrapy-splash中間件和Selenium是兩種流行的選擇。

來源

2017-08-20 01:52:30 prisoneroffreedom

Scrapy是否支持JavaScript for webcrawling？

回答

相關問題