Web抓取Cheerio中的HTML表格

我遇到了網頁抓取項目的問題。下面是頁面的樣本，我需要刮：Web抓取Cheerio中的HTML表格

<table style="position..."> 
    <thead>..</thead> 
    <tbody id="leaderboard_body"> 
     <tr bgcolor="#155555">..</tr> 
     <tr bgcolor="#155555">..</tr> 
     <tr bgcolor="#155555">..</tr> 
       ... 
    </tbody> 
</table>

更多的細節，這裏是頁：World Leaderboards

我想在TR標籤中訪問信息，但我不能達到它。我無法找到簡單的代碼，這樣的TBODY標籤，我不知道爲什麼：

var cheerio = require("cheerio"); 
 
var url = "http://www.dota2.com/leaderboards/?l=french#europe"; 
 
var http = require("http"); 
 

 
// Utility function that downloads a URL and invokes 
 
// callback with the data. 
 
function download(url, callback) { 
 
    http.get(url, function(res) { 
 
    var data = ""; 
 
    res.on('data', function (chunk) { 
 
     data += chunk; 
 
    }); 
 
    res.on("end", function() { 
 
     callback(data); 
 
    }); 
 
    }).on("error", function() { 
 
    callback(null); 
 
    }); 
 
} 
 

 
download(url, function(data) { 
 
    if (data) { 
 

 
var $ = cheerio.load(data); 
 
var content = $('tbody').text(); 
 
console.log(content); 
 
    } 
 
    else 
 
    console.log(err); 
 
    
 
});

來源

2016-03-02 thor

這是因爲表中不存在的HTML，它的插入用JavaScript在頁面加載後，不能以傳統的方式刮掉。

請始終查看源代碼，而不僅僅是控制檯中的實時視圖。

這樣做只是最低限度的研究表明，該表從被得到與請求

http://www.dota2.com/webapi/ILeaderboard/GetDivisionLeaderboard/v0001?division=europe

有你有所有的數據，你需要格式化，準備好了，而不必JSON建刮HTML

來源

2016-03-02 02:18:06 adeneo

我現在覺得愚蠢......至少ty很容易，我確信這樣的事情是我的麻煩的原因 – thor

Web抓取Cheerio中的HTML表格

回答

相關問題