2015-04-23 58 views
1

我在代碼中缺少什麼能夠獲取網站的html源代碼(信貸給@Michal Kottman)? 就像您要右鍵單擊並在Chrome中單擊「查看頁面源代碼」一樣。我如何使用luacurl/libcurl/curl和Lua獲取HTML代碼

local curl = require "luacurl" 
local c = curl.new() 

function GET(url) 
    c:setopt(curl.OPT_URL, url) 
    c:setopt(curl.OPT_PROXY, "http://myproxy.bla.com:8080") 
    c:setopt(curl.OPT_HTTPHEADER, "Connection: Keep-Alive", "Accept-Language: en-us") 
    c:setopt(curl.OPT_CONNECTTIMEOUT, 30) 
    local t = {} -- this will collect resulting chunks 
    c:setopt(curl.OPT_WRITEFUNCTION, function (param, buf) 
     table.insert(t, buf) -- store a chunk of data received 
     return #buf 
    end) 
    c:setopt(curl.OPT_PROGRESSFUNCTION, function(param, dltotal, dlnow) 
     print('%', url, dltotal, dlnow) -- do your fancy reporting here 
    end) 
    c:setopt(curl.OPT_NOPROGRESS, false) -- use this to activate progress 
    assert(c:perform()) 
    return table.concat(t) -- return the whole data as a string 
end 

--local s = GET 'http://www.lua.org/' 
local s = GET 'https://www.youtube.com/watch?v=dT_fkwX4fRM' 
print(s) 
file = io.open("text.html", "wb") 
file:write(s) 
file:close() 

不幸的是它必須是使用Lua和使用luacurl與libcurl爲luasocket它時所提供的代理工作不綁定(至少對我來說)。 我下載的文件是空的。使用CMD我得到沒有問題的頁面源 curl http://mypage.com

它適用於lua.org,但對於youtube鏈接它沒有。我錯過了什麼?

回答

1
local curl = require "luacurl" 
local c = curl.new() 

function GET(url) 
    c:setopt(curl.OPT_URL, url) 
    c:setopt(curl.OPT_PROXY, "http://myproxy.com:8080") 
    c:setopt(curl.OPT_HTTPHEADER, "Connection: Keep-Alive", "Accept-Language: en-us") 
    c:setopt(curl.OPT_CONNECTTIMEOUT, 30) 
    c:setopt(curl.OPT_FOLLOWLOCATION, true) -- REALLY IMPORTANT ELSE FAIL 
    c:setopt(curl.OPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36") 
    c:setopt(curl.OPT_SSL_VERIFYPEER, false) -- REALLY IMPORTANT ELSE NOTHING HAPPENS -.- 
    c:setopt(curl.OPT_ENCODING, "utf8") -- could be important 
    local t = {} -- this will collect resulting chunks 
    c:setopt(curl.OPT_WRITEFUNCTION, function (param, buf) 
     table.insert(t, buf) -- store a chunk of data received 
     return #buf 
    end) 
    c:setopt(curl.OPT_PROGRESSFUNCTION, function(param, dltotal, dlnow) 
     print('%', url, dltotal, dlnow) -- do your fancy reporting here 
    end) 
    c:setopt(curl.OPT_NOPROGRESS, false) -- use this to activate progress 
    assert(c:perform()) 
    return table.concat(t) -- return the whole data as a string 
end 
相關問題