2014-02-17 36 views
0

我試圖用C++(Visual Studio 2013)將整個網頁保存爲.txt文件。我正在使用cURL。 一切工作正常,但我試圖保存的網站 - 使用大量的JavaScript來生成頁面。所以當我用cURL保存網頁時.txt文件只有〜170行。 當我使用Google Chrome(ctrl + s)將網頁保存爲.htm文件時.htm文件有超過2000行。有沒有辦法將完全加載的網頁保存到文件中? 這是我使用的代碼:C++ cURL - 如何將完整的網頁保存到文件?

struct MemoryStruct { 
    char *memory; 
    size_t size; 
}; 

static size_t 
WriteMemoryCallback(void *contents, size_t size, size_t nmemb, void *userp) 
{ 
    size_t realsize = size * nmemb; 
    struct MemoryStruct *mem = (struct MemoryStruct *)userp; 

    mem->memory = (char*)realloc(mem->memory, mem->size + realsize + 1); 
    if (mem->memory == NULL) { 
     /* out of memory! */ 
     printf("not enough memory (realloc returned NULL)\n"); 
     return 0; 
    } 

    memcpy(&(mem->memory[mem->size]), contents, realsize); 
    mem->size += realsize; 
    mem->memory[mem->size] = 0; 

    return realsize; 
} 


int main(void) 
{ 
    CURL *curl_handle; 
    CURLcode res; 

    struct MemoryStruct chunk; 

    chunk.memory = (char*)malloc(1); /* will be grown as needed by the realloc above */ 
    chunk.size = 0; /* no data at this point */ 

    curl_global_init(CURL_GLOBAL_ALL); 

    /* init the curl session */ 
    curl_handle = curl_easy_init(); 

    /* specify URL to get */ 
    curl_easy_setopt(curl_handle, CURLOPT_URL, "http://www.example.com/"); 

    /* send all data to this function */ 
    curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, WriteMemoryCallback); 

    /* we pass our 'chunk' struct to the callback function */ 
    curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, (void *)&chunk); 

    /* some servers don't like requests that are made without a user-agent 
    field, so we provide one */ 
    curl_easy_setopt(curl_handle, CURLOPT_USERAGENT, "libcurl-agent/1.0"); 

    /* get it! */ 
    res = curl_easy_perform(curl_handle); 

    /* check for errors */ 
    if (res != CURLE_OK) { 
     fprintf(stderr, "curl_easy_perform() failed: %s\n", 
      curl_easy_strerror(res)); 
    } 
    else { 
     /* 
     * Now, our chunk.memory points to a memory block that is chunk.size 
     * bytes big and contains the remote file. 
     * 
     * Do something nice with it! 
     */ 

     printf("%lu bytes retrieved\n", (long)chunk.size); 
    } 
    std::ofstream oplik; 
    oplik.open("test.txt"); 
    oplik << chunk.memory; 
    oplik.close(); 

    /* cleanup curl stuff */ 
    curl_easy_cleanup(curl_handle); 

    if (chunk.memory) 
     free(chunk.memory); 

    /* we're done with libcurl, so clean it up */ 
    curl_global_cleanup(); 

    return 0; 
} 

感謝您的幫助,和對不起我的英語不好。

回答

1

cURL只能保存web服務器提供的內容。

如果你想保存任何東西,你必須包括一個JavaScript解釋器來構建網頁,就像任何網頁瀏覽器一樣。

+0

我不知道如何做到這一點。是不是有更簡單的方法來打開網頁作爲Internet Explorer,然後獲取生成的數據? – Mona

+0

我也不知道,因爲我不熟悉Windows或IE。但我可以想象,有一些組件,它允許這樣做。否則,您可以查看[embed V8](https://developers.google.com/v8/embed)或http://stackoverflow.com/q/93692/1741542 –

相關問題