0
我試圖用C++(Visual Studio 2013)將整個網頁保存爲.txt文件。我正在使用cURL。 一切工作正常,但我試圖保存的網站 - 使用大量的JavaScript來生成頁面。所以當我用cURL保存網頁時.txt文件只有〜170行。 當我使用Google Chrome(ctrl + s)將網頁保存爲.htm文件時.htm文件有超過2000行。有沒有辦法將完全加載的網頁保存到文件中? 這是我使用的代碼:C++ cURL - 如何將完整的網頁保存到文件?
struct MemoryStruct {
char *memory;
size_t size;
};
static size_t
WriteMemoryCallback(void *contents, size_t size, size_t nmemb, void *userp)
{
size_t realsize = size * nmemb;
struct MemoryStruct *mem = (struct MemoryStruct *)userp;
mem->memory = (char*)realloc(mem->memory, mem->size + realsize + 1);
if (mem->memory == NULL) {
/* out of memory! */
printf("not enough memory (realloc returned NULL)\n");
return 0;
}
memcpy(&(mem->memory[mem->size]), contents, realsize);
mem->size += realsize;
mem->memory[mem->size] = 0;
return realsize;
}
int main(void)
{
CURL *curl_handle;
CURLcode res;
struct MemoryStruct chunk;
chunk.memory = (char*)malloc(1); /* will be grown as needed by the realloc above */
chunk.size = 0; /* no data at this point */
curl_global_init(CURL_GLOBAL_ALL);
/* init the curl session */
curl_handle = curl_easy_init();
/* specify URL to get */
curl_easy_setopt(curl_handle, CURLOPT_URL, "http://www.example.com/");
/* send all data to this function */
curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, WriteMemoryCallback);
/* we pass our 'chunk' struct to the callback function */
curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, (void *)&chunk);
/* some servers don't like requests that are made without a user-agent
field, so we provide one */
curl_easy_setopt(curl_handle, CURLOPT_USERAGENT, "libcurl-agent/1.0");
/* get it! */
res = curl_easy_perform(curl_handle);
/* check for errors */
if (res != CURLE_OK) {
fprintf(stderr, "curl_easy_perform() failed: %s\n",
curl_easy_strerror(res));
}
else {
/*
* Now, our chunk.memory points to a memory block that is chunk.size
* bytes big and contains the remote file.
*
* Do something nice with it!
*/
printf("%lu bytes retrieved\n", (long)chunk.size);
}
std::ofstream oplik;
oplik.open("test.txt");
oplik << chunk.memory;
oplik.close();
/* cleanup curl stuff */
curl_easy_cleanup(curl_handle);
if (chunk.memory)
free(chunk.memory);
/* we're done with libcurl, so clean it up */
curl_global_cleanup();
return 0;
}
感謝您的幫助,和對不起我的英語不好。
我不知道如何做到這一點。是不是有更簡單的方法來打開網頁作爲Internet Explorer,然後獲取生成的數據? – Mona
我也不知道,因爲我不熟悉Windows或IE。但我可以想象,有一些組件,它允許這樣做。否則,您可以查看[embed V8](https://developers.google.com/v8/embed)或http://stackoverflow.com/q/93692/1741542 –