一些實驗:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <Windows.h>
void generateFiles(int n) {
char fileName[32];
char fileStr[1032];
for (int i=0;i<n;i++) {
sprintf(fileName, "c:\\t\\%i.txt", i);
FILE * f = fopen(fileName, "w");
for (int j=0;j<256;j++) {
int lineLen = rand() % 1024;
memset(fileStr, 'X', lineLen);
fileStr[lineLen] = 0x0D;
fileStr[lineLen+1] = 0x0A;
fileStr[lineLen+2] = 0x00;
fwrite(fileStr, 1, lineLen+2, f);
}
fclose(f);
}
}
void readFiles(int n) {
char fileName[32];
for (int i=0;i<n;i++) {
sprintf(fileName, "c:\\t\\%i.txt", i);
FILE * f = fopen(fileName, "r");
fseek(f, 0L, SEEK_END);
int size = ftell(f);
fseek(f, 0L, SEEK_SET);
char * data = (char*)malloc(size);
fread(data, size, 1, f);
free(data);
fclose(f);
}
}
DWORD WINAPI readInThread(LPVOID lpParam)
{
int * number = (int *)lpParam;
char fileName[32];
sprintf(fileName, "c:\\t\\%i.txt", *number);
FILE * f = fopen(fileName, "r");
fseek(f, 0L, SEEK_END);
int size = ftell(f);
fseek(f, 0L, SEEK_SET);
char * data = (char*)malloc(size);
fread(data, size, 1, f);
free(data);
fclose(f);
return 0;
}
int main(int argc, char ** argv) {
long t1 = GetTickCount();
generateFiles(256);
printf("Write: %li ms\n", GetTickCount() - t1);
t1 = GetTickCount();
readFiles(256);
printf("Read: %li ms\n", GetTickCount() - t1);
t1 = GetTickCount();
const int MAX_THREADS = 256;
int pDataArray[MAX_THREADS];
DWORD dwThreadIdArray[MAX_THREADS];
HANDLE hThreadArray[MAX_THREADS];
for(int i=0; i<MAX_THREADS; i++)
{
pDataArray[i] = (int) HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY,
sizeof(int));
pDataArray[i] = i;
hThreadArray[i] = CreateThread(
NULL,
0,
readInThread,
&pDataArray[i],
0,
&dwThreadIdArray[i]);
}
WaitForMultipleObjects(MAX_THREADS, hThreadArray, TRUE, INFINITE);
printf("Read (threaded): %li ms\n", GetTickCount() - t1);
}
第一功能只是醜陋的東西,使測試數據集(我知道這是可以做到更好,但老實說,我沒有時間)
實驗1 - 連續讀取 第二個實驗 - 全部並行讀取
結果:
個
256個文件:
Write: 250 ms
Read: 140 ms
Read (threaded): 78 ms
1024文件:
Write: 1250 ms
Read: 547 ms
Read (threaded): 843 ms
我認爲第二次嘗試清楚地表明,在長期運行「啞巴」線程的創建只是使事情變得更糟。當然,它需要預先分配的工作人員,某些線程池等方面的改進,但我認爲通過從磁盤讀取100-200k這樣的快速操作,將此功能移到線程中並沒有真正的好處。我沒有時間寫更多'聰明'的解決方案,但我懷疑它會快得多,因爲你將不得不添加系統調用互斥鎖等......
進入極限,你可以想到預先分配內存池但正如前面提到的代碼你的發佈錯誤..這是短短的幾毫秒,但是可以肯定不是秒
800文件(每行20個字符,256線)
Write: 250 ms
Read: 63 ms
Read (threaded): 500 ms
結論:
答案是:
您的閱讀代碼錯誤,你正在讀取文件這麼慢速度顯着增加,那麼你使任務並行運行。在上面閱讀的代碼實際上是更快那麼費用生成一個線程
'while(!datain.eof())'Argggggghhhhhhhhhh –
您關心什麼平臺,文件系統,硬件等? – Useless
嗯,你很快就會遇到磁盤訪問限制:你的硬盤基本上是一個連續的機制。這就是SQL這樣的數據庫管理者使用優化的存儲子系統的原因 – lucasg