我在Win32下開發了一個簡單工作的DLL:它掃描主機的子虛擬內存。但是由於某些原因,它與Cheat Engine,ArtMoney甚至OllyDbg相比使用單線程掃描速度非常慢。這是用VirtualQuery()掃描單個內存段的函數的代碼。主機(.exe應用程序)承諾大約300-400 MiB的內存,並且我必須掃描大約170個內存部分,大小從4KiB到32MiB不等。我只掃描MEM_PRIVATE,MEM_COMMIT區域,不掃描PAGE_GUARD,PAGE_NOACCESS,PAGE_READONLY,跳過DLL自己的內存。在內存中搜索字符串時的性能問題
由於某些原因,性能非常糟糕 - 需要10-12秒來找到單個字符串。例如OllyDbg在〜2-3秒內找到字符串。
UINT __stdcall ScanAndReplace(UCHAR* pStartAddress, UCHAR* pEndAddress, const char* csSearchFor, const char* csReplaceTo, UINT iLength)
{
// This function runs inside the single memory section and looks for a specific substring
// pStartAddress: UCHAR* - The begining of the memory section
// pEndAddress: UCHAR* - The ending of the memory section
// csSearchFor: const char* - The pointer to the substring to search for
// csReplaceTo: const char* - The pointer to the substring to replace with
// iLength: UINT - max length of csSearchFor substring
// Total iterations
UINT iHits = 0;
// Scan from pStartAddress to (pEndAddress - iLength) and don't overrun memory section
for (pStartAddress; pStartAddress < (pEndAddress - iLength); ++pStartAddress)
{
UINT iIterator = 0;
// Scan for specific string that begins at current address (pStartAddress) until condition breaks
for (iIterator; (iIterator < iLength) && (pStartAddress[iIterator] == csSearchFor[iIterator]); ++iIterator);
// String matches if iIterator == iLength
if (iIterator == iLength)
{
// Found, do something (edit/replace, etc), increment counter...
++iHits;
}
/*
// Even if you search for single byte it's very slow
if (*pStartAddress == 'A')
++iHits;
*/
}
return iHits;
}
我使用MSVS 2010
編譯器的命令行:
/nologo /W3 /WX- /O2 /Os /Oy- /GL /D "WIN32" /D "NDEBUG" /D "_WINDOWS"
/D "_USRDLL" /D "MYDLL_EXPORTS" /D "_WINDLL" /GF /Gm- /MD /GS- /Gy
/fp:precise /Zc:wchar_t /Zc:forScope /Fp"Release\MyDll.pch" /FAcs
/Fa"Release\" /Fo"Release\" /Fd"Release\vc100.pdb" /Gd /TC /analyze-
/errorReport:queue
器命令行:
/OUT:"D:\MyDll\Release\MyDll.dll" /INCREMENTAL:NO /NOLOGO /DLL "Dbghelp.lib"
"msvcrt.lib" "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib"
"comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib"
"uuid.lib" "odbc32.lib" "odbccp32.lib" /NODEFAULTLIB /MANIFEST:NO
/ManifestFile:"Release\MyDll.dll.intermediate.manifest" /ALLOWISOLATION
/MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG
/PDB:"D:\MyDll\Release\MyDll.pdb" /SUBSYSTEM:WINDOWS /OPT:REF /OPT:ICF
/PGD:"D:\MyDll\Release\MyDll.pgd" /LTCG /TLBID:1 /ENTRY:"DllMain"
/DYNAMICBASE /NXCOMPAT /MACHINE:X86 /ERRORREPORT:QUEUE
我在做什麼錯?我的algorythm是壞的還是有某種其他記憶掃描儀使用的「魔術」?
需要多長時間來枚舉所有部分?我的意思是 - 如果你將ScanAndReplace()函數減少爲空,它是否還需要相當長的時間?也許你在錯誤的地方尋找問題? –
如果我從這個函數中刪除所有的代碼(或者只是發生實際讀取的部分 - 內部FOR循環),它幾乎可以立即傳遞整個虛擬內存。我想我已經找到了問題的根源,它與算法無關,但主機應用程序以某種方式影響了內存讀取速度。我需要更多的時間來弄清楚會發生什麼,以及爲什麼在「測試」exe應用程序中加載相同數量的提交內存的DLL運行速度如預期的那樣快,但在「真實」應用程序中運行速度非常慢。這可能是try/catch {}。 –