需要幫助改善PowerShell中分隔的文本解析腳本的執行

我有必要通過大型管道分隔的文件解析來算，其第5列符合，不符合我的標準的記錄數。需要幫助改善PowerShell中分隔的文本解析腳本的執行

PS C:\temp> gc .\items.txt -readcount 1000 | ` 
    ? { $_ -notlike "HEAD" } | ` 
    % { foreach ($s in $_) { $s.split("|")[4] } } | ` 
    group -property {$_ -ge 256} -noelement | ` 
    ft –autosize

此命令我想要做什麼，這樣返回的輸出：

 
    Count Name 
    ----- ---- 
1129339 True 
2013703 False

然而，對於一個500 MB的測試文件，這個命令需要約5.5鍾如通過測量命令測量運行。一個典型的文件超過2GB，等待20分鐘以上的時間不合需要很長。

你看到的方式，以提高該命令的性能？

例如，有沒有辦法來確定獲取內容的ReadCount最佳值？沒有它，完成相同的文件需要8.8分鐘。

來源

2012-01-17 neontapir

您是否試過StreamReader？我認爲Get-Content會在完成任何事情之前將整個文件加載到內存中。 – Gisli 2012-01-17 21:52:25

你的意思是通過導入System.IO？ – neontapir 2012-01-17 21:59:23

是的，如果可以的話，請使用.net框架。我曾經習慣讀取SQL Server生成的大型日誌文件，結果良好。我不知道PowerShell中有任何其他方式來有效地讀取大文件，但我不是專家。 – Gisli 2012-01-17 22:08:59

您是否嘗試過的StreamReader？我認爲Get-Content會在完成任何事情之前將整個文件加載到內存中。

StreamReader class

來源

2012-01-17 22:52:58 Gisli

使用@吉斯利的提示，這裏是我結束了劇本：

param($file = $(Read-Host -prompt "File")) 
$fullName = (Get-Item "$file").FullName 
$sr = New-Object System.IO.StreamReader("$fullName") 
$trueCount = 0; 
$falseCount = 0; 
while (($line = $sr.ReadLine()) -ne $null) { 
     if ($line -like 'HEAD|') { continue } 
     if ($line.split("|")[4] -ge 256) { 
      $trueCount++ 
     } 
     else { 
      $falseCount++ 
     } 
} 
$sr.Dispose() 
write "True count: $trueCount" 
write "False count: $falseCount"

它產生在大約一分鐘，這符合我的性能要求同樣的結果。

來源

2012-01-17 23:11:26 neontapir

只需添加使用的StreamReader通過一個非常大的IIS日誌文件中讀取另一個例子輸出所有獨特的客戶端的IP地址和一些PERF指標。

$path = 'A_245MB_IIS_Log_File.txt' 
$r = [IO.File]::OpenText($path) 

$clients = @{} 

while ($r.Peek() -ge 0) { 
    $line = $r.ReadLine() 

    # String processing here... 
    if (-not $line.StartsWith('#')) { 
     $split = $line.Split() 
     $client = $split[-5] 
     if (-not $clients.ContainsKey($client)){ 
      $clients.Add($client, $null) 
     } 
    } 
} 

$r.Dispose() 
$clients.Keys | Sort

針對Get-Content小性能對比：

的StreamReader ：完成5.5秒，PowerShell.exe：35328 KB RAM。

獲取內容：完成23.6秒。 PowerShell.exe：1,110,524 KB RAM。

來源

2012-01-18 00:16:38

需要幫助改善PowerShell中分隔的文本解析腳本的執行

回答

相關問題