Powershell：使用列表計算文件中字符串的實例

我試圖以有效的方式獲得「file1」中字符串（從40到400+個字符）變化的次數。 file1有大約2k行，file2有大約130k行。我目前有一個Unix解決方案，它在VM中約2分鐘，在Cygwin約5個，但我試圖用Powershell /與自動化（AutoIT）
我有一個解決方案，但它需要太長時間（大約與Cygwin完成相同的時間 - 所有2k線 - 我只有40-50行Powershell！）雖然我沒有'還沒有準備好解決方案，如果有一種解決方案可以快速準確的話，我也可以使用Python。Powershell：使用列表計算文件中字符串的實例

這裏是Unix代碼：

while read SEARCH_STRING; 
do printf "%s$" "${SEARCH_STRING}"; 
grep -Fc "${SEARCH_STRING}" file2.csv; 
done < file1.csv | tee -a output.txt;

這裏是PowerShell代碼我現在有

$Target = Get-Content .\file1.csv 
Foreach ($line in $Target){ 
    #Just to keep strings small, since I found that not all 
    #strings were being compared correctly if they where 250+ chars 
    $line = $line.Substring(0,180) 
    $Coll = Get-Content .\file2.csv | Select-string -pattern "$line" 
    $cnt = $Coll | measure 
    $cnt.count 
}

的建議任何想法會有所幫助。

謝謝。

編輯

我試圖通過CB提出了修改的方案

del .\output.txt 
$Target = Get-Content .\file1.csv 
$file= [System.IO.File]::ReadAllText("C:\temp\file2.csv") 
Foreach ($line in $Target){ 
    $line = [string]$line.Substring(0, $line.length/2) 
    $cnt = [regex]::matches([string]$file, $line).count >> ".\output.txt" 
}

但是，因爲我在文件1字符串是變長我keept越來越OutOfBound例外的子功能，所以我將輸入字符串減半（/ 2）以嘗試匹配。當我嘗試減半他們，如果我是有一個開放的括號，它告訴我：

Exception calling "Matches" with "2" argument(s): "parsing "CVE-2013-0796,04/02/2013,MFSA2013-35 SeaMonkey: WebGL 
crash with Mesa graphics driver on Linux (C" - Not enough)'s." 
At C:\temp\script_test.ps1:6 char:5 
+  $cnt = [regex]::matches([string]$file, $line).count >> ".\output.txt ... 
+  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
+ CategoryInfo   : NotSpecified: (:) [], MethodInvocationException 
+ FullyQualifiedErrorId : ArgumentException

我不知道是否有一種方法以提高在PowerShell中的輸入限制（我最大的尺寸現在是406，但未來可能會更大）或者放棄並嘗試Python解決方案。

想法？

編輯

感謝@ C.B。我得到了正確的答案，它完美地匹配了Bash腳本的輸出。這裏是輸出結果到一個文本文件的完整代碼：

$Target = Get-Content .\file1.csv 
$file= [System.IO.File]::ReadAllText("C:\temp\file2.csv") 
Foreach ($line in $Target){ 
    $cnt = [regex]::matches($file, [regex]::escape($line)).count >> ".\output.txt"  
}

來源

2013-06-26 lennin59

試試這個：

$Target = Get-Content .\file1.csv 
$file= [System.IO.File]::ReadAllText("c:\test\file2.csv") 
Foreach ($line in $Target){ 
    $line = $line.Substring(0,180)  
    $cnt = [regex]::matches($file, [regex]::escape($line)).count  
}

來源

2013-06-26 15:04:00

它的偉大工程和快速（大約爲所有結果一分鐘！）雖然我現在有一個問題更多...其理由是子串功能的OurOfRange異常（理解的，因爲一些字符串不180個字符長），並且在許多情況下不會給我正確的結果，有時輸出0個匹配，甚至完全忽略它。任何類型的PowerShell中字符限制的修復？ – lennin59

您可以嘗試：'$ line = $ line.Substring（0，$ line。長度-1）' –

這樣做了，但是我在文件1中有不同的長度。而且這些字符串包含圓括號，並且由於某種原因，如果我用開括號將字符串剪切，它會給出一個參數異常，因爲「沒有足夠的」）「s」。做了一次if/else評估，但沒有任何幫助，並且在某些領域一直沒有找到匹配。想用Python與正則表達式... – lennin59

一個與你的腳本的問題是，你讀file2.csv一遍又一遍，每行來自file1.csv。只讀一次文件並將內容存儲在變量中會顯着加快速度。試試這個：

$f2 = Get-Content .\file2.csv 

foreach ($line in (gc .\file1.csv)) { 
    $line = $line.Substring(0,180) 
    @($f2 | ? { $_ -match $line }).Count 
}

來源

2013-06-26 15:32:08

你說得對，我一遍又一遍地檢查文件，但我嘗試了你的方法，並沒有加快速度。感謝您的提示，但！ – lennin59

Powershell：使用列表計算文件中字符串的實例

回答

相關問題