如何高效地搜索/替換一個大的txt文件？

我有一個相對較大的csv /文本數據文件（33mb），我需要執行全局搜索並替換分隔符。（原因是在表導出期間似乎沒有辦法讓SQLServer逃避/處理數據中的雙引號，但這是另一回事......）如何高效地搜索/替換一個大的txt文件？

我成功完成了Textmate搜索並取而代之的是一個較小的文件，但它在這個大文件上窒息。

好像命令行grep的可能是答案，但我不能完全掌握語法，鼻翼：

grep -rl OLDSTRING . | xargs perl -pi~ -e ‘s/OLDSTRING/NEWSTRING/’

所以在我的情況下，我在尋找的「^」（尖）性格和用'「（雙引號）代替。

grep -rl " grep_test.txt | xargs perl -pi~ -e 's/"/^'

這不工作，我假設它與雙引號或東西的轉義做的，但我很迷失。幫助任何人？

（我想如果有人知道如何讓SQLServer2005在輸出到csv的時候在文本列中處理雙引號，那真的能解決核心問題。）

來源

2010-08-23 Robert Travis Pierce

你的perl替換似乎是錯誤的。嘗試：

grep -rl \" . | xargs perl -pi~ -e 's/\^/"/g'

說明：

grep : command to find matches 
-r : to recursively search 
-l : to print only the file names where match is found 
\" : we need to escape " as its a shell meta char 
. : do the search in current working dir 
perl : used here to do the inplace replacement 
-i~ : to do the replacement inplace and create a backup file with extension ~ 
-p : to print each line after replacement 
-e : one line program 
\^ : we need to escape caret as its a regex meta char to mean start anchor

來源

2010-08-23 15:51:41 codaddict

這兩個工作，並幫助解釋清楚。非常感謝你！ – 2010-08-23 17:07:40

哦，好吧，我沒有足夠的'分數'來做到這一點。謝謝。 – 2010-08-23 18:04:15

sed -i.bak 's/\^/"/g' mylargefile.csv

更新：您也可以使用Perl作爲收服曾建議

perl -i.bak -pe 's/\^/"/g' mylargefile.csv

但在大文件，sed的可能運行快一點而不是Perl，因爲我的結果顯示在一個600萬行文件上

$ tail -4 file 
this is a line with^
this is a line with^
this is a line with^

$ wc -l<file 
6136650 

$ time sed 's/\^/"/g' file >/dev/null 

real 0m14.210s 
user 0m12.986s 
sys  0m0.323s 
$ time perl -pe 's/\^/"/g' file >/dev/null 

real 0m23.993s 
user 0m22.608s 
sys  0m0.630s 
$ time sed 's/\^/"/g' file >/dev/null 

real 0m13.598s 
user 0m12.680s 
sys  0m0.362s 

$ time perl -pe 's/\^/"/g' file >/dev/null 

real 0m23.690s 
user 0m22.502s 
sys  0m0.393s

來源

2010-08-24 00:59:52 ghostdog74

感謝您的幫助。我從來沒有使用sed，但如果它是簡明的，它一定值得一看。 :) – 2010-08-24 01:51:57

perl -i.bak -pe's/\^/「/ g'mylargefile.csv並不是那麼長...... – reinierpost 2010-08-24 08:35:02

如何高效地搜索/替換一個大的txt文件？

回答

相關問題