當在Cygwin上時,最重要的是儘可能地避免fork() - exec()。 通過設計,Windows不是用來處理像linux這樣的多進程的。 它沒有fork(),牛壞了。 因此,在編寫腳本時,儘量從單個進程執行perfrom。
在這種情況下,我們想要awk,而且只有1個awk。不惜一切代價避免xargs。另一件事是,如果你必須掃描多個文件,Windows中的磁盤緩存只是一個笑話。 而不是訪問所有文件的,更好的方法是讓grep的 只查找匹配給定的要求 所以你必須
grep -r "some-pattern-prahaps-HHH.Web-or-so" "/dir/to/where/you/have/millions/of/files/" |awk -f ~/scripts/parse.awk
並在「〜/腳本/ parse.awk」的文件,你必須打開並在awk中關閉()文件以加快速度。 請儘可能不要使用system()。
#!/bin/awk
BEGIN{
id=PROCINFO["pid"];
}
# int staticlibs_codesize_grep(option, regexp, filepath, returnArray, returnArray_linenum )
# small code size
# Code size is choosen instead of speed. Search may be slow on large files
# "-n" option supported
function staticlibs_codesize_grep(o, re, p, B, C, this, r, v, c){
if(c=o~"-n")C[0]=0;B[0]=0;while((getline r<p)>0){if(o~"-o"){while(match(r,re)){
B[B[0]+=1]=substr(r,RSTART,RLENGTH);r=substr(r,RSTART+RLENGTH);if(c)C[C[0]+=1]=c;}
}else{if(!match(r,re)!=!(o~"-v")){B[B[0]+=1]=r;if(c)C[C[0]+=1]=c;}}c++}return B[0]}
# Total: 293 byte , Codesize: > 276 byte, Depend: 0 byte
{
file = $0;
outfile = $0"."id; # Whatever.
# If you have multiple replacements, or multiline replacements,
# be carefull in the order you replace. writing a k-map for efficient condition branch is a must.
# Also, try to unroll the loop.
# The unrolling can be anyting, this is a trade between code size for speed.
# Here is a example of a unrolled loop
# instead of having while((getline r<file)>0){if(file~html){print "foo";}else{print "bar";};};
# we have moved the condition outside of the while() loop.
if(file~".htm$"){
while((getline r<file)>0){
# Try to perform minimum replacement required for given file.
# Try to avoid branching by if(){}else{} if you are inside a loop.
# Keep it minimalit and small.
print "foo" > outfile;
}
}else{
while((getline r<file)>0){
# Here, as a example, we unrolled the loop into two, one for htm files, one for other files.
print "bar" > outfile;
# if a condition is required, match() is better
if(match(r,"some-pattern-you-want-to-match")){
# do whatever complex replacement you want. We reuse the RSTART,RLENGTH from match()
before_match = substr(r,1,RSTART);
matched_data = substr(r,RSTART,RLENGTH);
after_match = substr(r,1,RSTART+RLENGTH);
# if you want further matches, like grep -o, extracting only the match
a=r;
while(match(a,re)){
B[B[0]+=1]=substr(a,RSTART,RLENGTH);
a=substr(a,RSTART+RLENGTH);
}
# Avobe stores multiple matches from a single line, into B
}
# If you want to perform even further complex matches. try the grep() option.
# staticlibs_codesize_grep() handles -o , -n , -v options. It sould satisfy most of the daily needs.
# for a grep-like output, use printf("%4s\t\b:%s\n", returnArray_linenum[index] , returnArray[index]);
# Example of multiple matches, against data that may or may not been replaced by the previous cond.
if(match(r,"another-pattern-you-want-to-match")){
# whatever
# if you decide that replaceing is not good, you can abort
if(for_whatever_reason_we_want_to_abort){
break;
}
}
# notice that we always need to output a line.
print r > outfile;
}
}
# If we forget to close file, we will run out of FD
close(file);
close(outfile);
# now we can move the file, however I would not do it here.
# The reason is, system() is a very heavy operation, and second is our replacement may be imcomplete, by human error.
# system("mv \""outfile"\" \""file"\" ")
# I would advice output to another file, for later move by bash or any other shell with builtin mv command.
# NOTE[*1]
print "mv \""outfile"\" \""file"\" " > "files.to.update.list";
}
END{
# Assuming we are all good, we should have a log file that records what has been modified
close("files.to.update.list");
}
# Now when all is ready, meaning you have checked the result and it is what you desire, perform
# source "files.to.update.list"
# inside a terminal , or
# cat "files.to.update.list" |bash
# and you are done
# NOTE[*1] if you have file names containing \x27 in them, the escape with \x22 is incomplete.
# Always check "files.to.update.list" for \x27 to avoid problems
# prahaps
# grep -v -- "`echo -ne "\x27"`" > "files.to.update.list.safe"
# then
# grep -- "`echo -ne "\x27"`" > "files.to.update.list.unsafe"
# may be a good idea.
請不要[cross-post](http://superuser.com/questions/246725/optimize-shell-and-awk-script)。 – 2011-02-17 01:40:50