這裏是我的解決辦法awk
,我認爲比sed
更靈活。 此prg。離開LaTeX命令(當單詞以「\」開始時)並且將保留第一個大寫字母。 LaTeX命令(和普通文本)的參數將被字典文件替代。 當[rev]程序的第三個參數打開時,它將通過相同的字典文件進行反向替換。 任何非alpha-beta字符都可作爲字詞分隔符(這在LaTeX源文件中是必需的)。 prg將其輸出寫入屏幕(stdout),因此您需要使用重定向到文件(> output_f)。 (我認爲你的LaTeX源的inputencoding是1字節/字符。)
> cat dic.sh
#!/bin/bash
(($#<2))&& { echo "Usage $0 dictionary_file latex_file [rev]"; exit 1; }
((d= $#==3 ? 0:1))
awk -v d=$d '
BEGIN {cm=fx=0; fn="";}
fn!=FILENAME {fx++; fn=FILENAME;}
fx==1 {if(!NF)next; if(d)a[$1]=$2; else a[$2]=$1; next;} #read dict or rev dict file into an associative array
fx==2 { for(i=1; i<=length($0); i++)
{c=substr($0,i,1); #read characters from a given line of LaTeX source
if(cm){printf("%s",c); if(c~"[^A-Za-z0-9\\\]")cm=0;} #LaTeX command is occurred
else if(c~"[A-Za-z]")w=w c; else{pr(); printf("%s",c); if(c=="\\")cm=1;} #collect alpha-bets or handle them
}
pr(); printf("\n"); #handle collected last word in the line
}
function pr( s){ # print collected word or its substitution by dictionary and recreates first letter case
if(!length(w))return;
s=tolower(w);
if(!(s in a))printf("%s",w);
else printf("%s", s==w ? a[s] : toupper(substr(a[s],1,1)) substr(a[s],2));
w="";}
' $1 $2
字典文件:
> cat dictionary
apple lemon
raspberry cherry
pear banana
LaTeX的輸入源:
> cat src.txt
Apple123pear,apple "pear".
\Apple123pear{raspberry}{pear}[apple].
Raspberry12Apple,pear.
執行結果:
> ./dic.sh
Usage ./dic.sh dictionary_file latex_file [rev]
> ./dic.sh dictionary src.txt >out1.txt; cat out1.txt
Lemon123banana,lemon "banana".
\Apple123pear{cherry}{banana}[lemon].
Cherry12Lemon,banana.
> ./dic.sh dictionary out1.txt >out2.txt rev; cat out2.txt
Apple123pear,apple "pear".
\Apple123pear{raspberry}{pear}[apple].
Raspberry12Apple,pear.
> diff src.txt out2.txt # they are identical
「替換」 不會爲你做的工作? –
該腳本是微不足道的。數據,但是...你能提供一個合適的替代列表嗎? –
那麼,我可以使用'sed'或'awk'來分別替換每個案例。我希望有人已經爲普通情況準備了一個循環或腳本。事實上,找到一個通用替代品列表是另一個挑戰。如果要自己做,我會在github上設置它,以便在遇到新病例時進行更新。 –