2012-09-13 61 views
2

我需要從包含read (symbol)匹配文件的文件中刪除所有行,其中(symbol)是任何CJK字符。在匹配是read (symbol)之前立即是A-Z或a-z,但是,那行不應該被刪除。例如,這裏有一些樣品線和結果:如何刪除所有包含特定字符串的行,但只有當後面的字符是CJK字符?

Do you like to read books? (not deleted) 
Can you read 書? (deleted) 
.read 書. (deleted) 
This is some thread 線. (not deleted) 

如何刪除只有那些線匹配(not A-Z or a-z)read (CJK symbol)

+0

出於好奇,你是否在使用'grep -vP'[^ A-Za-z] read [\ x {9DBB}] +「file.txt時得到了同樣的錯誤信息'? – Steve

+0

是的,這也給出了錯誤,但是,另一個解決方案似乎運行良好。 – Village

回答

1

我不完全知道如何搭配CJK字符,但如果你符合非ASCII字符,你可以實現你正在尋找的結果:

grep -vP "[^A-Za-z]read [\x80-\xFF]" file.txt 

理論上,你應該能夠做到:

grep -vP "[^A-Za-z]read [\x{2E80}-\x{9FBB}]+" file.txt 

然而,在我的測試中,我得到錯誤:

grep: character value in \x{...} sequence is too large 

http://en.wikipedia.org/wiki/List_of_Unicode_characters#CJK_unified_ideographs

編輯:

LC_ALL="POSIX" sed -r '/[^A-Za-z]read [\o200-\o377]+/d' file.txt 

結果:

Do you like to read books? (not deleted) 
This is some thread 線. (not deleted) 

另見:

How to delete all CJK text appearing immediately after a particular symbol?

1
awk '$0~/ read [a-zA-Z]+/' your_file 
相關問題