2010-07-10 136 views
2

這是與另外一個問題/代碼高爾夫我問上Code golf: "Color highlighting" of repeated textshell腳本查找,搜索和

我有具有以下內容的文件「sample1.txt」文件替換字符串數組:

LoremIpsumissimplydummytextoftheprintingandtypesettingindustry.LoremIpsumhasbeentheindustry'sstandarddummytexteversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook. 

我有一個腳本生成的字符串的下面陣列(只有少數爲了說明示出),其發生在文件中:

LoremIpsum 
LoremIpsu 
dummytext 
oremIpsum 
LoremIps 
dummytex 
industry 
oremIpsu 
remIpsum 
ummytext 
LoremIp 
dummyte 
emIpsum 
industr 
mmytext 

我需要(從頂部)看到如果'loremIpsum'出現在文件sample1.txt中。如果是這樣,我想用以下代碼替換所有的LoremIpsum:。現在,當程序移動到下一個單詞'LoremIpsu'時,它不應該與sample1.txt中的文本匹配。它應該重複上面這個'數組'的所有元素。下一個'有效'的將是'dummytext',並且應該被標記爲<T2>dummytext</T2>

我認爲應該可以爲此創建一個bash shell腳本解決方案,而不是依靠perl/python/ruby​​程序。

+0

這聽起來像SED工作,但問題是我不明白。 – Marco 2010-07-10 07:18:52

+0

嗨馬可 - T2的例子有幫助嗎? – RubiCon10 2010-07-10 07:20:23

+1

你爲什麼要使用shell腳本?爲什麼不使用最適合工作的工具? Perl是用於低編程時文本處理的。 – Borealid 2010-07-10 07:27:02

回答

0

純擊(無外部對象)

在擊命令行:

用Perl
$ sample="LoremIpsumissimplydummytextoftheprintingandtypesettingindustry.LoremIpsumhasbeentheindustry'sstandarddummytexteversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook." 
$ # or: sample=$(<sample1.txt) 
$ array=(
LoremIpsum 
LoremIpsu 
dummytext 
... 
) 
$ tag=0; for entry in ${array[@]}; do test="<[^>/]*>[^>]*$entry[^<]*</"; if [[ ! $sample =~ $test ]]; then ((tag++)); sample=${sample//${entry}/<T$tag>$entry</T$tag>}; fi; done; echo "Output:"; echo $sample 
Output: 
<T1>LoremIpsum</T1>issimply<T2>dummytext</T2>oftheprintingandtypesetting<T3>industry</T3>.<T1>LoremIpsum</T1>hasbeenthe<T3>industry</T3>'sstandard<T2>dummytext</T2>eversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook. 
+0

奇妙btw! – RubiCon10 2010-07-13 10:18:42

0

直接:

#! /usr/bin/perl 

use warnings; 
use strict; 

my @words = qw/ 
    LoremIpsum 
    LoremIpsu 
    dummytext 
    oremIpsum 
    LoremIps 
    dummytex 
    industry 
    oremIpsu 
    remIpsum 
    ummytext 
    LoremIp 
    dummyte 
    emIpsum 
    industr 
    mmytext 
/; 

my $to_replace = qr/@{[ join "|" => 
         sort { length $b <=> length $a } 
         @words 
        ]}/; 

my $i = 0; 
while (<>) { 
    s|($to_replace)|++$i; "<T$i>$1</T$i>"|eg; 
    print; 
} 

樣品運行(包裝,以防止水平滾動):

$ ./tag-words sample.txt 
<T1>LoremIpsum</T1>issimply<T2>dummytext</T2>oftheprintingandtypesetting<T3>indus 
try</T3>.<T4>LoremIpsum</T4>hasbeenthe<T5>industry</T5>'sstandard<T6>dummytext</T 
6>eversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatyp 
especimenbook.

您可能會反對,所有的qr//@{[ ... ]}業務都是巴洛克式的。人們可以得到與/o正則表達式開關相同的效果

# plain scalar rather than a compiled pattern 
my $to_replace = join "|" => 
       sort { length $b <=> length $a } 
       @words; 

my $i = 0; 
while (<>) { 
    # o at the end for "compile (o)nce" 
    s|($to_replace)|++$i; "<T$i>$1</T$i>"|ego; 
    print; 
} 
+0

嗨gbacon - 嗯 - 第二次更換應該是「T2」,第三 - 「T3」...只是fyi - 我知道它的代碼 – RubiCon10 2010-07-12 05:09:53

+0

@ RubiCon10 Ack!謝謝並修復! – 2010-07-12 12:22:26