如何刪除不以特定子字符串開始或結束的字符串？

不幸的是，我不是正則表達式的專家，所以我需要一點幫助。如何刪除不以特定子字符串開始或結束的字符串？

我正在尋找解決方案如何grep一個字符串數組來獲得兩個字符串的列表不會開始（1）或結束（2）與特定的子字符串。

假設我們有一個匹配以下規則的字符串數組：

[speakerId] - [短語] - [ID] .txt文件

即

10-phraseone-10.txt 11-phraseone-3.txt 1-phraseone-2.txt 2-phraseone-1.txt 3-phraseone-1.txt 4-phraseone-1.txt 5-clauseone-3。 txt 6-phraseone-2.txt 7-phraseone-2.txt 8-phraseone-10.txt 9-phraseone-2.txt 10-phrasetwo-1.txt 11-phrasetwo-1.txt 1-phrasetwo-1.txt 2-phrasetwo-1.txt 3 -twtw-1 .txt 4-phrasetwo-1.txt 5-phrasetwo-1.txt 6-phrasetwo-3.txt 7-phrasetwo-10.txt 8 -twtw-1.txt 9 -twtw-1.txt 10-phrasethree- 10.txt 11-phrasethree-3.txt 1-phrasethree-1.txt 2 -thththree-11.txt 3 -thththree-1.txt 4 -th -three-3.txt 5 -th -three-1.txt 6 -threethree -3.txt 7 phrasethree-1.txt的8 phrasethree-1.txt的9 phrasethree-1.txt的

讓我們介紹變量：

$speakerId
$phrase
$id1，$id2

我想到grep列表，並獲得的數組：

與含有特定$phrase元件，但我們排除那些同時從特定的$speakerId開始並且結尾H所指明的ID（例如$id1或$id2）
與具有特定$speakerId和$phrase但不包含特定ID的一個是在結束元素之一（警告：記得不排除10或11 $id=1等等）

也許有人coulde使用下面的代碼寫的解決方案：

@AllEntries = readdir(INPUTDIR); 

@Result1 = grep(/blablablahere/, @AllEntries); 

@Result2 = grep(/anotherblablabla/, @AllEntries); 

closedir(INPUTDIR);

來源

2012-11-19 venedie

我相信你正在尋找負前瞻（'（？！...）'），然後尋找隱藏（'（？<！...）'）斷言（排除比賽中的特定組件）。 –

這將有助於瞭解您的規則集（包含特定短語，以特定ID開頭，以某個ID結尾）的定義。例如，這些規則是固定的還是需要從文件中讀取？ – memowe

夥計們，謝謝你的回覆！我試圖使用負面預測，但沒有成功。我忘了如何使用正則表達式，所以任何例子都會有用。 :)要指定用例：我想僅在部分txt文件上運行測試應用程序。例如，對於有第一個說話者並且由第一個說話者講話但沒有id 1,2或3的文件（id's可以是[1..100]） – venedie

我喜歡用純正則表達式使用負面視圖和背後的方法。但是，閱讀有點困難。也許這樣的代碼可能更明瞭。它採用標準的Perl的成語是可讀的，如英語中的一些情況：

my @all_entries  = readdir(...); 
my @matching_entries =(); 

foreach my $entry (@all_entries) { 

    # split file name 
    next unless /^(\d+)-(.*?)-(\d+).txt$/; 
    my ($sid, $phrase, $id) = ($1, $2, $3); 

    # filter 
    next unless $sid eq "foo"; 
    next unless $id == 42 or $phrase eq "bar"; 
    # more readable filter rules 

    # match 
    push @matching_entries, $entry; 
} 

# do something with @matching_entries

如果你真正想表達的東西在一個grep列表轉變是複雜的，你可以寫這樣的代碼：

my @matching_entries = grep { 

    /^(\d)-(.*?)-(\d+).txt$/ 
    and $1 eq "foo" 
    and ($3 == 42 or $phrase eq "bar") 
    # and so on 

} readdir(...)

來源

2012-11-19 14:53:43 memowe

非常感謝，我花了一點你的第一個解決方案，並根據我的需要進行調整。 – venedie

假設一個基本的模式來匹配你的例子：

(?:^|\b)(\d+)-(\w+)-(?!1|2)(\d+)\.txt(?:\b|$)

能分解爲：

(?:^|\b) # starts with a new line or a word delimeter 
(\d+)-  # speakerid and a hyphen 
(\w+)-  # phrase and a hyphen 
(\d+)  # id 
\.txt  # file extension 
(?:\b|$) # end of line or word delimeter

你可以asser使用負面預測的排除。例如，包括那些沒有phrasetwo你可以修改上面的表達式中使用負前瞻短語的所有匹配：

(?:^|\b)(\d+)-(?!phrasetwo)(\w+)-(\d+)\.txt(?:\b|$)

注意我如何包括(?!phrasetwo)。另外，通過使用一個向後看，而不是一個前瞻的發現，在偶數結尾的所有phrasethree條目：

(?:^|\b)(\d+)-phrasethree-(\d+)(?<![13579])\.txt(?:\b|$)

(?<![13579])只是確保該ID的最後一個數字落在偶數。

來源

2012-11-19 14:41:50

並且對於參考和測試，您可以使用[this site ]（http://regexr.com?32rtg）在實現它之前試用你的模式。 –

酷！首先，我正在尋找這個網站！其次，我現在知道那些前瞻和其他有趣的小機制是如何工作的！我也採取了一些你的解決方案！ – venedie

這聽起來有點像你正在描述查詢功能。

#!/usr/bin/perl -Tw 

use strict; 
use warnings; 
use Data::Dumper; 

my ($set_a, $set_b) = query(2, 'phrasethree', [ 1, 3 ]); 

print Dumper({ a => $set_a, b => $set_b }); 

# a) fetch elements which 
# 1. match $phrase 
# 2. exclude $speakerId 
# 3. match @ids 
# b) fetch elements which 
# 1. match $phrase 
# 2. match $speakerId 
# 3. exclude @ids 
sub query { 
    my ($speakerId, $passPhrase, $id_ra) = @_; 

    my %has_id = map { ($_ => 0) } @{$id_ra}; 

    my (@a, @b); 

    while (my $filename = glob '*.txt') { 

     if ($filename =~ m{\A (\d+)-(.+?)-(\d+) [.] txt \z}xms) { 

      my ($_speakerId, $_passPhrase, $_id) = ($1, $2, $3); 

      if ($_passPhrase eq $passPhrase) { 

       if ($_speakerId ne $speakerId 
        && exists $has_id{$_id}) 
       { 
        push @a, $filename; 
       } 

       if ($_speakerId eq $speakerId 
        && !exists $has_id{$_id}) 
       { 
        push @b, $filename; 
       } 
      } 
     } 
    } 

    return (\@a, \@b); 
}

來源

2012-11-20 06:24:38 ddoxey

如何刪除不以特定子字符串開始或結束的字符串？

回答

相關問題