2013-03-01 44 views
0

給定一個url,以下正則表達式可以在URL中的某些點處插入/替換單詞。Perl正則表達式在特定位置插入/替換字符串

代碼:

#!/usr/bin/perl 

use strict; 
use warnings; 
#use diagnostics; 

my @insert_words = qw/HELLO GOODBYE/; 
my $word = 0; 
my $match; 

while (<DATA>) { 
    chomp; 
    foreach my $word (@insert_words) 
    { 
     my $repeat = 1; 
     while ((my $match=$_) =~ s|(?<![/])(?:[/](?![/])[^/]*){$repeat}[^/]*\K|$word|) 
     { 
      print "$match\n"; 
      $repeat++; 
     } 

    print "\n"; 
    } 
} 

__DATA__ 
http://www.stackoverflow.com/dog/cat/rabbit/ 
http://www.superuser.co.uk/dog/cat/rabbit/hamster/ 
10.15.16.17/dog/cat/rabbit/ 

(在__DATA__HELLO單詞的第一個例子URL)給出的輸出:

http://www.stackoverflow.com/dogHELLO/cat/rabbit/ 
http://www.stackoverflow.com/dog/catHELLO/rabbit/ 
http://www.stackoverflow.com/dog/cat/rabbitHELLO/ 
http://www.stackoverflow.com/dog/cat/rabbit/HELLO 

在哪裏,我現在堅持:

我現在想改變正則表達式錫安使輸出將是什麼樣子如下圖所示:

http://www.stackoverflow.com/dogHELLO/cat/rabbit/ 
http://www.stackoverflow.com/dog/catHELLO/rabbit/ 
http://www.stackoverflow.com/dog/cat/rabbitHELLO/ 
http://www.stackoverflow.com/dog/cat/rabbit/HELLO 
#above is what it already does at the moment 
#below is what i also want it to be able to do as well 
http://www.stackoverflow.com/HELLOdog/cat/rabbit/ #<-puts the word at the start of the string 
http://www.stackoverflow.com/dog/HELLOcat/rabbit/ 
http://www.stackoverflow.com/dog/cat/HELLOrabbit/ 
http://www.stackoverflow.com/dog/cat/rabbit/HELLO 
http://www.stackoverflow.com/HELLO/cat/rabbit/ #<- now also replaces the string with the word 
http://www.stackoverflow.com/dog/HELLO/rabbit/ 
http://www.stackoverflow.com/dog/cat/HELLO/ 
http://www.stackoverflow.com/dog/cat/rabbit/HELLO 

但我無法得到它的一個正則表達式中自動執行此操作。

這件事的任何幫助,將不勝感激,非常感謝

+1

你的意思是把'/ dog/cat/ra bbit/HELLO'兩次? – ikegami 2013-03-01 16:34:41

+0

@ikegami - 很好的問題,我希望它不會重複,我把它留在問題中,讓其他人可以理解我想要更容易實現的輸出類型,謝謝 – 2013-03-01 16:40:02

+1

**這可能不是正則表達式的工作,而是使用您選擇語言的現有工具。**您使用的是哪種語言?您可能不想使用正則表達式,而是使用已編寫,測試和調試的現有模塊。 如果您使用PHP,您需要['parse_url'](http://php.net/manual/en/function.parse-url.php)函數。 如果您使用Perl,您需要['URI'](http://search.cpan.org/dist/URI/)模塊。 如果您使用的是Ruby,請使用['URI'](http://www.ruby-doc.org/stdlib-1.9.3/libdoc/uri/rdoc/URI.html)模塊。 – 2013-03-01 17:27:49

回答

1

一個解決方案:

use strict; 
use warnings; 

use URI qw(); 

my @insert_words = qw(HELLO); 

while (<DATA>) { 
    chomp; 
    my $url = URI->new($_); 
    my $path = $url->path(); 

    for (@insert_words) { 
     # Use package vars to communicate with /(?{})/ blocks. 
     local our $insert_word = $_; 
     local our @paths; 
     $path =~ m{ 
     ^(.*/)([^/]*)((?:/.*)?)\z 
     (?{ 
      push @paths, "$1$insert_word$2$3"; 
      if (length($2)) { 
       push @paths, "$1$insert_word$3"; 
       push @paths, "$1$2$insert_word$3"; 
      } 
     }) 
     (?!) 
     }x; 

     for (@paths) { 
     $url->path($_); 
     print "$url\n"; 
     } 
    } 
} 

__DATA__ 
http://www.stackoverflow.com/dog/cat/rabbit/ 
http://www.superuser.co.uk/dog/cat/rabbit/hamster/ 
http://10.15.16.17/dog/cat/rabbit/ 
+0

優秀的解決方案,謝謝 – 2013-03-05 14:18:42

1

沒有瘋狂的正則表達式:

use strict; 
use warnings; 

use URI qw(); 

my @insert_words = qw(HELLO); 

while (<DATA>) { 
    chomp; 
    my $url = URI->new($_); 
    my $path = $url->path(); 

    for my $insert_word (@insert_words) { 
     my @parts = $path =~ m{/([^/]*)}g; 
     my @paths; 
     for my $part_idx (0..$#parts) { 
     my $orig_part = $parts[$part_idx]; 
     local $parts[$part_idx]; 
     { 
      $parts[$part_idx] = $insert_word . $orig_part; 
      push @paths, join '', map "/$_", @parts; 
     } 
     if (length($orig_part)) { 
      { 
       $parts[$part_idx] = $insert_word; 
       push @paths, join '', map "/$_", @parts; 
      } 
      { 
       $parts[$part_idx] = $orig_part . $insert_word; 
       push @paths, join '', map "/$_", @parts; 
      } 
     } 
     } 

     for (@paths) { 
     $url->path($_); 
     print "$url\n"; 
     } 
    } 
} 

__DATA__ 
http://www.stackoverflow.com/dog/cat/rabbit/ 
http://www.superuser.co.uk/dog/cat/rabbit/hamster/ 
http://10.15.16.17/dog/cat/rabbit/ 
+0

好主意擺脫這個解決方案的正則表達式,謝謝,它會讓我的生活在我的程序的其他部分更加輕鬆。 – 2013-03-02 16:30:20

+0

不知道哪個更快,如果這很關鍵。 – ikegami 2013-03-02 16:33:19

+0

我知道我需要將正則表達式更改爲'my @parts = $ path =〜m {[/ =&]([^/= &]*)}g;'讓它通過我指定的其他字符(/ =& ,而不僅僅是斜槓。但是我不知道接下來要改變什麼,因爲'map'/ $ _「,@parts;'顯然總是用斜槓輸出,即使它是在URL中找到的'='或'& ?非常感謝你的幫助 – 2013-03-03 23:34:46

1

多了一個解決方案:

#!/usr/bin/perl 

use strict; 
use warnings; 

my @insert_words = qw/HELLO GOODBYE/; 

while (<DATA>) { 
    chomp; 
    /(?<![\/])(?:[\/](?![\/])[^\/]*)/p; 
    my $begin_part = ${^PREMATCH}; 
    my $tail = ${^MATCH} . ${^POSTMATCH}; 
    my @tail_chunks = split /\//, $tail; 

    foreach my $word (@insert_words) {      
     for my $index (1..$#tail_chunks) { 
      my @new_tail = @tail_chunks; 

      $new_tail[$index] = $word . $tail_chunks[$index]; 
      my $str = $begin_part . join "/", @new_tail; 
      print $str, "\n"; 

      $new_tail[$index] = $tail_chunks[$index] . $word; 
      $str = $begin_part . join "/", @new_tail; 
      print $str, "\n"; 
     } 

     print "\n"; 
    } 
} 

__DATA__ 
http://www.stackoverflow.com/dog/cat/rabbit/ 
http://www.superuser.co.uk/dog/cat/rabbit/hamster/ 
10.15.16.17/dog/cat/rabbit/