2015-04-02 63 views
0

輸出:perl的 - 如何捕捉空單元格與正則表達式

id | status | name    | cluster | ip   | mac    | roles | pending_roles | online 
---|----------|------------------|---------|-------------|-------------------|-------|-----------------|------- 
11 | discover | Untitled (9a:3a) | 12  | 10.20.0.144 | c8:1f:66:ce:9a:3a |  | cinder   | True 
12 | discover | Untitled (9f:8d) | 12  | 10.20.0.186 | c8:1f:66:ce:9f:8d |  | cinder, compute | True 
10 | discover | Untitled (c7:f3) | None | 10.20.0.214 | c8:1f:66:ce:c7:f3 |  |     | True 
13 | discover | Untitled (9f:3d) | None | 10.20.0.233 | c8:1f:66:ce:9f:3d |  |     | True 
8 | discover | Untitled (74:8e) | 12  | 10.20.0.184 | c8:1f:66:ce:74:8e |  | controller  | True 
14 | discover | Untitled (75:4b) | None | 10.20.0.185 | c8:1f:66:ce:75:4b |  |     | True 
9 | discover | Untitled (76:23) | None | 10.20.0.213 | c8:1f:66:ce:76:23 |  |     | True 

我正則表達式:

\d+)\s+\|\s+(\w+)\s+\|\s+\w+\s+\((\S+)\)\s+\|\s+(\d+)\s+\|\s+(\S+)\s+\|\s+(\S+)\s+\|(.*?)\|(.*?)\|\s+(\w+) 

,但不能趕上空單元格!我嘗試了很多方法。

行示例:

13 | discover | Untitled (9f:3d) | None | 10.20.0.233 | c8:1f:66:ce:9f:3d |  |     | True 
+1

當你有分隔數據,它更容易使用['split'](http://perldoc.perl.org/functions/ split.html)(或['Text :: CSV'](https://metacpan.org/pod/Text::CSV)如果字段可以包含分隔符)。 – ThisSuitIsBlackNot 2015-04-02 17:21:07

+0

鏈接到文本文件http://textuploader.com/xcda – 2015-04-02 17:22:20

+1

請不要將輸入數據放在場外。如果鏈接中斷,未來訪問此頁面的用戶將無法看到數據,問題將不再有意義。 – ThisSuitIsBlackNot 2015-04-02 17:26:35

回答

1

不要試圖處理結構化數據非結構化線。您有管道分隔的數據,因此將其解析爲管道分隔的數據,然後檢查解析的內容。

請注意,我在單個單元格上使用正則表達式(/^\s*$/以查看它是否全部是空格),但不是在每一行上。

下面是一個例子:

#!/usr/bin/perl 

use strict; 
use warnings; 

while (my $line = <DATA>) { 
    chomp $line; 
    my @cells = split /\|/, $line, -1; 
    my $ncells = scalar @cells; 
    die "There should be 9 fields, but line $. has $ncells" unless $ncells == 9; 
    for my $i (1 .. $ncells) { 
     if ($cells[$i-1] =~ /^\s*$/) { 
      print "Cell #$i on line $. is empty\n"; 
     } 
    } 
} 

__DATA__ 
id | status | name    | cluster | ip   | mac    | roles | pending_roles | online 
---|----------|------------------|---------|-------------|-------------------|-------|-----------------|------- 
11 | discover | Untitled (9a:3a) | 12  | 10.20.0.144 | c8:1f:66:ce:9a:3a |  | cinder   | True 
12 | discover | Untitled (9f:8d) | 12  | 10.20.0.186 | c8:1f:66:ce:9f:8d |  | cinder, compute | True 
10 | discover | Untitled (c7:f3) | None | 10.20.0.214 | c8:1f:66:ce:c7:f3 |  |     | True 
13 | discover | Untitled (9f:3d) | None | 10.20.0.233 | c8:1f:66:ce:9f:3d |  |     | True 
8 | discover | Untitled (74:8e) | 12  | 10.20.0.184 | c8:1f:66:ce:74:8e |  | controller  | True 
14 | discover | Untitled (75:4b) | None | 10.20.0.185 | c8:1f:66:ce:75:4b |  |     | True 
9 | discover | Untitled (76:23) | None | 10.20.0.213 | c8:1f:66:ce:76:23 |  |     | True 
+0

Thx alot @Andy Lester,您的解決方案迄今爲止最簡單和正確的獲取這些細胞。 – 2015-04-05 05:17:39

+0

'split(/ \ | /,$ line,-1)'並不比'unpack('A2 x3 A8 x3 A16 ...',$ line)'簡單得多,如果數據包含'| '。 – ikegami 2015-04-05 06:11:37

4
chomp(my $header = <>); 
chomp(my $sep = <>); 

my $pat = 
    join ' x3 ', 
     map "A".(length($_)-2), 
     "-$sep-" =~ /(-+)/g; 

my @headers = unpack($pat, $header); 
while (my $line = <>) { 
    my %row; @row{@headers} = unpack($pat, $line); 

    # Do whatever here. 
    print("Row id=$row{id} has no pending roles\n") 
     if !length($row{pending_roles}); 
} 

輸出:

Row id=10 has no pending roles 
Row id=13 has no pending roles 
Row id=14 has no pending roles 
Row id=9 has no pending roles 
+1

與@Andy Lester的解決方案不同,如果數據包含'|',則這個解決方案不會失敗。 – ikegami 2015-04-02 17:36:12

+1

極好的選擇:使用頭來確定字段寬度,以便對字段寬度不同於一次調用的可能性不存在差異(數據的情況可能是DB查詢的轉儲)。 – DavidO 2015-04-02 17:45:06

+1

我喜歡它。此解決方案還說明了我喜歡的報價。 http://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/ – ryansstack 2015-04-02 17:47:52

0

如果必須使用正則表達式,只是儘量做到儘可能小。同樣假設你沒有|你的數據或任何東西..

my $r = 0; 
foreach my $row (@rows) { 
    my $c = 0; 
    print "Row $r\n"; 
    while($row =~ /([^|])*(\||$)/g) { 
     my $col = $1; 
     print " $c: $col\t"; 
     if ($col =~ /^\s+$/) { print "whitespace only!" } 
     print "\n"; 
     $c++; 
    } 
    $r++; 
}