使用perl腳本分割xml文件

嗨我正在使用Perl腳本將Big xml分割爲小塊。我已經審閱此鏈接 Split file by XML tag 使用perl腳本分割xml文件

，我的代碼是這樣的

if($line =~ /^</row>/) 
{ 
$count++; 
}

但即時得到這個錯誤

works\filesplit.pl line 20. 
Bareword found where operator expected at E:\Work\perl works\filesplit.pl line 2 
0, near "/^</row" 
     (Missing operator before row?) 
syntax error at E:\Work\perl works\filesplit.pl line 20, near "/^</row" 
Search pattern not terminated at E:\Work\perl works\filesplit.pl line 20.

誰能幫我

更新

<row> 
    <date></date> 
    <ForeignpostingId /> 
    <country>11</country> 
    <domain>http://www.xxxx.com</domain> 
    <domainid>20813</domainid> 
</row> 
<row> 
    <date></date> 
    <ForeignpostingId /> 
    <country>11</country> 
    <domain>http://www.xxxx.com</domain> 
    <domainid>20813</domainid> 
</row> 
<row> 
    <date></date> 
    <ForeignpostingId /> 
    <country>11</country> 
    <domain>http://www.xxxx.com</domain> 
    <domainid>20813</domainid> 
</row>

來源

2013-11-28 Backtrack

你想如何分割這個文件，你想用這些塊做什麼？ – Kenosis

@Kenosis ...「Five」 ........將在單個文件中被分塊 – Backtrack

@Kenosis ..其實我的文件太大了所以我希望它被分塊5 .. in單個文件... ... ....這樣的 – Backtrack

您需要^<\/row>，前提是您試圖在行的開頭匹配</row>。這是我的測試代碼。

#!/usr/bin/perl 
use strict; 
use warnings; 

my $line = "</row> something"; 
if ($line =~ /^<\/row>/) 
{ 
    print "found a match \n"; 
}

OUTPUT：

# perl test.pl 
found a match

更新

發佈此更新OP提供的樣本數據之後。

你需要在你的正則表達式中使用^\s+<\/row>，因爲它們並不都是從行首開始的。其中一些人在他們之前有one space。因此在進行實際匹配之前，我們需要在行的開頭匹配零個或多個空格。

代碼：

#!/usr/bin/perl -w 
use strict; 
use warnings; 

while (my $line = <DATA>) 
{ 
    if ($line =~ /^\s+<\/row>/) 
    { 
     print "found a match \n"; 
    } 
} 

__DATA__ 
<row> 
    <date></date> 
    <ForeignpostingId /> 
    <country>11</country> 
    <domain>http://www.xxxx.com</domain> 
    <domainid>20813</domainid> 
</row> 
<row> 
    <date></date> 
    <ForeignpostingId /> 
    <country>11</country> 
    <domain>http://www.xxxx.com</domain> 
    <domainid>20813</domainid> 
</row> 
<row> 
    <date></date> 
    <ForeignpostingId /> 
    <country>11</country> 
    <domain>http://www.xxxx.com</domain> 
    <domainid>20813</domainid> 
</row>

輸出：

# perl test.pl 
found a match 
found a match 
found a match

來源

2013-11-28 05:19:44 slayedbylucifer

但是，如果我們有多個它不起作用 – Backtrack

你能否提供樣品數據來處理？ – slayedbylucifer

+1。是的，我添加了 – Backtrack

你試過xml_split？這是一個與XML::Twig一起使用的工具，它基於各種條件（標籤名稱，級別，大小）專門設計用於拆分大型XML文件。

來源

2013-11-28 06:07:24 mirod

或許下面會有所幫助：

use strict; 
use warnings; 

my $i = 1; 
local $/ = '<row>'; 

while (<>) { 
    chomp; 
    s!</row>!! or next; 

    open my $fh, '>', 'File_' . (sprintf '%05d', $i++) . '.xml' or die $!; 
    print $fh $_; 
}

用法：perl script.pl inFile.xml

這臺Perl的記錄分隔$/到<row>讀取XML文件中的<row>分隔那些 '塊'。它從塊中刪除</row>，然後將該塊寫入具有「File_nnnnn.xml」命名方案的文件。

來源

2013-11-28 07:03:42 Kenosis

進入黑屏。什麼都沒有發生 – Backtrack

檢查生成文件的目錄。 – Kenosis

#!/bin/perl -w 

## splitting xml files using perl script 

print "Input File ? "; 
chomp($XmlFile = <STDIN>); 

open $XmlFileHandle,'<',$XmlFile; 

print "\nSplit By which Tag ? "; 
chomp($splitby = <STDIN>); 

open $OutputHandle, '>','OutputFile_'.$splitby; 

## to split by <user>...</user> 
while(<$XmlFileHandle>){ 
    if(/<$splitby>/){ 
     print $OutputHandle "<$splitby>\n"; 
     last; 
    } 
} 

while(<$XmlFileHandle>){ 
    $line = $_; 
    if($line =~ m/<\/$splitby>/){ 
     print $OutputHandle "</$splitby>"; 
     last; 
    } 
    print $OutputHandle $line; 
} 

print "\nOutput File is : OutputFile_$splitby\n";

來源

2013-11-28 07:09:53 prashant

使用perl腳本分割xml文件

回答

相關問題