2012-03-23 66 views
1

第一天處理Perl和已經封鎖:)如何只有新的和/或更新的行插入到另一個文件

這裏的情況:一個文件在文件夾中的更新,但也存在於文件夾B,C & D,爲了使它更容易,它們在所有這些中可以不同,所以我不能只是做一個差異。 意圖複製到其他文件的新行由標誌標識,例如#I,在行尾。

更新之前

文件看起來是這樣的:

First line 
    Second line 
    Fifth line 

更新它看起來像在此之後:對

First line 
    Second line 
    Third line #I 
    Fourth line #I 
    Fifth line 
    Sixth line #I 

我需要做的是搜索「二線」其他文件,插入標有#I的行 - 按照它們插入的順序 - 然後搜索「第五行」並插入「第六行#I」。

在這個例子中,它們都是連續的,但在需要更新的文件中,第一個更新塊和第二個(以及第三個等等)之間可能有幾行。

要更新的文件可以是sh腳本,awk腳本,純文本文件等,腳本應該是通用的。該腳本將有兩個條目參數,更新的文件和要更新的文件。

任何提示如何做到這一點都是受歡迎的。如果需要,我可以提供迄今爲止的代碼 - 關閉但尚未運行。

感謝,

若昂

PS:這裏是我到目前爲止

# Pass the content of the file $FileUpdate to the updateFile array 
@updateFile = <UPD>; 

# Pass the content of the file $FileOriginal to the originalFile array 
@originalFile = <ORG>; 

# Remove empty lines from the array contained on the updated file 
@updateFile = grep(/\S/, @updateFile); 

# Create an array that will contain the modifications and the line 
# prior to the first modification. 
@modifications =(); 

# Counter initialization 
$i = 0; 


# Loop the array to find out which lines are flagged as new and 
# which lines immediately precede those 
foreach $linha (@updateFile) { 

# Remove \n characters 
chomp($linha); 

# Find the new lines flagged with #I 
if ($linha =~ m/#I$/) { 

    # Verify that the previous line is not flagged as updated. 
    # If it is not, it means that the update starts here. 
    unless ($updateFile[$i-1] =~ m/#I$/) { 
     print "Line where the update starts $updateFile[$i-1]\n"; 

     # Add that line to the array modifications 
     push(@modifications, $updateFile[$i-1]); 

    } # END OF unless 

print "$updateFile[$i]\n"; 

# Add the lines tagged for insertion into the array 
push(@modifications, $updateFile[$i]); 

} # END OF if ($linha =~ m/#I$/) 

# Increment the counter 
$i = $i + 1; 

} # END OF foreach $linha (@updateFile) 


foreach $modif (@modifications) { 
    unless ($modif =~ m/#I$/) { 
     foreach $original (@originalFile) { 
      chomp($original); 
      if ($original ne $modif) { 
       push (@newOriginal, $originalFile[$n]); 
      } 
      elsif ($original eq $modif) { #&& $modif[$n+1] =~ m/#I$/) { 
       push (@newOriginal, $originalFile[$n]); 
       last; 
      } 
      $n = $n + 1; 
     } 
    } 
    if ($modif =~ m/#I$/) { 
     push (@newOriginal, $modifications[$m]); 
    } 
    $m = $m + 1; 
} 

得到的結果幾乎是一個我想,但現在還沒有。

+0

因此,您正在更新源'B/file','C/file'和'D/file'的目標'A/file'。源代碼中的新行被標記,並且必須將它們插入到目標中,該行與標記的新行之前的行中的行相同。是對的嗎?這是否可以滿足要刪除的行?如果源中存在多個相同的行,那麼會發生什麼情況,以至於無法確定插入新記錄的位置? – Borodin 2012-03-23 14:27:58

+0

嗨TLP,我已經添加了我到目前爲止。 – 2012-03-23 14:31:31

+0

嗨鮑羅丁,更新流程是相反的。 A /文件將更新B /文件,C /文件和D /文件。原則上不會有多條相同的線條,但我沒有真正想過。也許插入第一個。 – 2012-03-23 14:33:08

回答

1

我終於能夠回到這個問題,似乎我已經能夠解決這個問題。可能不是最好的解決方案或「最漂亮」,但正在做我所需要的:)。

# Open the file 

# First parameter is the file containing the update 
my ($FileUpdate) = $ARGV[0]; 

# Second parameter is the file to be updated 
my ($FileOriginal) = $ARGV[1]; 


# \s whitespace characters 

# Open both files and give them handles to be referred to further ahead 
open(UPD, $FileUpdate) || die("Could not open file $FileUpdate!"); 
open(ORG, $FileOriginal) || die("Could not open file $FileOriginal!"); 

# ------------------------------------------------ # 
# ---------------- ARRAY CREATION ---------------- # 
# ------------------------------------------------ # 

# Pass the content of the file $FileUpdate to the updateFile array 
@updateFile = <UPD>; 

# Pass the content of the file $FileOriginal to the originalFile array 
@originalFile = <ORG>; 

# Remove empty lines from the array contained on the updated file 
@updateFile = grep(/\S/, @updateFile); 

# Create an array that will contain the modifications and the line 
# prior to the first modification. 
@modifications =(); 

# Counter initialization 
$i = 0; 


# ------------------------------------------------ # 
# ----- LOOP TO IDENTIFY LINES FOR INSERTION ----- # 
# ------------------------------------------------ # 

# Loop the array to find out which lines are flagged as new and 
# which lines immediately precede those 
foreach $linha (@updateFile) { 

# Remove \n characters 
chomp($linha); 

# Find the new lines flagged with #I 
if ($linha =~ m/#I$/) { 

    # Verify that the previous line is not flagged as updated. 
    # If it is not, it means that the update starts here. 
    unless ($updateFile[$i-1] =~ m/#I$/) { 

     # Add that line to the array modifications 
     push(@modifications, $updateFile[$i-1]); 

    } # END OF unless 

# Add the lines tagged for insertion into the array 
push(@modifications, $updateFile[$i]); 

} # END OF if ($linha =~ m/#I$/) 

# Increment the counter 
$i = $i + 1; 

} # END OF foreach $linha (@updateFile) 


# ------------------------------------------------ # 
# --------- ADD VALUES TO MODIFICATIONS --------- # 
# ------------------------------------------------ # 
foreach $valor (@modifications) { 
print "$valor\n"; 
} 

# ------------------------------------------------ # 
# -------------------- BACKUP -------------------- # 
# ------------------------------------------------ # 

# Make a backup copy from the original file 
# in case something goes wrong when updating it 

# Obtain the current time 
$tt=localtime(); 
use POSIX qw(strftime); 
$tt = strftime "%Y%m%d-%H%M\n", localtime; 

system("cp $FileOriginal $FileOriginal.$tt"); 

# ------------------------------------------------ # 
# ------------- INSERT THE NEW LINES ------------- # 
# ------------------------------------------------ # 

# Counter initialization 
$m = 0; 

# New file array 
@newOriginal =(); 

# Goes through the original file and for each line not present in modifs, writes it . 

foreach $original (@originalFile) { 
# Initialize counter 
$n = 0; 

# Remove spaces 
chomp ($original); 

# Check if the value already exists on the array 
# If it doesnt, adds it 
if (grep {$_ eq $original} @newOriginal) { 
} 
else { 
    push (@newOriginal, $originalFile[$m]); 
} 

# Iterate over the array containing the modifications 
# These new lines shall be added to the final file. 
foreach $modif (@modifications) { 
    # Remove spaces 
    chomp ($modif); 

    #print "Original: $original, Modif: $modif\n"; 

    # Initialize counter 
    $k = 0; 

    # Compare the current value from the original file with 
    # the elements that exist on the modifications array. 
    # If they are equal push that line in order to be added 
    # to the results file. 
    if ($original eq $modif) { 

     # Increment the counter 
     $k = $n+1; 

     # Iterate the array with the modifications 
     # in order to insert all lines that end with #I 
     # immediately after the common line between files. 
     foreach my $igual ($k..$#modifications) { 

      # Remove spaces 
      chomp($igual); 

      # If the line ends with #I add it to the final file. 
      if ($modifications[$igual] =~ m/#I$/) { 

       foreach $newO (@newOriginal) { 
        # Remove spaces 
        chomp($newO); 
        if ($newO ne $modifications[$igual]) { 
         push (@newOriginal, $modifications[$igual]); 
         last; 
        } 
       } 
      } 
      else { 
       last; 
      } 
     } 
    } 

    # Increment the counter 
    $n = $n + 1; 
} 
# Increment the counter 
$m = $m + 1; 
} 

# ------------------------------------------------ # 
# ------------- RESULTS PRESENTATION ------------- # 
# ------------------------------------------------ # 
$v = 0; 
print "--------------------\n"; 
foreach $vl (@newOriginal) { 
print "newOriginal: $newOriginal[$v]\n"; 
$v = $v + 1; 
} 
print "--------------------\n"; 

# ------------------------------------------------ # 
# ------------- CREATE UPDATED FILE -------------- # 
# ------------------------------------------------ # 
$v = 0; 

# Create the new name for the file - only for testing purposes now, it will be the original name afterwards 
$NewFileToWriteTo = $FileOriginal; 
# Retrieve the extension of the file to be updated 
my ($ext) = $FileOriginal =~ /(\.[^.]+)$/; 
# Remove the extension - just for testing purposes because I want to change the file name now 
$NewFileToWriteTo =~ s/$ext//; 
# Create the new file name by adding the suffix _tst and the correct extension to it. 
$NewFileToWriteTo = $NewFileToWriteTo . '_tst' . ${ext}; 


# Create the new file or die in case it is not possible to open it 
open DAT, ">$NewFileToWriteTo" or die("Could not open file!"); 


# Write to the new file. This will be the UPDATED version of the ORIGINAL file. 
foreach $vl (@newOriginal) { 
print DAT "$newOriginal[$v]\n"; 
$v = $v + 1; 
} 

# Close all files 
close(DAT); 
close(UPD); 
close(ORG); 
0

好吧我想我明白你需要什麼,下面的程序實現了一個解決方案。

我並不完全清楚源文件(B,C,D)是什麼樣子,但我認爲它們與中的目標(A)文件在您的問題中更新狀態後是相同的。

我碰到另一個邊緣的情況下:如果所述源的第一行(B,C,d)文件被標記有一個#I?我假定它應該在輸出的開頭插入。

我還選擇了die如果源文件中的前一行沒有在目標找到。

讓我們知道這是否正確。

use strict; 
use warnings; 

open my $fa, '<', 'A.txt' or die $!; 

open my $fb, '<', 'B.txt' or die $!; 

my $keyline; 
my $inserting; 

while (<$fb>) { 

    if (/#I$/) { 

    if ($keyline) {    # We have to search for a match 

     while() { 

     my $source = <$fa>;  # read from the target 

     if (defined $source) { # copy to output. stop reading if key is found 
      print $source; 
      last if $source eq $keyline; 
     } 
     else {     # die if key nowhere in target 
      chomp $keyline; 
      die qq(Key Line "$keyline" not found); 
     } 
     } 

     undef $keyline;   # don't have to search next time 
    } 

    print;      # insert the new line 
    } 
    else { 
    $keyline = $_;    # remember the line to search for 
    } 
} 
+0

嗨鮑羅丁。感謝您的回覆。我已經試過這個,用OriginalFile.txt和B.txt替換了A.txt和UpdatedFile.txt。當我運行它時,它會打印出原始文件的內容,而不會將添加到UpdatedFile.txt中的新行添加到輸出中。 UpdatedFile.txt將成爲所有其他文件的來源。關於第一行問題,從我看到的第一行不會被改變,因爲所有文件似乎都有一個以#--------#開頭的標題。可能發生,但到目前爲止,我還沒有看到任何可能發生的情況。 – 2012-03-26 09:26:58

+0

@ JoaoVilla-Lobos:請說明哪個文件是哪個。您的原始文件夾A,B,C和D中包含標有'#I'行的文件,以及'OriginalFile.txt'和'UpdatedFile.txt'的含義? (我的代碼期望使用來自'B.txt'的插入來更新'A.txt'。) – Borodin 2012-03-26 14:33:13

+0

抱歉,不清楚。雖然它們中的任何一個都可以包含 - 在某個特定時間 - 將用作源的文件,其他文件需要更新的文件可以說包含以#I結尾的行的文件位於文件夾A這個文件是我命名爲UpdatedFile.txt的文件。要更新的文件是 - 名稱很差的OriginalFile.txt。 – 2012-03-27 09:23:48

相關問題