2012-04-14 60 views
-1

我很難與以下;多行正則表達式替換如果超過兩個

有一個問題和答案,我需要導入到一個特定格式的Moodle(在線問題網站)的文件。一切都是黑色的接受正確的答案,這些都是綠色的。開始格式如下:

1. Question example 

a. Wrong 

b. Wrong 

C. Wrong 

D. Right 

輸出應該成爲

:Question example 

:Question example 

{ 

~ Wrong 

~ Wrong 

~ Wrong 

= Right 

} 

我在Word中打開文件替換所有紅色的段落標記(我不能做一組替換)與*。之後,我將.docx文件導出爲文本。 在我的linux電腦上打開並在其上拋出下面的正則表達式。

sed -i -e 's/^\r/\n/g' tmp #OS X white line replacement      
sed -i -e 's/\r//g' tmp #remove white lines       
sed -i -e 's:^[a-z]\.:~:' tmp #Replace Leading question letters with tilde                        
sed -i -e 's/\(^[0-9]*\.\ \)\(.*\)/}\n::\2\n::\2\n{/' tmp #regenerate tittle      
sed -i -n '${p;q};N;/\n\*/{s/"\?\n//p;b};P;D' tmp #next line starts with * append to front of current                
sed -i -e 's:^~\(.*\)\(\*.*\)$:=\1:' tmp #move * from back to = to front 
sed -i -e 's:^\*:=:' tmp #replace any remaining * with =   
sed '/^$/d' tmp #delete any remaining white lines 

這不是很好,但效果很好,問題是手工製作的,並且有很多錯誤,所以我仍然需要手工操作。困難的部分是當我有多個正確的答案。輸出應該如下所示;

:Question example 

:Question example 

{ 

~%-100% Wrong 

~%-100% Wrong 

~%50% Right 

~%50% Right 

} 

理想我具有SED或Perl的正則表達式,計數=量之間唱{並用〜%50%替換它們。和所有的〜%〜100%。我可以爲這3個正確答案編寫代碼,其中每個正確的答案變成〜33%。

這是可行的嗎?我有超過1000個問題,它肯定會幫助實現這一點。使用sed多行替換是有點棘手的兩行,所以我想四行或更多行將需要perl?我沒有Perl的經驗。

有人可以幫我解決這個問題嗎?請原諒我英語不好,我是非母語的人。

+0

查找sed保持空間操作;這似乎很棘手,但我想這是可能的。 – 2012-04-14 21:52:50

+0

你在做Windows和Linux之間的不同新行?那麼Word爲你「糾正」的所有字符,比如引號呢? – stark 2012-04-14 22:52:50

+0

如果你展示了一些真實的例子,它會有很大的幫助。從您的示例中很難判斷哪些文本是真實的,哪些是佔位符。源文件中是否出現錯誤和錯誤?如果是這樣,如何判斷哪個答案是對的,哪個是錯的?如果不是,輸出文件中的重點是什麼? – Borodin 2012-04-14 23:05:39

回答

1
my $file = do { local $/; <> }; 
my @questions = split /(?<=.)(?=[0-9]+\.)/s, $file; 
for (@questions) { 
    my @lines = split /^/m; 

    my $title = shift(@lines); 
    $title =~ s/^\S+\s*/:/; 

    my $num_right; 
    my $num_wrong; 
    for (@lines) { 
     if (/Right/) { ++$num_right; } 
     elsif (/Wrong/) { ++$num_wrong; } 
    } 

    my $num_answers = $num_right + $num_wrong; 

    my $right_pct = sprintf('%.0f', $num_right/$num_answers*100); 
    my $right_prefix = $num_right == 1 ? "=" : "~%$right_pct%"; 
    my $wrong_prefix = $num_right == 1 ? "~" : "~%-100%"; 

    for (@lines) { 
     if (/Right/) { s/^\S+/$right_prefix/; } 
     elsif (/Wrong/) { s/^\S+/$wrong_prefix/; } 
    } 

    print(
     $title, 
     "\n", 
     $title, 
     "\n{\n", 
     @lines, 
     "\n}\n", 
    ); 
} 

更換/Right//Wrong/適當的東西。

+0

編輯完成後,你在'print'語句的第一個換行符後面引入了一個缺失',''。在確定前綴時,條件運算符應該使用'$ num_right'而不是'$ num_wrong'。 – Borodin 2012-04-14 23:58:14

+0

固定,謝謝.. – ikegami 2012-04-15 00:44:46

1

下面的程序根據我最好的猜測來工作,你需要什麼。它通過將所有信息讀入數組然後格式化來工作。

現在,數據被合併到源中並從DATA文件句柄中讀取。將循環更改爲while (<>) { ... }將允許您在命令行上指定數據文件。

如果我的猜測錯誤,你必須糾正我。

use strict; 
use warnings; 

my @questions; 

while (<DATA>) { 
    next unless /\S/; 
    s/\s+$//; 
    if (/^\d+\.\s*(.+)/) { 
    push @questions, [$1]; 
    } 
    elsif (/^[A-Za-z]\.\s*(.+)/i) { 
    push @{$questions[-1]}, $1; 
    } 
} 

for my $question (@questions) { 

    my ($text, @answers) = @$question; 

    print "::$text\n" for 1, 2; 

    my $correct = grep /right/i, @answers; 
    my $percent = int(100/$correct); 

    print "{\n"; 

    if ($correct == 1) { 
    printf "%s %s\n", /right/i ? '=' : '~', $_ for @answers; 
    } 
    else { 
    my $percent = int(100/$correct); 
    printf "~%%%d%%~ %s\n", /right/i ? $percent : -100, $_ for @answers; 
    } 

    print "}\n"; 
} 

__DATA__ 
1. Question one 

a. Wrong 

b. Wrong 

c. Right 

d. Wrong 

2. Question two 

a. Right 

b. Wrong 

c. Right 

d. Wrong 

3. Question three 

a. Right 

b. Right 

c. Wrong 

d. Right 

輸出

::Question one 
::Question one 
{ 
~ Wrong 
~ Wrong 
= Right 
~ Wrong 
} 
::Question two 
::Question two 
{ 
~%50%~ Right 
~%-100%~ Wrong 
~%50%~ Right 
~%-100%~ Wrong 
} 
::Question three 
::Question three 
{ 
~%33%~ Right 
~%33%~ Right 
~%-100%~ Wrong 
~%33%~ Right 
} 
1

這可能會爲你工作:

cat <<\! >file.sed 
> # On encountering a digit in the first character position 
> /^[0-9]/{ 
> # Create a label to cater for last line processing 
> :end 
> # Swap to hold space 
> x 
> # Check hold space for contents. 
> # If none delete it and begin a new cycle 
> # This is to cater for the first question line 
> /./!d 
> # Remove any carriage returns 
> s/\r//g 
> # Remove any blank lines 
> s/\n\n*/\n/g 
> # Double the question line, replacing the question number by a ':' 
> # Also append a { followed by a newline 
> s/^[0-9]*\.\([^\n]*\n\)/:\1:\1{\n/ 
> # Coalesce lines beginning with a * and remove optional preceeding " 
> s/"\?\n\*/*/g 
> # Replace the wrong answers a,b,c... with ~%-100% 
> s/\n[a-zA-z]*\. \(Wrong\)/\n~%-100% \1/g 
> # Replace the right answers a,B,c... with ~%100% 
> s/\n[a-zA-Z]*\. \(Right\)/\n~%100% \1/g 
> # Assuming no more than 4 answers: 
> # Replace 4 correct answers prefix with ~%25% 
> s/\(~%100%\)\(.*\)\1\(.*\)\1\(.*\)\1/~%25%\2~%25%\3~%25%\4~%25%/ 
> # Replace 3 correct answers prefix with ~%33% 
> s/\(~%100%\)\(.*\)\1\(.*\)\1/~%33%\2~%33%\3~%33%/ 
> # Replace 2 correct answers prefix with ~%50% 
> s/\(~%100%\)\(.*\)\1/~%50%\2~%50%/ 
> # Append a newline and a } 
> s/$/\n}/ 
> # Break and so print newly formatted string 
> b 
> } 
> # Append pattern space to hold space 
> H 
> # On last line jump to end label 
> $b end 
> # Delete all lines from pattern space 
> d 
> ! 

然後運行:

sed -f file.sed file 
0

您的例子並不這份文件匹配ation:http://docs.moodle.org/22/en/GIFT。問題標題和questiosn由兩個冒號而不是一個冒號分隔:

//Comment line 
::Question title 
:: Question { 
=A correct answer 
~Wrong answer1 
#A response to wrong answer1 
~Wrong answer2 
#A response to wrong answer2 
~Wrong answer3 
#A response to wrong answer3 
~Wrong answer4 
#A response to wrong answer4 
} 

有些人天真地給你根據你的例子,而不是尋找真正的規範,哎呀答案。

你的問題不可能回答,因爲你的格式不能揭示哪一個是正確的答案。這就是說:

1. Question 

a. Is this right? 

b. Or this? 

c. Or this? 

你說這些是原始的Word文檔中,並且你做一些更換,以保持信息用顏色識別;然而,你不會舉這個例子!糟糕...