2013-08-07 103 views
0

我在尋找awk的代碼來連接從PDF粘貼的線。加入過程應按照以下規則進行:如果一行中的最後一個字符不是句點.,則應在該行中添加空格字符,並且應將下一行加入到該行中。有條件地在awk中連接線

採樣輸入文本(在文件中):

In a perfect school, students would treat each other with affection and 
respect. Differences would be tolerated, and even welcomed. Kids would 
become more popular by being kind and supportive. Students would go out 
of their way to make sure one another felt happy and comfortable. But most 
schools are not perfect. Instead of being places of respect and tolerance, 
they are places where the hateful act of bullying is widespread. 

Students have to deal with all kinds of problems in schools. There are 
the problems created by difficult classes, by too much homework, or by 
personality conflicts with teachers. There are problems with scheduling 
the classes you need and still getting some of the ones you want. There 
are problems with bad cafeteria food, grouchy principals, or overcrowded 
classrooms. But one of the most difficult problems of all has to do with a 
terrible situation that exists in most schools: bullying. 

預期輸出:

在一個完美的學校,學生們會像對待彼此的感情 和尊重。差異是可以容忍的,甚至是受歡迎的。善良和支持,孩子們會變得更受歡迎。學生們會盡力確保彼此感到高興,並且 舒服。但大多數學校並不完美。他們不是被尊重和寬容的地方,而是欺凌行爲可惡的行爲廣泛存在的地方。

學生不得不面對各種學校的問題。還有 由困難的班級,太多的功課,或與教師的個性衝突造成的問題。 在安排您需要的課程方面存在問題,仍然會獲得一些您想要的課程。自助餐廳的食物不好,脾氣暴躁的校長, 或過度擁擠的教室都有問題。但是, 最困難的問題之一都與大多數學校存在的可怕情況有關: 欺凌。

(預期輸出具有在一行上的每個段落推測:。段彼此由空行分隔)

+2

這是Markdown格式化的一個不幸的副作用,即輸入和輸出之間幾乎沒有區別。據推測,產出應該有'粗笨的校長'而不是'粗魯的校長'。 –

回答

0

這可能是足夠的:

awk -v ORS= '!NF{$NF="\n"} NF{ $NF = $NF ($NF~/\.$/?"\n":" ")} 1' input 
+0

我把下面的代碼放在一個文件test.awk中:ORS ='$ NF〜/\.$/{$NF=$NF"\n「} 1'。要修改的文本位於文件「pdfpaste.txt」中。然後我打電話給:gawk -f test1.awk pdfpaste.txt> pdfpaste2.txt。但是pdfpaste.txt文件中沒有輸出。難道我做錯了什麼? – user1955215

+0

用以下代碼替換test.awk:ORS ='!NF {$ NF =「\ n」} NF {$ NF〜/\.$/? $ NF = $ NF「\ n」:$ NF = $ NF「」} 1'(仍然不工作) – user1955215

+0

像這樣運行:'awk -v ORS = -f test.awk input' – perreal

0

如果你的輸入文件段落真的被空行分開,那麼你所需要的只是:

awk -v RS= -v ORS='\n\n' '{$1=$1}1' file