2014-02-14 53 views
0

How to separate tokens in line using Unix?顯示使用sedxargs可以標記文件。在單行標記每行文件中反向換行標記化? - Unix

有沒有辦法做相反的事情?

[在:]

some 
sentences 
are 
like 
this. 

some 
sentences 
foo 
bar 
that 

[OUT]:

some sentences are like this. 
some sentences foo bar that 

每個句子的唯一分隔符是\n\n。我本來可以在python中完成以下操作,但是有沒有unix方法?

def per_section(it): 
    """ Read a file and yield sections using empty line as delimiter """ 
    section = [] 
    for line in it: 
    if line.strip('\n'): 
     section.append(line) 
    else: 
     yield ''.join(section) 
     section = [] 
    # yield any remaining lines as a section too 
    if section: 
    yield ''.join(section) 

print ["".join(i).replace("\n"," ") for i in per_section(codecs.open('outfile.txt','r','utf8'))] 

[出來:]

[u'some sentences are like this. ', u'some sentences foo bar that '] 
+1

它總是5個字嗎?什麼樣的模式來檢查新的行應該改變,點'.'? – fedorqui

+1

不,它不總是5個單詞,5個單詞是巧合。 – alvas

回答

3

被eaiser來處理這樣的任務:

awk -v RS="" '{$1=$1}7' file 
如果你想保持多個空格在每一行

,你可以

awk -v RS="" -F'\n' '{$1=$1}7' file 

你的榜樣:

kent$ cat f 
some 
sentences 
are 
like 
this. 

some 
sentences 
foo 
bar 
that 

kent$ awk -v RS="" '{$1=$1}7' f 
some sentences are like this. 
some sentences foo bar that 
0

你可以用awk命令操作如下:

awk -v RS="\n\n" '{gsub("\n"," ",$0);print $0}' file.txt 

設置記錄分隔符爲\n\n這意味着字符串在標記化由空行分隔的一組行。現在,用空格字符替換所有\n後打印該標記。用awk

0
sed -n --posix 'H;$ {x;s/\n\([^[:cntrl:]]\{1,\}\)/\1 /gp;}' YourFile 

基於blan k線分隔如此,每個字符串的長度也可能不同