How to separate tokens in line using Unix?顯示使用sed
或xargs
可以標記文件。在單行標記每行文件中反向換行標記化? - Unix
有沒有辦法做相反的事情?
[在:]
some
sentences
are
like
this.
some
sentences
foo
bar
that
[OUT]:
some sentences are like this.
some sentences foo bar that
每個句子的唯一分隔符是\n\n
。我本來可以在python中完成以下操作,但是有沒有unix方法?
def per_section(it):
""" Read a file and yield sections using empty line as delimiter """
section = []
for line in it:
if line.strip('\n'):
section.append(line)
else:
yield ''.join(section)
section = []
# yield any remaining lines as a section too
if section:
yield ''.join(section)
print ["".join(i).replace("\n"," ") for i in per_section(codecs.open('outfile.txt','r','utf8'))]
[出來:]
[u'some sentences are like this. ', u'some sentences foo bar that ']
它總是5個字嗎?什麼樣的模式來檢查新的行應該改變,點'.'? – fedorqui
不,它不總是5個單詞,5個單詞是巧合。 – alvas