這裏是我的解決方案。
容易在原則(ŁukaszW.pl給了它),但如果想要照顧特殊情況(ŁukaszW.pl沒有),不容易編碼。
特殊情況是當分隔符ROW_DEL在兩個讀取塊(如I4V指出的)中被拆分時,更加微妙的是如果有兩個連續的ROW_DEL,其中第二個被分成兩個讀取塊。
由於ROW_DEL長於任何可能的換行符('\r'
,'\n'
,'\r\n'
),因此它可以在文件中由OS使用的換行符替換。這就是我選擇自己重寫文件的原因。
爲此我使用模式'r+'
,它不創建新文件。
使用二進制模式'b'
也是絕對必需的。
的原理是讀一個塊(在現實生活中其大小將是262144例如)和X附加字符,wher X是分離器的長度-1。
然後檢查分隔符是否出現在塊+ x字符的末尾。
如果存在或不存在,在執行ROW_DEL轉換之前該塊被縮短或者被縮短,並被重寫。
裸體代碼:
text = ('The hospital roommate of a man infected ROW_DEL'
'with novel coronavirus (NCoV)ROW_DEL'
'—a SARS-related virus first identified ROW_DELROW_DEL'
'last year and already linked to 18 deaths—ROW_DEL'
'has contracted the illness himself, ROW_DEL'
'intensifying concerns about the ROW_DEL'
"virus's ability to spread ROW_DEL"
'from person to person.')
with open('eessaa.txt','w') as f:
f.write(text)
with open('eessaa.txt','rb') as f:
ch = f.read()
print ch.replace('ROW_DEL','ROW_DEL\n')
print '\nlength of the text : %d chars\n' % len(text)
#==========================================
from os.path import getsize
from os import fsync,linesep
def rewrite(whichfile,sep,chunk_length,OSeol=linesep):
if chunk_length<len(sep):
print 'Length of second argument, %d , is '\
'the minimum value for the third argument'\
% len(sep)
return
x = len(sep)-1
x2 = 2*x
file_length = getsize(whichfile)
with open(whichfile,'rb+') as fR,\
open(whichfile,'rb+') as fW:
while True:
chunk = fR.read(chunk_length)
pch = fR.tell()
twelve = chunk[-x:] + fR.read(x)
ptw = fR.tell()
if sep in twelve:
pt = twelve.find(sep)
m = ("\n !! %r is "
"at position %d in twelve !!" % (sep,pt))
y = chunk[0:-x+pt].replace(sep,OSeol)
else:
pt = x
m = ''
y = chunk.replace(sep,OSeol)
pos = fW.tell()
fW.write(y)
fW.flush()
fsync(fW.fileno())
if fR.tell()<file_length:
fR.seek(-x2+pt,1)
else:
fW.truncate()
break
rewrite('eessaa.txt','ROW_DEL',14)
with open('eessaa.txt','rb') as f:
ch = f.read()
print '\n'.join(repr(line)[1:-1] for line in ch.splitlines(1))
print '\nlength of the text : %d chars\n' % len(ch)
遵循執行,這裏的,一直以來將消息輸出另一個代碼:
text = ('The hospital roommate of a man infected ROW_DEL'
'with novel coronavirus (NCoV)ROW_DEL'
'—a SARS-related virus first identified ROW_DELROW_DEL'
'last year and already linked to 18 deaths—ROW_DEL'
'has contracted the illness himself, ROW_DEL'
'intensifying concerns about the ROW_DEL'
"virus's ability to spread ROW_DEL"
'from person to person.')
with open('eessaa.txt','w') as f:
f.write(text)
with open('eessaa.txt','rb') as f:
ch = f.read()
print ch.replace('ROW_DEL','ROW_DEL\n')
print '\nlength of the text : %d chars\n' % len(text)
#==========================================
from os.path import getsize
from os import fsync,linesep
def rewrite(whichfile,sep,chunk_length,OSeol=linesep):
if chunk_length<len(sep):
print 'Length of second argument, %d , is '\
'the minimum value for the third argument'\
% len(sep)
return
x = len(sep)-1
x2 = 2*x
file_length = getsize(whichfile)
with open(whichfile,'rb+') as fR,\
open(whichfile,'rb+') as fW:
while True:
chunk = fR.read(chunk_length)
pch = fR.tell()
twelve = chunk[-x:] + fR.read(x)
ptw = fR.tell()
if sep in twelve:
pt = twelve.find(sep)
m = ("\n !! %r is "
"at position %d in twelve !!" % (sep,pt))
y = chunk[0:-x+pt].replace(sep,OSeol)
else:
pt = x
m = ''
y = chunk.replace(sep,OSeol)
print ('chunk == %r %d chars\n'
' -> fR now at position %d\n'
'twelve == %r %d chars %s\n'
' -> fR now at position %d'
% (chunk ,len(chunk), pch,
twelve,len(twelve),m, ptw))
pos = fW.tell()
fW.write(y)
fW.flush()
fsync(fW.fileno())
print (' %r %d long\n'
' has been written from position %d\n'
' => fW now at position %d'
% (y,len(y),pos,fW.tell()))
if fR.tell()<file_length:
fR.seek(-x2+pt,1)
print ' -> fR moved %d characters back to position %d'\
% (x2-pt,fR.tell())
else:
print (" => fR is at position %d == file's size\n"
' File has thoroughly been read'
% fR.tell())
fW.truncate()
break
raw_input('\npress any key to continue')
rewrite('eessaa.txt','ROW_DEL',14)
with open('eessaa.txt','rb') as f:
ch = f.read()
print '\n'.join(repr(line)[1:-1] for line in ch.splitlines(1))
print '\nlength of the text : %d chars\n' % len(ch)
有一個在塊的兩端在治療上有一些細微之處爲了檢測ROW_DEL是否跨越兩個塊並且是否有兩個ROW_DEL連續。這就是爲什麼我花了很長時間來發布我的解決方案:我終於被迫寫fR.seek(-x2+pt,1)
,不僅fR.seek(-2*x,1)
或fR.seek(-x,1)
根據sep跨越或不(2 * x代碼是x2,其中ROW_DEL x和x2是6和12)。任何對此感興趣的人都可以通過更改if 'ROW_DEL' is in twelve
中的代碼來檢查它。
你不能使用'sed'或任何腳本工具嗎? – harsh
你爲什麼稱ROW_DEL爲虛擬結局? ROW_DEL是文件中是否有連續的字符?我想,你的問題很容易解決,但是這一點困擾了我。 – eyquem
您可以嘗試讀取固定大小的塊中的文件,查看StreamReader文檔中的「read」文檔(http://docs.python.org/release/2.4/lib/stream-reader-objects.html) –