下面的代碼創建了一個功能Which_Line_for_Position(POS)號位置POS線的,即所述號線的在位於位於位置POS 字符在文件中。
該函數可以與任何位置一起作爲參數使用,與函數調用之前文件指針的當前位置的值和該指針運動的歷史無關。
因此,使用此函數,不僅限於在線路上不間斷迭代期間確定當前線路的數量,這與Greg Hewgill的解決方案一樣。
with open(filepath,'rb') as f:
GIVE_NO_FOR_END = {}
end = 0
for i,line in enumerate(f):
end += len(line)
GIVE_NO_FOR_END[end] = i
if line[-1]=='\n':
GIVE_NO_FOR_END[end+1] = i+1
end_positions = GIVE_NO_FOR_END.keys()
end_positions.sort()
def Which_Line_for_Position(pos,
dic = GIVE_NO_FOR_END,
keys = end_positions,
kmax = end_positions[-1]):
return dic[(k for k in keys if pos < k).next()] if pos<kmax else None
。
同樣的解決方案可以與模塊的幫助下寫成的FileInput:
import fileinput
GIVE_NO_FOR_END = {}
end = 0
for line in fileinput.input(filepath,'rb'):
end += len(line)
GIVE_NO_FOR_END[end] = fileinput.filelineno()
if line[-1]=='\n':
GIVE_NO_FOR_END[end+1] = fileinput.filelineno()+1
fileinput.close()
end_positions = GIVE_NO_FOR_END.keys()
end_positions.sort()
def Which_Line_for_Position(pos,
dic = GIVE_NO_FOR_END,
keys = end_positions,
kmax = end_positions[-1]):
return dic[(k for k in keys if pos < k).next()] if pos<kmax else None
但這種解決方案有一些不便之處:
- 它需要導入模塊的FileInput
- 它刪除文件的所有內容!我的代碼中一定有錯,但我不知道文件輸入足以找到它。或者它是fileinput.input()函數的正常行爲?
- 似乎該文件是在任何迭代啓動之前首先完全讀取的。如果是這樣,對於非常大的文件,文件的大小可能會超過RAM的容量。我不確定這一點:我試圖用1.5 GB的文件進行測試,但這個時間很長,我暫時放棄了這一點。如果這一點是正確的,則構成使用另一種解決方案的論據枚舉()
。
爲例:
text = '''Harold Acton (1904–1994)
Gilbert Adair (born 1944)
Helen Adam (1909–1993)
Arthur Henry Adams (1872–1936)
Robert Adamson (1852–1902)
Fleur Adcock (born 1934)
Joseph Addison (1672–1719)
Mark Akenside (1721–1770)
James Alexander Allan (1889–1956)
Leslie Holdsworthy Allen (1879–1964)
William Allingham (1824/28-1889)
Kingsley Amis (1922–1995)
Ethel Anderson (1883–1958)
Bruce Andrews (born 1948)
Maya Angelou (born 1928)
Rae Armantrout (born 1947)
Simon Armitage (born 1963)
Matthew Arnold (1822–1888)
John Ashbery (born 1927)
Thomas Ashe (1836–1889)
Thea Astley (1925–2004)
Edwin Atherstone (1788–1872)'''
#with open('alao.txt','rb') as f:
f = text.splitlines(True)
# argument True in splitlines() makes the newlines kept
GIVE_NO_FOR_END = {}
end = 0
for i,line in enumerate(f):
end += len(line)
GIVE_NO_FOR_END[end] = i
if line[-1]=='\n':
GIVE_NO_FOR_END[end+1] = i+1
end_positions = GIVE_NO_FOR_END.keys()
end_positions.sort()
print '\n'.join('line %-3s ending at position %s' % (str(GIVE_NO_FOR_END[end]),str(end))
for end in end_positions)
def Which_Line_for_Position(pos,
dic = GIVE_NO_FOR_END,
keys = end_positions,
kmax = end_positions[-1]):
return dic[(k for k in keys if pos < k).next()] if pos<kmax else None
print
for x in (2,450,320,104,105,599,600):
print 'pos=%-6s line %s' % (x,Which_Line_for_Position(x))
結果
line 0 ending at position 25
line 1 ending at position 51
line 2 ending at position 74
line 3 ending at position 105
line 4 ending at position 132
line 5 ending at position 157
line 6 ending at position 184
line 7 ending at position 210
line 8 ending at position 244
line 9 ending at position 281
line 10 ending at position 314
line 11 ending at position 340
line 12 ending at position 367
line 13 ending at position 393
line 14 ending at position 418
line 15 ending at position 445
line 16 ending at position 472
line 17 ending at position 499
line 18 ending at position 524
line 19 ending at position 548
line 20 ending at position 572
line 21 ending at position 600
pos=2 line 0
pos=450 line 16
pos=320 line 11
pos=104 line 3
pos=105 line 4
pos=599 line 21
pos=600 line None
。
然後,將具有功能Which_Line_for_Position(),很容易獲得當前行號:只是路過f.tell()作爲參數傳遞給功能
但是警告:當使用f.tell(),做文件中的文件的指針的運動,這是絕對必要的文件是以二進制模式打開:「RB」或「RB +」或「AB」或...
+1,好簡單的解決方案,因爲它不僅需要'open'呼籲改變。您可能想爲其他任何使用的函數提供包裝(例如'close'),但它們應該是相當小的pass-thru函數。 – paxdiablo 2011-06-16 04:32:49
哦,對,'close'很方便,我會補充一點。 – 2011-06-16 04:34:30
這兩種解決方案都很棒,太棒了! – 2011-06-16 04:36:50