sample = (
"1234567 12345 123456789",
"1234567 12345 123456789",
"1234567 12345 123456789",
"1234567 12345 123456789",
)
def slices_at(sequence,offsets=((0,7),(8,13),(14,25))):
for line in sequence:
yield tuple(line[x:y] for (x,y) in offsets)
result = list(slices_at(sample))
產量:
>>> result
[('1234567', '12345', '123456789'), ('1234567', '12345', '123456789'), ('1234567', '12345', '123456789'), ('1234567', '12345', '123456789')]
重讀你的問題,我已經意識到要對前兩個字段終止空白。
這裏有一個新的函數,它接受塊長度的列表:
def slices_by_block_length(sequence,block_lengths=(8,6,9)):
prev = 0
offsets = []
for length in block_lengths:
offsets.append((prev,prev+length))
prev += length
for line in sequence:
yield tuple(line[x:y] for (x,y) in offsets)
產量:
[('1234567 ', '12345 ', '123456789'), ('1234567 ', '12345 ', '123456789'), ('1234567 ', '12345 ', '123456789'), ('1234567 ', '12345 ', '123456789')]
如果您有具體塊長度,甚至不認爲使用正則表達式。切片是最好的解決方案。 – nhahtdh
@nhahtdh:爲什麼?因爲它更乾淨或更高效? – Caniko
它比正則表達式更有效率,並且你可以避免用正則表達式做出微妙的陷阱(即你確定這個假設是正確的嗎?你確定語法沒有引入一些隱藏的假設嗎?)。乾淨與否 - 我會讓別人評論。 – nhahtdh