如果你想組可以使用itertools.groupby
空行作爲分隔符使用部分:
from itertools import groupby
with open("in.txt") as f:
for k, sec in groupby(f,key=lambda x: bool(x.strip())):
if k:
print(list(sec))
多帶些itertools FOO,我們可以用大寫的標題作爲分隔符得到部分:
from itertools import groupby, takewhile
with open("in.txt") as f:
grps = groupby(f,key=lambda x: x.isupper())
for k, sec in grps:
# if we hit a title line
if k:
# pull all paragraphs
v = next(grps)[1]
# skip two empty lines after title
next(v,""), next(v,"")
# take all lines up to next empty line/second paragraph
print(list(takewhile(lambda x: bool(x.strip()), v)))
這將使你:
['There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.\n']
['What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.']
每個部分的開頭都有一個全部大寫的標題,所以一旦我們擊中了,我們知道有兩條空行,那麼第一段和模式重複。
要掰成使用循環:
from itertools import groupby
from itertools import groupby
def parse_sec(bk):
with open(bk) as f:
grps = groupby(f, key=lambda x: bool(x.isupper()))
for k, sec in grps:
if k:
print("First paragraph from section titled :{}".format(next(sec).rstrip()))
v = next(grps)[1]
next(v, ""),next(v,"")
for line in v:
if not line.strip():
break
print(line)
爲了您的文字:
In [11]: cat -E in.txt
THE LAY OF THE LAND$
$
$
There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.$
$
Of all the kinds of interest attaching to the study of the world's wild animals, there are none that surpass the study of their minds, their morals, and the acts that they perform as the results of their mental processes.$
$
$
WILD ANIMAL TEMPERAMENT & INDIVIDUALITY$
$
$
What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.
的美元符號是新的生產線,產量:
In [12]: parse_sec("in.txt")
First paragraph from section titled :THE LAY OF THE LAND
There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.
First paragraph from section titled :WILD ANIMAL TEMPERAMENT & INDIVIDUALITY
What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.
你可以添加實際的輸入和預期的輸出嗎? –