我想修改下面的腳本,以便從隨機數的腳本生成的句子中創建段落。換句話說,在添加換行符之前,連接一個隨機數(如1-5個)的句子。如何從馬爾可夫鏈輸出創建段落?
該腳本原樣工作,但輸出是由換行符分隔的短句子。我想收集一些句子成段落。
有關最佳實踐的任何想法?謝謝。
"""
from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
"""
import random;
import sys;
stopword = "\n" # Since we split on whitespace, this can never be a word
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word
sentencesep = "\n" #String used to seperate sentences
# GENERATE TABLE
w1 = stopword
w2 = stopword
table = {}
for line in sys.stdin:
for word in line.split():
if word[-1] in stopsentence:
table.setdefault((w1, w2), []).append(word[0:-1])
w1, w2 = w2, word[0:-1]
word = word[-1]
table.setdefault((w1, w2), []).append(word)
w1, w2 = w2, word
# Mark the end of the file
table.setdefault((w1, w2), []).append(stopword)
# GENERATE SENTENCE OUTPUT
maxsentences = 20
w1 = stopword
w2 = stopword
sentencecount = 0
sentence = []
while sentencecount < maxsentences:
newword = random.choice(table[(w1, w2)])
if newword == stopword: sys.exit()
if newword in stopsentence:
print ("%s%s%s" % (" ".join(sentence), newword, sentencesep))
sentence = []
sentencecount += 1
else:
sentence.append(newword)
w1, w2 = w2, newword
編輯01:
好吧,我已經拼湊一個簡單的「段落的包裝,」效果很好的句子聚集成段落,但它的輸出搞砸句子生成器 - 例如,在第一個單詞的重複性問題中,我遇到了其他問題。
但前提是合理的;我只需要弄清楚爲什麼句子循環的功能受到了段落循環的影響。請告知,如果你能看到的問題:
###
# usage: $ python markov_sentences.py <input.txt> output.txt
# from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
###
import random;
import sys;
stopword = "\n" # Since we split on whitespace, this can never be a word
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word
paragraphsep = "\n\n" #String used to seperate sentences
# GENERATE TABLE
w1 = stopword
w2 = stopword
table = {}
for line in sys.stdin:
for word in line.split():
if word[-1] in stopsentence:
table.setdefault((w1, w2), []).append(word[0:-1])
w1, w2 = w2, word[0:-1]
word = word[-1]
table.setdefault((w1, w2), []).append(word)
w1, w2 = w2, word
# Mark the end of the file
table.setdefault((w1, w2), []).append(stopword)
# GENERATE PARAGRAPH OUTPUT
maxparagraphs = 10
paragraphs = 0 # reset the outer 'while' loop counter to zero
while paragraphs < maxparagraphs: # start outer loop, until maxparagraphs is reached
w1 = stopword
w2 = stopword
stopsentence = (".", "!", "?",)
sentence = []
sentencecount = 0 # reset the inner 'while' loop counter to zero
maxsentences = random.randrange(1,5) # random sentences per paragraph
while sentencecount < maxsentences: # start inner loop, until maxsentences is reached
newword = random.choice(table[(w1, w2)]) # random word from word table
if newword == stopword: sys.exit()
elif newword in stopsentence:
print ("%s%s" % (" ".join(sentence), newword), end=" ")
sentencecount += 1 # increment the sentence counter
else:
sentence.append(newword)
w1, w2 = w2, newword
print (paragraphsep) # newline space
paragraphs = paragraphs + 1 # increment the paragraph counter
# EOF
編輯02:
新增sentence = []
按照下面的答案爲elif
聲明。以機智;
elif newword in stopsentence:
print ("%s%s" % (" ".join(sentence), newword), end=" ")
sentence = [] # I have to be here to make the new sentence start as an empty list!!!
sentencecount += 1 # increment the sentence counter
編輯03:
這是這個劇本的最後一次迭代。感謝在整理這個問題上的幫助而感到悲傷。我希望別人可以有一些樂趣,我知道我會的。 ;)
供參考:有一個小的神器 - 有一個額外的段落結束空間,如果您使用此腳本,您可能需要清理。但除此之外,馬爾可夫鏈文本生成的完美實現。
###
# usage: python markov_sentences.py <input.txt> output.txt
# from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
###
import random;
import sys;
stopword = "\n" # Since we split on whitespace, this can never be a word
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word
sentencesep = "\n" #String used to seperate sentences
# GENERATE TABLE
w1 = stopword
w2 = stopword
table = {}
for line in sys.stdin:
for word in line.split():
if word[-1] in stopsentence:
table.setdefault((w1, w2), []).append(word[0:-1])
w1, w2 = w2, word[0:-1]
word = word[-1]
table.setdefault((w1, w2), []).append(word)
w1, w2 = w2, word
# Mark the end of the file
table.setdefault((w1, w2), []).append(stopword)
# GENERATE SENTENCE OUTPUT
maxsentences = 20
w1 = stopword
w2 = stopword
sentencecount = 0
sentence = []
paragraphsep = "\n"
count = random.randrange(1,5)
while sentencecount < maxsentences:
newword = random.choice(table[(w1, w2)]) # random word from word table
if newword == stopword: sys.exit()
if newword in stopsentence:
print ("%s%s" % (" ".join(sentence), newword), end=" ")
sentence = []
sentencecount += 1 # increment the sentence counter
count -= 1
if count == 0:
count = random.randrange(1,5)
print (paragraphsep) # newline space
else:
sentence.append(newword)
w1, w2 = w2, newword
# EOF
糟糕!是的,我一定是在某個時候抽出來的,忘記把它放回去。謝謝你的見解!這幾乎成功了。似乎句子循環爲每個句子重新使用相同的開始單詞。關於如何混合它爲句子生成選擇的第一個單詞的任何想法? –
我添加了一個不需要外部循環的獨立解決方案。 – grieve
我目前沒有安裝python 3,所以你可能需要調整第二個解決方案的語法。 – grieve