如何從馬爾可夫鏈輸出創建段落？

我想修改下面的腳本，以便從隨機數的腳本生成的句子中創建段落。換句話說，在添加換行符之前，連接一個隨機數（如1-5個）的句子。如何從馬爾可夫鏈輸出創建段落？

該腳本原樣工作，但輸出是由換行符分隔的短句子。我想收集一些句子成段落。

有關最佳實踐的任何想法？謝謝。

""" 
    from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python 
""" 

import random; 
import sys; 

stopword = "\n" # Since we split on whitespace, this can never be a word 
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word 
sentencesep = "\n" #String used to seperate sentences 


# GENERATE TABLE 
w1 = stopword 
w2 = stopword 
table = {} 

for line in sys.stdin: 
    for word in line.split(): 
     if word[-1] in stopsentence: 
      table.setdefault((w1, w2), []).append(word[0:-1]) 
      w1, w2 = w2, word[0:-1] 
      word = word[-1] 
     table.setdefault((w1, w2), []).append(word) 
     w1, w2 = w2, word 
# Mark the end of the file 
table.setdefault((w1, w2), []).append(stopword) 

# GENERATE SENTENCE OUTPUT 
maxsentences = 20 

w1 = stopword 
w2 = stopword 
sentencecount = 0 
sentence = [] 

while sentencecount < maxsentences: 
    newword = random.choice(table[(w1, w2)]) 
    if newword == stopword: sys.exit() 
    if newword in stopsentence: 
     print ("%s%s%s" % (" ".join(sentence), newword, sentencesep)) 
     sentence = [] 
     sentencecount += 1 
    else: 
     sentence.append(newword) 
    w1, w2 = w2, newword

編輯01：

好吧，我已經拼湊一個簡單的「段落的包裝，」效果很好的句子聚集成段落，但它的輸出搞砸句子生成器 - 例如，在第一個單詞的重複性問題中，我遇到了其他問題。

但前提是合理的;我只需要弄清楚爲什麼句子循環的功能受到了段落循環的影響。請告知，如果你能看到的問題：

### 
# usage: $ python markov_sentences.py <input.txt> output.txt 
# from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python 
### 

import random; 
import sys; 

stopword = "\n" # Since we split on whitespace, this can never be a word 
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word 
paragraphsep = "\n\n" #String used to seperate sentences 


# GENERATE TABLE 
w1 = stopword 
w2 = stopword 
table = {} 

for line in sys.stdin: 
    for word in line.split(): 
     if word[-1] in stopsentence: 
      table.setdefault((w1, w2), []).append(word[0:-1]) 
      w1, w2 = w2, word[0:-1] 
      word = word[-1] 
     table.setdefault((w1, w2), []).append(word) 
     w1, w2 = w2, word 
# Mark the end of the file 
table.setdefault((w1, w2), []).append(stopword) 

# GENERATE PARAGRAPH OUTPUT 
maxparagraphs = 10 
paragraphs = 0 # reset the outer 'while' loop counter to zero 

while paragraphs < maxparagraphs: # start outer loop, until maxparagraphs is reached 
    w1 = stopword 
    w2 = stopword 
    stopsentence = (".", "!", "?",) 
    sentence = [] 
    sentencecount = 0 # reset the inner 'while' loop counter to zero 
    maxsentences = random.randrange(1,5) # random sentences per paragraph 

    while sentencecount < maxsentences: # start inner loop, until maxsentences is reached 
     newword = random.choice(table[(w1, w2)]) # random word from word table 
     if newword == stopword: sys.exit() 
     elif newword in stopsentence: 
      print ("%s%s" % (" ".join(sentence), newword), end=" ") 
      sentencecount += 1 # increment the sentence counter 
     else: 
      sentence.append(newword) 
     w1, w2 = w2, newword 
    print (paragraphsep) # newline space 
    paragraphs = paragraphs + 1 # increment the paragraph counter 


# EOF

編輯02：

新增sentence = []按照下面的答案爲elif聲明。以機智;

 elif newword in stopsentence: 
      print ("%s%s" % (" ".join(sentence), newword), end=" ") 
      sentence = [] # I have to be here to make the new sentence start as an empty list!!! 
      sentencecount += 1 # increment the sentence counter

編輯03：

這是這個劇本的最後一次迭代。感謝在整理這個問題上的幫助而感到悲傷。我希望別人可以有一些樂趣，我知道我會的。 ;）

供參考：有一個小的神器 - 有一個額外的段落結束空間，如果您使用此腳本，您可能需要清理。但除此之外，馬爾可夫鏈文本生成的完美實現。

### 
# usage: python markov_sentences.py <input.txt> output.txt 
# from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python 
### 

import random; 
import sys; 

stopword = "\n" # Since we split on whitespace, this can never be a word 
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word 
sentencesep = "\n" #String used to seperate sentences 


# GENERATE TABLE 
w1 = stopword 
w2 = stopword 
table = {} 

for line in sys.stdin: 
    for word in line.split(): 
     if word[-1] in stopsentence: 
      table.setdefault((w1, w2), []).append(word[0:-1]) 
      w1, w2 = w2, word[0:-1] 
      word = word[-1] 
     table.setdefault((w1, w2), []).append(word) 
     w1, w2 = w2, word 
# Mark the end of the file 
table.setdefault((w1, w2), []).append(stopword) 

# GENERATE SENTENCE OUTPUT 
maxsentences = 20 

w1 = stopword 
w2 = stopword 
sentencecount = 0 
sentence = [] 
paragraphsep = "\n" 
count = random.randrange(1,5) 

while sentencecount < maxsentences: 
    newword = random.choice(table[(w1, w2)]) # random word from word table 
    if newword == stopword: sys.exit() 
    if newword in stopsentence: 
     print ("%s%s" % (" ".join(sentence), newword), end=" ") 
     sentence = [] 
     sentencecount += 1 # increment the sentence counter 
     count -= 1 
     if count == 0: 
      count = random.randrange(1,5) 
      print (paragraphsep) # newline space 
    else: 
     sentence.append(newword) 
    w1, w2 = w2, newword 


# EOF

來源

2012-10-20 Spider M. Mann

你需要複製

sentence = []

返回到

elif newword in stopsentence:

條款。

所以

while paragraphs < maxparagraphs: # start outer loop, until maxparagraphs is reached 
    w1 = stopword 
    w2 = stopword 
    stopsentence = (".", "!", "?",) 
    sentence = [] 
    sentencecount = 0 # reset the inner 'while' loop counter to zero 
    maxsentences = random.randrange(1,5) # random sentences per paragraph 

    while sentencecount < maxsentences: # start inner loop, until maxsentences is reached 
     newword = random.choice(table[(w1, w2)]) # random word from word table 
     if newword == stopword: sys.exit() 
     elif newword in stopsentence: 
      print ("%s%s" % (" ".join(sentence), newword), end=" ") 
      sentence = [] # I have to be here to make the new sentence start as an empty list!!! 
      sentencecount += 1 # increment the sentence counter 
     else: 
      sentence.append(newword) 
     w1, w2 = w2, newword 
    print (paragraphsep) # newline space 
    paragraphs = paragraphs + 1 # increment the paragraph counter

編輯

這裏是不使用外部環路中的溶液。

""" 
    from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python 
""" 

import random; 
import sys; 

stopword = "\n" # Since we split on whitespace, this can never be a word 
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word 
sentencesep = "\n" #String used to seperate sentences 


# GENERATE TABLE 
w1 = stopword 
w2 = stopword 
table = {} 

for line in sys.stdin: 
    for word in line.split(): 
     if word[-1] in stopsentence: 
      table.setdefault((w1, w2), []).append(word[0:-1]) 
      w1, w2 = w2, word[0:-1] 
      word = word[-1] 
     table.setdefault((w1, w2), []).append(word) 
     w1, w2 = w2, word 
# Mark the end of the file 
table.setdefault((w1, w2), []).append(stopword) 

# GENERATE SENTENCE OUTPUT 
maxsentences = 20 

w1 = stopword 
w2 = stopword 
sentencecount = 0 
sentence = [] 
paragraphsep == "\n\n" 
count = random.randrange(1,5) 

while sentencecount < maxsentences: 
    newword = random.choice(table[(w1, w2)]) 
    if newword == stopword: sys.exit() 
    if newword in stopsentence: 
     print ("%s%s" % (" ".join(sentence), newword), end=" ") 
     sentence = [] 
     sentencecount += 1 
     count -= 1 
     if count == 0: 
      count = random.randrange(1,5) 
      print (paragraphsep) 
    else: 
     sentence.append(newword) 
    w1, w2 = w2, newword

來源

2012-10-23 20:36:30 grieve

糟糕！是的，我一定是在某個時候抽出來的，忘記把它放回去。謝謝你的見解！這幾乎成功了。似乎句子循環爲每個句子重新使用相同的開始單詞。關於如何混合它爲句子生成選擇的第一個單詞的任何想法？ –

我添加了一個不需要外部循環的獨立解決方案。 – grieve

我目前沒有安裝python 3，所以你可能需要調整第二個解決方案的語法。 – grieve

您是否理解此代碼？我敢打賭，你可以找到打印該句子的位，並將其更改爲一起打印幾個句子，而無需退貨。您可以在句子位周圍添加另一個while循環以獲取多個段落。

語法提示：

print 'hello' 
print 'there' 
hello 
there 

print 'hello', 
print 'there' 
hello there 

print 'hello', 
print 
print 'there'

的一點是，在打印語句結束一個逗號防止在生產線末端的回報，一個空白的print語句打印的回報。

來源

2012-10-21 03:02:44 Thomas

是的，我遵循。麻煩的是，我用'print'語句嘗試的所有內容都無助於將句子彙總到段落中（除非您數出所有的換行符，製作一個大段落）。 'while'循環是我想到的，但我不太確定如何包裝句子部分。我試過的每件事都會導致各種錯誤，所以我想我會問專家。什麼是最好的方法來告訴它「生成x（例如1-5）的句子數量，然後插入一個換行符，然後重複，直到達到」maxsentences「？ –

如何從馬爾可夫鏈輸出創建段落？

回答

相關問題