使用Python找到音節

可能重複：
Detecting syllables in a word 使用Python找到音節

踢（並刷上了我的Python），我試圖創建一種算法，將隨機生成一個Hai句（日文詩由三行，每行5,7和5個音節組成）。

我碰到的是找到一個詞的音節數的問題（我使用的是EN-US.dic從Ubuntu的）。

目前，我有一個腳本運行，試圖抓住通過this web site,報告的電話號碼，但是速度很慢，並且沒有產生許多命中。 This看起來更有前途，但我不知道如何使用Python在文本框中插入單詞。

我的問題是雙重的：

有一個算法的方式來確定音節的詞數（因此，並不需要製造出成千上萬個Web請求的）？
我可以使用Python將單詞注入WordCalc嗎？

來源

2012-05-02 SomeKittens

下載Moby Hyphenated Word List。它的大部分英文單詞和名字都用音節連接。音節數將是連字符數+空格數+實際連字符數+ 1.

來源

2012-05-02 14:40:23

對於第二部分，如果您使用Chrome，請右鍵單擊「計算字數」按鈕並選擇「檢查元素」。你會看到它POST是個形式/index.php與一些相關作品：

name="text" 
name="optionSyllableCount" 
name="optionWordCount"

（後兩個是輸入複選框，這通常需要一個價值POST）。

import urllib 

url = 'http://www.wordcalc.com/index.php' 
post_data = urllib.urlencode(
    {'text': 'virgina'}) 
post_data = '%s&optionSyllableCount&optionWordCount' % post_data 

cnxn = urllib.urlopen(url, post_data) 
response = cnxn.read() 
cnxn.close()

如果你想解析響應您可以：

from BeautifulSoup import BeautifulSoup 

soup = BeautifulSoup(response) 
h3_matches = [h3 for h3 in soup.findAll('h3') if h3.text == 'Statistics'] 
if len(h3_matches) != 1: 
    raise Exception('Wrong number of <h3>Statistics</h3>') 
h3_match = h3_matches[0] 
table = h3_match.findNextSibling('table') 

td_matches = [td for td in table.findAll('td') 
       if td.text == 'Syllable Count'] 
if len(td_matches) != 1: 
    raise Exception('Wrong number of <td>Syllable Count</td>') 
td_match = td_matches[0] 

td_value = td_match.findNextSibling('td') 
syllable_count = int(td_value.text)

來源

2012-05-02 14:20:42 bossylobster

響應很好，並且及時。我接受了另一個，因爲它更簡單（並且不需要互聯網）。但是，我可能最終也會實現這一個，所以我可以在下次學會如何去做。 – SomeKittens

使用Python找到音節

回答

相關問題