2017-04-10 18 views
0

我正在嘗試使用HTK(隱馬爾可夫模型工具包)爲語言(Kannada)創建文本語音轉換系統。我正在按照教程Voxforge.org爲每個叫做卡納達語言的音素生成HMM。我進行了我的數據訓練,並在經過9次連續的重新估計後結束了一系列HMM。第100行此文件(hmm9/hmmdefs)的如下所示:如何爲每部手機提供HMM模型的文本到語音系統?

~o 
<STREAMINFO> 1 25 
<VECSIZE> 25<NULLD><MFCC_D_N_Z_0><DIAGC> 
~s "silst" 
<MEAN> 25 
-8.981787e+00 9.576690e+00 -5.363592e+00 7.546162e+00 -2.035893e+00 1.368924e+01 -1.560227e+00 1.069209e+01 -1.187764e+00 7.615524e+00 -2.514401e+00 1.025364e+01 2.944104e+00 2.461911e-01 1.115181e+00 -3.759977e-02 7.252287e-01 1.149914e-01 8.399552e-01 3.023236e-01 2.565392e-01 -1.392404e-01 6.415843e-02 -1.413524e-03 -2.068892e+00 
<VARIANCE> 25 
7.971278e+01 2.110036e+01 1.939896e+01 1.365947e+01 2.379106e+01 2.825374e+01 2.110220e+01 2.366302e+01 1.793456e+01 1.325843e+01 1.291668e+01 1.298042e+01 9.060905e+00 2.023319e+00 2.965916e+00 2.247055e+00 3.701807e+00 5.488801e+00 2.863564e+00 7.988983e+00 3.203042e+00 3.015911e+00 1.897855e+00 2.752292e+00 4.656968e+01 
<GCONST> 1.009808e+02 
~h "sp" 
<BEGINHMM> 
<NUMSTATES> 3 
<STATE> 2 
~s "silst" 
<TRANSP> 3 
0.000000e+00 6.046680e-01 3.953321e-01 
0.000000e+00 7.563286e-01 2.436714e-01 
0.000000e+00 0.000000e+00 0.000000e+00 
<ENDHMM> 
~h "\340\262\202" 
<BEGINHMM> 
<NUMSTATES> 5 
<STATE> 2 
<MEAN> 25 
6.015578e+00 -1.092545e+00 5.592138e+00 2.388550e+00 -5.738769e+00 -1.192297e+01 3.629476e-01 -9.733001e+00 -9.800635e+00 -3.803736e+00 -1.282426e+00 -4.283876e+00 6.507627e-01 9.431242e-01 7.103178e-01 3.613995e-01 -6.547348e-01 3.122466e-01 4.086074e-01 1.523578e-01 -1.648303e+00 6.070523e-01 -8.467742e-01 7.640757e-01 -9.809423e-01 
<VARIANCE> 25 
7.796819e+00 1.998214e+01 2.138531e+01 2.325875e+01 3.568517e+01 4.934142e+01 4.020288e+01 4.395386e+01 5.728282e+01 4.333259e+01 4.898003e+01 5.033849e+01 4.539665e-01 1.352478e+00 2.180626e+00 2.384371e+00 2.703094e+00 2.922996e+00 3.646270e+00 4.031106e+00 6.244075e+00 3.810220e+00 4.399445e+00 4.742214e+00 4.219064e-01 
<GCONST> 9.904237e+01 
<STATE> 3 
<MEAN> 25 
8.230138e+00 3.391990e+00 7.918058e+00 7.893150e-01 -4.886593e+00 -7.784183e+00 -1.977627e+00 -5.161526e+00 -1.425691e+01 2.598373e-01 -6.913644e+00 -4.903273e-01 -5.491136e-01 1.191998e+00 5.495291e-01 -5.040157e-01 1.796028e+00 1.397739e+00 -3.372138e-01 1.118834e+00 8.360423e-01 1.942233e-01 -4.129717e-01 4.928542e-01 -1.360264e+00 
<VARIANCE> 25 
7.438219e+00 2.445082e+01 2.108792e+01 2.080677e+01 4.452454e+01 4.637808e+01 4.500817e+01 5.165137e+01 5.889799e+01 4.406474e+01 5.085579e+01 4.713411e+01 1.122957e+00 1.781516e+00 1.638479e+00 2.044909e+00 4.016158e+00 4.268800e+00 4.197068e+00 4.704672e+00 5.295690e+00 4.290681e+00 4.821241e+00 4.186268e+00 1.247262e+00 
<GCONST> 1.023382e+02 
<STATE> 4 
<MEAN> 25 
1.485957e+00 6.803011e+00 9.217879e+00 7.743183e-01 2.929435e+00 -2.796502e+00 -4.663482e-01 -2.663082e+00 -5.646381e+00 -1.620271e+00 -4.276723e+00 9.053932e-01 -3.141532e+00 -1.006466e+00 -4.909436e-01 -7.754492e-01 2.238380e+00 5.363849e-01 8.221800e-01 -3.755744e-01 4.192139e+00 -8.950262e-01 1.567237e+00 -1.683249e-02 8.296179e-01 
<VARIANCE> 25 
3.641056e+01 3.190582e+01 2.878690e+01 2.753709e+01 4.602863e+01 4.876221e+01 4.515038e+01 5.164213e+01 6.707198e+01 4.130317e+01 4.181741e+01 4.647356e+01 2.847422e+00 5.915011e+00 4.902977e+00 5.104756e+00 6.492997e+00 6.636117e+00 6.856898e+00 6.270217e+00 6.925057e+00 5.473263e+00 5.736817e+00 5.553099e+00 9.164440e+00 
<GCONST> 1.135293e+02 
<TRANSP> 5 
0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 
0.000000e+00 8.058720e-01 1.941280e-01 0.000000e+00 0.000000e+00 
0.000000e+00 0.000000e+00 6.787453e-01 3.212548e-01 0.000000e+00 
0.000000e+00 0.000000e+00 0.000000e+00 6.043503e-01 3.956497e-01 
0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 
<ENDHMM> 
~h "\340\262\203" 
<BEGINHMM> 
<NUMSTATES> 5 
<STATE> 2 
<MEAN> 25 
5.805896e-01 -1.075882e+01 -2.814176e+00 -5.880020e+00 9.052986e+00 -5.357531e+00 5.210671e-01 -3.864521e+00 -3.081197e+00 -3.748333e+00 -4.464721e+00 -9.172723e+00 -1.705454e-01 -2.218179e-01 -1.914008e-01 -1.184491e-01 8.222175e-02 7.313553e-02 3.581692e-01 2.845684e-01 1.727469e-01 5.520494e-01 2.320103e-01 -2.316576e-02 -2.695204e-01 
<VARIANCE> 25 
2.977049e+01 1.933836e+01 7.313203e+01 2.030999e+01 6.759410e+01 3.599974e+01 3.177183e+01 4.166393e+01 4.396541e+01 5.064403e+01 3.757855e+01 3.437418e+01 2.450529e+00 2.951347e+00 5.625500e+00 1.981921e+00 7.028037e+00 4.438006e+00 2.735678e+00 5.700938e+00 6.298313e+00 4.537711e+00 3.482989e+00 2.986932e+00 7.007732e-01 
<GCONST> 1.053795e+02 
<STATE> 3 
<MEAN> 25 
3.327909e-01 -7.221850e+00 -7.173618e+00 -5.016214e+00 5.876769e+00 -3.370543e+00 1.606741e+00 -1.376615e+00 1.960738e+00 1.589902e+00 1.996540e+00 -5.060460e+00 3.495924e-01 1.866842e+00 5.888807e-01 9.426065e-01 -9.187853e-01 8.452010e-01 -7.540780e-02 2.513928e-02 -3.550608e-01 4.580146e-01 -3.656684e-01 1.935040e+00 -2.721846e+00 
<VARIANCE> 25 
2.727732e+01 2.661234e+01 4.245024e+01 2.600391e+01 4.045695e+01 6.376625e+01 1.882898e+01 6.454602e+01 4.445516e+01 5.115188e+01 2.588016e+01 6.268848e+01 1.138753e+00 3.187240e+00 6.774692e+00 3.562576e+00 2.859416e+00 2.514062e+00 2.626426e+00 4.330146e+00 2.819590e+00 3.446536e+00 4.056371e+00 3.272135e+00 1.644712e+00 
<GCONST> 1.038539e+02 
<STATE> 4 
<MEAN> 25 
-1.948545e+00 5.389034e+00 7.606988e-01 2.563978e+00 3.025390e-01 -2.773341e+00 1.604488e+00 3.484166e-02 -1.042359e+00 -9.634234e-01 -2.470519e-01 9.371792e-01 -2.833774e+00 1.188266e+00 8.226773e-01 6.984392e-01 5.602998e-01 1.270509e+00 2.334537e-01 5.804406e-01 6.506713e-01 -2.372813e-01 1.170728e+00 8.938893e-01 -2.155946e+00 
<VARIANCE> 25 
4.566143e+01 5.729873e+01 3.032546e+01 4.105791e+01 3.591120e+01 1.047685e+02 3.953066e+01 5.429741e+01 4.366869e+01 3.534993e+01 2.732262e+01 5.425205e+01 5.152478e+00 1.152582e+01 3.550126e+00 5.687152e+00 1.209841e+01 1.279304e+01 4.226732e+00 9.372541e+00 4.770387e+00 7.745139e+00 5.311399e+00 6.597965e+00 2.797117e+01 
<GCONST> 1.177988e+02 
<TRANSP> 5 
0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 
0.000000e+00 9.086774e-01 9.132260e-02 0.000000e+00 0.000000e+00 
0.000000e+00 0.000000e+00 7.875688e-01 2.124312e-01 0.000000e+00 
0.000000e+00 0.000000e+00 0.000000e+00 7.366875e-01 2.633125e-01 
0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 
<ENDHMM> 
~h "\340\262\205" 
<BEGINHMM> 
<NUMSTATES> 5 
<STATE> 2 
<MEAN> 25 
-1.364586e-01 -1.345779e+01 -1.243845e+01 -6.815464e+00 9.149407e+00 -5.414480e+00 8.044611e+00 -4.851705e+00 3.946866e+00 -4.012804e+00 -3.772848e+00 -7.529723e+00 1.129064e-02 -4.110667e-01 7.924367e-01 -6.968765e-01 -1.525430e+00 -9.059939e-01 -1.230243e+00 7.858636e-01 4.188629e-01 -1.171953e+00 1.233250e-01 -1.297955e+00 1.128136e+00 
<VARIANCE> 25 
1.083044e+01 1.837658e+01 2.447788e+01 2.060703e+01 3.049649e+01 5.438379e+01 3.610032e+01 5.379135e+01 3.230940e+01 3.523213e+01 3.347173e+01 3.463765e+01 1.293990e+00 4.274662e+00 2.051608e+00 2.855648e+00 2.757517e+00 5.047538e+00 4.031400e+00 6.269907e+00 3.726271e+00 4.057119e+00 3.291831e+00 3.767739e+00 5.445541e+00 
<GCONST> 1.028119e+02 
<STATE> 3 
<MEAN> 25 
1.217768e+00 -6.389030e+00 -2.646137e+00 -3.905793e+00 7.397957e-01 -1.201395e+01 2.211602e-01 -5.126017e+00 2.621910e+00 -6.144597e+00 -2.084508e+00 -8.465837e+00 5.732061e-01 2.775116e+00 2.996845e+00 1.764418e+00 -8.277674e-01 -4.310024e-01 -1.470319e+00 -7.550560e-01 -1.513498e+00 3.458426e-01 -6.254972e-02 9.949619e-01 -2.641093e+00 
<VARIANCE> 25 
9.996929e+00 4.527835e+01 4.282214e+01 3.972993e+01 3.381993e+01 6.505142e+01 3.322316e+01 5.223600e+01 4.259438e+01 3.215968e+01 3.654654e+01 3.350212e+01 9.168934e-01 1.936100e+00 1.460505e+00 2.197761e+00 3.082174e+00 5.383616e+00 3.715859e+00 6.141081e+00 4.185381e+00 3.373948e+00 3.559795e+00 2.988963e+00 2.619537e+00 
<GCONST> 1.026411e+02 
<STATE> 4 
<MEAN> 25 
7.228215e-01 3.678457e+00 5.183236e+00 1.126518e+00 1.464845e+00 -7.542987e+00 -1.289414e+00 -3.014408e+00 -2.044827e+00 -2.407705e+00 -4.620550e+00 -2.489383e+00 -1.007321e+00 1.561653e+00 2.128999e+00 4.428890e-01 3.319249e-01 1.499440e+00 -8.010308e-02 1.836691e-01 -5.105380e-01 6.720528e-01 -4.222759e-02 1.465378e+00 -1.665329e+00 
<VARIANCE> 25 
3.001524e+01 5.585024e+01 3.712964e+01 3.331260e+01 4.078825e+01 9.574130e+01 3.707666e+01 6.811467e+01 6.065076e+01 4.812584e+01 3.788456e+01 4.958126e+01 3.807814e+00 3.643170e+00 2.988422e+00 3.972459e+00 4.200939e+00 4.923275e+00 4.783486e+00 5.365061e+00 6.556737e+00 4.130150e+00 5.004657e+00 3.739977e+00 3.390516e+00 
<GCONST> 1.109406e+02 
<TRANSP> 5 
0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 
0.000000e+00 8.390214e-01 1.609786e-01 0.000000e+00 0.000000e+00 
0.000000e+00 0.000000e+00 7.104794e-01 2.895207e-01 0.000000e+00 
0.000000e+00 0.000000e+00 0.000000e+00 4.845260e-01 5.154740e-01 
0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 
<ENDHMM> 

鑑於這個文件,我如何將任何輸入埃納德語文本(其中手機進行編碼,八進制文件上面)語音?據我所知,朱利葉斯是以相反的方式執行的,即語音到文本。但是,我想將文本轉換爲語音。任何建議表示讚賞。

回答

0

該模型用於識別目的。爲了將文本轉換爲語音,您需要一個更豐富的模型,除了mel-cepstral係數外,它還包括激勵(基頻)和持續時間,參數也在更大的語境(語音,語言,韻律)中建模。 你可以從maryTTS開始https://github.com/marytts/marytts/wiki/HMMVoiceCreation

相關問題