Loading
柚木鉉の空間
0%
INITIALIZING
首页 文章 标签 归档 友链 推广 搜索
文档编号 // 43ECFD 在线

Unit-selection Speech Synthesis

2019-09-30
更新: 2026-04-25
2403 字符
这篇文章写于 2019,已经超过 7 年了。内容可能已经过时。

1. About Unit-selection Speech Synthesis

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.

And for Unit selection synthesis is using large databases of recorded speech. During database creation, each recorded utterance is segmented into some or all of the following: individual phones, diphones, half-phones, syllables, morphemes, words, phrases, and sentences.

2. Unit-selection Speech Synthesis

In Unit-selection Speech Synthesis, we use a large speech database which provides all syllable in. Like Japanese, Use those pronunciation:

きゃ しゃ ちゃ にゃ ひゃ みゃ りゃ ぎゃ じゃ びゃ ぴゃ
きゅ しゅ ちゅ にゅ ひゅ みゅ りゅ ぎゅ じゅ びゅ ぴゅ
きょ しょ ちょ にょ ひょ みょ りょ ぎょ じょ びょ ぴょ
きぇ しぇ ちぇ にぇ ひぇ みぇ りぇ ぎぇ じぇ びぇ ぴぇ
ふぁ ふぃ ふぇ ふぉ
いぁ うぃ うぇ うぉ
つぁ つぃ つぇ つぉ
すぃ てぃ てゅ とぅ
ずぃ でぃ でゅ どぅ
ゔぁ ゔぃ ゔぇ ゔぉ

We need collect all pronounce of those words and record it. After that, we need add label for the $Initials$ and $finals$,than, do $prosodic\ parameter\ extraction$ and $spectral\ parameter\ extraction$. After that load into the Unit-selection Synthesis model which waiting for synthesis.

The Unit-selection Speech Synthesis Algorithm

$$ \begin{aligned} C^{c}\left(u_{i-1}, u_{i}\right) &=\sum_{k=1}^{q} w_{k}^{c} C_{k}^{c}\left(u_{i-1}, u_{i}\right) \\ C^{\prime}\left(t_{i}, u_{i}\right) &=\sum_{j=1}^{p} w_{j}^{\prime} C_{j}^{t}\left(t_{i}, u_{i}\right) \end{aligned} $$

$$ \hat{u}_{1}^{n}=\arg \min _{u^{i}}\left\{C\left(t_{1}^{n}, u_{1}^{n}\right)\right\} $$

$$ C\left(t_{1}^{n}, u_{1}^{n}\right)=\sum_{i=1}^{n} C^{\prime}\left(t_{i}, u_{i}\right)+\sum_{i=1}^{n} C^{c}\left(u_{i-1}, u_{i}\right) $$

$$ C^{c}\left(u_{i-1}, u_{i}\right) : \text { Concatenation cost } $$

$$ C^{t}\left(t_{i}, u_{i}\right) : \text { Target cost } $$

REFERENCES

  1. https://en.wikipedia.org/wiki/Speech_synthesis
  2. Expressive Prosody for Unit-selection Speech Synthesis
WeChat Pay 微信
Alipay 支付宝
导航 // 相关文章
目录