An Attribute Interpolation Method in Speech Synthesis by Model Merging
Audio Samples
4.2. Speaker Generation
Female-Female
| speaker combination (A-B) | base speaekr A | interpolation A-B | base speaker B |
|---|---|---|---|
| p225-p229 | |||
| p225-p231 | |||
| p228-p231 |
Male-Male
| speaker combination (A-B) | base speaekr A | interpolation A-B | base speaker B |
|---|---|---|---|
| p226-p237 | |||
| p226-p241 | |||
| p232-p237 |
Male-Female
| speaker combination (A-B) | base speaekr A | interpolation A-B | base speaker B |
|---|---|---|---|
| p227-p228 | |||
| p237-p225 | |||
| p226-p228(Failed) |
4.3. Emotion Intensity Control
| emotion style | alpha=0(Neutral) | alpha=0.25 | alpha=0.5 | alpha=0.75 | alpha=1.0(Emotional) |
|---|---|---|---|---|---|
| Angry | |||||
| Happy | |||||
| Sad | |||||
| Surprise |