An Attribute Interpolation Method in Speech Synthesis by Model Merging
Audio Samples
4.2. Speaker Generation
Female-Female
speaker combination (A-B) | base speaekr A | interpolation A-B | base speaker B |
---|---|---|---|
p225-p229 | |||
p225-p231 | |||
p228-p231 |
Male-Male
speaker combination (A-B) | base speaekr A | interpolation A-B | base speaker B |
---|---|---|---|
p226-p237 | |||
p226-p241 | |||
p232-p237 |
Male-Female
speaker combination (A-B) | base speaekr A | interpolation A-B | base speaker B |
---|---|---|---|
p227-p228 | |||
p237-p225 | |||
p226-p228(Failed) |
4.3. Emotion Intensity Control
emotion style | alpha=0(Neutral) | alpha=0.25 | alpha=0.5 | alpha=0.75 | alpha=1.0(Emotional) |
---|---|---|---|---|---|
Angry | |||||
Happy | |||||
Sad | |||||
Surprise |