In the project, it is aimed to transfer the trained voice style of a famous person to given input voice.
Follow the links for
final report
and
brief presentation.
[Paper link]
Takuhiro Kaneko and Hirokazu Kameoka
Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks
[Implementation link]
Lei Mao's work
Voice Converter Using CycleGAN and Non-Parallel Data
[Project]
Our Work
MLSP Term Project: Voice Conversion
Source
Google's text-to-speech voices are used to generate 13 audio clips (each in a duration of approx. 40 secs) in a total of at least 8 minutes for each speaker.
Female Speaker WaveNet Turkish Female voice G
Male Speaker WaveNet Turkish Male voice E
Target
Similarly, 13 audio clips in a total duration of 8.8 minutes of Turkish news-presenter Ece Uner's speech is chosen.
Training Samples
Below speech samples are given into the model in training phase. There is no parallelism between source and target.
Source (Female) | Source (Male) | Target (Female) |
---|---|---|
Validation Samples
In the first row of table, original record of source speech is placed.
Remaining rows are the outputs of #-epochs trained models for given input.
CycleGAN-VC (Female-to-Female) | CycleGAN-VC (Male-to-Female) | |
---|---|---|
Input | ||
500 Epochs | ||
1500 Epochs | ||
5000 Epochs |
Input "Merhaba, bu ses CycleGAN ile üretildi." (Hi, this voice is generated by CycleGAN)
*Of course, the input speech is not generated by this network; however 'the outputs' are.
There are 3 more examples in this folder.