[Paper link]
Takuhiro Kaneko and Hirokazu Kameoka
Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks

[Implementation link]
Lei Mao's work
Voice Converter Using CycleGAN and Non-Parallel Data

[Project]
Our Work
MLSP Term Project: Voice Conversion

Dataset

Source
Google's text-to-speech voices are used to generate 13 audio clips (each in a duration of approx. 40 secs) in a total of at least 8 minutes for each speaker.
Female Speaker WaveNet Turkish Female voice G
Male Speaker WaveNet Turkish Male voice E

Target
Similarly, 13 audio clips in a total duration of 8.8 minutes of Turkish news-presenter Ece Uner's speech is chosen.

The dataset link is here.

Result

Training Samples
Below speech samples are given into the model in training phase. There is no parallelism between source and target.

Source (Female)	Source (Male)	Target (Female)

Validation Samples
In the first row of table, original record of source speech is placed.
Remaining rows are the outputs of #-epochs trained models for given input.

	CycleGAN-VC (Female-to-Female)	CycleGAN-VC (Male-to-Female)
Input
500 Epochs
1500 Epochs
5000 Epochs

Input "Merhaba, bu ses CycleGAN ile üretildi." (Hi, this voice is generated by CycleGAN)
*Of course, the input speech is not generated by this network; however 'the outputs' are.

There are 3 more examples in this folder.

Voice Conversion

MLSP Project, Istanbul Technical University
Fall 2020

Selahaddin HONİ | İsmail Melik TÜRKER | İmran Çağla EYÜBOĞLU

Project Aim

Reference Paper & Implementation

Dataset

Result

Voice Conversion

MLSP Project, Istanbul Technical UniversityFall 2020

Selahaddin HONİ | İsmail Melik TÜRKER | İmran Çağla EYÜBOĞLU

Project Aim

Reference Paper & Implementation

Dataset

Result

MLSP Project, Istanbul Technical University
Fall 2020