AGAIN-VC Demo

This is the demo page for AGAIN-VC, a one-shot Voice Conversion system using Activation Guidance and Adaptive Instance Normalization

Abstract

Recently, voice conversion (VC) has been widely studied. Many VC systems use disentangle-based learning techniques to separate the speaker and the linguistic content information from a speech signal. Subsequently, they convert the voice by changing the speaker information to that of the target speaker. To prevent the speaker information from leaking into the content embeddings, previous works either reduce the dimension or quantize the content embedding as a strong information bottleneck. These mechanisms somehow hurt the synthesis quality. In this work, we propose AGAIN-VC, an innovative VC system using Activation Guidance and Adaptive Instance Normalization. AGAIN-VC is an auto-encoder-based model, comprising of a single encoder and a decoder. With a proper activation as an information bottleneck on content embeddings, the trade-off between the synthesis quality and the speaker similarity of the converted speech is improved drastically. This one-shot VC system obtains the best performance regardless of the subjective or objective evaluations.

Paper

The paper is here.

Code

The code is available here.

One-shot Voice Conversion

Source Target Conversion
p334_007 p343_004 Proposed AdaIN-VC VQVC+ AutoVC
p334_007 p360_018 Proposed AdaIN-VC VQVC+ AutoVC
p334_007 p362_010 Proposed AdaIN-VC VQVC+ AutoVC
p343_004 p334_007 Proposed AdaIN-VC VQVC+ AutoVC
p343_004 p360_018 Proposed AdaIN-VC VQVC+ AutoVC
p343_004 p362_010 Proposed AdaIN-VC VQVC+ AutoVC
p360_018 p334_007 Proposed AdaIN-VC VQVC+ AutoVC
p360_018 p343_004 Proposed AdaIN-VC VQVC+ AutoVC
p360_018 p362_010 Proposed AdaIN-VC VQVC+ AutoVC
p362_010 p334_007 Proposed AdaIN-VC VQVC+ AutoVC
p362_010 p343_004 Proposed AdaIN-VC VQVC+ AutoVC
p362_010 p360_018 Proposed AdaIN-VC VQVC+ AutoVC

For fun

Source Target Conversion
phd_ch1 p225_002 Proposed
phd_ch1 p227_002 Proposed
phd_en3 p225_002 Proposed
phd_en3 p227_002 Proposed
welcome_ch3 p225_002 Proposed
welcome_ch3 p227_002 Proposed
welcome_en3 p225_002 Proposed
welcome_en3 p227_002 Proposed