Voice Conversion (VC) must be achieved while maintaining the content of the source speech and representing the characteristics of the target speaker.
The existing methods do not simultaneously satisfy the above two aspects of VC, and their conversion outputs suffer from a trade-off problem between maintaining source contents and target characteristics.
In this study, we propose Triple Adaptive Attention Normalization VC (TriAAN-VC), comprising an encoder-decoder and an attention-based adaptive normalization block, that can be applied to non-parallel any-to-any VC.
The proposed adaptive normalization block extracts target speaker representations and achieves conversion while minimizing the loss of the source content with siamese loss.
We evaluated TriAAN-VC on the VCTK dataset in terms of the maintenance of the source content and target speaker similarity.
Experimental results for one-shot VC suggest that TriAAN-VC achieves state-of-the-art performance while mitigating the trade-off problem encountered in the existing VC methods.
Voice Conversion Results
Below, we provide some voice conversion examples depending on scenarios and speakers.
Seen to Seen Scenarios (S2S):
Source and target speakers are seen during training
Male to Male
Male to Female
Female to Female
Female to Male
Scenario 1. S2S Male to Male:
Source
Target
Conversion
p279
p232
AdaIN-VC
AGAIN-VC
VQMIVC
VQVC+
S2VC
TriAAN-VC
p298
p281
AdaIN-VC
AGAIN-VC
VQMIVC
VQVC+
S2VC
TriAAN-VC
Scenario 2. S2S Male to Female:
Source
Target
Conversion
p263
p248
AdaIN-VC
AGAIN-VC
VQMIVC
VQVC+
S2VC
TriAAN-VC
p363
p318
AdaIN-VC
AGAIN-VC
VQMIVC
VQVC+
S2VC
TriAAN-VC
Scenario 3. S2S Female to Female:
Source
Target
Conversion
p265
p333
AdaIN-VC
AGAIN-VC
VQMIVC
VQVC+
S2VC
TriAAN-VC
p307
p234
AdaIN-VC
AGAIN-VC
VQMIVC
VQVC+
S2VC
TriAAN-VC
Scenario 4. S2S Female to Male:
Source
Target
Conversion
p288
p278
AdaIN-VC
AGAIN-VC
VQMIVC
VQVC+
S2VC
TriAAN-VC
p310
p298
AdaIN-VC
AGAIN-VC
VQMIVC
VQVC+
S2VC
TriAAN-VC
Unseen to Unseen Scenarios (U2U):
Source and target speakers are unseen during training