2380 字
12 分钟
CycleGAN-V2音色转换算法复现

GAN#

先说一下GAN,对抗生成模型核心即生成器Generator和判别器Discriminator,由生成器不停的生成数据,判别器给出反馈,以此往复知道判别器无法判断生成器生成的数据真伪。

这样的过程相当于是从一张白纸做成了一张假钞,“警察”(Discriminator)最后也无法判断出这是真币还是假币。实现了从一种特征到另一种特征的转换。

CycleGAN-VC#

CycleGAN VC2是一种用于语音转换的模型,基于 CycleGANGAN 的原理进行改进和应用。其主要特点和原理包括:

  1. 语音转换CycleGAN VC2 主要用于将一种语音特征转换为另一种语音特征,如改变说话人的声音特征,同时保持内容不变。

  2. 无配对数据训练:与图像转换中的 CycleGAN 类似,CycleGAN VC2 不需要成对的语音数据进行训练。它可以在没有直接对应的语音对的情况下学习不同语音域之间的映射。

  3. 循环一致性损失:为了确保转换后的语音保留原始语音的内容,CycleGAN VC2 使用循环一致性损失。这意味着,如果将语音从域 A 转换到域 B,然后再转换回域A,最终得到的语音应该与原始语音相似。

  4. 生成器和判别器

    • 生成器:用于将语音特征从一种风格转换到另一种风格。
    • 判别器:用于区分生成的语音特征和真实的语音特征。
  5. 损失函数

    • 对抗性损失:用于训练生成器,使生成的语音特征能够骗过判别器。
    • 循环一致性损失:用于确保语音在转换来回后保持一致。
  6. 应用CycleGAN VC2被用于语音风格转换、语音克隆、语音增强等领域。

通过这些机制,CycleGAN VC2 可以在不需要大量配对数据的情况下,实现高质量的语音转换。

## ResNet残差思想#

ResNet(Residual Network)是一种深度神经网络架构,其核心思想是“残差学习”。这种思想解决了随着网络深度增加而出现的梯度消失和梯度爆炸问题。

残差块(Residual Block)#

  1. 基本结构
    • 残差块由两层或三层卷积网络组成,通常包括 Batch NormalizationReLU 激活函数。
  2. 恒等映射
    • 残差块的输出不是直接输出 F(x),而是 F(x) + x,其中x 是输入。
    • 这种结构称为“恒等映射”(identity mapping),即通过跳跃连接(shortcut connection)将输入直接加到输出上。

残差思想的优势#

  1. 解决退化问题

    • 随着网络层数增加,深层网络比浅层网络表现更差的问题称为“退化问题”。
    • 残差网络通过学习残差而不是直接学习原始映射,使得优化更容易。
  2. 梯度流动更顺畅

    • 恒等映射使得梯度可以直接通过跳跃连接反向传播,缓解梯度消失和梯度爆炸问题。
  3. 易于优化

    • 由于学习的是残差,网络可以更容易地接近恒等映射,从而更快地收敛。

实际应用#

  • 深度网络设计ResNet 可以支持非常深的网络(如 50 层、101 层、152 层),在图像分类、目标检测等任务中表现优异。
  • 模块化设计:残差块可以堆叠,形成不同深度的 ResNet网络。

核心代码#

生成器Generator#

class downSample_Generator(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
        super(downSample_Generator, self).__init__()

        self.convLayer = nn.Sequential(nn.Conv2d(in_channels=in_channels,
                                                 out_channels=out_channels,
                                                 kernel_size=kernel_size,
                                                 stride=stride,
                                                 padding=padding),
                                       nn.InstanceNorm2d(num_features=out_channels,
                                                         affine=True))
        self.convLayer_gates = nn.Sequential(nn.Conv2d(in_channels=in_channels,
                                                       out_channels=out_channels,
                                                       kernel_size=kernel_size,
                                                       stride=stride,
                                                       padding=padding),
                                             nn.InstanceNorm2d(num_features=out_channels,
                                                               affine=True))

    def forward(self, input):
        a = self.convLayer(input)
        b = self.convLayer_gates(input)
        return self.convLayer(input) * torch.sigmoid(self.convLayer_gates(input))


class upSample_Generator(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
        super(upSample_Generator, self).__init__()

        self.convLayer = nn.Sequential(nn.Conv2d(in_channels=in_channels,
                                                 out_channels=out_channels,
                                                 kernel_size=kernel_size,
                                                 stride=stride,
                                                 padding=padding),
                                       #PixelShuffle(upscale_factor=2),
                                       up_2Dsample(upscale_factor=2),
                                       nn.InstanceNorm2d(num_features=out_channels,
                                                         affine=True))
        self.convLayer_gates = nn.Sequential(nn.Conv2d(in_channels=in_channels,
                                                       out_channels=out_channels,
                                                       kernel_size=kernel_size,
                                                       stride=stride,
                                                       padding=padding),
                                             #PixelShuffle(upscale_factor=2),
                                             up_2Dsample(upscale_factor=2),
                                             nn.InstanceNorm2d(num_features=out_channels,
                                                               affine=True))

   

    def forward(self, input):        
        return self.convLayer(input) * torch.sigmoid(self.convLayer_gates(input))


class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=1,
                               out_channels=128,
                               kernel_size=[5,15],
                               stride=1,
                               padding=[2,7])

        self.conv1_gates = nn.Conv2d(in_channels=1,
                               out_channels=128,
                               kernel_size=[5,15],
                               stride=1,
                               padding=[2,7])

        # Downsample Layer
        self.downSample1 = downSample_Generator(in_channels=128,
                                                out_channels=256,
                                                kernel_size=5,
                                                stride=2,
                                                padding=2)

        self.downSample2 = downSample_Generator(in_channels=256,
                                                out_channels=512,
                                                kernel_size=5,
                                                stride=2,
                                                padding=2)
        #reshape
        self.conv2 = nn.Conv1d(in_channels=3072,
                               out_channels=512,
                               kernel_size=1,
                               stride=1)

        # Residual Blocks
        self.residualLayer1 = ResidualLayer(in_channels=512,
                                            out_channels=1024,
                                            kernel_size=3,
                                            stride=1,
                                            padding=1)
        self.residualLayer2 = ResidualLayer(in_channels=512,
                                            out_channels=1024,
                                            kernel_size=3,
                                            stride=1,
                                            padding=1)
        self.residualLayer3 = ResidualLayer(in_channels=512,
                                            out_channels=1024,
                                            kernel_size=3,
                                            stride=1,
                                            padding=1)
        self.residualLayer4 = ResidualLayer(in_channels=512,
                                            out_channels=1024,
                                            kernel_size=3,
                                            stride=1,
                                            padding=1)
        self.residualLayer5 = ResidualLayer(in_channels=512,
                                            out_channels=1024,
                                            kernel_size=3,
                                            stride=1,
                                            padding=1)
        self.residualLayer6 = ResidualLayer(in_channels=512,
                                            out_channels=1024,
                                            kernel_size=3,
                                            stride=1,
                                            padding=1)
        #reshape
        self.conv3 = nn.Conv1d(in_channels=512,
                               out_channels=3072,
                               kernel_size=1,
                               stride=1)


        # UpSample Layer
        self.upSample1 = upSample_Generator(in_channels=512,
                                            out_channels=1024,
                                            kernel_size=5,
                                            stride=1,
                                            padding=2)
        
        self.upSample2 = upSample_Generator(in_channels=1024,
                                            out_channels=512,
                                            kernel_size=5,
                                            stride=1,
                                            padding=2)

        self.lastConvLayer = nn.Conv2d(in_channels=512,
                                       out_channels=1,
                                       kernel_size=[5,15],
                                       stride=1,
                                       padding=[2,7])

    def forward(self, input):
        # GLU
        input = input.unsqueeze(1)

        conv1 = self.conv1(input) * torch.sigmoid(self.conv1_gates(input))

        downsample1 = self.downSample1(conv1)
        
        downsample2 = self.downSample2(downsample1)
        
        downsample3 = downsample2.view([downsample2.shape[0],-1,downsample2.shape[3]])
        
        downsample3 = self.conv2(downsample3)
        
        residual_layer_1 = self.residualLayer1(downsample3)
        
        residual_layer_2 = self.residualLayer2(residual_layer_1)
        
        residual_layer_3 = self.residualLayer3(residual_layer_2)
        
        residual_layer_4 = self.residualLayer4(residual_layer_3)
        
        residual_layer_5 = self.residualLayer5(residual_layer_4)
        
        residual_layer_6 = self.residualLayer6(residual_layer_5)
        
        residual_layer_6 = self.conv3(residual_layer_6)
        
        residual_layer_6 = residual_layer_6.view([downsample2.shape[0],downsample2.shape[1],downsample2.shape[2],downsample2.shape[3]])
        
        upSample_layer_1 = self.upSample1(residual_layer_6)
        
        upSample_layer_2 = self.upSample2(upSample_layer_1)
        
        output = self.lastConvLayer(upSample_layer_2)
        
        output = output.view([output.shape[0],-1,output.shape[3]])
        return output

判别器Discriminator#

class DownSample_Discriminator(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
        super(DownSample_Discriminator, self).__init__()

        self.convLayer = nn.Sequential(nn.Conv2d(in_channels=in_channels,
                                                 out_channels=out_channels,
                                                 kernel_size=kernel_size,
                                                 stride=stride,
                                                 padding=padding),
                                       nn.InstanceNorm2d(num_features=out_channels,
                                                         affine=True))
        self.convLayerGates = nn.Sequential(nn.Conv2d(in_channels=in_channels,
                                                      out_channels=out_channels,
                                                      kernel_size=kernel_size,
                                                      stride=stride,
                                                      padding=padding),
                                            nn.InstanceNorm2d(num_features=out_channels,
                                                              affine=True))

    def forward(self, input):
        # GLU
        return self.convLayer(input) * torch.sigmoid(self.convLayerGates(input))


class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()

        self.convLayer1 = nn.Conv2d(in_channels=1,
                                    out_channels=128,
                                    kernel_size=[3, 3],
                                    stride=[1, 1])
        self.convLayer1_gates = nn.Conv2d(in_channels=1,
                                          out_channels=128,
                                          kernel_size=[3, 3],
                                          stride=[1, 1])

        # Note: Kernel Size have been modified in the PyTorch implementation
        # compared to the actual paper, as to retain dimensionality. Unlike,
        # TensorFlow, PyTorch doesn't have padding='same', hence, kernel sizes
        # were altered to retain the dimensionality after each layer

        # DownSample Layer
        self.downSample1 = DownSample_Discriminator(in_channels=128,
                                                    out_channels=256,
                                                    kernel_size=[3, 3],
                                                    stride=[2, 2],
                                                    padding=0)

        self.downSample2 = DownSample_Discriminator(in_channels=256,
                                                    out_channels=512,
                                                    kernel_size=[3, 3],
                                                    stride=[2, 2],
                                                    padding=0)

        self.downSample3 = DownSample_Discriminator(in_channels=512,
                                                    out_channels=1024,
                                                    kernel_size=[3, 3],
                                                    stride=[2, 2],
                                                    padding=0)

        self.downSample4 = DownSample_Discriminator(in_channels=1024,
                                                    out_channels=1024,
                                                    kernel_size=[1, 5],
                                                    stride=[1, 1],
                                                    padding=[0, 2])

        # Fully Connected Layer
        self.fc = nn.Linear(in_features=1024,
                            out_features=1)

        # output Layer
        self.output_layer = nn.Conv2d(in_channels=1024,
                                      out_channels=1,
                                      kernel_size=[1, 3],
                                      stride=[1, 1],
                                      padding=[0, 1])

    def forward(self, input):
        # input has shape [batch_size, num_features, time]
        # discriminator requires shape [batchSize, 1, num_features, time
        input = input.unsqueeze(1)
        # GLU
        pad_input = nn.ZeroPad2d((1, 1, 1, 1))
        layer1 = self.convLayer1(
            pad_input(input)) * torch.sigmoid(self.convLayer1_gates(pad_input(input)))

        pad_input = nn.ZeroPad2d((1, 0, 1, 0))
        downSample1 = self.downSample1(pad_input(layer1))

        pad_input = nn.ZeroPad2d((1, 0, 1, 0))
        downSample2 = self.downSample2(pad_input(downSample1))

        pad_input = nn.ZeroPad2d((1, 0, 1, 0))
        downSample3 = self.downSample3(pad_input(downSample2))

        downSample4 = self.downSample4(downSample3)
        downSample4 = self.output_layer(downSample4)

        downSample4 = downSample4.contiguous().permute(0, 2, 3, 1).contiguous()
        # fc = torch.sigmoid(self.fc(downSample3))
        # Taking off sigmoid layer to avoid vanishing gradient problem
        #fc = self.fc(downSample4)
        fc = torch.sigmoid(downSample4)
        return fc

残差单元Res-Block#

class ResidualLayer(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
        super(ResidualLayer, self).__init__()

        self.conv1d_layer = nn.Sequential(nn.Conv1d(in_channels=in_channels,
                                                    out_channels=out_channels,
                                                    kernel_size=kernel_size,
                                                    stride=1,
                                                    padding=padding),
                                          nn.InstanceNorm1d(num_features=out_channels,
                                                            affine=True))

        self.conv_layer_gates = nn.Sequential(nn.Conv1d(in_channels=in_channels,
                                                        out_channels=out_channels,
                                                        kernel_size=kernel_size,
                                                        stride=1,
                                                        padding=padding),
                                              nn.InstanceNorm1d(num_features=out_channels,
                                                                affine=True))

        self.conv1d_out_layer = nn.Sequential(nn.Conv1d(in_channels=out_channels,
                                                        out_channels=in_channels,
                                                        kernel_size=kernel_size,
                                                        stride=1,
                                                        padding=padding),
                                              nn.InstanceNorm1d(num_features=in_channels,
                                                                affine=True))

输入输出测试#

input = torch.randn(10, 24, 1100)  # (N, C_in, Width) For Conv1d
np.random.seed(0)
print(np.random.randn(10))
input = np.random.randn(15, 24, 128)
input = torch.from_numpy(input).float()
# print(input)
generator = Generator()

output = generator(input)
print("Output shape Generator", output.shape)

# Discriminator Dimensionality Testing
# input = torch.randn(32, 1, 24, 128)  # (N, C_in, height, width) For Conv2d
discriminator = Discriminator()
#pdb.set_trace()
output = discriminator(output)
print("Output shape Discriminator", output.shape)

完整代码#

import torch.nn as nn
import torch.nn.functional as F
import torch
import numpy as np
import pdb


class GLU(nn.Module):
    def __init__(self):
        super(GLU, self).__init__()
        # Custom Implementation because the Voice Conversion Cycle GAN
        # paper assumes GLU won't reduce the dimension of tensor by 2.

    def forward(self, input):
        return input * torch.sigmoid(input)


class up_2Dsample(nn.Module):
    def __init__(self, upscale_factor=2):
        super(up_2Dsample, self).__init__()
        self.scale_factor = upscale_factor

    def forward(self, input):
        h = input.shape[2]
        w = input.shape[3]
        new_size = [h * self.scale_factor, w * self.scale_factor]
        return F.interpolate(input,new_size)
       

class PixelShuffle(nn.Module):
    def __init__(self, upscale_factor=2):
        super(PixelShuffle, self).__init__()
        # Custom Implementation because PyTorch PixelShuffle requires,
        # 4D input. Whereas, in this case we have have 3D array
        self.upscale_factor = upscale_factor

    def forward(self, input):
        n = input.shape[0]
        c_out = input.shape[1] // self.upscale_factor
        w_new = input.shape[2] * self.upscale_factor
        return input.view(n, c_out, w_new)


class ResidualLayer(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
        super(ResidualLayer, self).__init__()

        # self.residualLayer = nn.Sequential(nn.Conv1d(in_channels=in_channels,
        #                                              out_channels=out_channels,
        #                                              kernel_size=kernel_size,
        #                                              stride=1,
        #                                              padding=padding),
        #                                    nn.InstanceNorm1d(
        #                                        num_features=out_channels,
        #                                        affine=True),
        #                                    GLU(),
        #                                    nn.Conv1d(in_channels=out_channels,
        #                                              out_channels=in_channels,
        #                                              kernel_size=kernel_size,
        #                                              stride=1,
        #                                              padding=padding),
        #                                    nn.InstanceNorm1d(
        #                                        num_features=in_channels,
        #                                        affine=True)
        #                                    )

        self.conv1d_layer = nn.Sequential(nn.Conv1d(in_channels=in_channels,
                                                    out_channels=out_channels,
                                                    kernel_size=kernel_size,
                                                    stride=1,
                                                    padding=padding),
                                          nn.InstanceNorm1d(num_features=out_channels,
                                                            affine=True))

        self.conv_layer_gates = nn.Sequential(nn.Conv1d(in_channels=in_channels,
                                                        out_channels=out_channels,
                                                        kernel_size=kernel_size,
                                                        stride=1,
                                                        padding=padding),
                                              nn.InstanceNorm1d(num_features=out_channels,
                                                                affine=True))

        self.conv1d_out_layer = nn.Sequential(nn.Conv1d(in_channels=out_channels,
                                                        out_channels=in_channels,
                                                        kernel_size=kernel_size,
                                                        stride=1,
                                                        padding=padding),
                                              nn.InstanceNorm1d(num_features=in_channels,
                                                                affine=True))

    def forward(self, input):
        h1_norm = self.conv1d_layer(input)
        h1_gates_norm = self.conv_layer_gates(input)

        # GLU
        h1_glu = h1_norm * torch.sigmoid(h1_gates_norm)

        h2_norm = self.conv1d_out_layer(h1_glu)
        return input + h2_norm


class downSample_Generator(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
        super(downSample_Generator, self).__init__()

        self.convLayer = nn.Sequential(nn.Conv2d(in_channels=in_channels,
                                                 out_channels=out_channels,
                                                 kernel_size=kernel_size,
                                                 stride=stride,
                                                 padding=padding),
                                       nn.InstanceNorm2d(num_features=out_channels,
                                                         affine=True))
        self.convLayer_gates = nn.Sequential(nn.Conv2d(in_channels=in_channels,
                                                       out_channels=out_channels,
                                                       kernel_size=kernel_size,
                                                       stride=stride,
                                                       padding=padding),
                                             nn.InstanceNorm2d(num_features=out_channels,
                                                               affine=True))

    def forward(self, input):
        a = self.convLayer(input)
        b = self.convLayer_gates(input)
        return self.convLayer(input) * torch.sigmoid(self.convLayer_gates(input))


class upSample_Generator(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
        super(upSample_Generator, self).__init__()

        self.convLayer = nn.Sequential(nn.Conv2d(in_channels=in_channels,
                                                 out_channels=out_channels,
                                                 kernel_size=kernel_size,
                                                 stride=stride,
                                                 padding=padding),
                                       #PixelShuffle(upscale_factor=2),
                                       up_2Dsample(upscale_factor=2),
                                       nn.InstanceNorm2d(num_features=out_channels,
                                                         affine=True))
        self.convLayer_gates = nn.Sequential(nn.Conv2d(in_channels=in_channels,
                                                       out_channels=out_channels,
                                                       kernel_size=kernel_size,
                                                       stride=stride,
                                                       padding=padding),
                                             #PixelShuffle(upscale_factor=2),
                                             up_2Dsample(upscale_factor=2),
                                             nn.InstanceNorm2d(num_features=out_channels,
                                                               affine=True))

   

    def forward(self, input):        
        return self.convLayer(input) * torch.sigmoid(self.convLayer_gates(input))


class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=1,
                               out_channels=128,
                               kernel_size=[5,15],
                               stride=1,
                               padding=[2,7])

        self.conv1_gates = nn.Conv2d(in_channels=1,
                               out_channels=128,
                               kernel_size=[5,15],
                               stride=1,
                               padding=[2,7])

        # Downsample Layer
        self.downSample1 = downSample_Generator(in_channels=128,
                                                out_channels=256,
                                                kernel_size=5,
                                                stride=2,
                                                padding=2)

        self.downSample2 = downSample_Generator(in_channels=256,
                                                out_channels=512,
                                                kernel_size=5,
                                                stride=2,
                                                padding=2)
        #reshape
        self.conv2 = nn.Conv1d(in_channels=3072,
                               out_channels=512,
                               kernel_size=1,
                               stride=1)

        # Residual Blocks
        self.residualLayer1 = ResidualLayer(in_channels=512,
                                            out_channels=1024,
                                            kernel_size=3,
                                            stride=1,
                                            padding=1)
        self.residualLayer2 = ResidualLayer(in_channels=512,
                                            out_channels=1024,
                                            kernel_size=3,
                                            stride=1,
                                            padding=1)
        self.residualLayer3 = ResidualLayer(in_channels=512,
                                            out_channels=1024,
                                            kernel_size=3,
                                            stride=1,
                                            padding=1)
        self.residualLayer4 = ResidualLayer(in_channels=512,
                                            out_channels=1024,
                                            kernel_size=3,
                                            stride=1,
                                            padding=1)
        self.residualLayer5 = ResidualLayer(in_channels=512,
                                            out_channels=1024,
                                            kernel_size=3,
                                            stride=1,
                                            padding=1)
        self.residualLayer6 = ResidualLayer(in_channels=512,
                                            out_channels=1024,
                                            kernel_size=3,
                                            stride=1,
                                            padding=1)
        #reshape
        self.conv3 = nn.Conv1d(in_channels=512,
                               out_channels=3072,
                               kernel_size=1,
                               stride=1)


        # UpSample Layer
        self.upSample1 = upSample_Generator(in_channels=512,
                                            out_channels=1024,
                                            kernel_size=5,
                                            stride=1,
                                            padding=2)
        
        self.upSample2 = upSample_Generator(in_channels=1024,
                                            out_channels=512,
                                            kernel_size=5,
                                            stride=1,
                                            padding=2)

        self.lastConvLayer = nn.Conv2d(in_channels=512,
                                       out_channels=1,
                                       kernel_size=[5,15],
                                       stride=1,
                                       padding=[2,7])

    def forward(self, input):
        # GLU
        input = input.unsqueeze(1)

        conv1 = self.conv1(input) * torch.sigmoid(self.conv1_gates(input))

        downsample1 = self.downSample1(conv1)
        
        downsample2 = self.downSample2(downsample1)
        
        downsample3 = downsample2.view([downsample2.shape[0],-1,downsample2.shape[3]])
        
        downsample3 = self.conv2(downsample3)
        
        residual_layer_1 = self.residualLayer1(downsample3)
        
        residual_layer_2 = self.residualLayer2(residual_layer_1)
        
        residual_layer_3 = self.residualLayer3(residual_layer_2)
        
        residual_layer_4 = self.residualLayer4(residual_layer_3)
        
        residual_layer_5 = self.residualLayer5(residual_layer_4)
        
        residual_layer_6 = self.residualLayer6(residual_layer_5)
        
        residual_layer_6 = self.conv3(residual_layer_6)
        
        residual_layer_6 = residual_layer_6.view([downsample2.shape[0],downsample2.shape[1],downsample2.shape[2],downsample2.shape[3]])
        
        upSample_layer_1 = self.upSample1(residual_layer_6)
        
        upSample_layer_2 = self.upSample2(upSample_layer_1)
        
        output = self.lastConvLayer(upSample_layer_2)
        
        output = output.view([output.shape[0],-1,output.shape[3]])
        return output


class DownSample_Discriminator(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
        super(DownSample_Discriminator, self).__init__()

        self.convLayer = nn.Sequential(nn.Conv2d(in_channels=in_channels,
                                                 out_channels=out_channels,
                                                 kernel_size=kernel_size,
                                                 stride=stride,
                                                 padding=padding),
                                       nn.InstanceNorm2d(num_features=out_channels,
                                                         affine=True))
        self.convLayerGates = nn.Sequential(nn.Conv2d(in_channels=in_channels,
                                                      out_channels=out_channels,
                                                      kernel_size=kernel_size,
                                                      stride=stride,
                                                      padding=padding),
                                            nn.InstanceNorm2d(num_features=out_channels,
                                                              affine=True))

    def forward(self, input):
        # GLU
        return self.convLayer(input) * torch.sigmoid(self.convLayerGates(input))


class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()

        self.convLayer1 = nn.Conv2d(in_channels=1,
                                    out_channels=128,
                                    kernel_size=[3, 3],
                                    stride=[1, 1])
        self.convLayer1_gates = nn.Conv2d(in_channels=1,
                                          out_channels=128,
                                          kernel_size=[3, 3],
                                          stride=[1, 1])

        # Note: Kernel Size have been modified in the PyTorch implementation
        # compared to the actual paper, as to retain dimensionality. Unlike,
        # TensorFlow, PyTorch doesn't have padding='same', hence, kernel sizes
        # were altered to retain the dimensionality after each layer

        # DownSample Layer
        self.downSample1 = DownSample_Discriminator(in_channels=128,
                                                    out_channels=256,
                                                    kernel_size=[3, 3],
                                                    stride=[2, 2],
                                                    padding=0)

        self.downSample2 = DownSample_Discriminator(in_channels=256,
                                                    out_channels=512,
                                                    kernel_size=[3, 3],
                                                    stride=[2, 2],
                                                    padding=0)

        self.downSample3 = DownSample_Discriminator(in_channels=512,
                                                    out_channels=1024,
                                                    kernel_size=[3, 3],
                                                    stride=[2, 2],
                                                    padding=0)

        self.downSample4 = DownSample_Discriminator(in_channels=1024,
                                                    out_channels=1024,
                                                    kernel_size=[1, 5],
                                                    stride=[1, 1],
                                                    padding=[0, 2])

        # Fully Connected Layer
        self.fc = nn.Linear(in_features=1024,
                            out_features=1)

        # output Layer
        self.output_layer = nn.Conv2d(in_channels=1024,
                                      out_channels=1,
                                      kernel_size=[1, 3],
                                      stride=[1, 1],
                                      padding=[0, 1])

    # def downSample(self, in_channels, out_channels, kernel_size, stride, padding):
    #     convLayer = nn.Sequential(nn.Conv2d(in_channels=in_channels,
    #                                         out_channels=out_channels,
    #                                         kernel_size=kernel_size,
    #                                         stride=stride,
    #                                         padding=padding),
    #                               nn.InstanceNorm2d(num_features=out_channels,
    #                                                 affine=True),
    #                               GLU())
    #     return convLayer

    def forward(self, input):
        # input has shape [batch_size, num_features, time]
        # discriminator requires shape [batchSize, 1, num_features, time]
        input = input.unsqueeze(1)
        # GLU
        pad_input = nn.ZeroPad2d((1, 1, 1, 1))
        layer1 = self.convLayer1(
            pad_input(input)) * torch.sigmoid(self.convLayer1_gates(pad_input(input)))

        pad_input = nn.ZeroPad2d((1, 0, 1, 0))
        downSample1 = self.downSample1(pad_input(layer1))

        pad_input = nn.ZeroPad2d((1, 0, 1, 0))
        downSample2 = self.downSample2(pad_input(downSample1))

        pad_input = nn.ZeroPad2d((1, 0, 1, 0))
        downSample3 = self.downSample3(pad_input(downSample2))

        downSample4 = self.downSample4(downSample3)
        downSample4 = self.output_layer(downSample4)

        downSample4 = downSample4.contiguous().permute(0, 2, 3, 1).contiguous()
        # fc = torch.sigmoid(self.fc(downSample3))
        # Taking off sigmoid layer to avoid vanishing gradient problem
        #fc = self.fc(downSample4)
        fc = torch.sigmoid(downSample4)
        return fc


if __name__ == '__main__':
    
    # Generator Dimensionality Testing
    input = torch.randn(10, 24, 1100)  # (N, C_in, Width) For Conv1d
    np.random.seed(0)
    print(np.random.randn(10))
    input = np.random.randn(15, 24, 128)
    input = torch.from_numpy(input).float()
    # print(input)
    generator = Generator()
    
    output = generator(input)
    print("Output shape Generator", output.shape)
    
    # Discriminator Dimensionality Testing
    # input = torch.randn(32, 1, 24, 128)  # (N, C_in, height, width) For Conv2d
    discriminator = Discriminator()
    #pdb.set_trace()
    output = discriminator(output)
    print("Output shape Discriminator", output.shape)
CycleGAN-V2音色转换算法复现
https://blog.kimbleex.top/posts/2024-05-12-cycleganv2-voice-trans/音色转换模型cycleganv2的复现/
作者
Kimbleex
发布于
2024-05-12
许可协议
CC BY-NC-SA 4.0