U-Net from Denoising Diffusion Probabilistic Models#
U-Net for predicting noise in images |
|
Transformer position encoding |
|
3x3 basic resblocks with group norm, dropout and timestep embeddings |
|
Downsample blocks |
|
Upsample blocks |
|
Self Attention with groupnorm |
- class dmme.models.ddpm.UNet(in_channels=3, pos_dim=128, emb_dim=512, num_groups=32, dropout=0.1, channels_per_depth=(128, 256, 256, 256), num_blocks=2, attention_depths=(2,))[source]#
U-Net for predicting noise in images
- Parameters:
in_channels (int) – input channels of image
pos_dim (int) – dimension of position embedding
emb_dim (int) – dimension of timestep embedding
num_groups (int) – number of groups in
nn.GroupNormdropout (float) – dropout rate in
nn.Dropout2dchannels_per_depth (Tuple[int, ...]) – channels per depth
num_blocks (int) – number of resblocks to use in each depth
attention_depths (Tuple[int, ...]) – depths to use attention blocks
- class dmme.models.ddpm.SinusoidalPositionEmbeddings(dim)[source]#
Transformer position encoding
- Parameters:
dim (int) – number of dimensions of the position embedding, \(d_\text{emb}\)
- class dmme.models.ddpm.ResBlock(c_in, c_out, with_attention=False, emb_dim=512, num_groups=32, p=0.1)[source]#
3x3 basic resblocks with group norm, dropout and timestep embeddings
- Parameters:
c_in (int) – number of input channels
c_out (int) – number of output channels
with_attention (bool) – whether to add attention block
emb_dim (int) – input timestep embedding dimension
num_groups (int) – number of groups in
nn.GroupNormp (float) – dropout rate in
nn.Dropout2d
- dmme.models.ddpm.DownSample(c_in, c_out)[source]#
Downsample blocks
- Parameters:
c_in (int) – number of input channels
c_out (int) – number of output channels
- Returns:
down sampling layer using 2d convolutions
- Return type:
(nn.Conv2d)
- class dmme.models.ddpm.UpSample(c_in, c_out)[source]#
Upsample blocks
- Parameters:
c_in (int) – number of input channels
c_out (int) – number of output channels