U-Net from Denoising Diffusion Probabilistic Models#

`UNet`	U-Net for predicting noise in images
`SinusoidalPositionEmbeddings`	Transformer position encoding
`ResBlock`	3x3 basic resblocks with group norm, dropout and timestep embeddings
`DownSample`	Downsample blocks
`UpSample`	Upsample blocks
`Attention`	Self Attention with groupnorm

class dmme.models.ddpm.UNet(in_channels=3, pos_dim=128, emb_dim=512, num_groups=32, dropout=0.1, channels_per_depth=(128, 256, 256, 256), num_blocks=2, attention_depths=(2,))[source]#

U-Net for predicting noise in images

Parameters:

in_channels (int) – input channels of image
pos_dim (int) – dimension of position embedding
emb_dim (int) – dimension of timestep embedding
num_groups (int) – number of groups in nn.GroupNorm
dropout (float) – dropout rate in nn.Dropout2d
channels_per_depth (Tuple[int, ...]) – channels per depth
num_blocks (int) – number of resblocks to use in each depth
attention_depths (Tuple[int, ...]) – depths to use attention blocks

forward(x, c)[source]#

Predicts noise from x

Parameters:

x (torch.Tensor) – image of shape \((N, C, H, W)\)
c (torch.Tensor) – timestep of shape \((N,)\)

Returns:

estimated noise in input image x

Return type:

(torch.Tensor)

class dmme.models.ddpm.SinusoidalPositionEmbeddings(dim)[source]#

Transformer position encoding

Parameters:: dim (int) – number of dimensions of the position embedding, \(d_\text{emb}\)

forward(t)[source]#

Parameters:: t (torch.Tensor) – timestep of shape \((N,)\)
Returns:: Positional Embedding of shape \((N, d_\text{emb})\)
Return type:: (torch.Tensor)

class dmme.models.ddpm.ResBlock(c_in, c_out, with_attention=False, emb_dim=512, num_groups=32, p=0.1)[source]#

3x3 basic resblocks with group norm, dropout and timestep embeddings

Parameters:

c_in (int) – number of input channels
c_out (int) – number of output channels
with_attention (bool) – whether to add attention block
emb_dim (int) – input timestep embedding dimension
num_groups (int) – number of groups in nn.GroupNorm
p (float) – dropout rate in nn.Dropout2d

forward(x, c)[source]#

Parameters:

x (torch.Tensor) – image of shape \((N, C_\text{in}, H, W)\)
c (torch.Tensor) – timestep embedding of shape \((N, d_\text{emb})\)

Returns:

feature map of shape \((N, C_\text{out}, H, W)\)

Return type:

(torch.Tensor)

dmme.models.ddpm.DownSample(c_in, c_out)[source]#

Downsample blocks

Parameters:

c_in (int) – number of input channels
c_out (int) – number of output channels

Returns:

down sampling layer using 2d convolutions

Return type:

(nn.Conv2d)

class dmme.models.ddpm.UpSample(c_in, c_out)[source]#

Upsample blocks

Parameters:

c_in (int) – number of input channels
c_out (int) – number of output channels

forward(x)[source]#

Parameters:: x (torch.Tensor) – image of shape \((N, C_\text{in}, H, W)\)
Returns:: downsampled feature map of shape \((N, C_\text{out}, H//2, W//2)\)
Return type:: (torch.Tensor)

class dmme.models.ddpm.Attention(dim, num_groups)[source]#

Self Attention with groupnorm

Parameters:

dim (int) – equivalent to \(d_\text{model}\)
num_groups (int) – number of groups in nn.GroupNorm

forward(x)[source]#

Parameters:: x (torch.Tensor) – image of shape \((N, C_\text{in}, H, W)\)
Returns:: feature maps of shape \((N, C_\text{in}, H, W)\)
Return type:: (torch.Tensor)