Memory-Efficient Filter-Guided Diffusion with Domain Transform Filtering.

(left) Da Vinci’s Mona Lisa provided as guidance image. (right) 512 × 512 image generated by our technique taking as input the Mona Lisa image and the text prompt ‘‘A painting of a dog in a wig’’. using parameters: seed = 10, $\sigma_s = 3$, $\sigma_r = 0.3$, $𝑡_{𝑒𝑛𝑑} = 15$, 𝑑𝑒𝑡𝑎𝑖𝑙 = 1.0, and Stable Diffusion 1.4.
Memory-Efficient Filter-Guided Diffusion with Domain Transform Filtering
Gustavo L. Tamiosso
gltamiosso@inf.ufrgs.br
Caetano B. Müller
cbmuller@inf.ufrgs.br
Lucas S. Bombana
lsbombana@inf.ufrgs.br
Manuel M. Oliveira
oliveira@inf.ufrgs.br

 


Computers & Graphics.
Volume 132 (2025) Article 104389 pp. 1-10. [DOI]


Abstract

Diffusion models are powerful tools for image synthesis and editing, yet preserving structural content from a guidance image remains challenging. Filter-Guided Diffusion (FGD) tackles this by applying edge-preserving filtering at each denoising step. However, the original FGD relies on joint bilateral filtering, which incurs high VRAM and computational costs, limiting its scalability to high-resolution images. We propose Domain Transform Filter-Guided Diffusion (DT-FGD), a lightweight variant that replaces bilateral filtering with the efficient domain transform filter and introduces a normalization strategy for the guidance image’s latent representation. DT-FGD achieves significantly lower VRAM usage and faster inference while improving structural consistency. Our method produces images that better align with the text prompt and vary smoothly under filter parameter changes, leading to more predictable outcomes. Experiments show that DT-FGD can reduce VRAM consumption by over 50%, accelerates inference, and scales to high resolutions on a single GPU—unlike prior approaches. We further present a variant that offers even greater memory savings at the cost of additional inference time. DT-FGD enables structure-preserving diffusion on resource-constrained hardware and opens new directions for high-resolution, controllable image synthesis.

Keywords

Diffusion Models; Structure Guidance; Domain Transform Filter; Edge-preserving Filtering; Image Synthesis.

Examples

A portrait of a bird.

(left) Portrait of Carl Friedrich Gauss by Christian Jensen provided as guidance. (right) $1,024 × 1,024$ image generated by DT-FGD taking as input Gauss portrait and the text prompt "a portrait of a bird`` using: seed= 9, method seed= 12, $\sigma_s = 1.5$, $\sigma_r = 0.3$, $𝑡_{𝑒𝑛𝑑} = 15$, detail= 0.5, and Stable Diffusion version 2.1.

A cat in a red hat.

(left) Vermeer’s Girl with the Red Hat painting used as guidance image. (right) $512 × 512$ image generated with DT-FGD taking as input the image on the left and the text prompt "a cat in a red hat``. Parameters: seed = 10, $\sigma_s = 3$, $\sigma_r = 0.3$, $t_{𝑒𝑛𝑑} = 15$, 𝑑𝑒𝑡𝑎𝑖𝑙 = 1.2, using Stable Diffusion 1.4.

A photo of a steak.

(left) Guidance image of a loaf of bread (from https://filterguideddiffusion.github.io). (center) and (right) Images generated with DT-FGD at different resolutions ($512 × 512$ and $1192 × 1192$, respectively) taking as input the image on the left and the text prompt ‘‘a photo of a steak’’ using Stable diffusion 1.4. Parameters: seed = 1, method seed = 10, $\sigma_s = 3$, $\sigma_r = 0.3$, $𝑡_{𝑒𝑛𝑑} = 15$, detail = 1.6.

Girl with Pearl earring. Comparison DT-FGD and FGD.

DT-FGD with its normalized guidance latent ($𝑥_𝑔$) generates smoother image transitions (top) compared to FGD* (bottom) as we vary $\sigma_s$ values, leading to more predictable results. $1024 × 1024$ images generated taking as input Johannes Vermeer's painting Girl with a Pearl Earring (shown on the left) and the text prompt "a portrait of a dog", and parameters seed = 1, method seed = 10, $𝑡_{𝑒𝑛𝑑}$ = 15, detail = 1.0, and Stable Diffusion 2.1.

* Gu Z, Yang E, Davis A. Filter-guided diffusion for controllable image generation. In: ACM SIGGRAPH 2024 conference papers. 2024, p. 1–10. S

Downloads

Paper


Paper (pre-print, low resolution)

The final publication (high resolution, incorporating the feedback from the reviewing process) is available at ScienceDirect - Elsevier


Reference

Citation

Gustavo L. Tamiosso, Caetano B. Müller, Lucas S. Bombana, Manuel M. Oliveira. Memory-Efficient Filter-Guided Diffusion with Domain Transform Filtering, Computers & Graphics, Volume 132 (2025) 104389.

BibTeX

@article{TamiossoEtAl2025DT-FGD,
    author  = {Gustavo L. Tamiosso and Caetano B. Müller and Lucas S. Bombana and Manuel M. Oliveira},
    title   = {Memory-Efficient Filter-Guided Diffusion with Domain Transform Filtering},
    journal = {Computers & Graphics},
    volume  = {132},
    number  = {104389}
    DOI     = {10.1016/j.cag.2025.104389},
    ISSN    = {0097-8493},
    pages   = {1--10},
    year    = {2025}
}
  

Acknowledgments

This work was sponsored by

CNPq-Brazil fellowship 305474/2022-7.
CAPES, Brazil Finance Code 001.