What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

Abstract

Domain Generalization aims to develop models that can generalize to novel and unseen data distributions. In this work, we study how model architectures and pre-training objectives impact feature richness and propose a method to effectively leverage them for domain generalization. Specifically, given a pre-trained feature space, we first discover latent domain structures, referred to as pseudo-domains, that capture domain-specific variations in an unsupervised manner. Next, we augment existing classifiers with these complementary pseudo-domain representations making them more amenable to diverse unseen test domains. We analyze how different pre-training feature spaces differ in the domain-specific variances they capture. Our empirical studies reveal that features from diffusion models excel at separating domains in the absence of explicit domain labels and capture nuanced domain-specific information. On 5 datasets, we show that our very simple framework improves generalization to unseen domains by a maximum test accuracy improvement of over 4% compared to the standard baseline Empirical Risk Minimization (ERM). Crucially, our method outperforms most algorithms that access domain labels during training.

Method

Pseudo-Domain Discovery

Our method builds on the insight that domain-specific structure can be inferred from pre-trained features without requiring domain labels. We begin by identifying pseudo-domains via clustering in the latent space.

Quantifying Domain Separation

We quantify domain separation using the Normalized Mutual Information (NMI) score between cluster assignments (with K = number of ground truth domains) and the corresponding domain labels. For example for the VLCS dataset shown below, we cluster with K = 4 and compute the NMI score between the cluster assignments and the domain labels. In addition to Domain NMI scores, we also compute Class NMI scores in a similar fashion. This allows us to quantify the amount of class-specific information captured by the clusters.

Domain vs Class NMI

To obtain domain-specific centroids from the clusters, ideally we would need a feature space that captures domain-specific information while being invariant to class-specific information. We can quantify this using the NMI scores described above. The ideal feature space would have high domain NMI and a relatively lower class NMI.

Normalized Mutual Information (NMI) – Domain Labels ↑

Normalized Mutual Information (NMI) – Class Labels ↓

VLCS domain vs class nmi — Comparison of domain and class NMI scores for different feature spaces on the VLCS dataset. DiT exhibits high domain NMI scores while having a low class NMI score.

PACS: domain vs class NMI comparison.

VLCS: domain vs class NMI comparison.

TerraIncognita: domain vs class NMI comparison.

OfficeHome: domain vs class NMI comparison.

DomainNet: domain vs class NMI comparison.

GUIDE: Generalization using Inferred Domains from Latent Embeddings

To the standard classification pipeline, we append these pseudo-domain representations to the features coming from the ResNet50 backbone before passing them to the classifier.

DomainBed Results

GUIDE on different feature spaces

Pseudo-domain features from diffusion models offered the best utility.
DiT excelled on high-level domain shifts (e.g., PACS, VLCS).
Stable Diffusion 2.1 performed best on environmental and spatial shifts (e.g., TerraIncognita).

GUIDE against other approaches

GUIDE outperforms both the baseline and other methods that rely on explicit ground truth domain labels during training. Methods in cyan correspond to domain-adaptive classifiers (described in Sec. 3.3).

GUIDE + Enhanced Training Strategies

Comparison using SWAD [Cha et al., 2021], MIRO [Cha et al., 2022], and ERM++ [Teterwak et al., 2024] on PACS and TerraIncognita (TI). GUIDE trained with ERM++ further improves performance.

BibTeX

@misc{thomas2025whatslatentleveragingdiffusion, title={What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization}, author={Xavier Thomas and Deepti Ghadiyaram}, year={2025}, eprint={2503.06698}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2503.06698}, }