Domain Generalization aims to develop models that can generalize to novel and unseen data distributions. In this work, we study how model architectures and pre-training objectives impact feature richness and propose a method to effectively leverage them for domain generalization. Specifically, given a pre-trained feature space, we first discover latent domain structures, referred to as pseudo-domains, that capture domain-specific variations in an unsupervised manner. Next, we augment existing classifiers with these complementary pseudo-domain representations making them more amenable to diverse unseen test domains. We analyze how different pre-training feature spaces differ in the domain-specific variances they capture. Our empirical studies reveal that features from diffusion models excel at separating domains in the absence of explicit domain labels and capture nuanced domain-specific information. On 5 datasets, we show that our very simple framework improves generalization to unseen domains by a maximum test accuracy improvement of over 4% compared to the standard baseline Empirical Risk Minimization (ERM). Crucially, our method outperforms most algorithms that access domain labels during training.
Our method builds on the insight that domain-specific structure can be inferred from pre-trained features without requiring domain labels. We begin by identifying pseudo-domains via clustering in the latent space.
We quantify domain separation using the Normalized Mutual Information (NMI) score between cluster assignments
(with K
= number of ground truth domains) and the corresponding domain labels. For example for the VLCS dataset shown below, we cluster with K
= 4 and compute the NMI score between the cluster assignments and the domain labels.
In addition to Domain NMI scores, we also compute Class NMI scores in a similar fashion. This allows us to quantify the amount of class-specific information captured by the clusters.
To obtain domain-specific centroids from the clusters, ideally we would need a feature space that captures domain-specific information while being invariant to class-specific information. We can quantify this using the NMI scores described above. The ideal feature space would have high domain NMI and a relatively lower class NMI.
Normalized Mutual Information (NMI) – Domain Labels ↑
Normalized Mutual Information (NMI) – Class Labels ↓
PACS: domain vs class NMI comparison.
VLCS: domain vs class NMI comparison.
TerraIncognita: domain vs class NMI comparison.
OfficeHome: domain vs class NMI comparison.
DomainNet: domain vs class NMI comparison.
GUIDE on different feature spaces
GUIDE against other approaches
GUIDE + Enhanced Training Strategies
Comparison using SWAD [Cha et al., 2021], MIRO [Cha et al., 2022], and ERM++ [Teterwak et al., 2024] on PACS and TerraIncognita (TI). GUIDE trained with ERM++ further improves performance.
@misc{thomas2025whatslatentleveragingdiffusion,
title={What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization},
author={Xavier Thomas and Deepti Ghadiyaram},
year={2025},
eprint={2503.06698},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2503.06698},
}