Symmetry as Intervention;
Causal Estimation with Data Augmentation

  Paper   PDF   Reviews   Code   Slides   Poster   Video

NeurIPS  (Spotlight)

TLDR: We show that data augmentation (DA) transformations that correspond to symmetries in data generaiton are equivalent to causal interventions. Such DA can hence be used to mitigate bias due to hidden confounding in observational (X,Y)(X, Y) data, improving causal estimation and robust prediction.


Statistical vs. Causal Estimation

Empirical risk minimization: We are given nn samples D{(xi,yi)}i=0n\mathcal{D}\coloneqq \{ (\mathbf{x}_i, \mathbf{y}_i) \}_{i=0}^n of outcome YY and treatment XX from the following data generating process.

Y=f(X)+ξE[ξ]=0.(†) Y = f(X) + \xi \qquad \mathbb{E}[\xi] = 0 . \tag{†}

We would like to estimate the function ff. Under the standard assumption that noise ξ\xi is uncorrelated with XX, we can recover ff via empirical risk minimization (ERM) by finding a function h^ERM\widehat{h}_{\text{ERM}} that best predicts YY values for unlabelled XX values. Regularization techniques like data augmentation (DA) are then used to reduce estimation variance by multiple random perturbations (Gxi,yi)(G\mathbf{x}_i, \mathbf{y}_i) for each sample (xi,yi)D(\mathbf{x}_i, \mathbf{y}_i)\in\mathcal{D} using random transformation GG.

Causal Estimation: However, XX is generally correlated with ξ\xi, rendering ERM based estimation biased. Known as confounding bias, it arises due to unobserved common causes CC of XX and YY, known as confounders. Removing confounding bias requires that we make XX and ξ\xi uncorrelated via an intervention where we independently assign values of XX during data generation. Alas, interventions are often inaccessible compared to observational data.

A common workaround is to use auxiliary variables to correct for confounding. One approach is that of instrumental variable (IV) regression, where an instrument ZZ satisfies certain conditional independences with respect to (X,Y,C)(X, Y, C). Unfortunately, even IVs are unavailable in many application domains.

Causal Estimation with Data Augmentation

Augmentation, intervention equivalence.
Figure 1: Graphs for the data generating process; (left) The original model ()(\dagger) post DA application. (right) The modified model ()(\ddagger) post soft-intervention.

Outcome Invariant DA: We consider DA transformations with respect to which ff is invariant. Specifically, GG takes values in G\mathcal{G} such that ff is G\mathcal{G}-invariant:

f(x)=f(gx),    (x,g)X×G. f(\mathbf{x}) = f(\mathbf{g} \mathbf{x}), \qquad \forall \;\; (\mathbf{x}, \mathbf{g})\in \mathcal{X}\times\mathcal{G}.

Of course, constructing such DA requires knowledge of symmetries of ff. For example, when classifying images of cats vs. dogs, the true labeling function would certainly be invariant to image rotations. GG would then represent the random rotation angle, whereas GxG\mathbf{x} would be the rotated image x\mathbf{x}.

DA as soft-intervention: Our key insight is that such DA on observatinoal (X,Y)(X, Y) data is equivalent to changing the data generating process ()(\dagger) itself to

Y=f(GX)+ξE[ξ]=0.(‡) Y = f(GX) + \xi \qquad \mathbb{E}[\xi] = 0 . \tag{‡}

DA therefore constitutes as a soft-intervention on XX, and as such can mitigate hidden confounding bias, thereby improving causal estimation of ff.

DA as relaxed IVs: Next, we frame DA transformations GG as IV-like (IVL) by observing that they satisfy similar conditional independences as IVs by construction. As such, we can use DA in composition with IV regression to further reduce confounding bias and improve causal estimation beyond simple DA.

Robust Prediction

Reducing confounding bias, even when ff itself may not be identifiable, is an upstream problem for robust prediction—predictors that generalize well to out-of-distribution (OOD) shifts in XX. A predictor that fails on shifted distributions does so because it learned spurious correlations (i.e., confounding). We tackle this root cause directly by re-purposing the common IID generalizaiton tool of DA to instead achieve downstream goals of OOD generalization.

Experimental Results

Estimation error is captured in an interpretable way using normalized causal excess risk (nCER)—it is 00 for the true solution ff and 11 for pure confounding.

Linear Gaussian simulation experiment.
Figure 2: Simulation experiment for a linear Gaussian data generation model. κ\kappa and γ\gamma control the amount of confounding and strength of DA respectively. α\alpha is the IVL regularization parameter. All three are set to 11 by default. Each data-point averages nCER\operatorname{nCER} over 3232 trials with a 95%95\% confidence interval.
Benchmark comparison with OOD baselines.
Figure 3: Experiment results; common OOD generalisation benchmarks compared against the ERM, DA+ERM and DA+IV baselines, including DA+IVL.

Citation

@misc{akbar2025symmetryAsIntervention,
      title={An Analysis of Causal Effect Estimation using Outcome Invariant Data Augmentation}, 
      author={Uzair Akbar and Niki Kilbertus and Hao Shen and Krikamol Muandet and Bo Dai},
      year={2025},
      eprint={2510.25128},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.25128},
}