TLDR: We show that data augmentation (DA) transformations that correspond to symmetries in data generaiton are equivalent to causal interventions. Such DA can hence be used to mitigate bias due to hidden confounding in observational data, improving causal estimation and robust prediction.
Statistical vs. Causal Estimation
Empirical risk minimization: We are given samples of outcome and treatment from the following data generating process.
We would like to estimate the function . Under the standard assumption that noise is uncorrelated with , we can recover via empirical risk minimization (ERM) by finding a function that best predicts values for unlabelled values. Regularization techniques like data augmentation (DA) are then used to reduce estimation variance by multiple random perturbations for each sample using random transformation .
Causal Estimation: However, is generally correlated with , rendering ERM based estimation biased. Known as confounding bias, it arises due to unobserved common causes of and , known as confounders. Removing confounding bias requires that we make and uncorrelated via an intervention where we independently assign values of during data generation. Alas, interventions are often inaccessible compared to observational data.
A common workaround is to use auxiliary variables to correct for confounding. One approach is that of instrumental variable (IV) regression, where an instrument satisfies certain conditional independences with respect to . Unfortunately, even IVs are unavailable in many application domains.
Causal Estimation with Data Augmentation
Outcome Invariant DA: We consider DA transformations with respect to which is invariant. Specifically, takes values in such that is -invariant:
Of course, constructing such DA requires knowledge of symmetries of . For example, when classifying images of cats vs. dogs, the true labeling function would certainly be invariant to image rotations. would then represent the random rotation angle, whereas would be the rotated image .
DA as soft-intervention: Our key insight is that such DA on observatinoal data is equivalent to changing the data generating process itself to
DA therefore constitutes as a soft-intervention on , and as such can mitigate hidden confounding bias, thereby improving causal estimation of .
DA as relaxed IVs: Next, we frame DA transformations as IV-like (IVL) by observing that they satisfy similar conditional independences as IVs by construction. As such, we can use DA in composition with IV regression to further reduce confounding bias and improve causal estimation beyond simple DA.
Robust Prediction
Reducing confounding bias, even when itself may not be identifiable, is an upstream problem for robust prediction—predictors that generalize well to out-of-distribution (OOD) shifts in . A predictor that fails on shifted distributions does so because it learned spurious correlations (i.e., confounding). We tackle this root cause directly by re-purposing the common IID generalizaiton tool of DA to instead achieve downstream goals of OOD generalization.
Experimental Results
Estimation error is captured in an interpretable way using normalized causal excess risk (nCER)—it is for the true solution and for pure confounding.
Citation