Causal Inference in Natural Language Processing:

Causal Inference in Natural Language Processing:
Estimation, Prediction, Interpretation and Beyond

Amir Feder1,10∗, Katherine A. Keith2, Emaad Manzoor3, Reid Pryzant4,
Dhanya Sridhar5, Zach Wood-Doughty6, Jacob Eisenstein7, Justin Grimmer8,
Roi Reichart1, Margaret E. Roberts9, Brandon M. Stewart10,
Victor Veitch7,11, and Diyi Yang12

1TechnionIsrael Institute of Technology, Israel

2Williams College, USA
3University of Wisconsin – Madison, USA
4Microsoft, USA
5Columbia University, Kanada
6Nordwestliche Universität, USA
7Google Research, USA
8Universität in Stanford, USA
9University of California San Diego, USA
10Princeton University, USA
11Universität von Chicago, USA

12Georgia Tech, USA

Abstrakt

Modelle. We thus provide a unified overview
of causal inference for the NLP community.1

A fundamental goal of scientific research is
to learn about causal relationships. Jedoch,
despite its critical role in the life and so-
cial sciences, causality has not had the same
importance in Natural Language Processing
(NLP), which has traditionally placed more
emphasis on predictive tasks. This distinc-
tion is beginning to fade, with an emerging
area of interdisciplinary research at the con-
vergence of causal inference and language
Verarbeitung. Trotzdem, research on causality in NLP
remains scattered across domains without uni-
fied definitions, benchmark datasets and clear
articulations of the challenges and opportuni-
ties in the application of causal inference to
the textual domain, with its unique proper-
Krawatten. In this survey, we consolidate research
across academic areas and situate it in the
broader NLP landscape. We introduce the sta-
tistical challenge of estimating causal effects
with text, encompassing settings where text is
used as an outcome, treatment, or to address
confounding. Zusätzlich, we explore potential
uses of causal inference to improve the ro-
bustness, fairness, and interpretability of NLP

∗All authors equally contributed to this paper. Au-
thor names are organized alphabetically in two clusters:
First students and post-docs and then faculty members.
The email address of the corresponding (Erste) author is:
feder@campus.technion.ac.il.

1 Einführung

The increasing effectiveness of NLP has created
exciting new opportunities for interdisciplinary
collaborations, bringing NLP techniques to a
wide range of external research disciplines (z.B.,
Roberts et al., 2014; Zhang et al., 2020; Ophir
et al., 2020) and incorporating new data and tasks
into mainstream NLP (z.B., Thomas et al., 2006;
Pryzant et al., 2018). In such interdisciplinary
collaborations, many of the most important re-
search questions relate to the inference of causal
Beziehungen. Zum Beispiel, before recommending
a new drug therapy, clinicians want to know the
causal effect of the drug on disease progression.
Causal inference involves a question about a coun-
terfactual world created by taking an intervention:
What would a patient’s disease progression have
been if we had given them the drug? As we ex-
plain below, with observational data, the causal
effect is not equivalent to the correlation between
whether the drug is taken and the observed dis-
ease progression. There is now a vast literature
on techniques for making valid inferences using

1An online repository containing existing research on
inference and language processing is available
causal
Hier: https://github.com/causaltext/causal
-text-papers.

1138

Transactions of the Association for Computational Linguistics, Bd. 10, S. 1138–1158, 2022. https://doi.org/10.1162/tacl a 00511
Action Editor: Chris Brew. Submission batch: 4/2022; Revision batch: 7/2022; Published 10/2022.
C(cid:3) 2022 Verein für Computerlinguistik. Distributed under a CC-BY 4.0 Lizenz.

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
5
1
1
2
0
5
4
6
9
0

/

/
T

l

A
C
_
A
_
0
0
5
1
1
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

traditional (non-text) datasets (z.B., Morgan and
the application of these
Winship, 2015), Aber
techniques to natural language data raises new
fundamental challenges.

Umgekehrt, in many classical NLP applica-
tionen, the main goal is to make accurate predic-
tionen: Any statistical correlation is admissible,
regardless of the underlying causal relationship.
Jedoch, as NLP systems are increasingly de-
ployed in challenging and high-stakes scenarios,
we cannot rely on the usual assumption that
training and test data are identically distributed,
and we may not be satisfied with uninterpretable
black-box predictors. For both of these problems,
causality offers a promising path forward: Domain
knowledge of the causal structure of the data gen-
erating process can suggest inductive biases that
lead to more robust predictors, and a causal view
of the predictor itself can offer new insights on its
inner workings.

The core claim of this survey paper is that
deepening the connection between causality and
NLP has the potential to advance the goals of both
social science and NLP researchers. We divide
the intersection of causality and NLP into two
Bereiche: Estimating causal effects from text, Und
using causal formalisms to make NLP methods
more reliable. We next illustrate this distinction.

Example 1. An online forum has allowed its users
to indicate their preferred gender in their profiles
with a female or male icon. They notice that
users who label themselves with the female icon
tend to receive fewer ‘‘likes’’ on their posts. To
better evaluate their policy of allowing gender
information in profiles, they ask: Does using the
female icon cause a decrease in popularity for
a post?

Ex. 1 addresses the causal effect of signaling
female gender (treatment) on the likes a post
receives (outcome) (see discussion on signaling
at Keith et al., 2020). The counterfactual question
Ist: If we could manipulate the gender icon of
a post, how many likes would the post have
received?

The observed correlation between the gender
icons and the number of ‘‘likes’’ generally does
not coincide with the causal effect: It might in-
stead be a spurious correlation, induced by other
Variablen, known as confounders, which are cor-
related with both the treatment and the outcome
(see Gururangan et al., 2018, for an early discus-

sion of spurious correlation in NLP). One possible
confounder is the topic of each post: Posts writ-
ten by users who have selected the female icon
may be about certain topics (z.B., child birth or
menstruation) more often, and those topics may
not receive as many likes from the audience of
the broader online platform. As we will see in
§ 2, due to confounding, estimating a causal effect
requires assumptions.

Example 1 highlights the setting where the text
encodes the relevant confounders of a causal ef-
fect. The text as a confounder setting is one of
many causal inferences we can make with text
Daten. The text data can also encode outcomes or
treatments of interest. Zum Beispiel, we may won-
der about how gender signal affects the sentiment
of the reply that a post receives (text as outcome),
or about how a writing style affects the ‘‘likes’’ a
post receives (text as treatment).

NLP Helps Causal Inference. Causal inference
with text data involves several challenges that are
distinct from typical causal inference settings:
is high-dimensional, needs sophisticated
Text
modeling to measure semantically meaningful fac-
tors like topic, and demands careful thought to
formalize the intervention that a causal question
corresponds to. The developments in NLP around
modeling language,
from topic models (Blei
et al., 2003) to contextual embeddings (z.B.,
Devlin et al., 2019), offer promising ways to ex-
tract the information we need from text to estimate
causal effects. Jedoch, we need new assump-
tions to ensure that the use of NLP methods leads
to valid causal inferences. We discuss existing re-
search on estimating causal effects from text and
emphasize these challenges and opportunities in
§ 3.

Example 2. A medical research center wants to
build a classifier to detect clinical diagnoses from
the textual narratives of patient medical records.
The records are aggregated across multiple hos-
pital sites, which vary both in the frequency of the
target clinical condition and the writing style of
the narratives. When the classifier is applied to
records from sites that were not in the training
set, its accuracy decreases. Post-hoc analysis in-
dicates that it puts significant weight on seemingly
irrelevant features, such as formatting markers.

Like Ex. 1, Ex. 2 also involves a counter-
factual question: Does the classifier’s prediction

1139

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
5
1
1
2
0
5
4
6
9
0

/

/
T

l

A
C
_
A
_
0
0
5
1
1
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

change if we intervene to change the hospital
site, while holding the true clinical status fixed?
We want the classifier to rely on phrases that
express clinical facts, and not writing style. Wie-
immer, in the training data, the clinical condition
and the writing style are spuriously correlated,
due to the site acting as a confounding variable.
Zum Beispiel, a site might be more likely to en-
counter the target clinical condition due to its
location or speciality, and that site might also
employ distinctive textual features, such as boil-
erplate text at the beginning of each narrative. In
the training set, these features will be predictive
of the label, but they are unlikely to be useful in
deployment scenarios at new sites. In this ex-
reichlich, the hospital site acts like a confounder:
It creates a spurious correlation between some
features of the text and the prediction target.

Example 2 shows how the lack of robustness
can make NLP methods less trustworthy. A re-
lated problem is that NLP systems are often black
boxes, making it hard to understand how human-
interpretable features of the text lead to the ob-
served predictions. In this setting, we want to
know if some part of the text (z.B., some se-
quence of tokens) causes the output of an NLP
method (z.B., classification prediction).

Causal Models Can Help NLP. To address the
robustness and interpretability challenges posed
by NLP methods, we need new criteria to learn
models that go beyond exploiting correlations.
Zum Beispiel, we want predictors that are invari-
ant to certain changes that we make to text, solch
as changing the format while holding fixed the
ground truth label. There is considerable promise
in using causality to develop new criteria in ser-
vice of building robust and interpretable NLP
Methoden. In contrast to the well-studied area of
causal inference with text, this area of causality
and NLP research is less well understood, obwohl
well-motivated by recent empirical successes. In
§4, we cover the existing research and review
the challenges and opportunities around using
causality to improve NLP.

This position paper follows a small body of
surveys that review the role of text data within
causal inference (Egami et al., 2018; Keith et al.,
2020). We take a broader view, separating the in-
tersection of causality and NLP into two distinct
lines of research on estimating causal effects in
which text is at least one causal variable (§3) Und

using causal formalisms to improve robustness
and interpretability in NLP methods (§4). Nach
reading this paper, we envision that the reader
will have a broad understanding of: different types
of causal queries and the challenges they pre-
sent; the statistical and causal challenges that
are unique to working with text data and NLP
Methoden; and open problems in estimating effects
from text and applying causality to improve NLP
Methoden.

2 Hintergrund

Both focal problems of this survey (causal effect
estimation and causal formalisms for robust and
explainable prediction) involve causal inference.
The key ingredient to causal inference is defin-
ing counterfactuals based on an intervention of
interest. We will
illustrate this idea with the
motivating examples from §1.

Example 1 involves online forum posts and
the number of likes Y that they receive. Wir
use a binary variable T to indicate whether a
post uses a ‘‘female icon’’ (T = 1) or a ‘‘male
icon’’ (T = 0). We view the post icon T as the
‘‘treatment’’ in this example, but do not assume
that the treatment is randomly assigned (it may
be selected by the posts’ authors). The counterfac-
tual outcome Y (1) represents the number of
likes a post would have received had it used a
female icon. The counterfactual outcome Y (0) Ist
defined analogously.

The fundamental problem of causal inference
(Holland, 1986) is that we can never observe
Y (0) and Y (1) simultaneously for any unit
of analysis, the smallest unit about which one
wants to make counterfactual inquiries (z.B., A
post
in Ex. 1). This problem is what makes
causal inference harder than statistical inference
and impossible without identification assumptions
(see § 2.2).

Example 2 involves a trained classifier f (X)
that takes a textual clinical narrative X as input
and outputs a diagnosis prediction. The text X
is written based on the physician’s diagnosis Y ,
and is also influenced by the writing style used
at the hospital Z. We want to intervene upon
the hospital Z while holding the label Y fixed.
The counterfactual narrative X(z) is the text we
would have observed had we set the hospital to
the value z while holding the diagnosis fixed. Der
counterfactual prediction f (X(z)) is the output

1140

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
5
1
1
2
0
5
4
6
9
0

/

/
T

l

A
C
_
A
_
0
0
5
1
1
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

the trained classifier would have produced had we
given the counterfactual review X(z) as input.

2.1 Causal Estimands

An analyst begins by specifying target causal
quantities of interest, called causal estimands,
which typically involve counterfactuals. In Ex-
reichlich 1, one possible causal estimand is the
average treatment effect (ATE) (Rubin, 1974),

ATE = E[Y (1) − Y (0)]

(1)

where the expectation is over the generative dis-
tribution of posts. The ATE can be interpreted as
the change in the number of likes a post would
have received, on average, had the post used a
female icon instead of a male icon.

Another possible causal effect of interest is
the conditional average treatment effect (CATE)
(Imbens and Rubin, 2015),

CATE = E[Y (1) − Y (0) | G]

(2)

where G is a predefined subgroup of the pop-
ulation. Zum Beispiel, G could be all posts on
political topics. In this case, the CATE can be
interpreted as the change in the number of likes
a post on a political topic would have received,
on average, had the post used a male icon instead
of a female icon. CATEs are used to quantify
the heterogeneity of causal effects in different
population subgroups.

2.2 Identification Assumptions for
Causal Inference

We will focus on Example 1 and the ATE in
Gleichung (1) to explain the assumptions needed
for causal inference. Although we focus on the
ATE, related assumptions are needed in some
form for all causal estimands. Variables are the
same as those defined previously in this section.

Ignorability requires that the treatment assign-
ment be statistically independent of the counter-
factual outcomes,

T ⊥⊥ Y (A) ∀a ∈ {0, 1}

(3)

Note that this assumption is not equivalent to
independence between the treatment assignment
and the observed outcome Y . Zum Beispiel, Wenn
ignorability holds, Y ⊥⊥ T would additionally
imply that the treatment has no effect.

Randomized treatment assignment guarantees
ignorability by design. Zum Beispiel, we can guar-
antee ignorability in Example 1 by flipping a coin
to select the icon for each post, and disallowing
post authors from changing it.

Without randomized treatment assignment, ig-
norability could be violated by confounders, var-
iables that influence both the treatment status and
potential outcomes. In Example 1, suppose that:
(ich) the default post icon is male, (ii) only experi-
enced users change the icon for their posts based
on their gender, (iii) experienced users write posts
that receive relatively more likes. In this scenario,
the experience of post authors is a confounder:
Posts having female icons are more likely to be
written by experienced users, and thus receive
more likes. In the presence of confounders, causal
inference is only possible if we assume condi-
tional ignorability,

T ⊥⊥ Y (A) | X ∀a ∈ {0, 1}

(4)

where X is a set of observed variables, condition-
ing on which ensures independence between the
treatment assignment and the potential outcomes.
Mit anderen Worten, we can assume that all confounders
are observed.

Positivity requires that the probability of receiv-
ing treatment is bounded away from 0 Und 1 für
all values of the confounders X:

0 < Pr(T = 1 | X = x) < 1, ∀x (5) Intuitively, positivity requires that each unit under study has the possibility of being treated and has the possibility of being untreated. Randomized treatment assignment can also guarantee positivity by design. Consistency requires that the outcome observed for each unit under study at treatment level a ∈ {0, 1} is identical to the outcome we would have observed had that unit been assigned to treatment level a, T = a ⇔ Y (a) = Y ∀a ∈ {0, 1} (6) Consistency ensures that the potential outcomes for each unit under study take on a single value at each treatment level. Consistency will be vi- olated if different unobservable ‘‘versions’’ of the treatment lead to different potential outcomes. For example, if red and blue female icons had 1141 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 different effects on the number of likes received, but icon color was not recorded. Consistency will also be violated if the treatment assignment of one unit affects the potential outcomes of another; a phenomenon called interference (Rosenbaum, 2007). Randomized treatment assignment does not guarantee consistency by design. For example, if different icon colors affect the number of likes but are not considered by the model, then a ran- domized experiment will not solve the problem. As Hern´an (2016) discusses, consistency assump- tions are a ‘‘matter of expert agreement’’ and, while subjective, these assumptions are at least made more transparent by causal formalisms. These three assumptions enable identifying the ATE defined in Equation (1), as formalized in the following identification proof: E[Y (a)] (i) = EX [E[Y (a) | X]] (ii) = EX [E[Y (a) | X, T = a]] (iii) = EX [E[Y | X, T = a]], ∀a ∈ {0, 1} where equality (i) is due to iterated expectation, equality (ii) follows from conditional ignorabil- ity, and equality (iii) follows from consistency and positivity, which ensures that the conditional expectation E[Y | X, T = a] is well defined. The final expression can be computed from observ- able quantities alone. We refer to other background material to discuss how to identify and estimate causal effects with these assumptions in hand (Rubin, 2005; Pearl, 2009; Imbens and Rubin, 2015; Egami et al., 2018; Keith et al., 2020). 2.3 Causal Graphical Models Finding a set of variables X that ensure con- ditional ignorability is challenging, and requires making several carefully assessed assumptions about the causal relationships in the domain un- der study. Causal directed-acyclic graphs (DAGs) (Pearl, 2009) enable formally encoding these as- sumptions and deriving the set of variables X after conditioning on which ignorability is satisfied. In a causal DAG, an edge X → Y implies that X may or may not cause Y . The absence of an edge between X and Y implies that X does not cause Y . Bi-directed dotted arrows between vari- ables indicate that they are correlated potentially through some unobserved variable. Figure 1: Causal graphs for the motivating examples. (Left) In Example 1, the post icon (T ) is correlated with attributes of the post (X), and both variables af- fect the number of likes a post receives (Y ). (Right) In Example 2, the label (Y , i.e., diagnosis) and hospi- tal site (Z) are correlated, and both affect the clini- cal narrative (X). Predictions f (X) from the trained classifier depend on X. Figure 1 illustrates the causal DAGs we assume for Example 1 and Example 2. Given a causal DAG, causal dependencies between any pair of variables can be derived using the d-separation algorithm (Pearl, 1994). These dependencies can then be used to assess whether conditional ignor- ability holds for a given treatment, outcome, and set of conditioning variables X. For example, in the left DAG in Figure 1, the post icon T is not independent of the number of likes Y unless we condition on X. In the right DAG, the prediction f (X) is not independent of the hospital Z even after conditioning on the narrative X. 3 Estimating Causal Effects with Text In §2, we described assumptions for causal in- ference when the treatment, outcome, and con- founders were directly measured. In this section, we contribute a novel discussion about how causal assumptions are complicated when vari- ables necessary for a causal analysis are extracted automatically from text. Addressing these open challenges will require collaborations between the NLP and causal estimation communities to understand what are the requisite assumptions to draw valid causal conclusions. We highlight prior approaches and future challenges in settings where the text is a confounder, the outcome, or the treatment – but this discussion applies broadly to many text-based causal problems. To make these challenges clear, we will ex- pand upon Example 1 by supposing that a hy- pothetical online forum wants to understand and reduce harassment on its platform. Many such questions are causal: Do gendered icons influence 1142 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 the harassment users receive? Do longer suspen- sions make users less likely to harass others? How can a post be rewritten to avoid offending others? In each case, using NLP to measure aspects of language is integral to any causal analysis. loss on the treatment and counterfactual outcomes, they show that confounding properties could be found within text data. Roberts et al. (2020) com- bine these strategies with the topic model approach in a text matching framework. 3.1 Causal Effects with Textual Confounders Returning to Example 1, suppose the platform worries that users with female icons are more likely to receive harassment from other users. Such a finding might significantly influence plans for a new moderation strategy (Jhaver et al., 2018; Rubin et al., 2020). We may be unable or un- willing to randomize our treatment (the gender signal) of the author’s icon), so the causal effect of gender signal on harassment received might be confounded by other variables. The topic of the post may be an important confounder: some sub- ject areas may be discussed by a larger proportion of users with female icons, and more controversial subjects may attract more harassment. The text of the post provides evidence of the topic and thus acts as a confounder (Roberts et al., 2020). Previous Approaches. The main idea in this setting is to use NLP methods to extract con- founding aspects from text and then adjust for those aspects in an estimation approach such as propensity score matching. However, how and when these methods violate causal assumptions are still open questions. Keith et al. (2020) pro- vide a recent overview of several such methods and many potential threats to inference. One set of methods apply unsupervised di- mensionality reduction methods that reduce high- dimensional text data to a low-dimensional set of variables. Such methods include latent variable models such as topic models, embedding meth- ods, and auto-encoders. Roberts et al. (2020) and Sridhar and Getoor (2019) have applied topic models to extract confounding patterns from text data, and performed an adjustment for these inferred variables. Mozer et al. (2020) match texts using distance metrics on the bag-of-words representation. A second set of methods adjust for confounders from text with supervised NLP methods. Recently, Veitch et al. (2020) adapted pre-trained language models and supervised topic models with multi- ple classification heads for binary treatment and counterfactual outcomes. By learning a ‘‘suffi- cient’’ embedding that obtained low classification Challenges for Causal Assumptions with Text. In settings without randomized treatments, NLP methods that adjust for text confounding require a particularly strong statement of conditional ignor- ability (Equation 4): All aspects of confounding must be measured by the model. Because we can- not test this assumption, we should seek domain expertise to justify it or understand the theoretical and empirical consequences if it is violated. When the text is a confounder, its high- dimensionality makes positivity unlikely to hold (D’Amour et al., 2020). Even for approaches that extract a low-dimensional representation of the confounder from text, positivity is a concern. For example, in Example 1, posts might con- tain phrases that near-perfectly encode the chosen gender-icon of the author. If the learned represen- tation captures this information alongside other confounding aspects, it would be nearly impos- sible to imagine changing the gender icon while holding the gendered text fixed. 3.2 Causal Effects on Textual Outcomes Suppose platform moderators can choose to sus- pend users who violate community guidelines for either one day or one week, and we want to know which option has the greatest effect at decreasing the toxicity of the suspended user. If we could col- lect them for each user’s post, ground-truth human annotations of toxicity would be our ideal outcome variable. We would then use those outcomes to calculate the ATE, following the discussion in § 2. Our analysis of suspensions is complicated if, instead of ground-truth labels for our toxicity outcome, we rely on NLP methods to extract the outcome from the text. A core challenge is to distill the high-dimensional text into a low-dimensional measure of toxicity. Challenges for Causal Assumptions with Text. We saw in § 2 that randomizing the treat- ment assignment can ensure ignorability and pos- itivity; but even with randomization, we require more careful assessment to satisfy consistency. Suppose we randomly assign suspension lengths to users and then once those users return and 1143 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 continue to post, we use a clustering method to discover toxic and non-toxic groupings among the formerly suspended users. To estimate the causal effect of suspension length, we rely on the trained clustering model to infer our outcome variable. Assuming that the suspension policy does in truth have a causal effect on posting behavior, then be- cause our clustering model depends on all posts in its training data, it also depends on the treatment assignments that influenced each post. Thus, when we use the model to infer outcomes, each user’s outcome depends on all other users’ treatments. This violates the assumption of consistency—that potential outcomes do not depend on the treatment status of other units. This undermines the theoret- ical basis for our causal estimate, and, in practice, implies that different randomized treatment as- signments could lead to different treatment ef- fect estimates. These issues can be addressed by developing the measure on only a sample of the data and then estimating the effect on a separate, held-out data sample (Egami et al., 2018). 3.3 Causal Effects with Textual Treatments As a third example, suppose we want to understand what makes a post offensive. This might allow the platform to provide automated suggestions that encourage users to rephrase their post. Here, we are interested in the causal effect of the text itself on whether a reader reports it as offensive. Theoretically, the counterfactual Y (t) is defined for any t, but could be limited to an exploration of specific aspects of the text. For example, do second-person pronouns make a post more likely to be reported? Previous Approaches. One approach to study- ing the effects of text involves treatment discovery: producing interpretable features of the text—such as latent topics or lexical features like n-grams (Pryzant et al., 2018)—that can be causally linked to outcomes. For example, Fong and Grimmer (2016) discovered features of candidate biogra- phies that drove voter evaluations, Pryzant et al. (2017) discovered writing styles in marketing materials that are influential in increasing sales figures, and Zhang et al. (2020) discovered con- versational tendencies that lead to positive mental health counseling sessions. Another approach is to estimate the causal effects of specific latent properties that are in- tervened on during an experiment or extracted from text for observational studies (Pryzant et al., 2021; Wood-Doughty et al., 2018). For example, Gerber et al. (2008) studied the effect of appeal- ing to civic duty on voter turnout. In this setting, factors are latent properties of the text for which we need a measurement model. Challenges for Causal Assumptions with Text. Ensuring positivity and consistency remains a challenge in this setting, but assessing conditional ignorability is particularly tricky. Suppose the treatment is the use of second-person pronouns, but the relationship between this treatment and the outcome is confounded by other properties of the text (e.g., politeness). For conditional ignorability to hold, we would need to extract from the text and condition on all such confounders, which requires assuming that we can disentangle the treatment from many other aspects of the text (Pryzant et al., 2021). Such concerns could be avoided by randomly assigning texts to readers (Fong and Grimmer, 2016, 2021), but that may be impracti- cal. Even if we could randomize the assignment of texts, we still have to assume that there is no confounding due to latent properties of the reader, such as their political ideology or their tastes. 3.4 Future Work We next highlight key challenges and oppor- tunities for NLP researchers to facilitate causal inference from text. Heterogeneous Effects. Texts are read and in- terpreted differently by different people; NLP researchers have studied this problem in the con- text of heterogeneous perceptions of annotators (Paun et al., 2018; Pavlick and Kwiatkowski, 2019). In the field of causal inference, the idea that different subgroups experience different causal ef- fects is formalized by a heterogeneous treatment effect, and is studied using conditional average treatment effects (Equation (2)) for different sub- groups. It may also be of interest to discover subgroups where the treatment has a strong effect on an outcome of interest. For example, we may want to identify text features that characterize when a treatment such as a content moderation policy is effective. Wager and Athey (2018) pro- posed a flexible approach to estimating heteroge- neous effects based on random forests. However, 1144 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 such approaches, which are developed with tabu- lar data in mind, may be computationally infea- sible for high-dimensional text data. There is an opportunity to extend NLP methods to discover text features that capture subgroups where the causal effect varies. Representation Learning. Causal inference from text requires extracting low-dimensional features from text. Depending on the setting, the low-dimensional features are tasked with ex- tracting confounding information, outcomes, or treatments. The need to measure latent aspects from text connects to the field of text representa- tion learning (Le and Mikolov, 2014; Liu et al., 2015; Liu and Lapata, 2018). The usual objective of text representation learning approaches is to model language. Adapting representation learning for causal inference offers open challenges; for example, we might augment the objective func- tion to ensure that (i) positivity is satisfied, (ii) confounding information is not discarded, or (iii) noisily measured outcomes or treatments enable accurate causal effect estimates. Benchmarks. Benchmark datasets have pro- pelled machine learning forward by creating shared metrics by which predictive models can be evaluated. There are currently no real-world text-based causal estimation benchmarks due to the fundamental problem of causal inference that we can never obtain counterfactuals on an individ- ual and observe the true causal effects. However, as Keith et al. (2020) discuss, there has been some progress in evaluating text-based estimation methods on semi-synthetic datasets in which real covariates are used to generate treatment and out- comes (e.g., Veitch et al., 2020; Roberts et al., 2020; Pryzant et al., 2021; Feder et al., 2021; Weld et al., 2022). Wood-Doughty et al. (2021) employed large-scale language models for con- trolled synthetic generation of text on which causal methods can be evaluated. An open prob- lem is the degree to which methods that perform well on synthetic data generalize to real-world data. Controllable Text Generation. When running a randomized experiment or generating synthetic data, researchers make decisions using the em- pirical distribution of the data. If we are study- ing whether a drug prevents headaches, it would make sense to randomly assign a ‘reasonable’ dose—one that is large enough to plausibly be effective but not so large as to be toxic. But when the causal question involves natural language, do- main knowledge might not provide a small set of ‘reasonable’ texts. Instead, we might turn to controllable text generation to sample texts that fulfill some requirements (Kiddon et al., 2016). Such methods have a long history in NLP; for example, a conversational agent should be able to answer a user’s question while being perceived as polite (Niu and Bansal, 2018). In our text as treatment example where we want to understand which textual aspects make a text offensive, such methods could enable an experiment allowing us to randomly assign texts that differ on only a spe- cific latent aspect. For example, we could change the style of a text while holding its content fixed (Logeswaran et al., 2018). Recent work has ex- plored text generation from a causal perspective (Hu and Li, 2021), but future work could develop these methods for causal estimation. 4 Robust and Explainable Predictions from Causality Thus far we have focused on using NLP tools for estimating causal effects in the presence of text data. In this section, we consider using causal reasoning to help solve traditional NLP tasks such as understanding, manipulating, and generating natural language. At a first glance, NLP may appear to have lit- tle need for causal ideas. The field has achieved remarkable progress from the use of increasingly high-capacity neural architectures to extract cor- relations from large-scale datasets (Peters et al., 2018; Devlin et al., 2019; Liu et al., 2019). These architectures make no distinction between causes, effects, and confounders, and they make no at- tempt to identify causal relationships: A feature may be a powerful predictor even if it has no direct causal relationship with the desired output. Yet correlational predictive models can be un- trustworthy (Jacovi et al., 2021): They may latch onto spurious correlations (‘‘shortcuts’’), leading to errors in out-of-distribution (OOD) settings (e.g., McCoy et al., 2019); they may exhibit un- acceptable performance differences across groups of users (e.g., Zhao et al., 2017); and their be- havior may be too inscrutable to incorporate into high-stakes decisions (Guidotti et al., 2018). Each 1145 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 of these shortcomings can potentially be addressed by the causal perspective: Knowledge of the causal relationship between observations and labels can be used to formalize spurious correlations and mit- igate their impact (§ 4.1); causality also provides a language for specifying and reasoning about fairness conditions (§ 4.2); and the task of ex- plaining predictions may be naturally formulated in terms of counterfactuals (§ 4.3). The applica- tion of causality to these problems is still an active area of research, which we attempt to facilitate by highlighting previously implicit connections among a diverse body of prior work. 4.1 Learning Robust Predictors The NLP field has grown increasingly concerned with spurious correlations (Gururangan et al., 2018; McCoy et al., 2019, inter alia). From a causal perspective, spurious correlations arise when two conditions are met. First, there must be some factor(s) Z that are informative (in the training data) about both the features X and label Y . Second, Y and Z must be dependent in the training data in a way that is not guaranteed to hold in general. A predictor f : X → Y will learn to use parts of X that carry information about Z (because Z is informative about Y ), which can lead to errors if the relationship between Y and Z changes when the predictor is deployed.2 This issue is illustrated by Example 2, where the task is to predict a medical condition from the text of patient records. The training set is drawn from multiple hospitals which vary both in the frequency of the target clinical condition (Y ) and the writing style of the narratives (represented in X). A predictor trained on such data will use textual features that carry information about the hospital (Z), even when they are useless at pre- dicting the diagnosis within any individual hospi- tal. Spurious correlations also appear as artifacts in benchmarks for tasks such as natural language 2From the perspective of earlier work on domain adap- tation (Søgaard, 2013), spurious correlations can be viewed as a special case of a more general phenomenon in which feature-label relationships change across domains. For exam- ple, the lexical feature boring might have a stronger negative weight in reviews about books than about kitchen appliances, but this is not a spurious correlation because there is a di- rect causal relationship between this feature and the label. Spurious correlations are a particularly important form of distributional shift in practice because they can lead to in- consistent predictions on pairs of examples that humans view as identical. inference, where negation words are correlated with semantic contradictions in crowdsourced training data but not in text that is produced under more natural conditions (Gururangan et al., 2018; Poliak et al., 2018). Such observations have led to several proposals for novel evaluation methodologies (Naik et al., 2018; Ribeiro et al., 2020; Gardner et al., 2020) to ensure that predictors are not ‘‘right for the wrong reasons’’. These evaluations generally take two forms: invariance tests, which assess whether predictions are affected by perturbations that are causally unrelated to the label, and sensitivity tests, which apply perturbations that should in some sense be the minimal change necessary to flip the true label. Both types of test can be moti- vated by a causal perspective. The purpose of an invariance test is to determine whether the predic- tor behaves differently on counterfactual inputs X(Z = ˜z), where Z indicates a property that an analyst believes should be causally irrelevant to Y . A model whose predictions are invariant across such counterfactuals can in some cases be expected to perform better on test distributions with a different relationship between Y and Z (Veitch et al., 2021). Similarly, sensitivity tests can be viewed as evaluations of counterfactuals X(Y = ˜y), in which the label Y is changed but all other causal influences on X are held constant (Kaushik et al., 2020). Features that are spuriously correlated with Y will be identical in the factual X and the counterfactual X(Y = ˜y). A predictor that relies solely on such spurious correlations will be unable to correctly label both factual and counterfactual instances. A number of approaches have been proposed for learning predictors that pass tests of sensi- tivity and invariance. Many of these approaches are either explicitly or implicitly motivated by a causal perspective. They can be viewed as ways to incorporate knowledge of the causal structure of the data into the learning objective. 4.1.1 Data Augmentation To learn predictors that pass tests of invariance and sensitivity, a popular and straightforward ap- proach is data augmentation: Elicit or construct counterfactual instances, and incorporate them into the training data. When the counterfactu- als involve perturbations to confounding factors Z, it can help to add a term to the learning objective to explicitly penalize disagreements in 1146 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 the predictions for counterfactual pairs, for exam- ple, |f (X(Z = z)) − f (X(Z = ˜z))|, when f is the prediction function (Garg et al., 2019). When perturbations are applied to the label Y , training on label counterfactuals X(Y = ˜y) can improve OOD generalization and reduce noise sensitivity (Kaushik et al., 2019, 2020; Jha et al., 2020).3 Counterfactual examples can be generated in several ways: (1) manual post-editing (e.g., Kaushik et al., 2019; Gardner et al., 2020), (2) heuristic replacement of keywords (e.g., Shekhar et al., 2017; Garg et al., 2019; Feder et al., 2021), and (3) automated text rewriting (e.g., Zmigrod et al., 2019; Riley et al., 2020; Wu et al., 2021; Calderon et al., 2022). Manual editing is typi- cally fluent and accurate but relatively expensive. Keyword-based approaches are appropriate in some cases—for example, when counterfactuals can be obtained by making local substitutions of closed-class words like pronouns—but they can- not guarantee fluency or coverage of all labels and covariates of interest (Antoniak and Mimno, 2021), and are difficult to generalize across lan- guages. Fully generative approaches could po- tentially combine the fluency and coverage of manual editing with the ease of lexical heuristics. Counterfactual examples are a powerful re- source because they directly address the missing data issues that are inherent to causal inference, as described in § 2. However, in many cases it is difficult for even a fluent human to produce meaningful counterfactuals: Imagine the task of converting a book review into a restaurant re- view while somehow leaving ‘‘everything else’’ constant (as in Calderon et al., 2022). A related concern is lack of precision in specifying the de- sired impact of the counterfactual. To revise a text from, say, U.S. to U.K. English, it is unam- biguous that ‘‘colors’’ should be replaced with ‘‘colours’’, but should terms like ‘‘congress’’ be replaced with analogous concepts like ‘‘parlia- ment’’? This depends on whether we view the semantics of the text as a causal descendent of 3More broadly, there is a long history of methods that elicit or construct new examples and labels with the goal of improving generalization, e.g., self-training (McClosky et al., 2006; Reichart and Rappoport, 2007), co-training (Steedman et al., 2003), and adversarial perturbations (Ebrahimi et al., 2018). The connection of such methods to causal issues such as spurious correlations has not been explored until recently (Chen et al., 2020; Jin et al., 2021). the locale. If such decisions are left to the anno- tators’ intuitions, it is difficult to ascertain what robustness guarantees we can get from counter- factual data augmentation. Finally, there is the possibility that counterfactuals will introduce new spurious correlations. For example, when asked to rewrite NLI examples without using negation, an- notators (or automated text rewriters) may simply find another shortcut, introducing a new spuri- ous correlation. Keyword substitution approaches may also introduce new spurious correlations if the keyword lexicons are incomplete (Joshi and He, 2021). Automated methods for conditional text rewriting are generally not based on a formal coun- terfactual analysis of the data generating process (cf. Pearl, 2009), which would require model- ing the relationships between various causes and consequences of the text. The resulting counterfac- tual instances may therefore fail to fully account for spurious correlations and may introduce new spurious correlations. 4.1.2 Distributional Criteria An alternative to data augmentation is to design new learning algorithms that operate directly on the observed data. In the case of invariance tests, one strategy is to derive distributional properties of invariant predictors, and then ensure that these properties are satisfied by the trained model. Given observations of the potential confounder at training time, the counterfactually invariant pre- dictor will satisfy an independence criterion that can be derived from the causal structure of the data generating process (Veitch et al., 2021). Re- turning to Example 2, the desideratum is that the predicted diagnosis f (X) should not be affected by the aspects of the writing style that are associ- ated with the hospital Z. This can be formalized as counterfactual invariance to Z: The predic- tor f should satisfy f (X(z)) = f (X(z(cid:9))) for all z, z(cid:9). In this case, both Z and Y are causes of the text features X.4 Using this observation, it can be shown that any counterfactually invariant predictor will satisfy f (X) ⊥⊥ Z | Y , that is, the prediction f (X) is independent of the covariate Z conditioned on the true label Y . In other cases, such as content moderation, the label is an effect of the text, rather than a cause—for a detailed 4This is sometimes called the anticausal setting, because the predictor f : X → ˆY must reverse the causal direction of the data generating process (Sch¨olkopf et al., 2012). 1147 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 discussion of this distinction, see Jin et al. (2021). In such cases, it can be shown that a counterfac- tually invariant predictor will satisfy f (X) ⊥⊥ Z (without conditioning on Y ). In this fashion, knowledge of the true causal structure of the problem can be used to derive observed-data sig- natures of the counterfactual invariance. Such signatures can be incorporated as regulariza- tion terms in the training objective (e.g., using kernel-based measures of statistical dependence). These criteria do not guarantee counterfactual invariance—the implication works in the other in practice they increase coun- direction—but terfactual invariance and improve performance in out-of-distribution settings without requiring counterfactual examples. An alternative set of distributional criteria can be derived by viewing the training data as aris- ing from a finite set of environments, in which each environment is endowed a unique distri- bution over causes, but the causal relationship between X and Y is invariant across environ- ments. This view motivates a set of environmental invariance criteria: The predictor should include a representation function that is invariant across environments (Muandet et al., 2013; Peters et al., 2016); we should induce a representation such that the same predictor is optimal in every en- vironment (Arjovsky et al., 2019); the predictor should be equally well calibrated across envi- ronments (Wald et al., 2021). Multi-environment training is conceptually similar to domain adap- tation (Ben-David et al., 2010), but here the goal is not to learn a predictor for any specific target domain, but rather to learn a predictor that works well across a set of causally compatible domains, known as domain generalization (Ghifary et al., 2015; Gulrajani and Lopez-Paz, 2020). However, it may be necessary to observe data from a very large number of environments to disentangle the true causal structure (Rosenfeld et al., 2021). Both general approaches require richer training data than in typical supervised learning: Either ex- plicit labels Z for the factors to disentangle from the predictions or access to data gathered from multiple labeled environments. Obtaining such data may be rather challenging, even compared to creating counterfactual instances. Furthermore, the distributional approaches have thus far been applied only to classification problems, while data augmentation can easily be applied to struc- tured outputs such as machine translation. 4.2 Fairness and Bias NLP systems inherit and sometimes amplify un- desirable biases encoded in text training data (Barocas et al., 2019; Blodgett et al., 2020). Causality can provide a language for specifying desired fairness conditions across demographic attributes like race and gender. Indeed, fairness and bias in predictive models have close connec- tions to causality: Hardt et al. (2016) argue that a causal analysis is required to determine the fair- ness properties of an observed distribution of data and predictions; Kilbertus et al. (2017) show that fairness metrics can be motivated by causal inter- pretations of the data generating process; Kusner et al. (2017) study ‘‘counterfactually fair’’ predic- tors where, for each individual, predictions are the same for that individual and for a counterfactual version of them created by changing a protected attribute. However, there are important questions about the legitimacy of treating attributes like race as variables subject to intervention (e.g., Kohler-Hausmann, 2018; Hanna et al., 2020), and Kilbertus et al. (2017) propose to focus instead on invariance to observable proxies such as names. Fairness with Text. The fundamental connec- tions between causality and unfair bias have been relatively explored mainly in the context of low-dimensional tabular data rather than text. However, there are several applications of the counterfactual data augmentation strategies from § 4.1.1 in this setting: For example, Garg et al. (2019) construct counterfactuals by swapping lists of ‘‘identity terms’’, with the goal of reducing bias in text classification, and Zhao et al. (2018) swap gender markers such as pronouns and names for coreference resolution. Counterfactual data augmentation has also been applied to reduce bias in pre-trained models (e.g., Huang et al., 2019; Maudslay et al., 2019) but the extent to which biases in pre-trained models propa- gate to downstream applications remains unclear (Goldfarb-Tarrant et al., 2021). Fairness appli- cations of the distributional criteria discussed in § 4.1.2 are relatively rare, but Adragna et al. invariant risk minimization (2020) show that (Arjovsky et al., 2019) can reduce the use of spu- rious correlations with race for toxicity detection. 4.3 Causal Model Interpretations Explanations of model predictions can be cru- cial to help diagnose errors and establish trust 1148 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 with decision makers (Guidotti et al., 2018; Jacovi and Goldberg, 2020). One prominent approach to generate explanations is to exploit network ar- tifacts, such as attention weights (Bahdanau et al., 2014), which are computed on the path to gen- erating a prediction (e.g., Xu et al., 2015; Wang et al., 2016). Alternatively, there have been at- tempts to estimate simpler and more interpretable models by using perturbations of test examples or their hidden representations (Ribeiro et al., 2016; Lundberg and Lee, 2017; Kim et al., 2018). How- ever, both attention and perturbation-based meth- ods have important limitations. Attention-based explanations can be misleading (Jain and Wallace, 2019), and are generally possible only for indi- vidual tokens; they cannot explain predictions linguistic concepts. in terms of more abstract Existing perturbation-based methods often gen- erate implausible counterfactuals and also do not allow for estimating the effect of sentence- level concepts. Viewed as a causal inference problem, explana- tion can be performed by comparing predictions for each example and its generated counterfac- tual. While it is usually not possible to observe counterfactual predictions, here the causal system is the predictor itself. In those cases it may be possible to compute counterfactuals, for example, by manipulating the activations inside the network (Vig et al., 2020; Geiger et al., 2021). Treatment effects can then be computed by comparing the predictions under the factual and counterfactual conditions. Such a controlled setting is similar to the randomized experiment described in § 2, where it is possible to compute the difference between an actual text and what the text would have been had a specific concept not existed in it. Indeed, in cases where counterfactual texts can be gener- ated, we can often estimate causal effects on text- based models (Ribeiro et al., 2020; Gardner et al., 2020; Rosenberg et al., 2021; Ross et al., 2021; Meng et al., 2022; Zhang et al., 2022). However, generating such counterfactuals is challenging (see § 4.1.1). To overcome the counterfactual generation problem, another class of approaches proposes to manipulate the representation of the text and not the text itself (Feder et al., 2021; Elazar et al., 2021; Ravfogel et al., 2021). Feder et al. (2021) compute the counterfactual representation by pre- training an additional instance of the language representation model employed by the classifier, with an adversarial component designed to ‘‘for- get’’ the concept of choice, while controlling for confounding concepts. Ravfogel et al. (2020) offer a method for removing information from representations by iteratively training linear clas- sifiers and projecting the representations on their null-spaces, but do not account for confound- ing concepts. A complementary approach is to generate counterfactuals with minimal changes that ob- tain a different model prediction (Wachter et al., 2017; Mothilal et al., 2020). Such examples allow us to observe the changes required to change a model’s prediction. Causal modeling can facili- tate this by making it possible to reason about the causal relationships between observed features, thus identifying minimal actions which might have downstream effects on several features, ul- timately resulting in a new prediction (Karimi et al., 2021). Finally, a causal perspective on attention-based explanations is to view internal nodes as mediators of the causal effect from the input to the output (Vig et al., 2020; Finlayson et al., 2021). By querying models using manually crafted counter- factuals, we can observe how information flows, and identify where in the model it is encoded. 4.4 Future Work In general we cannot expect to have full causal models of text, so a critical question for future work is how to safely use partial causal mod- els, which omit some causal variables and do not completely specify the causal relationships within the text itself. A particular concern is unobserved confounding between the variables that are ex- plicitly specified in the causal model. Unobserved confounding is challenging for causal inference in general, but it is likely to be ubiquitous in language applications, in which the text arises from the author’s intention to express a structured arrangement of semantic concepts, and the label corresponds to a query, either directly on the in- tended semantics or on those understood by the reader. Partial causal models of text can be ‘‘top down’’, in the sense of representing causal rel- ationships between the text and high-level doc- ument metadata such as authorship, or ‘‘bottom up’’, in the sense of representing local linguistic invariance properties, such as the intuition that a 1149 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 multiword expression like ‘San Francisco’ has a single cause. The methods described here are al- most exclusively based on top-down models, but approaches such as perturbing entity spans (e.g., Longpre et al., 2021) can be justified by implicit bottom-up causal models. Making these connec- tions more explicit may yield new insights. Future work may also explore hybrid models that con- nect high-level document metadata with medium- scale spans of text such as sentences or paragraphs. A related issue is when the true variable of interest is unobserved but we do receive some noisy or coarsened proxy variable. For example, we may wish to enforce invariance to dialect but have access only to geographical information, with which dialect is only approximately correlated. This is an emerging area within the statistical literature (Tchetgen et al., 2020), and despite the clear applicability to NLP, we are aware of no relevant prior work. Finally, applications of causality to NLP have focused primarily on classification, so it is natural to ask how these approaches might be extended to structured output prediction. This is particularly challenging for distributional criteria like f (X) ⊥⊥ Z | Y , because f (X) and Y may now represent sequences of vectors or tokens. In such cases it may be preferable to focus on invariance criteria that apply to the loss distribution or calibration. 5 Conclusion Our main goal in this survey was to collect the various touchpoints of causality and NLP into one space, which we then subdivided into the problems of estimating the magnitude of causal effects. and more traditional NLP tasks. These branches of scientific inquiry share common goals, intuitions, and are beginning to show methodo- logical synergies. In § 3 we showed how recent advances in NLP modeling can help researchers make causal conclusions with text data and the challenges of this process. In § 4, we showed how ideas from causal inference can be used to make NLP models more robust, trustworthy, and transparent. We also gather approaches that are implicitly causal and explicitly show their relationship to causal inference. Both of these spaces, especially the use of causal ideas for robust and explainable predictions, remain nascent with a large number of open challenges which we have detailed throughout this paper. A particular advantage of causal methodology is that it forces practitioners to explicate their assumptions. To improve scientific standards, we believe that the NLP community should be clearer about these assumptions and analyze their data using causal reasoning. This could lead to a better understanding of language and the models we build to process it. References Robert Adragna, Elliot Creager, David Madras, and Richard Zemel. 2020. Fairness and ro- bustness in invariant learning: A case study in toxicity classification. arXiv preprint arXiv: 2011.06485. Maria Antoniak and David Mimno. 2021. Bad seeds: Evaluating lexical methods for bias mea- surement. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1889–1904, Online. Association for Computational Lin- guistics. https://doi.org/10.18653/v1 /2021.acl-long.148 Martin Arjovsky, L´eon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2019. Invariant risk min- imization. arXiv preprint arXiv:1907.02893. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and Machine Learn- ing. fairmlbook.org. http://www.fairmlbook .org. Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learn- ing from different domains. Machine Learning, 79(1):151–175. https://doi.org/10.1007 /s10994-009-5152-4 David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal research, of machine Learning 3(Jan):993–1022. 1150 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Su Lin Blodgett, Solon Barocas, Hal Daum´e III, and Hanna Wallach. 2020. Language (tech- nology) is power: A critical survey of ‘‘bias’’ in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476. https://doi .org/10.18653/v1/2020.acl-main.485 Nitay Calderon, Eyal Ben-David, Amir Feder, and Roi Reichart. 2022. Docogen: Domain counterfactual generation for low resource do- main adaptation. In Proceedings of the 60th Annual Meeting of the Association of Com- putational Linguistics (ACL). https://doi .org/10.18653/v1/2022.acl-long.533 Yining Chen, Colin Wei, Ananya Kumar, and Tengyu Ma. 2020. Self-training avoids using spurious features under domain shift. Advances in Neural Information Processing Systems, 33:21061–21071. Alexander D’Amour, Peng Ding, Avi Feller, Lihua Lei, and Jasjeet Sekhon. 2020. Overlap in observational studies with high-dimensional covariates. Journal of Econometrics. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for lan- guage understanding. In Proceedings of the 2019 Conference of the North American Chap- the Association for Computational ter of Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics. Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018. Hotflip: White-box adver- sarial examples for text classification. In Pro- ceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Vol- ume 2: Short Papers), pages 31–36. https:// doi.org/10.18653/v1/P18-2006 Naoki Egami, Christian J. Fong, Justin Grimmer, Margaret E. Roberts, and Brandon M. Stewart. 2018. How to make causal inferences using texts. arXiv preprint arXiv:1802.02163. Yanai Elazar, Shauli Ravfogel, Alon Jacovi, and Yoav Goldberg. 2021. Amnesic probing: Behavioral explanation with amnesic counter- the Association factuals. Transactions of for Computational Linguistics, 9:160–175. https://doi.org/10.1162/tacl a 00359 Amir Feder, Nadav Oved, Uri Shalit, and Roi Reichart. 2021. Causalm: Causal model expla- nation through counterfactual language mod- els. Computational Linguistics, 47(2):333–386. https://doi.org/10.1162/coli a 00404 Matthew Finlayson, Aaron Mueller, Sebastian Gehrmann, Stuart Shieber, Tal Linzen, and Yonatan Belinkov. 2021. Causal analysis of syn- tactic agreement mechanisms in neural language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1828–1843, Online. Association for Computational Lin- guistics. https://doi.org/10.18653/v1 /2021.acl-long.144 Christian Fong and Justin Grimmer. 2016. Dis- covery of treatments from text corpora. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1600–1609. Christian Fong and Justin Grimmer. 2021. Causal treatments. American inference with latent Journal of Political Science. Forthcoming. Matt Gardner, Yoav Artzi, Victoria Basmov, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hannaneh Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, and Ben Zhou. 2020. Evaluating models’ local decision boundaries via contrast sets. In Findings of the Association for Computational Linguis- tics: EMNLP 2020, pages 1307–1323, Online. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020 .findings-emnlp.117 Sahaj Garg, Vincent Perot, Nicole Limtiaco, Ankur Taly, Ed H. Chi, and Alex Beutel. 2019. Counterfactual fairness in text classification through robustness. In Proceedings of the 2019 1151 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 AAAI/ACM Conference on AI, Ethics, and So- ciety, pages 219–226. https://doi.org /10.1145/3306618.3317950 Atticus Geiger, Hanson Lu, Thomas Icard, and Christopher Potts. 2021. Causal abstractions of neural networks. Advances in Neural Informa- tion Processing Systems, 34. Alan S. Gerber, Donald P. Green, and Christopher W. Larimer. 2008. Social pressure and voter turnout: Evidence from a large-scale field ex- periment. American Political Science Review, 102(1):33–48. https://doi.org/10.1017 /S000305540808009X Muhammad Ghifary, W. Bastiaan Kleijn, Mengjie Zhang, and David Balduzzi. 2015. Domain gen- eralization for object recognition with multi- task autoencoders. In Proceedings of the IEEE International Conference on Computer Vision, pages 2551–2559. https://doi.org/10 .1109/ICCV.2015.293 Seraphina Goldfarb-Tarrant, Rebecca Marchant, Ricardo Mu˜noz S´anchez, Mugdha Pandya, and Adam Lopez. 2021. Intrinsic bias metrics do not correlate with application bias. In Proceed- ings of the 59th Annual Meeting of the Asso- ciation for Computational Linguistics and the 11th International Joint Conference on Nat- ural Language Processing (Volume 1: Long Papers), pages 1926–1940, Online. Association for Computational Linguistics. https://doi .org/10.18653/v1/2021.acl-long.150 Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods for explaining black box models. ACM Computing Surveys (CSUR), 51(5):1–42. https://doi .org/10.1145/3236009 Ishaan Gulrajani and David Lopez-Paz. 2020. In search of lost domain generalization. arXiv preprint arXiv:2007.01434. Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R. Bowman, and Noah A. Smith. 2018. Annotation artifacts in natural language inference data. Proceedings of the North American Chapter of the Asso- ciation for Computational Linguistics: Human Language Technologies (NAACL). https:// doi.org/10.18653/v1/N18-2017 Alex Hanna, Emily Denton, Andrew Smart, and Jamila Smith-Loud. 2020. Towards a crit- ical race methodology in algorithmic fairness. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pages 501–512. https://doi.org/10.1145 /3351095.3372826 Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems, 29:3315–3323. Miguel A. Hern´an. 2016. Does water kill? A call for less casual causal inferences. Annals of Epidemiology, 26(10):674–680. https://doi .org/10.1016/j.annepidem.2016.08.016, PubMed: 27641316 Paul W. Holland. 1986. Statistics and causal in- ference. Journal of the American Statistical Association, 81(396):945–960. https://doi .org/10.2307/2289069 Zhiting Hu and Li Erran Li. 2021. A causal lens for controllable text generation. Advances in Neural Information Processing Systems, 34. Po-Sen Huang, Huan Zhang, Ray Jiang, Robert Stanforth, Johannes Welbl, Jack Rae, Vishal Maini, Dani Yogatama, and Pushmeet Kohli. 2019. Reducing sentiment bias in language models via counterfactual evaluation. arXiv preprint arXiv:1911.03064. https://doi.org /10.18653/v1/2020.findings-emnlp.7 Guido W. Imbens and Donald B. Rubin. 2015. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge Univer- sity Press. https://doi.org/10.1017 /CBO9781139025751 Alon Jacovi and Yoav Goldberg. 2020. To- wards faithfully interpretable nlp systems: How should we define and evaluate faithfulness? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4198–4205. https://doi.org/10 .18653/v1/2020.acl-main.386 Alon Jacovi, Ana Marasovi´c, Tim Miller, and Yoav Goldberg. 2021. Formalizing trust in ar- tificial intelligence: Prerequisites, causes and goals of human trust in ai. In Proceedings of the 2021 ACM Conference on Fairness, Account- ability, and Transparency, pages 624–635. 1152 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 https://doi.org/10.1145/3442188 .3445923 from causal estimates. In ACL. https://doi .org/10.18653/v1/2020.acl-main.474 Sarthak Jain and Byron C. Wallace. 2019. Atten- tion is not explanation. arXiv preprint arXiv: 1902.10186. Rohan Jha, Charles Lovering, and Ellie Pavlick. 2020. Does data augmentation improve gen- eralization in NLP? arXiv preprint arXiv: 2004.15012. Shagun Jhaver, Sucheta Ghoshal, Amy Bruckman, and Eric Gilbert. 2018. Online ha- rassment and content moderation: The case of blocklists. ACM Transactions on Computer- Human Interaction (TOCHI), 25(2):1–33. https://doi.org/10.1145/3185593 Zhijing Jin, Julius von K¨ugelgen, Jingwei Ni, Tejas Vaidhya, Ayush Kaushal, Mrinmaya Sachan, and Bernhard Schoelkopf. 2021. Causal direction of data collection matters: Im- plications of causal and anticausal learning for NLP. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9499–9513. https://doi .org/10.18653/v1/2021.emnlp-main.748 Nitish Joshi and He He. 2021. An investigation of the (in) effectiveness of counterfactually augmented data. arXiv preprint arXiv:2107 .00753. https://doi.org/10.18653/v1 /2022.acl-long.256 Amir-Hossein Karimi, Bernhard Sch¨olkopf, and Isabel Valera. 2021. Algorithmic recourse: from counterfactual explanations to interventions. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 353–362. https://doi.org/10.1145 /3442188.3445899 Divyansh Kaushik, Eduard Hovy, and Zachary C. Lipton. 2019. Learning the difference that makes a difference with counterfactually- augmented data. arXiv preprint arXiv:1909 .12434. Divyansh Kaushik, Amrith Setlur, Eduard Hovy, and Zachary C. Lipton. 2020. Explaining the efficacy of counterfactually-augmented data. arXiv preprint arXiv:2010.02114. Katherine Keith, David Jensen, and Brendan O’Connor. 2020. Text and causal inference: A review of using text to remove confounding Chlo´e Kiddon, Luke Zettlemoyer, and Yejin Choi. 2016. Globally coherent text generation with neural checklist models. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 329–339. https://doi.org/10.18653/v1/D16 -1032 Niki Kilbertus, Mateo Rojas-Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Sch¨olkopf. 2017. Avoiding dis- crimination through causal reasoning. In Pro- ceedings of the 31st International Conference on Neural Information Processing Systems, pages 656–666. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. 2018. Interpretability be- yond feature attribution: Quantitative testing with concept activation vectors (tcav). In In- ternational Conference on Machine Learning, pages 2668–2677. Issa Kohler-Hausmann. 2018. Eddie murphy and the dangers of counterfactual causal think- ing about detecting racial discrimination. Nw. UL Rev., 113:1163. https://doi.org/10 .2139/ssrn.3050650 Matt J. Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. In Advances in Neural Information Processing Systems, pages 4066–4076. Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learn- ing, pages 1188–1196. PMLR. Xiaodong Liu, Jianfeng Gao, Xiaodong He, Li Deng, Kevin Duh, and Ye-Yi Wang. 2015. Representation learning using multi-task deep neural networks for semantic classification and information retrieval. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 912–921. Yang Liu and Mirella Lapata. 2018. Learning structured text representations. Transactions of the Association for Computational Linguis- tics, 6:63–75. https://doi.org/10.1162 /tacl a 00005 1153 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly opti- mized bert pretraining approach. arXiv pre- print arXiv:1907.11692. Lajanugen Logeswaran, Honglak Lee, and Samy Bengio. 2018. Content preserving text genera- tion with attribute controls. Advances in Neural Information Processing Systems, 31. Shayne Longpre, Kartik Perisetla, Anthony Chen, Nikhil Ramesh, Chris DuBois, and Sameer Singh. 2021. Entity-based knowledge conflicts in question answering. In Proceedings of the 2021 Conference on Empirical Methods in Nat- ural Language Processing, pages 7052–7063. https://doi.org/10.18653/v1/2021 .emnlp-main.565 Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predic- tions. In Advances in Neural Information Pro- cessing Systems, pages 4765–4774. Rowan Hall Maudslay, Hila Gonen, Ryan Cotterell, and Simone Teufel. 2019. It’s all in the name: Mitigating gender bias with name- based counterfactual data substitution. arXiv preprint arXiv:1909.00871. https://doi .org/10.18653/v1/D19-1530 David McClosky, Eugene Charniak, and Mark Johnson. 2006. Effective self-training for pars- ing. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 152–159. Citeseer. https://doi.org/10.3115 /1220835.1220855 R. Thomas McCoy, Ellie Pavlick, and Tal Linzen. 2019. Right for the wrong reasons: Diagnos- ing syntactic heuristics in natural language inference. arXiv preprint arXiv:1902.01007. https://doi.org/10.18653/v1/P19 -1334 Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. 2022. Locating and edit- ing factual knowledge in GPT. arXiv preprint arXiv:2202.05262. Stephen L. Morgan and Christopher Winship. 2015. Counterfactuals and Causal Inference, Cambridge University Press. https://doi .org/10.1017/CBO9781107587991 Ramaravind K. Mothilal, Amit Sharma, and Chenhao Tan. 2020. Explaining machine learn- ing classifiers through diverse counterfactual the 2020 explanations. Conference on Fairness, Accountability, and Transparency, pages 607–617. https://doi .org/10.1145/3351095.3372850 In Proceedings of Reagan Mozer, Luke Miratrix, Aaron Russell Kaufman, and L. Jason Anastasopoulos. 2020. Matching with text data: An experimental eval- uation of methods for matching documents and of measuring match quality. Political Analy- sis, 28(4):445–468. https://doi.org/10 .1017/pan.2020.1 Krikamol Muandet, David Balduzzi, and Bernhard Sch¨olkopf. 2013. Domain general- ization via invariant feature representation. In International Conference on Machine Learn- ing, pages 10–18. Aakanksha Naik, Abhilasha Ravichander, Norman Sadeh, Carolyn Rose, and Graham Neubig. 2018. Stress test evaluation for natu- ral language inference. In Proceedings of the 27th International Conference on Computa- tional Linguistics, pages 2340–2353, Santa Fe, New Mexico, USA. Association for Computa- tional Linguistics. Tong Niu and Mohit Bansal. 2018. Polite dialogue generation without parallel data. Transactions of the Association for Computational Linguis- tics, 6:373–389. https://doi.org/10.1162 /tacl a 00027 Yaakov Ophir, Refael Tikochinski, Christa S. C. Asterhan, Itay Sisso, and Roi Reichart. 2020. Deep neural networks detect suicide risk from textual facebook posts. Scientific Re- ports, 10(1):1–10. https://doi.org/10 .1038/s41598-020-73917-0, PubMed: 33028921 Silviu Paun, Bob Carpenter, Jon Chamberlain, Dirk Hovy, Udo Kruschwitz, and Massimo Poesio. 2018. Comparing bayesian models the Associa- of annotation. Transactions of tion for Computational Linguistics, 6:571–585. https://doi.org/10.1162/tacl a 00040 1154 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Ellie Pavlick and Tom Kwiatkowski. 2019. Inher- ent disagreements in human textual inferences. Transactions of the Association for Computa- tional Linguistics, 7:677–694. https://doi .org/10.1162/tacl_a_00293 Judea Pearl. 1994. A probabilistic calculus of actions, Uncertainty Proceedings 1994, pages 454–462. Elsevier. https://doi .org/10.1016/B978-1-55860-332-5 .50062-6 Judea Pearl. 2009. Causality. Cambridge Uni- versity Press. J. Peters, P. B¨uhlmann, and N. Meinshausen. 2016. Causal inference using invariant prediction: identification and confidence intervals. Jour- nal of the Royal Statistical Society-Statistical Methodology-Series 78(5):947–1012. https://doi.org/10.1111/rssb.12167 B, Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextu- alized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1–6, 2018, Volume 1 (Long Papers), pages 2227–2237. Association for Computa- tional Linguistics. https://doi.org/10 .18653/v1/N18-1202 Adam Poliak, Jason Naradowsky, Aparajita Haldar, Rachel Rudinger, and Benjamin Van Durme. 2018. Hypothesis only baselines in nat- ural language inference. arXiv preprint arXiv: 1805.01042. https://doi.org/10.18653 /v1/S18-2023 Reid Pryzant, Dallas Card, Dan Jurafsky, Victor Veitch, and Dhanya Sridhar. 2021. Causal ef- fects of linguistic properties. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4095–4109. https://doi.org/10 .18653/v1/2021.naacl-main.323 Reid Pryzant, Kelly Shen, Dan Jurafsky, and Stefan Wagner. 2018. Deconfounded lexicon induction for interpretable social science. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lan- guage Technologies, Volume 1 (Long Papers), pages 1615–1625. https://doi.org/10 .18653/v1/N18-1146 Shauli Ravfogel, Yanai Elazar, Hila Gonen, Michael Twiton, and Yoav Goldberg. 2020. Null it out: Guarding protected attributes by iterative nullspace projection. arXiv preprint arXiv:2004.07667. https://doi.org/10 .18653/v1/2020.acl-main.647 Shauli Ravfogel, Grusha Prasad, Tal Linzen, and Yoav Goldberg. 2021. Counterfactual in- terventions reveal the causal effect of relative clause representations on agreement prediction. arXiv preprint arXiv:2105.06965. https:// doi.org/10.18653/v1/2021.conll-1.15 Roi Reichart and Ari Rappoport. 2007. Self- training for enhancement and domain adap- tation of statistical parsers trained on small datasets. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 616–623. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should I trust you?: Ex- plaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining, pages 1135–1144. ACM. https://doi.org/10.1145/2939672 .2939778 Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, and Sameer Singh. 2020. Beyond testing of NLP mod- accuracy: Behavioral the els with CheckList. In Proceedings of 58th Annual Meeting of the Association for Computational Linguistics, pages 4902–4912, Online. Association for Computational Lin- guistics. https://doi.org/10.18653/v1 /2020.acl-main.442 Reid Pryzant, Youngjoo Chung, and Dan Jurafsky. 2017. Predicting sales from the lan- guage of product descriptions. In eCOM@ SIGIR. Parker Riley, Noah Constant, Mandy Guo, Girish Kumar, David Uthus, and Zarana Parekh. 2020. Textsettr: Label-free text style extraction and tunable targeted restyling. arXiv preprint 1155 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 arXiv:2010.03802. https://doi.org/10 .18653/v1/2021.acl-long.293 sociation, 100(469):322–331. https://doi .org/10.1198/016214504000001880 Margaret E. Roberts, Brandon M. Stewart, and Richard A. Nielsen. 2020. Adjusting for con- founding with text matching. American Journal of Political Science, 64(4):887–903. https:// doi.org/10.1111/ajps.12526 Margaret E. Roberts, Brandon M. Stewart, Dustin Tingley, Christopher Lucas, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, and David G. Rand. 2014. Struc- topic models for open-ended survey tural responses. American Journal of Political Sci- ence, 58(4):1064–1082. https://doi.org /10.1111/ajps.12103 Paul R. Rosenbaum. 2007. Interference be- tween units in randomized experiments. Jour- nal of the american statistical association, 102(477):191–200. https://doi.org/10 .1198/016214506000001112 Daniel Rosenberg, Itai Gat, Amir Feder, and Roi Reichart. 2021. Are VQA systems rad? Measuring robustness to augmented data with focused interventions. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Inter- national Joint Conference on Natural Lan- guage Processing (Volume 2: Short Papers), pages 61–70. https://doi.org/10.18653 /v1/2021.acl-short.10 Elan Rosenfeld, Pradeep Ravikumar, and Andrej Risteski. 2021. The risks of invariant risk minimization. In International Conference on Learning Representations, volume 9. Alexis Ross, Tongshuang Wu, Hao Peng, Matthew E. Peters, and Matt Gardner. 2021. Tailor: Generating and perturbing text with semantic controls. arXiv preprint arXiv:2107 .07150. https://doi.org/10.18653/v1 /2022.acl-long.228 Donald B. Rubin. 1974. Estimating causal effects of treatments in randomized and non- randomized studies. Journal of Educational Psychology, 66(5):688. https://doi.org /10.1037/h0037350 Jennifer D. Rubin, Lindsay Blackwell, and Terri D. Conley. 2020. Fragile masculinity: Men, gender, and online harassment. In Proceed- ings of the 2020 CHI Conference on Human Factors in Computing Systems, pages 1–14. https://doi.org/10.1145/3313831 .3376645 B. Sch¨olkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. 2012. On causal and anticausal learning. In 29th International Con- ference on Machine Learning (ICML 2012), pages 1255–1262. International Machine Learn- ing Society. Ravi Sandro Shekhar, Pezzelle, Yauhen Klimovich, Aur´elie Herbelot, Moin Nabi, Enver Sangineto, and Raffaella Bernardi. 2017. FOIL it! Find one mismatch between image and language caption. In Proceedings of the Associ- ation for Computational Linguistics (Volume 1: Long Papers), pages 255–265, Vancouver, Canada. Association for Computational Lin- guistics. https://doi.org/10.18653 /v1/P17-1024 the 55th Annual Meeting of Anders Søgaard. 2013. Semi-supervised learning and domain adaptation in natural language processing. Synthesis Lectures on Human Lan- guage Technologies, 6(2):1–103. https://doi .org/10.2200/S00497ED1V01Y201304HLT021 Dhanya Sridhar and Lise Getoor. 2019. Estimat- ing causal effects of tone in online debates. In International Joint Conference on Artificial In- telligence. https://doi.org/10.24963 /ijcai.2019/259 Mark Steedman, Miles Osborne, Anoop Sarkar, Stephen Clark, Rebecca Hwa, Julia Hockenmaier, Paul Ruhlen, Steven Baker, and Jeremiah Crim. 2003. Bootstrapping statistical parsers from small datasets. In 10th Conference of the European Chapter of the Association for Computational Linguistics. https://doi .org/10.3115/1067807.1067851 Donald B. Rubin. 2005. Causal inference using potential outcomes: Design, modeling, deci- sions. Journal of the American Statistical As- Eric J. Tchetgen Tchetgen, Andrew Ying, Yifan Cui, Xu Shi, and Wang Miao. 2020. An in- troduction to proximal causal learning. arXiv 1156 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 preprint arXiv:2009.10982. https://doi .org/10.1101/2020.09.21.20198762 Matt Thomas, Bo Pang, and Lillian Lee. 2006. Get out the vote: Determining support or opposition floor-debate transcripts. from congressional the 2006 Conference on In Proceedings of Empirical Methods in Natural Language Processing, pages 327–335, Sydney, Australia. Association for Computational Linguistics. https://doi.org/10.3115/1610075 .1610122 Victor Veitch, Alexander D’Amour, Steve Yadlowsky, and Jacob Eisenstein. 2021. Coun- terfactual invariance to spurious correlations: Why and how to pass stress tests. arXiv pre- print arXiv:2106.00545. Victor Veitch, Dhanya Sridhar, and David M. Blei. 2020. Adapting text embeddings for causal inference. In UAI. Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, and Stuart M. Shieber. 2020. Investi- gating gender bias in language models using In Advances in causal mediation analysis. Neural Information Processing Systems 33: Annual Conference on Neural Information Pro- cessing Systems 2020, NeurIPS 2020, De- cember 6–12, 2020, virtual. Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual explanations without opening the black box: Automated de- cisions and the GDPR. Harvard Journal of Law & Technology, 31:841. https://doi.org /10.2139/ssrn.3063289 Stefan Wager and Susan Athey. 2018. Esti- mation and inference of heterogeneous treat- ment effects using random forests. Jour- nal of the American Statistical Association, 113(523):1228–1242. https://doi.org/10 .1080/01621459.2017.1319839 Yoav Wald, Amir Feder, Daniel Greenfeld, and Uri Shalit. 2021. On calibration and out-of-domain generalization. arXiv preprint arXiv:2102.10395. Yequan Wang, Minlie Huang, Xiaoyan Zhu, and Li Zhao. 2016. Attention-based LSTM for aspect-level sentiment classification. In Pro- ceedings of the 2016 Conference on Empir- ical Methods in Natural Language Processing, pages 606–615, Austin, Texas. Association for Computational Linguistics. https://doi .org/10.18653/v1/D16-1058 Galen Weld, Peter West, Maria Glenski, David Arbour, Ryan Rossi, and Tim Althoff. 2022. Adjusting for confounders with text: Chal- lenges and an empirical evaluation framework for causal inference. ICWSM. Ilya Shpitser, Zach Wood-Doughty, and Mark Dredze. 2018. Challenges of using text inference. In EMNLP. classifiers for causal https://doi.org/10.18653/v1/D18 -1488, PubMed: 31633125 Zach Wood-Doughty, Ilya Shpitser, and Mark Dredze. 2021. Generating synthetic text data to evaluate causal inference methods. arXiv preprint arXiv:2102.05638. Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, and Daniel S. Weld. 2021. Polyjuice: Au- tomated, general-purpose counterfactual gener- ation. arXiv preprint arXiv:2101.00288. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Confer- ence on Machine Learning, pages 2048–2057. PMLR. Justine Zhang, Sendhil Mullainathan, and Cristian Danescu-Niculescu-Mizil. 2020. Quantifying the causal effects of conversational tendencies. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2):1–24. https://doi .org/10.1145/3415202 Yi-Fan Zhang, Hanlin Zhang, Zachary C. Lipton, Li Erran Li, and Eric P. Xing. 2022. Can transformers be strong treatment effect estimators? arXiv preprint arXiv:2202.01336. Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2017. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. the 2017 Conference on In Proceedings of in Natural Language Empirical Methods Processing, pages 2979–2989, Copenhagen, for Computational Denmark. Association 1157 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Linguistics. https://doi.org/10.18653 /v1/D17-1323 tional Linguistics. https://doi.org/10 .18653/v1/N18-2003 Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2018. Gender bias in coreference resolution: Evalu- ation and debiasing methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 15–20, New Orleans, Louisiana. Association for Computa- Ran Zmigrod, Sabrina J. Mielke, Hanna Wallach, and Ryan Cotterell. 2019. Counterfactual data augmentation for mitigating gender stereo- types in languages with rich morphology. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1651–1661, Florence, Italy. Association for Computational Linguistics. https:// doi.org/10.18653/v1/P19-1161 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 1 1 2 0 5 4 6 9 0 / / t l a c _ a _ 0 0 5 1 1 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 1158
PDF Herunterladen