A Neighborhood Framework for Resource-Lean Content Flagging
Sheikh Muhammad Sarwar2,5,∗ and Dimitrina Zlatkova1 and Momchil Hardalov1,6
and Yoan Dinkov1 and Isabelle Augenstein1,3 and Preslav Nakov1,4
1Checkstep, UK,
2University of Massachusetts, Amherst,
3University of Copenhagen, Denmark,
4Qatar Computing Research Institute, HBKU, Qatar
5Amazon.com, US
6Sofia University ‘‘St. Kliment Ohridski’’, Bulgaria
smsarwar@amazon.com,
{didi,momchil,yoan.dinkov,isabelle,preslav.nakov}@checkstep.com
Abstract
We propose a novel framework for cross-
lingual content flagging with limited target-
language data, which significantly outperforms
prior work in terms of predictive performance.
The framework is based on a nearest-neighbor
architecture. It is a modern instantiation of the
vanilla k-nearest neighbor model, as we use
Transformer representations in all its compo-
nents. Our framework can adapt to new source-
language instances, without the need to be
retrained from scratch. Unlike prior work on
neighborhood-based approaches, we encode
the neighborhood information based on query–
neighbor interactions. We propose two encod-
ing schemes and we show their effectiveness
using both qualitative and quantitative analy-
sis. Our evaluation results on eight languages
from two different datasets for abusive lan-
guage detection show sizable improvements
of up to 9.5 F1 points absolute (for Italian)
over strong baselines. On average, we achieve
3.6 absolute F1 points of improvement for
the three languages in the Jigsaw Multilingual
dataset and 2.14 points for the WUL dataset.
1
Introduction
Online content moderation is an increasingly
important problem—small-scale websites and
large-scale corporations alike strive to remove
harmful content from their platforms (Vidgen
et al., 2019; Pavlopoulos et al., 2017; Wulczyn
et al., 2017). This is partly in anticipation of pro-
posed legislation, such as the Digital Service Act
(European Commission, 2020) in the EU and the
Online Harms Bill (UK Government, 2020) in
the UK. Moreover, the lack of content moderation
can have significant impact on businesses (e.g.,
Parler was denied server space), on governments
∗Work done prior to joining Amazon
484
(e.g., the U.S. Capitol Riots), and on individuals
(e.g., because hate speech is linked to self-harm
[J¨urgens et al., 2019]).
A key challenge when developing content mod-
eration systems is the lack of resources for many
languages (other than English). With this in mind,
here we aim to create a content flagging model
for a target language with limited annotated data
by transferring knowledge from another dataset in
a different language, for which a large amount of
training data is available.
Various approaches have been proposed in the
literature to address the lack of enough training
data in the target language. A popular approach
is to fine-tune large-scale pre-trained multilingual
language models such as XLM (Conneau and
Lample, 2019), XLM-R (Conneau et al., 2020), or
mBERT (Devlin et al., 2019) on the target data-
set (Glavaˇs et al., 2020; Stappen et al., 2020). In
order to incorporate knowledge from the source
dataset, a sequential adaptation technique can be
used that first fine-tunes a multilingual language
model (LM) on the source dataset, and then on the
target dataset (Garg et al., 2020). There are also
existing approaches for mixing the source and the
target datasets (Shnarch et al., 2018) in different
proportions, followed by fine-tuning the multi-
lingual language model on the resulting dataset.
While sequential adaptation introduces the risk of
forgetting the knowledge from the source dataset,
such mixing methods are driven by heuristics that
are effective, but not systematic. Crucially, as we
argue in this paper, this is because they do not
model the relationship between the source and the
target datasets. Another problem arises if we con-
sider that examples with novel labels can be added
to the source dataset. This is a specifically pertinent
issue for content moderation, as efforts to create
new resources often lead to the introduction of
new label inventories or taxonomies (Banko et al.,
Transactions of the Association for Computational Linguistics, vol. 10, pp. 484–502, 2022. https://doi.org/10.1162/tacl a 00472
Action Editor: Bo Pang. Submission batch: 5/2021; Revision batch: 12/2021; Published 5/2022.
c(cid:3) 2022 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
2020). In that case, model re-training becomes a
requirement in order to be able to map the new
label space to the output layer that is used for
fine-tuning.
We propose a Transformer-based k-nearest
neighbor (kNN+) framework,1 a one-stop solution
and a significant improvement over the vanilla
kNN model. Our framework addresses the above-
mentioned challenges, which are not easy to solve
via simple fine-tuning of pre-trained language
models. Moreover, to the best of our knowledge,
our framework is the first attempt to use kNN
for transfer learning for the task of abusive con-
tent detection.
Given a query, which is a training or an eval-
uation data point from the target dataset, kNN+
retrieves its nearest neighbors using a language-
agnostic sentence embedding model. Then, it con-
structs Transformer representations for the query
and for its neighbors. After that, it computes inter-
action features, which are based on the interactions
of the representations of the query with each of
its neighbors.2 At training time, the interaction
features are optimized using supervised training
signals computed from the label of the query and
the neighbor, so that the features indicate their
level of agreement.
For example, if the query and its neighbor are
both abusive, they agree on the labels. Thus, the
interactions help the model learn a semantic sim-
ilarity space in terms of labels. The framework
further uses a self-attention mechanism to aggre-
gate the interaction features from all the neigh-
bors, and it uses the aggregated representation
to classify the input query. This representation is
computed from the interaction features and indi-
cates the agreement of the query with the neigh-
borhood. As the predictions are made based on
aggregated interaction features only, kNN+ can
easily incorporate new examples with unseen la-
bels without requiring re-training. The conceptual
framework is shown in Figure 1; It is robust to
neighbors with incorrect labels, as it can learn to
disagree with them as part of its training process.
We instantiate two variants of our framework:
Cross-Encoder (CE) kNN+ and Bi-Encoder (BE)
1We use a ‘+’ superscript to indicate that our kNN+
framework is an improvement over the vanilla kNN model.
2We borrow the terminology from information retrieval,
as the interactions between a query and a document in deep
matching models are computed in a similar way (Guo et al.,
2016).
Figure 1: Conceptual diagram of our neighborhood
framework. The query is processed using run-time
compute, while the neighbor vector is pre-computed.
kNN+. The CE kNN+ concatenates the query
and a neighbor, and passes that sequence through
a Transformer to obtain interaction features. BE
kNN+ computes representations of the query and
of a neighbor by passing them individually through
a Transformer, and computes interaction features
from these representations. BE kNN+ is more
efficient than CE kNN+, but it does not yield the
same performance gains. Both models outperform
six strong baselines both in cross-lingual and in
multilingual settings. Our contributions can be
summarized as follows:
• We address cross-lingual transfer learning
for content flagging with limited labeled
data from the target language.
• We demonstrate that neighborhood meth-
ods, such as kNNs, are viable candidates for
approaching content flagging.
• We propose a novel framework, kNN+,
which, unlike a vanilla kNN, models the re-
lationship between a data point and each of
its neighbors to represent the neighborhood,
using language-agnostic Transformers.
• Our evaluation results on eight languages
from two different datasets for abusive lan-
guage detection show sizable improvements
of up to 9.5 F1 points absolute (for Ital-
ian) over strong baselines. On average, we
485
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
achieve improvements of 3.6 F1 points for
the three languages in the Jigsaw Multilin-
gual dataset, and of 2.14 F1 points on the
WUL dataset.
2 Related Work
Below, we review recent work on abusive lan-
guage detection and neighborhood approaches.
2.1 Abusive Content Detection
Most approaches for abusive language detec-
tion use text classification models, which have
been shown to be effective for related tasks
such as sentiment analysis. This includes SVMs
(MacAvaney et al., 2019), CNNs (Georgakopoulos
et al., 2018; Badjatiya et al., 2019; Agrawal and
Awekar, 2018), LSTMs (Arango et al., 2019;
Agrawal and Awekar, 2018), BiLSTMs, with
attention (Agrawal and Awekar, 2018), Capsule
networks (Srivastava et al., 2018), and fine-tuned
Transformers (Glavaˇs et al., 2020). All
these
approaches focus on single data points, while
we also model their neighbourhoods. See Nakov
et al. (2021) for a recent survey of abusive lan-
guage detection.
Several papers studied the bias in hate speech
detection datasets and criticized the use of
within-dataset evaluations (Arango et al., 2019;
Davidson et al., 2019; Badjatiya et al., 2019), as
this is not a realistic setting, and findings about
generalizability based on such experimental set-
tings are questionable. A more realistic and robust
evaluation setting was investigated by Glavaˇs
et al. (2020), who showed the performance of
online abuse detectors in a zero-shot cross-lingual
setting. They fine-tuned several multilingual lan-
guage models (Devlin et al., 2019; Conneau and
Lample, 2019; Conneau et al., 2020; Sanh et al.,
2019; Wang et al., 2020) such as XLM-RoBERTa
and mBERT on English datasets and observed
how these models transfer to datasets in five
other languages. Other cross-lingual abuse detec-
tion efforts include using Twitter user features
for detecting hate speech in English, German,
and Portuguese (Fehn Unsv˚ag and Gamb¨ack,
2018); cross-lingual embeddings (Ranasinghe and
Zampieri, 2020); and using multingual lexicon
with deep learning (Pamungkas and Patti, 2019).
Considerable relevant research was also done
as part of the OffensEval shared task at Sem-
Eval (Zampieri et al., 2019a,b, 2020; Rosenthal
et al., 2021).
While understanding the performance of zero-
shot cross-lingual models is interesting from a
natural language understanding point of view, in
reality, a platform willing to deploy an abusive lan-
guage detection system can almost always provide
some examples of malicious content for training.
Thus, a few-shot or a low-shot scenario is more
realistic, and we approach cross-lingual transfer
learning from that perspective. We hypothesize
that a nearest-neighbor model is a reasonable
choice in such a scenario, and we propose several
improvements over such a model.
2.2 Neighbourhood Models
kNN models have been used for a number of NLP
tasks such as part of speech tagging (Daelemans
et al., 1996) and morphological analysis (Bosch
et al., 2007), among many others. Their effective-
ness is rooted in the underlying similarity function,
and thus non-linear models such as neural net-
works can bring additional boost to their perfor-
mance. More recently, Kaiser et al. (2017) used a
similarly differentiable memory that is learned and
updated during training and is then applied to
one-shot learning tasks. Khandelwal et al. (2020)
introduced kNN retrieval for improving language
modeling, which Kassner and Sch¨utze (2020)
extended to question answering (QA). Guu
et al. (2020) proposed a framework for retrieval-
augmented language modeling (REALM), show-
ing its effectiveness on three Open QA datasets.
Lewis et al. (2020) explored a retrieval-augmented
generation for a variety of tasks, including fact-
checking and QA, among others. Fan et al. (2021)
introduced a kNN framework for dialogue gener-
ation using pre-trained embeddings enhanced by
learning an alignment function for retrieval from
a set of external multi-modal evidence sources.
Finally, Wallace et al. (2018) proposed a deep
kNN approach for interpreting the predictions
from a neural network for the task of natural lan-
guage inference.
All
the above approaches use neighbors as
additional information sources, but do not con-
sider the interactions between the neighbors as
we do. Moreover, there is no existing work on
using deep kNN models for cross-lingual abusive
content detection.
486
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
3 kNN+ Framework
We present our kNN+ framework below.
3.1 Problem Setting
Our goal is to learn a content flagging model from
source and target datasets in different languages
with different label spaces—see Figure 1 for an
illustration of our framework.
j)}nt
i )}ns
Formally, we assume access to a source dataset
for content flagging, Ds = {(xs
i , ys
i=1, where
∈ Y. Further, a
xs
is a textual content and ys
i
i
target dataset is given, Dt = {(xt
j, yt
j=1, where
∈ {f lagged, neutral}. Ds is resource-rich
yt
j
(i.e., ns (cid:5) nt) and label-rich (i.e., |Y| > 2). The
label space, Y = {hate, insult, . . . , neutral},
of Ds contains fine-grained labels for differ-
ent levels of abusiveness along with the neutral
the label space of Ds to
label. We convert
align it with the label space of Dt as follows:
Y (cid:6) = {f lagged | x ∈ Y, x (cid:7)= neutral}. Note
that this conversion is needed at training time to
compute label agreement in our proposed neigh-
borhood framework. However, at inference time, a
conversion of the label space of Ds is not needed,
as the label of an item from Dt is predicted using
the latent representations of the neighbors, rather
than their labels. This process is described in more
detail in Section 3.3.
3.2 Why a Neighbourhood Framework?
A vanilla kNN predicts a content label by aggre-
gating the labels of k similar training instances.
To this end, it uses the content as a query to retrieve
neighbors from the training instances. We hypoth-
esize that this retrieval step can be performed in
a cross-lingual transfer learning scenario. In our
setting, the queries are target dataset instances,
and we index the source dataset for retrieval.
Note that the target instances could also be con-
sidered as neighbors for retrieval, but we exclude
them, as the target dataset is small.
For a vanilla kNN model, the queries and the
documents are represented using lexical features,
and thus the model suffers from the curse of
dimensionality (Radovanovi´c et al., 2009). More-
over, the prediction pipeline becomes inefficient
if the source dataset is considerably larger than
the target dataset, as is our case here (Lu et al.,
2012). Finally, for a vanilla kNN, there is no
straight-forward way to map between different
languages for cross-lingual transfer.
487
We address
these problems by using a
representation
Transformer-based multilingual
space (Feng et al., 2020) that computes the simi-
larity between two sentences expressed in dif-
ferent
languages. We assume that efficiency
issues are less critical here for two main reasons:
(i) retrieval using dense vector sentence embed-
dings has become significantly faster with recent
advances (Johnson et al., 2021), and (ii) the
number of labeled source data examples is not
expected to go beyond millions, because obtain-
ing annotations for multilingual abusive content
detection is costly and the annotation process can
be very harmful for the human annotators as well
(Schmidt and Wiegand, 2017; Waseem, 2016;
Malmasi and Zampieri, 2018; Mathur et al., 2018).
Even though multilingual language models can
make the vanilla kNN model a viable solution for
our problem, it is hard to make predictions with
that model. Once a neighborhood is retrieved, a
vanilla kNN uses a majority voting scheme for
prediction, as the example in Figure 1 shows.
Given a flagged Turkish query, our framework
retrieves two neutral and one flagged English
neighbors. Here, the majority voting prediction
based on the neighborhood is incorrect. The
problem is this: A non-parametric vanilla kNN
cannot make a correct prediction with an incor-
rectly retrieved neighborhood. Thus, we propose
a learned voting strategy to alleviate this problem.
3.3 The Architecture of kNN+
We describe our kNN+ framework (shown in
Figure 2), including the training and the inference
procedures. The framework includes neighbor-
hood retrieval, interaction feature computation and
aggregation, and a multi-task learning objective
function for optimisation, which we describe in
detail below.
i , ys
i )}ns
i = Mretriever(xs
Neighbourhood Retrieval We construct a re-
trieval index R from the given source dataset,
Ds = {(xs
i=1. For each given example
∈ Ds, we compute its dense vector repre-
xs
i
i ). Here, Mretriever
sentation, xs
is a multilingual sentence embedding model that
we use for retrieval. There are several multilin-
gual sentence embedding models that we could
use as Mretriever (Artetxe and Schwenk, 2019;
Reimers and Gurevych, 2020; Chidambaram
et al., 2019; Feng et al., 2020). In this work, we
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
Figure 2: Two variants based on two encoding schemes used in our proposed kNN+, where Mf eature is the
interaction feature computation model, q is the query, and ci is a candidate neighbor. In the Bi-Encoder setup
(Figure 2a), the query and each candidate are encoded separately using the same Mf eature model. Afterwards, in
order to obtain a joint vector representation for each query–candidate tuple, the query’s representation (repq) is
concatenated with each candidate’s representation (repi) along with the absolute element-wise difference between
the two. In the Cross-Encoder setting (Figure 2b), the query and each candidate are passed through the Mf eature
model, which produces the joint vector representation (repi) for the query–candidate tuple. Finally, we pass each
joint representation through (i) a linear layer to predict the label agreement between the query and the candidate,
and (ii) a self-attention layer followed by a linear projection layer to predict the label of the example.
i , yi
j as our query q (i.e., q = xt
use LaBSE (Feng et al., 2020), a strong multi-
lingual sentence matching model, which has been
trained with parallel sentence pairs from 109 lan-
guages. The model is trained on 17 billion mono-
lingual sentences and 6 billion bilingual sentence
pairs and it has achieved state-of-the-art perfor-
mance for a parallel text retrieval task proposed
by Zweigenbaum et al. (2017). We use xs
i as
s) as its correspond-
a key, and we assign (xs
ing value. Our retrieval index R stores all the
key-value pairs computed from the source dataset.
Assume we have a training data point, (xt
j, yt
j)
∈ Dt, from the target dataset. We consider the
content xt
j). We
compute a vector representation of the query,
q = Mretriever(q). We use q to score each key,
i of R using cosine similarity (i.e., cos(q, xs
xs
i )).
We sort the items in R in descending order of
the scores of the keys, and we take the values
of the top-k items to construct the neighborhood
of q, Nq = {(c1, l1), (c2, l2), . . . , (ck, lk)}. Thus,
each neighbor is a tuple of a content and its label
from the source dataset. We convert fine-grained
neighbor labels to binary labels (flagged, neutral)
as described in Section 3.1, to align the label space
with the target dataset. Nevertheless, the original
fine-grained labels of the neighbors can be used
to get an explanation at inference time as this is
one of the core features of kNN-based models.
However, our focus is on combining these models
with Transformer-based ones. We leave the in-
vestigation of the explainability characteristics of
kNN+ for future work.
Interaction Feature Modeling As discussed in
Section 3.2, the neighborhood retrieval process
might lead to prediction errors. Thus, we propose
a learned voting strategy to mitigate this. Our
proposed strategy depends on how q relates to
its neighborhood Nq. To model this relationship,
we compute the interaction features between q
and the content of its j-th neighbor, cj ∈ Nq.
We obtain a set of k interaction features from k
neighbors, and we optimize them using query and
neighbor labels.
For
Similarly to Reimers and Gurevych (2019), we
apply two encoding schemes to compute the in-
teraction features: A Cross-Encoder (CE) and
Bi-Encoder (BE). Under our kNN+ framework,
we refer to the schemes as CE kNN+ for CE, and
BE kNN+ for BE. The BE kNN+ is computation-
ally inexpensive, while the CE kNN+ is more ef-
fective. We provide a justification for this as we
describe the schemes in the following paragraphs.
the CE kNN+ implementation (see
Figure 2b), we first form a set of query–neighbor
pairs Sce = {(q, c1), (q, c2), . . . , (q, ck)} by con-
catenating q with the content of each of its neigh-
bors. Then, we obtain the output representation,
repj = Mf eature(q, cj) of each (q, cj) ∈ Sce,
from a pre-trained multilingual language model
Mf eature. In this way, we create a set of interac-
tion features, Ice = {rep1, rep2, . . . , repj} from
q and its neighborhood. Throughout this paper,
the [CLS] token representation of Mf eature is
taken as its final output. We use varieties of imple-
mentations of Mf eature in the experimentation.
488
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
Figure 2b shows how the interaction features are
computed and optimized with a CE kNN+.
actions among the tokens of the query and of
the neighbor.
the
that
Note
interaction model
feature
Mf eature is different from the neighborhood
retrieval one Mretriever. We optimize interaction
features from Mf eature, and we leave retrieval
model optimization for future work.
For the BE kNN+ scheme (see Figure 2a), we
obtain the output representations of q and each
of the neighbors individually from Mf eature.
Given the representation of the query, repq =
Mf eature(q), and the representation of its jth
neighbor, repj = Mf eature(cj), we model their
interaction features by concatenating them along
with their vector difference. The interaction fea-
tures obtained for the j-th neighbor are (repq,
repj, |repq − repj|), and we construct a set of
interaction features Ibe from all the neighbors
of q. We use the vector difference |repq − repj|
along with the content vectors repq and repj
following the work of Reimers and Gurevych
(2019). They trained a sentence embedding model
using a Siamese neural network architecture with
Natural Language Inference (NLI) data. They tried
the following approaches to obtain features be-
tween the representations u and v of two sen-
tences: (u, v), (|u − v|), (u ∗ v), (|u − v|, u ∗ v),
(u, v, u ∗ v), (u, v, |u − v|), (u, v, |u − v|), (u ∗ v).
(u, v,
Their empirical analysis
|u − v|) works the best for NLI data, and thus
we apply this in our framework. We plan to ex-
plore other options in future work.
showed that
Both the cross-encoder and the bi-encoder ar-
chitectures were shown to be effective in a wide
variety of tasks including Semantic Textual Sim-
ilarity and Natural Language Inference. Reimers
and Gurevych (2019) showed that a bi-encoder
is much more efficient
than a cross-encoder,
and that bi-encoder representations can be stored
as sentence vectors. Thus, once Mf eature is
trained, the vector representations Mf eature(xs
i )
∈ Ds can be saved along with
of each xs
i
the textual contents and label. Then, at infer-
ence time, only the representation of the query
needs to be computed, which reduces the com-
putation time from k × Mf eature to a constant
time. Moreover,
the model can easily adapt
to new neighbors without the need for retrain-
ing. However, from an effectiveness perspective,
the cross-encoder is usually a better option as
it encodes the query and its neighbor jointly,
thus enabling multi-head attention-based inter-
489
Choice of Mf eature We explore two Mf eature
models for both the CE and the BE schemes: a
pre-trained XLM-R model, which we will refer
to as MXLM-R
f eature, as well as an XLM-R model aug-
mented with paraphrase knowledge, which we
will refer to as MP-XLM-R
f eature (Reimers and Gurevych,
2020). Sentence representations from XLM-R are
not aligned across languages (Ethayarajh, 2019)
and MP-XLM-R
f eature overcomes this problem. In par-
ticular, MP-XLM-R
is trained to learn sentence
f eature
semantics with parallel data from 50 languages.
Moreover, the training process includes knowl-
edge distillation from a Sentence BERT model
(Reimers and Gurevych, 2019) trained on 50 mil-
lion English paraphrases. As such, we expect
to outperform MXLM-R
MP-XLM-R
f eature, as it more
f eature
accurately captures the semantics of the query and
its neighbor sentences. Note that there is work
on producing better alignments of multilingual
vector spaces (Zhao et al., 2021), which would
allow us to consider a variety of pre-trained sen-
tence representation models, but exploring this is
outside the scope of this paper.
Interaction Features Optimization Given a
query q and its j-th neighbor, we obtain features
repj ∈ Ice and (repq, repj, |repq − repj|) ∈ Ibe
from Mf eature for the CE kNN+ and BE kNN+
schemes, respectively. For both schemes, we opti-
mize the interaction features to indicate whether a
query and its neighbor have the same or different
labels. We do this to later aggregate interaction
features from all the neighbors of a query to
model the overall agreement of the query with
the retrieved neighborhood. Our hypothesis is that
understanding individual neighbor-level agree-
ment and aggregating it will allow us also to un-
derstand the neighborhood.
We apply a fully connected layer with two
outputs over the interaction features to optimize
them. The outputs indicate the label agreement
between q and its j-th neighbor, (cj, lj) ∈ Nq.
There is a label agreement if both q and the
j-th neighbor are flagged or are both neutral,
that is, yt
j = lj. We learn the label agreement
using a binary cross-entropy loss Llal, which is
computed using the output of a softmax layer
for each example in a batch of training data.
We refer to Llal as label-agreement loss. In our
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
–
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
implementation, a batch of data comprises a query
and its k neighbors. We provide more details about
the training procedure in Section 4.4.
Note that as our model predicts label agreement,
it also indirectly predicts the label of the query
and of the neighbor. In this way, it learns rep-
resentations that separate flagged from the non-
flagged examples.
Interaction Features Aggregation The main
reasons to use interaction features for label agree-
ment is to predict whether q should be flagged or
not. In a vanilla kNN setup, there is no mechanism
to back-propagate classification errors, as the only
parameter to tune there is the hyper-parameter k.
In our model, we propose to optimize the inter-
action features—using a self-attention module—
to minimize the classification error with a fixed
neighborhood size k. To this end, we propose to
aggregate the k interaction features: Ice for CE
kNN+ and Ibe for BE kNN+. The aggregated rep-
resentation captures global information, namely,
the agreement between the query and its neigh-
borhood, whereas the interaction features capture
them locally.
We use structured self-attention (Lin et al.,
2017) to capture the neighborhood information.
At first, we construct an interaction features ma-
trix, H ∈ Rk×h from the set of k neighbors (Ice or
Ibe), where h is the dimensionality of the interac-
tion feature space. Then, we compute structured
self-attention as follows:
(cid:2)
(cid:3)(cid:3)
(cid:2)
(cid:2)a = softmax
W2 tanh
repi = (cid:2)aH
W1HT
(1)
(2)
Here, W1 ∈ Rhr×h is a matrix that en-
codes interactions between the representations
and projects the interaction features into a lower-
dimensional space, hr < h, thus making the repre-
sentation matrix hr × k dimensional. We multiply
another matrix W2 ∈ R1×hr by the resulting
representation, and we apply softmax to obtain
a probability distribution over the k neighbors.
Then, we use this probability distribution to pro-
duce an attention vector that linearly combines the
interaction features to generate the neighborhood
representation repNq , which we eventually use
for classification.
Classification Loss Optimization The aggre-
gated interaction features, repNq , are used as
an input to a softmax layer with two outputs
490
(flagged or neutral), which we optimize using a
binary cross-entropy loss, Lcll. We refer to Lcll
as classification loss.
Optimizing this loss means that
the classi-
fication decision for a query is made by com-
puting its agreement or disagreement with the
neighborhood as a whole. Our approach is a multi-
task learning one, and the final loss is computed
as follows:
L = (1 − λ) × Llal + λ × Lcll
(3)
As both the classification and the label-
agreement tasks aid each other, we adopt a multi-
task learning approach. We balance the two losses
using the hyper-parameter λ. The classification
loss forces the model to predict a label for the
query. As the model learns to predict a label for a
query, it becomes easier for it to reduce the label
agreement loss Llal. Moreover, as the model learns
to predict label agreement, it learns to compute
interaction features, which represent agreement or
disagreement. This, in turn, helps to optimize Lcll.
Note that, at inference time, our framework
requires neither the labels of the neighbors for clas-
sification, nor a heuristic-based label-aggregation
scheme. The classification layer makes a pre-
diction based on the pooled representation from
the interaction features, thus removing the need
for any heuristic-based voting strategy based on
the labels of the neighbors. Each individual in-
teraction feature from the query and a neighbor
captures the agreement between them as we opti-
mize the features via the Llal loss. The opinion of
the neighborhood is captured using an aggregation
of individual interaction features—which is dif-
ferent from a vanilla kNN—where neighborhood
opinion is captured using an individual neighbor
label. As our aggregation is performed using a
self-attention mechanism, we obtain a probability
distribution over the interaction features that we
can use to find the neighbor that influenced the
neighborhood opinion the most. We also know
both the original and the converted label of the
neighbor (see Section 3.1 for further details about
the label space conversion). The original label of
the neighbor could help us understand the predic-
tion behind the query better. For example, if the
query is flagged and the original label of the most
influential neighbor is hate, we could infer that the
query is hate speech. However, we do not explore
this direction in this paper, and we leave it as a
future work.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
4 Experimental Setting
4.1 Datasets
We conducted experiments on two different mul-
tilingual datasets covering eight languages from
six language families: Slavic, Turkic, Romance,
Germanic, Albanian, and Finno-Ugric. We used
these datasets as our target datasets, and an En-
glish dataset as the source dataset, which contains
a large number of training examples with fine-
grained categorization. Both the source and target
datasets are from the same domain (Wikipedia),
as we do not study domain adaptation techniques
in the present work. We describe these three da-
tasets in the following paragraphs. The number
of examples per dataset and the corresponding
label distributions are shown in Table 1.
Jigsaw English (Jigsaw, 2018) is an English
dataset with over 159K manually reviewed com-
ments, annotated with multiple labels. We map
the labels (toxic, severe toxic, obscene, threat, in-
sult, and identity hate) into a flagged label; if at
least one of these six labels is present for some
example, we consider that example as flagged,
and as neutral otherwise. As Jigsaw English is a
resource-rich dataset, covering different aspects of
abusive language, we use it as the source dataset.
We use all its examples for training, as we validate
our models on the target datasets’ dev sets.
Jigsaw Multilingual
(Jigsaw Multilingual,
2020) aims to improve toxicity detection by
addressing the shortcomings of the monolingual
setup. The dataset contains examples in Italian,
Turkish, and Spanish. It has binary labels (toxic
or non-toxic), and thus it aligns well with our
experimental setup. The label distribution is fairly
similar to that for Jigsaw English, as shown in
Table 1. This dataset is used for experimenting in
a resource-rich environment. As it does not have
standard training, testing, and development sets,
we split the examples in each language as follows:
1,500, 500, and 500 for Italian and Spanish, and
1,800, 600, and 600 for Turkish.
WUL (Glavaˇs et al., 2020) aims to create a fair
evaluation setup for abusive language detection
in multiple languages. Although originally in
English, multilinguality is achieved by translating
the comments as accurately as possible into five
languages: German (DE), Hungarian
different
Dataset
Examples Flagged % Neutral %
Jigsaw En
Jigsaw Multi
WUL
159,571
8,000
600
10.2
15.0
50.3
89.8
85.0
49.7
Table 1: Statistics about the dataset sizes and the
respective label distributions.
(HR), Albanian (SQ), Turkish (TR), and Rus-
sian (RU).
We use this dataset partially, by using the test
set originally generated by Wulczyn et al. (2017),
who focused on identifying personal attacks. In
contrast to Jigsaw Multilingual, this dataset is used
for experimenting in a low-resource environment.
For each language, we have 600 examples, which
are split as follows: 400 for training, 100 for
development, and 100 for testing. As abusive
content can be very culture-specific, there will be
cases, even within the same language, where some
utterances will be offensive in one culture, but not
in another. Thus, a translation-based dataset such
as WUL might not be an ideal choice, and we
acknowledge this limitation.
The results from experimenting with the above
datasets cannot be compared to those in the liter-
ature as we use the test set from these datasets to
create our train/dev/test splits. The datasets used
in previous work (Jigsaw Multilingual and WUL)
provide English-only training data and observe
the performance of different models in zero-shot
transfer learning settings. Our setup is different
as we assume that there is a limited number of
training examples in the target language. Thus, we
produce results only on a subset of the original
testset for both datasets. Therefore, our results
are not directly comparable to the results from
the literature, as both the training and the testing
datasets differ.
4.2 Baselines
We compare our proposed approach against three
families of strong baselines. The first one con-
siders training models only on the target dataset,
the second one is source adaptation, where we
use Jigsaw English as our source dataset, and
the third one consists of traditional kNN classi-
fication method, but with dense vector retrieval
using LaBSE (Feng et al., 2020). We use cosine
similarity under a LaBSE representation space to
491
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
retrieve neighbors for the baselines and for our
proposed approaches.
Target Dataset Training This family of base-
lines uses only the target dataset for training:
Lexicon approach: After standard text token-
ization and normalization of the text, we count
the number of terms it contains that are also listed
in the abusive language lexicon HurtLex.3 Based
on the development set, we learn a threshold
for the minimum number of matches required to
flag the text. Then, we apply the lexicon and the
threshold to the test set.
fastText is a baseline that uses the mean of
the token vectors obtained from fastText (Joulin
et al., 2017) word embeddings to represent a tex-
tual example. These representations are then used
in a binary logistic regression classifier.
XLM-R Target is a pre-trained XLM-R model,
summing the cosine similarities for the retrieved
flagged and neutral neighbors, respectively; then,
the label with the highest score is returned.
4.3 Evaluation Measures
Following prior work on abusive language de-
tection, we use F1 measure for evaluation. The
F1 measure combines precision and recall (using
a harmonic mean), which are both important to
consider for automatic abusive language detec-
tion systems. In particular, online platforms strive
to remove all content that violates their policies,
and thus, if the system were to achieve 100%
recall, the contents could be further filtered by hu-
man moderators to weed out the benign content.
However, if the system’s precision were very low,
it would mean that the moderators would have to
read every piece of content on the platform.
which we fine-tune on the target dataset.
4.4 Fine-Tuning and Hyper-Parameters
Source Adaptation This family of baselines
includes variations of XLM-R:
XLM-R Mix-Adapt is a baseline model, which
we train by mixing source and target data.
This is possible because the label
inventories
of our source and target datasets are the same:
Y = {f lagged, neutral}. The mixing is done by
oversampling the target data to match the number
of instances of the source dataset. As the number
of instances in the target dataset is limited, this is
preferable to undersampling.
XLM-R Seq-Adapt (Garg et al., 2020) is
a Transformer pre-trained on the source and
fine-tuned on the target data. Here, we fine-tune
XLM-R on the Jigsaw English dataset, and then
we do a second round of fine-tuning on the
target dataset.
Nearest Neighbor We apply two nearest neigh-
bor baselines, using majority voting for label
aggregation. We varied the number of neighbors
from 3 to 20, and we found that using 10 neighbors
works best (on the dev set).
LaBSE-kNN Here the source dataset is indexed
using representations obtained from LaBSE sen-
tence embeddings (Feng et al., 2020), and the
neighbors are retrieved using cosine similarity.
Weighted LaBSE-kNN is a baseline that uses
the same retrieval step as LaBSE-kNN, but with a
weighted voting strategy: Each label is scored by
3https://github.com/valeriobasile/hurtlex.
We train all the models for 10 epochs with XLM-R
as a base transformer representation with a max-
imum sequence length of 256 tokens. However,
we make an exception for SRC (see Section 5.1):
We train it for a single epoch, as training a
neighborhood-based model on a large dataset is
resource-intensive. For all the approaches, we
use Adam with β1 0.9, β2 0.999, (cid:5) 1e-08 as
the optimizer setting. For the baseline models,
we use a batch size of 64, and a learning rate
of 4e-05. For kNN+-based models, we create a
training batch from a query and its 10-nearest
neighbors. For stable updates, we accumulate gra-
dients from 50 batches before back-propagation.
We selected the values of all of the aforemen-
tioned hyper-parameters based on the validation
set. For kNN+-based models, the best learning
rate is selected from {5e-05, 7e-05}.
5 Experimental Results
5.1 Evaluation in a Cross-lingual Setting
Table 2 shows the performance of our model vari-
ants compared to the seven strong baselines we
described above (rows 1–7). The first two rows
represent non-contextual baselines and they per-
form worse compared to the baseline pre-trained
XLM-R models fine-tuned with labeled data (rows
3–5). Specifically, the lexicon baseline performs
the worst among all, which indicates the limited
coverage of hate speech lexicon and the loss in
precision due to token mismatches and context
492
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
Table 2: Comparison of F1 scores for the baselines and for our model variants. BE kNN+ and CE
kNN+ indicate Bi-encoder and Cross-encoder schemes, respectively. SRC indicates that the model was
further pre-trained with source Jigsaw English, using data from it as both query and neighbors.
obliviousness. For example, the word monkey is
generally included in a hate speech lexicon, but the
appearance of the token in a textual content does
not necessarily mean that the content is abusive.
The
rows in Table 2 show different
variants of our framework, based on CE kNN+
and BE kNN+, that is, using cross-encoders vs.
bi-encoders. For each of the encoding schemes, we
instantiate three different models by using three
different pre-trained representations fine-tuned in
our neighborhood framework, namely: MXLM-R
f eature,
which is a pre-trained XLM-RoBERTa model
(XLM-R); MP-XLM-R
f eature , which is an XLM-R model
fine-tuned under a knowledge distillation setting
with 50 million paraphrases and parallel data in
50 languages (Reimers and Gurevych, 2020); and
MP-XLM-R
f eature model
f eature
fine-tuned with source data (here, 159,571 in-
stances from Jigsaw English) in our neighbor-
hood framework.
→ SRC, which is an MP-XLM-R
In order to train with SRC, we use all the
training data in Jigsaw English, and we retrieve
neighbors from Jigsaw English using LaBSE sen-
tence embeddings.4 Then, we use this training
4Note that we only use LaBSE for retrieval, as it has a
large coverage of languages.
493
data to fine-tune MP-XLM-R
f eature with our kNN+-based
→ SRC)
cross-encoder (CE kNN+ + MP-XLM-R
f eature
and bi-encoder (BE kNN+ + MP-XLM-R
→ SRC)
f eature
experimental setups.
This is analogous to applying sequential adap-
tation (Garg et al., 2020), but here we do it in our
neighborhood framework.
The SRC approach addresses one of the weak-
nesses of our kNN framework. The training data
is created from instances in the target dataset and
their neighbors from the source dataset. Thus, the
neighborhood model cannot use all source training
data, as it pre-selects a subset of the source data
based on similarity. This is a disadvantage com-
pared to the sequential adaptation model, which
uses all source training instances for pre-training.
In order to overcome this, we use the neigh-
borhood approach to pre-train our models with
source data.
the F1 scores
Table 2 shows
for eight
language-specific training and evaluation sets
stemming from two different data sets: Jigsaw
Multilingual and WUL. Jigsaw Multilingual is an
imbalanced dataset with 15% abusive content and
WUL is balanced (see Table 1). Thus, it is hard
to achieve high F1 score in Jigsaw Multilingual,
whereas for WUL the F1 scores are relatively
higher. Our CE kNN+ variants achieve superior
performance to all the baselines and our BE kNN+
variants as well in the majority of the cases.
The performance of the best and of the second-
best models for each language are highlighted
by bold-facing and underlining, respectively. We
attribute the higher scores achieved by CE kNN+
variants compared to the BE kNN+ on the late-
stage interaction of the query and its neighbors.
The CE kNN+ variants show a large perfor-
mance gain compared to baseline models on the
Italian and the Turkish test sets from Jigsaw Mul-
tilingual. Even though the additional SRC pre-
training is not always helpful for the CE kNN+
model, it is always helpful for the BE kNN+
model. However, both models struggle to out-
perform the baseline for the Spanish test set. We
analyzed the training data distribution for Spanish,
but we could not find any noticeable patterns.
Yet, it can be observed that the XLM-R tar-
get baseline for Spanish (2nd row, 1st column)
achieves a higher F1 score compared to the Seq-
Adapt baseline, which yields better performance
for Italian and Turkish. We believe that the in-
domain training examples are good enough to
achieve a reasonable performance for Spanish.
On the WUL dataset, BE kNN+ + MP-XLM-R
f eature
with SRC pre-training outperforms the CE kNN+
variants and all baselines for Albanian, Russian,
and Turkish. Both the BE kNN+ variants and the
CE kNN+ variants perform worse compared to
the XLM-R Mix-Adapt baseline for English. Seq-
Adapt is a recently published effective baseline
(Garg et al., 2020), but for the WUL dataset, it
does not perform well compared to the Mix-Adapt
baseline. Note that the test set for the WUL dataset
is relatively small (100 examples per language)
and the examples in the test set are human transla-
tions of the English test set. Yet, we chose this
dataset as it results in a larger coverage of lan-
guages. We acknowledge this limitation (that the
dataset is based on translations) in our experi-
ments and that is why we further use Jigsaw Multi-
lingual to demonstrate the generality of our results.
5.2 Impact of the Learned Voting Strategy
To demonstrate the effectiveness of our learned
voting strategy, we use our baselines (shown in
Table 2, rows 3–7) to retrieve neighbors, and then
we perform majority voting to predict the label of
Method
ES
IT
TR
Fine-Tuned kNN Baselines
XLM-R Target-kNN
XLM-R Mix-Adapt-kNN
XLM-R Seq-Adapt-kNN
32.3
40.9
29.7
23.8
30.3
34.9
Sentence Similarity kNN Baselines
48.5
38.3
LaBSE-kNN
Weighted LaBSE-kNN
44.7
44.8
48.5
38.2
32.2
66.0
52.1
Our Model
BE kNN+ + MP-XLM-R
f eature
CE kNN+ + MP-XLM-R
f eature
→ SRC 59.1
→ SRC 61.2
59.5
61.1
81.6
85.0
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
Table 3: Performance comparison in terms of F1
score for the baseline classification models and
the sentence similarity model LaBSE under the
majority voting kNN setup (experiments on Jig-
saw Multilingual).
a test instance. The results for all the approaches
are shown in Table 3. For comparison, we also add
the best bi-encoder and cross-encoder versions of
kNN+ (see Table 2, rows 10 and 13).
In particular, these baseline models are pre-
trained XLM-R models fine-tuned on different
combinations of source and target language data-
sets (see Fine-Tuned kNN Baselines, Table 3). For
each data case in the source dataset, we compute
its representation as the [CLS] token from the
classification model and we construct a list of vec-
tors. Given a test data case from the target dataset,
we also compute its representation based on the
[CLS] token representation from the classification
model. We then compute its cosine similarity with
each of the [CLS] vectors from the source dataset.
After that, we compute a ranked list of the top-10
neighbors based on similarity scores.
Next, we vary the number of neighbors from
three to ten—considering them in the order they
are ranked based on their similarity to the query—
to obtain a majority vote and to classify the test
example. We can see in Table 3 that the perfor-
mance is similar to that for the LaBSE-kNN and
for the Weighted LaBSE-kNN approaches in
which the neighbors are retrieved using a repre-
sentation space constructed from sentence similar-
ity data (see Sentence Similarity kNN Baselines,
Table 3). The results in Table 3 show that when
fine-tuned models are directly used in a nearest
neighbors framework without additional modifi-
cations, their performance is lower by between
494
25 and 60 F1 points absolute, compared to our
proposed kNN+ model.
These results suggest that the interactions be-
tween the query and the retrieved neighbors cap-
tured by our model are an important prerequisite
for achieving high performance.
5.3 Evaluation in a Multilingual Setting
In this subsection, we go beyond our cross-lingual
setting and we analyse the effectiveness of our
proposed model in a multilingual setting. A mul-
tilingual setting has been explored in recent work
on abusive language detection (Pamungkas and
Patti, 2019; Ousidhoum et al., 2019; Basile et al.,
2019; Ranasinghe and Zampieri, 2020; Corazza
et al., 2020; Glavaˇs et al., 2020; Leite et al.,
2020 and it
is desirable because online plat-
forms are not limited to specific languages. An
effective multilingual model unifies the two-stage
process of language detection and prediction with
a language-specific classifier. Moreover, abusive
language is generally code-mixed (Saumya et al.,
2021), which makes language-agnostic represen-
tation spaces more desirable.
We investigate a multilingual scenario, where
all target languages in our cross-lingual setting
are observed both at training and at testing time.
To this end, we create new training, development,
and testing splits in a 5:1:2 ratio from the 8,000
available data cases in the Jigsaw Multilingual
dataset. Each split contains randomly sampled
data in Italian, Spanish, and Turkish.
We train and evaluate our BE kNN+ and CE
kNN+ using the aforementioned splits; the results
are shown in Table 4. Here, we must note that
our neighborhood retrieval model is language-
agnostic, and thus we can retrieve neighbors for
queries in any language.
We find that in a multilingual scenario, our
BE kNN+ model with SRC pre-training performs
better than the CE kNN+ model. Both the BE and
the CE approaches supersede the best baseline
model Seq-Adapt. Compared to the cross-lingual
setting, there is more data in a mix of languages
available. We hypothesize that the success of
the bi-encoder model over the cross-encoder one
stems from the increase in data size.
5.4 Analysis of the BE Representation
In order to understand the impact of the repre-
→ SRC, a
sentations by BE kNN+ + MP-XLM-R
f eature
Model
Representations
Seq-Adapt
CE-kNN
BE-kNN
XLM-R
MXLM-R
f eature
MP-XLM-R
f eature
MP-XLM-R
f eature
MXLM-R
f eature
MP-XLM-R
f eature
MP-XLM-R
f eature
→ SRC
→ SRC
F1
64.4
64.2
62.8
65.1
65.5
63.7
67.6
Table 4: Effectiveness of our BE kNN+ and CE
kNN+ schemes in the multilingual setting that
we create from Jigsaw Multilingual.
Table 5: An example showing the effectiveness
of our bi-encoder representation space for com-
puting the similarity between the query (flagged)
and its neighbors. We masked the offensive tokens
in the examples for better reading experience.
model variant instantiated from our proposed kNN
framework, we computed the similarity between
the query and its neighbors in the representation
space. An example is shown in Table 5 (it is the
example from the Introduction). Given the Turk-
ish flagged query, we use LaBSE (Feng et al.,
2020) and our BE representation space to retrieve
ranked lists of its ten nearest neighbors. The table
shows the scores computed by both approaches,
and we can see that our representation can help
discriminate between flagged and neutral contents
better. When we compute the cosine similarity
between the query and the nearest neighbors, the
BE representation space assigns negative scores to
the neutral content. The LaBSE sentence embed-
dings are optimized for semantic similarity, and
thus using them does not allow us to discriminate
between flagged and neutral content.
495
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
Figure 3: Impact of re-ranking neighbors using LaBSE in the BE kNN+ representation space.
Figure 4: Multi-task loss parameter sensitivity with uncertainties from two learning rates: 5e-05, 7e-05.
We further study the impact of our represen-
tation by comparing a voting-based kNN on the
top-10 neighbors retrieved by LaBSE vs. a re-
ranking using our BE representation.
For both the LaBSE-based ranking and for our
re-ranking, at each ranking point, we apply the
majority voting kNN approach on the neighbor-
hood within that ranking point. Figure 3 shows
the results for the test part of the Jigsaw Multi-
lingual dataset (including the multilingual setup;
see Section 5.3). We can see that the re-ranking
step improves over LaBSE for all the different
numbers of neighbors.
5.5 Multi-Task Learning Parameter
Sensitivity
Our approach uses multi-task learning, where we
balance the weights of Lcll and Llal using a
hyper-parameter λ. Figure 4 shows the impact
of different values for this hyper-parameter. On
the horizontal axis, we increase the importance
of the Llal loss, and we show the performance
of all model variants on the development part of
the Jigsaw Multilingual dataset. We can see that
the models perform well if the weight for the
label-agreement loss is set to 0.7, and degrades if
it is increased.
performs strong baselines with limited training
data in the target language. We further demon-
strated the effectiveness of our framework in a
multilingual scenario, where a test data point can
be in Turkish, Italian, or Spanish.
Moreover, we provided a qualitative analysis of
the representations learned by our proposed BE
kNN+ framework, and we demonstrated that, in
the learned representation space, flagged content
stays close to flagged content, while non-flagged
stays close to non-flagged content.
Our
framework computes a neighborhood
representation for a query using an attention
mechanism, thus indicating the influence of each
individual neighbor. This and the kNN-based
architecture offer an opportunity to obtain an ex-
planation for the individual model predictions, and
such explanations can be based not only on the
textual content of the influential neighbors, but
also on their original fine-grained labels.
In future work, we plan to understand the vi-
ability of such explanations in a user study. We
also plan to evaluate our framework on other
content flagging tasks, e.g., for detecting harmful
memes (Dimitrov et al., 2021; Pramanick et al.,
2021a,b), as the framework is not limited to abu-
sive content detection.
6 Conclusion and Future Work
Acknowledgments
We proposed kNN+, a novel framework for cross-
lingual content flagging, which significantly out-
We would like to thank the entire Checkstep
team for the useful discussions on the potential
496
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
implications of this research. We would especially
like to thank Jay Alammar, who further provided
feedback on the model and created the general
conceptual diagram that explains our proposed
neighborhood framework.
References
Sweta Agrawal and Amit Awekar. 2018. Deep
learning for detecting cyberbullying across
multiple social media platforms. In Proceed-
ings of the 40th European Conference on IR Re-
search (ECIR 2018), volume 10772 of Lecture
Notes in Computer Science, pages 141–153.
Springer. https://doi.org/10.1007/978
-3-319-76941-7 11
Aym´e Arango, Jorge P´erez, and Barbara Poblete.
2019. Hate speech detection is not as easy as
you may think: A closer look at model valida-
tion. In Proceedings of the 42nd International
ACM SIGIR Conference on Research and De-
velopment in Information Retrieval, SIGIR’19,
pages 45–54, Paris, France. Association for
Computing Machinery. https://doi.org
/10.1145/3331184.3331262
Mikel Artetxe and Holger Schwenk. 2019. Mas-
sively multilingual sentence embeddings for
zero-shot cross-lingual
transfer and beyond.
Transactions of the Association for Computa-
tional Linguistics, 7:597–610. https://doi
.org/10.1162/tacl_a_00288
Pinkesh Badjatiya, Manish Gupta, and Vasudeva
Varma. 2019. Stereotypical bias removal for
hate speech detection task using knowledge-
based generalizations. In Proceedings of the
World Wide Web Conference, WWW ’19,
pages 49–59, San Francisco, CA, USA. Asso-
ciation for Computing Machinery. https://
doi.org/10.1145/3308558.3313504
Michele Banko, Brendon MacKeen, and Laurie
Ray. 2020. A unified taxonomy of harmful con-
tent. In Proceedings of the Fourth Workshop
on Online Abuse and Harms, pages 125–137,
Online. Association for Computational Lin-
guistics. https://doi.org/10.18653
/v1/2020.alw-1.16
Valerio Basile, Cristina Bosco, Elisabetta Fersini,
Debora Nozza, Viviana Patti, Francisco Manuel
Rangel Pardo, Paolo Rosso, and Manuela
Sanguinetti. 2019. SemEval-2019 task 5: Mul-
tilingual detection of hate speech against immi-
grants and women in twitter. In Proceedings of
the 13th International Workshop on Seman-
tic Evaluation, pages 54–63, Minneapolis,
Minnesota, USA. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/S19-2007
Antal van den Bosch, Bertjan Busser, Sander
Canisius, and Walter Daelemans. 2007. An ef-
ficient memory-based morphosyntactic tagger
and parser for dutch. LOT Occasional Series,
7:191–206.
Muthu Chidambaram, Yinfei Yang, Daniel Cer,
Steve Yuan, Yunhsuan Sung, Brian Strope, and
Ray Kurzweil. 2019. Learning cross-lingual
sentence representations via a multi-task dual-
encoder model. In Proceedings of the 4th Work-
shop on Representation Learning for NLP,
RepL4NLP ’19, pages 250–259, Florence, Italy.
Association for Computational Linguistics.
https://doi.org/10.18653/v1/W19
-4330
European Commission. 2020. Shaping Europe’s
digital future: The digital services act package.
https://doi.org/10.18653/v1/2020
.acl-main.747
Alexis Conneau, Kartikay Khandelwal, Naman
Goyal, Vishrav Chaudhary, Guillaume Wenzek,
Francisco Guzm´an, Edouard Grave, Myle Ott,
Luke Zettlemoyer, and Veselin Stoyanov. 2020.
Unsupervised cross-lingual representation learn-
ing at scale. In Proceedings of the 58th Annual
Meeting of the Association for Computational
Linguistics, ACL ’20, pages 8440–8451, Online.
Association for Computational Linguistics.
Alexis Conneau and Guillaume Lample. 2019.
Cross-lingual language model pretraining. In
Advances in Neural Information Processing
Systems, volume 32, Vancouver, Canada. Curran
Associates, Inc.
Michele Corazza, Stefano Menini, Elena Cabrio,
Sara Tonelli, and Serena Villata. 2020. A
multilingual evaluation for online hate speech
detection. ACM Transactions on Internet
Technology, 20(2). https://doi.org/10
.1145/3377323
497
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
Walter Daelemans, Jakub Zavrel, Peter Berck, and
Steven Gillis. 1996. MBT: A memory-based
part of speech tagger-generator. In Proceed-
ings of the Fourth Workshop on Very Large
Corpora, Herstmonceux Castle, Sussex, UK.
Association for Computational Linguistics.
Thomas Davidson, Debasmita Bhattacharya, and
Ingmar Weber. 2019. Racial bias in hate
speech and abusive language detection data-
sets. In Proceedings of the Third Workshop
on Abusive Language Online, ALW ’19,
pages 25–35, Florence, Italy. Association for
Computational Linguistics. https://doi
.org/10.18653/v1/W19-3504
Jacob Devlin, Ming-Wei Chang, Kenton Lee,
and Kristina Toutanova. 2019. BERT: Pre-
training of deep bidirectional transformers for
In Proceedings of
language understanding.
the 2019 Conference of the North American
Chapter of the Association for Computational
Linguistics: Human Language Technologies,
NAACL-HLT ’19, pages 4171–4186. Minneapolis,
Minnesota, USA.
Dimitar Dimitrov, Bishr Bin Ali, Shaden Shaar,
Firoj Alam, Fabrizio Silvestri, Hamed Firooz,
Preslav Nakov, and Giovanni Da San Martino.
2021. Detecting propaganda techniques in
memes. In Proceedings of the 59th Annual
Meeting of the Association for Computational
Linguistics and the 11th International Joint
Conference on Natural Language Processing,
ACL ’21, pages 6603–6617, Online. Associa-
tion for Computational Linguistics. https://
doi.org/10.18653/v1/2021.acl-long
.516
Kawin Ethayarajh. 2019. How contextual are con-
textualized word representations? Comparing
the geometry of BERT, ELMo, and GPT-2
embeddings. In Proceedings of the 2019 Con-
ference on Empirical Methods in Natural
Language Processing and the 9th International
Joint Conference on Natural Language Pro-
cessing, EMNLP-IJCNLP ’19, pages 55–65,
Hong Kong, China. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/D19-1006
Angela Fan, Claire Gardent, Chlo´e Braud, and
Antoine Bordes. 2021. Augmenting transform-
ers with knn-based composite memory for dia-
log. Transactions of the Association for Compu-
tational Linguistics, 9:82–99. https://doi
.org/10.1162/tacl_a_00356
Elise Fehn Unsv˚ag and Bj¨orn Gamb¨ack. 2018. The
effects of user features on Twitter hate speech
detection. In Proceedings of the 2nd Work-
shop on Abusive Language Online, ALW ’18,
pages 75–85, Brussels, Belgium. Association
for Computational Linguistics. https://
doi.org/10.18653/v1/W18-5110
Fangxiaoyu Feng, Yinfei Yang, Daniel Cer,
Naveen Arivazhagan, and Wei Wang. 2020.
Language-agnostic BERT sentence embedding.
arXiv preprint arXiv:2007.01852.
Siddhant Garg, Thuy Vu,
and Alessandro
Moschitti. 2020. Tanda: Transfer and adapt pre-
trained transformer models for answer sentence
selection. In Proceedings of the AAAI Con-
ference on Artificial Intelligence, volume 34,
pages 7780–7788. https://doi.org/10
.1609/aaai.v34i05.6282
Spiros V. Georgakopoulos, Sotiris K. Tasoulis,
Aristidis G. Vrahatis,
and Vassilis P.
Plagianakos. 2018. Convolutional neural net-
works for toxic comment classification. In Pro-
ceedings of the 10th Hellenic Conference on
Artificial Intelligence, SETN ’18, New York,
for Computing
NY, USA. Association
Machinery. https://doi.org/10.1145
/3200947.3208069
Goran Glavaˇs, Mladen Karan, and Ivan Vuli´c.
2020. XHate-999: Analyzing and detecting abu-
sive language across domains and languages.
In Proceedings of the 28th International Con-
ference on Computational Linguistics, COL-
ING ’20, pages 6350–6365, Barcelona, Spain
(Online). International Committee on Compu-
tational Linguistics. https://doi.org/10
.18653/v1/2020.coling-main.559
UK Government. 2020. Online harms white paper.
Jiafeng Guo, Yixing Fan, Qingyao Ai, and
W. Bruce Croft. 2016. A deep relevance match-
ing model for ad-hoc retrieval. In Proceedings
of the 25th ACM International on Conference
on Information and Knowledge Management,
CIKM’16, pages 55–64, New York, NY, USA.
Association for Computing Machinery.
498
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
Kelvin Guu, Kenton Lee, Zora Tung, Panupong
Pasupat, and Mingwei Chang. 2020. Retrieval
augmented language model pre-training. In
Proceedings of the 37th International Confer-
ence on Machine Learning, volume 119 of
Proceedings of Machine Learning Research,
pages 3929–3938, Online. PMLR.
Jigsaw. 2018. Toxic comment classification
challenge. https://www.kaggle.com/c
/jigsaw-toxic-comment-classification
-challenge/. Online; accessed 28 February
2021.
Jigsaw Multilingual. 2020. Jigsaw multilingual
toxic comment classification. https://www
.kaggle.com/c/jigsaw-multilingual-toxic
-comment-classification/. Online;
accessed 28 February 2021.
Jeff Johnson, Matthijs Douze, and Herv´e J´egou.
2021. Billion-scale similarity search with GPUs.
IEEE Transactions on Big Data, 7(3):535–547.
https://doi.org/10.1109/TBDATA.2019
.2921572
Armand Joulin, Edouard Grave, Piotr Bojanowski,
and Tomas Mikolov. 2017. Bag of tricks for
efficient text classification. In Proceedings of
the 15th Conference of the European Chapter of
the Association for Computational Linguistics,
EACL ’17, pages 427–431, Valencia, Spain.
Association for Computational Linguistics.
https://doi.org/10.18653/v1/E17
-2068
David J¨urgens, Libby Hemphill, and Eshwar
Chandrasekharan. 2019. A just and compre-
hensive strategy for using NLP to address
online abuse. In Proceedings of the 57th Annual
the Association for Computa-
Meeting of
tional Linguistics, ACL ’19, pages 3658–3666,
Florence, Italy. Association for Computational
Linguistics.
Lukasz Kaiser, Ofir Nachum, Aurko Roy,
and Samy Bengio. 2017. Learning to re-
member rare events. In Proceedings of
the
5th International Conference on Learning
Representations, ICLR ’17, Toulon, France.
OpenReview.net.
Nora Kassner and Hinrich Sch¨utze. 2020. BERT-
kNN: Adding a kNN search component to pre-
trained language models for better QA. In
Proceedings of the 2020 Conference on Empir-
ical Methods in Natural Language Process-
ing: Findings, EMNLP ’20, pages 3424–3430,
Online. Association for Computational Lin-
guistics. https://doi.org/10.18653/v1
/2020.findings-emnlp.307
Urvashi Khandelwal, Omer Levy, Dan Jurafsky,
Luke Zettlemoyer, and Mike Lewis. 2020.
Generalization through memorization: Nearest
neighbor language models. In Proceedings of
the 8th International Conference on Learn-
ing Representations, ICLR ’20, Addis Ababa,
Ethiopia. OpenReview.net.
Jo˜ao Augusto Leite, Diego Silva, Kalina
Bontcheva, and Carolina Scarton. 2020. Toxic
language detection in social media for Brazil-
ian Portuguese: New dataset and multilingual
analysis. In Proceedings of the 1st Conference
of the Asia-Pacific Chapter of the Associa-
tion for Computational Linguistics and the 10th
International Joint Conference on Natural Lan-
guage Processing, AACL ’20, pages 914–924,
Suzhou, China. Association for Computational
Linguistics.
Patrick Lewis, Ethan Perez, Aleksandra Piktus,
Fabio Petroni, Vladimir Karpukhin, Naman
Goyal, Heinrich K¨uttler, Mike Lewis, Wen-tau
Yih, Tim Rockt¨aschel, Sebastian Riedel, and
Douwe Kiela. 2020. Retrieval-augmented gen-
eration for knowledge-intensive NLP tasks.
In Advances in Neural Information Process-
ing Systems, volume 33 of NeurIPS ’20,
pages 9459–9474. Curran Associates, Inc.
Zhouhan Lin, Minwei Feng, Cicero Nogueira
dos Santos, Mo Yu, Bing Xiang, Bowen
Zhou, and Yoshua Bengio. 2017. A structured
self-attentive sentence embedding. In The 5th
International Conference on Learning Repre-
sentations, ICLR ’17. Toulon, France.
Wei Lu, Yanyan Shen, Su Chen, and Beng Chin
Ooi. 2012. Efficient processing of k nearest
neighbor joins using mapreduce. Proceedings
of the VLDB Endowment, 5(10):1016–1027.
https://doi.org/10.14778/2336664
.2336674
Sean MacAvaney, Hao-Ren Yao, Eugene
Yang, Katina Russell, Nazli Goharian, and
Ophir Frieder. 2019. Hate speech detection:
Challenges and solutions. PloS One, 14(8).
499
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
https://doi.org/10.1371/journal
.pone.0221152, PubMed: 31430308
Shervin Malmasi and Marcos Zampieri. 2018.
Challenges in discriminating profanity from
hate speech. Journal of Experimental & Theo-
retical Artificial Intelligence, 30(2):187–202.
https://doi.org/10.1080/0952813X
.2017.1409284
Puneet Mathur, Rajiv Ratn Shah, Ramit Sawhney,
and Debanjan Mahata. 2018. Detecting offen-
sive tweets in Hindi-English code-switched
language. In SocialNLP@ACL, pages 18–26.
Association for Computational Linguistics.
https://doi.org/10.18653/v1/W18
-3504
Preslav Nakov, Vibha Nayak, Kyle Dent, Ameya
Bhatawdekar, Sheikh Muhammad Sarwar,
Momchil Hardalov, Yoan Dinkov, Dimitrina
Zlatkova, Guillaume Bouchard, and Isabelle
Augenstein. 2021. Detecting abusive language
on online platforms: A critical analysis. arXiv
preprint arXiv:2103.00153.
Nedjma Ousidhoum, Zizheng Lin, Hongming
Zhang, Yangqiu Song, and Dit-Yan Yeung.
2019. Multilingual and multi-aspect hate speech
analysis. In Proceedings of
the 2019 Con-
ference on Empirical Methods in Natural
Language Processing and the 9th International
Joint Conference on Natural Language Pro-
cessing (EMNLP-IJCNLP), pages 4675–4684,
Hong Kong, China. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/D19-1474
Endang Wahyu Pamungkas and Viviana Patti.
2019. Cross-domain and cross-lingual abusive
language detection: A hybrid approach with
deep learning and a multilingual lexicon. In
Proceedings of the 57th Annual Meeting of
the Association for Computational Linguistics:
Student Research Workshop, pages 363–370,
Florence, Italy. Association for Computational
Linguistics. https://doi.org/10.18653
/v1/P19-2051
John Pavlopoulos, Prodromos Malakasiotis, and
Ion Androutsopoulos. 2017. Deeper attention
to abusive user content moderation. In Pro-
ceedings of the 2017 Conference on Empirical
Methods in Natural Language Processing,
pages 1125–1135, Copenhagen, Denmark.
Association for Computational Linguistics.
https://doi.org/10.18653/v1/D17
-1117
Shraman Pramanick, Dimitar Dimitrov, Rituparna
Mukherjee, Shivam Sharma, Md. Shad Akhtar,
Preslav Nakov, and Tanmoy Chakraborty.
2021a. Detecting harmful memes and their
the Association for
targets. In Findings of
Computational Linguistics: ACL-IJCNLP 2021,
pages 2783–2796, Online. Association for
Computational Linguistics. https://doi.org
/10.18653/v1/2021.findings-acl.246
Shraman Pramanick, Shivam Sharma, Dimitar
Dimitrov, Md. Shad Akhtar, Preslav Nakov,
and Tanmoy Chakraborty. 2021b. MOMENTA:
A multimodal framework for detecting harm-
ful memes and their targets. In Findings of
the Association for Computational Linguistics:
EMNLP 2021, pages 4439–4455, Punta Cana,
Dominican Republic. Association for Compu-
tational Linguistics. https://doi.org/10
.18653/v1/2021.findings-emnlp.379
Miloˇs Radovanovi´c, Alexandros Nanopoulos, and
Mirjana Ivanovi´c. 2009. Nearest neighbors in
high-dimensional data: The emergence and in-
fluence of hubs. In Proceedings of the 26th
Annual International Conference on Machine
Learning, ICML ’09, pages 865–872, New
York, NY, USA. Association for Computing
Machinery. https://doi.org/10.1145
/1553374.1553485
Tharindu Ranasinghe and Marcos Zampieri.
2020. Multilingual offensive language iden-
tification with cross-lingual embeddings. In
Proceedings of the 2020 Conference on Em-
pirical Methods in Natural Language Pro-
cessing (EMNLP), pages 5838–5844, Online.
Association for Computational Linguistics.
https://doi.org/10.18653/v1/2020
.emnlp-main.470
Nils Reimers
and Iryna Gurevych. 2019.
Sentence-BERT: Sentence embeddings using
Siamese BERT-networks. In Proceedings of
the 2019 Conference on Empirical Meth-
ods in Natural Language Processing and the
9th International Joint Conference on Natu-
ral Language Processing (EMNLP-IJCNLP),
500
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
pages 3982–3992, Hong Kong, China. Associa-
tion for Computational Linguistics. https://
doi.org/10.18653/v1/D19-1410
Nils Reimers and Iryna Gurevych. 2020. Making
monolingual sentence embeddings multilingual
using knowledge distillation. In Proceedings
of the 2020 Conference on Empirical Methods
in Natural Language Processing (EMNLP),
pages 4512–4525, Online. Association for
Computational Linguistics. https://doi.org
.org/10.18653/v1/2020.emnlp-main.365
Sara Rosenthal,
Pepa Atanasova, Georgi
Karadzhov, Marcos Zampieri, and Preslav
Nakov. 2021. SOLID: A large-scale semi-
supervised dataset for offensive language iden-
tification. In Findings of the Association for
Computational Linguistics: ACL-IJCNLP 2021,
pages 915–928, Online. Association for
Computational Linguistics. https://doi.org
/10.18653/v1/2021.findings-acl.80
Victor Sanh, Lysandre Debut, Julien Chaumond,
and Thomas Wolf. 2019. DistilBERT, a distilled
version of BERT: Smaller, faster, cheaper and
lighter. arXiv preprint arXiv:1910.01108.
Sunil Saumya, Abhinav Kumar, and Jyoti Prakash
Singh. 2021. Offensive language identifica-
tion in dravidian code mixed social media
text. In Proceedings of the First Workshop on
Speech and Language Technologies for Dra-
vidian Languages, pages 36–45, Kyiv, Ukrane.
Association for Computational Linguistics.
Anna Schmidt and Michael Wiegand. 2017. A
survey on hate speech detection using natu-
ral language processing. In SocialNLP@EACL,
pages 1–10. Association for Computational
Linguistics. https://doi.org/10.18653
/v1/W17-1101
Eyal Shnarch, Carlos Alzate, Lena Dankin,
Martin Gleize, Yufang Hou, Leshem Choshen,
Ranit Aharonov, and Noam Slonim. 2018.
Will
it blend? Blending weak and strong
labeled data in a neural network for argu-
mentation mining. In Proceedings of the 56th
Annual Meeting of the Association for Compu-
tational Linguistics, ACL ’18, pages 599–605,
Melbourne, Australia. Association for Compu-
tational Linguistics. https://doi.org/10
.18653/v1/P18-2095
Saurabh Srivastava, Prerna Khurana, and Vartika
Tewari. 2018. Identifying aggression and tox-
icity in comments using capsule network.
In Proceedings of the First Workshop on Troll-
ing, Aggression and Cyberbullying, TRAC ’18,
pages 98–105, Santa Fe, New Mexico, USA.
Association for Computational Linguistics.
https://doi.org/10.18653/v1/W19
-3517
Lukas Stappen, Fabian Brunn, and Bj¨orn Schuller.
2020. Cross-lingual zero-and few-shot hate
speech detection utilising frozen transformer
language models and AXEL. arXiv preprint
arXiv:2004.13850.
Bertie Vidgen, Alex Harris, Dong Nguyen,
Rebekah Tromble, Scott Hale, and Helen
Margetts. 2019. Challenges and frontiers in
abusive content detection. In Proceedings of
the Third Workshop on Abusive Language On-
line, ALW ’19, pages 80–93, Florence, Italy.
Association for Computational Linguistics.
https://doi.org/10.18653/v1/W19
-3509
Eric Wallace, Shi Feng, and Jordan Boyd-Graber.
2018. Interpreting neural networks with near-
the 2018
est neighbors. In Proceedings of
EMNLP Workshop BlackboxNLP: Analyzing
and Interpreting Neural Networks for NLP,
pages 136–144, Brussels, Belgium. Association
for Computational Linguistics. https://
doi.org/10.18653/v1/W18-5416
Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao,
Nan Yang, and Ming Zhou. 2020. MiniLM:
Deep self-attention distillation for task-agnostic
compression of pre-trained transformers. In
Advances in Neural Information Processing
Systems, volume 33, pages 5776–5788. Curran
Associates, Inc.
Zeerak Waseem. 2016. Are you a racist or am I
seeing things? Annotator influence on hate
speech detection on twitter. In Proceedings
of
the First Workshop on NLP and Com-
putational Social Science, pages 138–142,
Austin, Texas. Association for Computational
Linguistics. https://doi.org/10.18653
/v1/W16-5618
Ellery Wulczyn, Nithum Thain, and Lucas Dixon.
2017. Ex machina: Personal attacks seen at
501
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
scale. In Proceedings of
the 26th Interna-
tional Conference on World Wide Web, WWW
’17, pages 1391–1399, Geneva, Switzerland.
International World Wide Web Conferences
Steering Committee. https://doi.org
/10.1145/3038912.3052591
Marcos Zampieri, Shervin Malmasi, Preslav
Nakov, Sara Rosenthal, Noura Farra, and Ritesh
Kumar. 2019a. Predicting the type and target
of offensive posts in social media. In Pro-
ceedings of the 2019 Conference of the North
American Chapter of the Association for Com-
putational Linguistics: Human Language Tech-
nologies, NAACL-HLT ’19, pages 1415–1420,
Minneapolis, Minnesota. Association for Com-
putational Linguistics. https://doi.org
/10.18653/v1/N19-1144
Marcos Zampieri, Shervin Malmasi, Preslav
Nakov, Sara Rosenthal, Noura Farra, and Ritesh
Kumar. 2019b. SemEval-2019 task 6: Iden-
tifying and categorizing offensive language
in social media (OffensEval). In Proceedings
of
the 13th International Workshop on Se-
mantic Evaluation, pages 75–86, Minneapolis,
Minnesota, USA. Association for Computa-
tional Linguistics. https://doi.org/10
.18653/v1/S19-2010g
Marcos Zampieri, Preslav Nakov, Sara Rosenthal,
Pepa Atanasova, Georgi Karadzhov, Hamdy
Mubarak, Leon Derczynski, Zeses Pitenis, and
C¸ a˘grı C¸ ¨oltekin. 2020. SemEval-2020 task 12:
Multilingual offensive language identification
in social media (OffensEval 2020). In Proceed-
ings of the Fourteenth Workshop on Semantic
Evaluation, pages 1425–1447.
International
Committee for Computational Linguistics.
https://doi.org/10.18653/v1/2020
.semeval-1.188
Wei Zhao, Steffen Eger, Johannes Bjerva, and
Isabelle Augenstein. 2021. Inducing language-
agnostic multilingual representations. In Pro-
ceedings of *SEM 2021: The Tenth Joint
Conference on Lexical and Computational Se-
mantics, pages 229–240, Online. Association
for Computational Linguistics. https://doi
.org/10.18653/v1/2021.starsem-1.22
Pierre Zweigenbaum, Serge Sharoff, and Reinhard
Rapp. 2017. Overview of the second BUCC
shared task: Spotting parallel sentences in com-
parable corpora. In Proceedings of the 10th
Workshop on Building and Using Comparable
Corpora, Vancouver, Canada. Association for
Computational Linguistics. https://doi
.org/10.18653/v1/W17-2512
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
4
7
2
2
0
2
0
7
1
2
/
/
t
l
a
c
_
a
_
0
0
4
7
2
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
502
Download pdf