Book Reviews - Recherche en IA spécialisée au MIT

Book Reviews

Embeddings in Natural Language Processing: Theory and Advances
in Vector Representations of Meaning

Mohammad Taher Pilehvar and Jose Camacho-Collados
(Tehran Institute for Advanced Studies & Cardiff University)

Morgan & Claypool (Synthesis Lectures on Human Language Technologies, édité par
Graeme Hirst, volume 47), 2021, xvii+157 pp; paperback, ISBN 978-1-63639-021-5;
ebook, ISBN 978-1-63639-022-2; hardcover, ISBN 978-1-63639-023-9;
est ce que je:10.2200/S01057ED1V01Y202009HLT047

Reviewed by
Marcos Garcia
CiTIUS, University of Santiago de Compostela

Word vector representations have a long tradition in several research ﬁelds, such as cog-
nitive science or computational linguistics. They have been used to represent the mean-
ing of various units of natural languages, y compris, entre autres, words, phrases,
and sentences. Before the deep learning tsunami, count-based vector space models had
been successfully used in computational linguistics to represent the semantics of natural
languages. Cependant, the rise of neural networks in NLP popularized the use of word
embeddings, which are now applied as pre-trained vectors in most machine learning
architectures.

This book, written by Mohammad Taher Pilehvar and Jose Camacho-Collados,
provides a comprehensive and easy-to-read review of the theory and advances in vector
models for NLP, focusing specially on semantic representations and their applications.
It is a great introduction to different types of embeddings and the background and mo-
tivations behind them. In this sense, the authors adequately present the most relevant
concepts and approaches that have been used to build vector representations. They also
keep track of the most recent advances of this vibrant and fast-evolving area of research,
discussing cross-lingual representations and current language models based on the
Transformer. Donc, this is a useful book for researchers interested in computational
methods for semantic representations and artiﬁcial intelligence. Although some basic
knowledge of machine learning may be necessary to follow a few topics, the book
includes clear illustrations and explanations, which make it accessible to a wide range
of readers.

Apart from the preface and the conclusions, the book is organized into eight
chapters. In the ﬁrst two, the authors introduce some of the core ideas of NLP and
artiﬁcial neural networks, respectivement, discussing several concepts that will be useful
throughout the book. Alors, Chapters 3 à 6 present different types of vector represen-
tations at the lexical level (word embeddings, graph embeddings, sense embeddings,
and contextualized embeddings), followed by a brief chapter (7) about sentence and
document embeddings. For each speciﬁc topic, the book includes methods and data sets
to assess the quality of the embeddings. Enfin, Chapter 8 raises ethical issues involved

https://doi.org/10.1162/coli r 00410

© 2021 Association for Computational Linguistics
Published under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
(CC BY-NC-ND 4.0) Licence

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
7
3
6
9
9
1
9
7
1
8
4
6
/
c
o

je
je

_
r
_
0
0
4
1
0
p
d

b
oui
g
toi
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Computational Linguistics

Volume 47, Nombre 3

in data-driven models for artiﬁcial intelligence. Each chapter can be summarized as
follows.

Chapter 1 makes a brief introduction to some challenges of NLP, both from un-
derstanding and from generation perspectives, including different types of linguistic
ambiguity. The main part of the chapter introduces vector space models for semantic
representation, presenting the distributional hypothesis and the evolution of vector
space models.

The second chapter starts by giving a quick introduction of some linguistic fun-
damentals for NLP (syntax, morphology, and semantics) and of statistical language
models. Alors, it gives an overview of deep learning, presenting the fundamental differ-
ences between architectures, and concepts which will be referred along with the book.
Enfin, the authors present some of the most relevant knowledge resources to build
semantically richer vector representations.

Chapter 3 is an extensive review of word embeddings. It ﬁrst presents differ-
ent count-based approaches and dimensionality reduction techniques and then dis-
cusses predictive models such as Word2vec and GloVe. En plus, it also describes
character-based and knowledge-based embeddings as well as supervised and unsuper-
vised approaches of cross-lingual vector representations.

Chapter 4 illustrates the principal methods to build node and relation embeddings
from graphs. D'abord, it presents the key strategies to build node embeddings, from matrix
factorization or random walks to methods based on graph neural networks. Alors, deux
approaches regarding relation embeddings are presented: those built from knowledge
graphs, and unsupervised methods which exploit regularities in the vector space.

The next chapter (5) starts by presenting the Meaning Conﬂation Deﬁciency of static
word embeddings, which motivates research on sense representations. This chapter
discusses two main approaches to build sense embeddings: unsupervised methods to
induce senses from corpora, and knowledge-based approaches which take advantage
of lexical resources.

Chapter 6 addresses contextualized embeddings and describes the main proper-
ties of the Transformer architecture and the self-attention mechanism. It includes an
overview of these types of embeddings, from early methods that represent a word
by its context, to current language models for contextualized word representation.
A cet égard, the authors present contextualized models based on recurrent neural
réseaux (par exemple., ELMo), and on the Transformer (GPT, BERT, and some derivatives). Le
potential impact of several parameters, such as subword tokenization or the training
objectifs, is also explained, and the authors discuss various approaches to use these
models in downstream tasks, such as feature extraction and ﬁnetuning. Enfin, ils
also summarize some interesting insights regarding the exploration of the linguistic
properties encoded by neural language models.

Chapter 7 comprises a brief sketch of vector representations of longer units, tel que
sentences and documents. It presents the bag of words approach and its limitations
as well as the concept of compositionality and its signiﬁcance for the unsupervised
learning of sentence embeddings. Some supervised strategies (par exemple., training on natural
language inference or machine translation datasets) are also discussed.

Ethical aspects and biases of word representations are the focus of Chapter 8. Ici,
the authors present some risks of data-driven models for artiﬁcial intelligence and use
examples of gender stereotypes to show biases present in word embeddings, followed
by several methods aimed at reducing those biases. Dans l'ensemble, the authors emphasize the
growing interest in the NLP community to critically analyze the social impact of these
models.

700

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
7
3
6
9
9
1
9
7
1
8
4
6
/
c
o

je
je

_
r
_
0
0
4
1
0
p
d

b
oui
g
toi
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

Book Reviews

The book concludes by highlighting some of the major achievements of current vec-
tor representations and calling for more rigorous evaluations to measure their progress,
especially in languages other than English, and with an eye on interpretability.

En résumé, this book brings a high-level synthesis of different types of embed-
dings for NLP, focused on the general concepts and the most established techniques, et
includes useful pointers to delve deeper into speciﬁc topics. As the book also discusses
the most recent contextualized models (up to November 2020), it results in an attractive
combination of the foundations of vector space models with current approaches based
on artiﬁcial neural networks. As suggested by the authors, because of the explosion and
rapid development of deep learning methods for NLP, maybe “it is necessary to step
back and rethink in order to achieve true language understanding.”

Marcos Garcia is a postdoctoral researcher at CiTIUS, the Research Center in Intelligent Tech-
nologies of the University of Santiago de Compostela. He has worked on NLP topics such as
PoS-tagging, dependency parsing, and lexical semantics, and has developed resources and tools
for different languages in both industry and academia. His e-mail address is marcos.garcia
.gonzalez@usc.gal.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
7
3
6
9
9
1
9
7
1
8
4
6
/
c
o

je
je

_
r
_
0
0
4
1
0
p
d

b
oui
g
toi
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

701

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi
/
c
o

je
je
/

un
r
t
je
c
e
–
p
d

F
/

4
7
3
6
9
9
1
9
7
1
8
4
6
/
c
o

je
je

_
r
_
0
0
4
1
0
p
d

b
oui
g
toi
e
s
t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

702
Télécharger le PDF