Unleashing the True Potential of Sequence-to-Sequence Models - IA de Investigación especializada en el MIT

Unleashing the True Potential of Sequence-to-Sequence Models
for Sequence Tagging and Structure Parsing

Han He
Departamento de Ciencias de la Computación
Emory University
Atlanta, Georgia 30322 EE.UU
han.he@emory.edu

Jinho D. Choi
Departamento de Ciencias de la Computación
Emory University
Atlanta, Georgia 30322 EE.UU
jinho.choi@emory.edu

Abstracto

Sequence-to-Sequence (S2S) models have
achieved remarkable success on various text
generation tasks. Sin embargo, learning complex
structures with S2S models remains challeng-
ing as external neural modules and additional
lexicons are often supplemented to predict
non-textual outputs. We present a systematic
study of S2S modeling using contained de-
coding on four core tasks: part-of-speech tag-
ging, named entity recognition, constituency,
and dependency parsing, to develop efficient
exploitation methods costing zero extra param-
eters. En particular, 3 lexically diverse lineari-
zation schemas and corresponding constrained
decoding methods are designed and evaluated.
Experiments show that although more lexical-
ized schemas yield longer output sequences
that require heavier training, their sequences
being closer to natural language makes them
easier to learn. Además, S2S models using
our constrained decoding outperform other
S2S approaches using external resources. Nuestro
best models perform better than or comparably
to the state-of-the-art for all 4 tareas, lighting
a promise for S2S models to generate non-
sequential structures.

Introducción

Sequence-to-Sequence (S2S) models pretrained
for language modeling (PLM) and denoising ob-
jectives have been successful on a wide range of
NLP tasks where both inputs and outputs are se-
quences (Radford et al., 2019; Rafael y col., 2020;
Lewis et al., 2020; Brown y cols., 2020). Sin embargo,
for non-sequential outputs like trees and graphs, a
procedure called linearization is often required to
flatten them into ordinary sequences (Le et al.,
2018; Fern´andez-Gonz´alez and G´omez-Rodr´ıguez,
2020; Yan et al., 2021; Bevilacqua et al., 2021; Él
and Choi, 2021a), where labels in non-sequential
structures are mapped heuristically as individual

582

tokens in sequences, and numerical properties like
indices are either predicted using an external de-
coder such as Pointer Networks (Vinyals et al.,
2015a) or cast to additional tokens in the vocab-
ulary. While these methods are found to be effec-
tivo, we hypothesize that S2S models can learn
complex structures without adapting such patches.
To challenge the limit of S2S modeling, BART
(Lewis et al., 2020) is finetuned on four tasks with-
out extra decoders: part-of-speech tagging (POS),
named entity recognition (NER), constituency pars-
En g (CON), and dependency parsing (DEP). Three
novel linearization schemas are introduced for
each task: label sequence (LS), label with text
(LT), and prompt (PT). LS to PT feature an
increasing number of lexicons and a decreasing
number of labels, which are not in the vocabulary
(Sección 3). Every schema is equipped with a
constrained decoding algorithm searching over
valid sequences (Sección 4).

Our experiments on three popular datasets de-
pict that S2S models can learn these linguistic
structures without external resources such as in-
dex tokens or Pointer Networks. Our best models
perform on par with or better than the other state-
of-the-art models for all four tasks (Sección 5).
Finalmente, a detailed analysis is provided to compare
the distinctive natures of our proposed schemas
(Sección 6).1

2 Trabajo relacionado

S2S (Sutskever et al., 2014) architectures have
been effective on many sequential modeling tasks.
Conventionally, S2S is implemented as an en-
coder and decoder pair, where the encoder learns
input representations used to generate the output

1All our resources including source codes are publicly
disponible: https://github.com/emorynlp/seq2seq
-corenlp.

Transacciones de la Asociación de Lingüística Computacional, volumen. 11, páginas. 582–599, 2023. https://doi.org/10.1162/tacl a 00557
Editor de acciones: Emily Pitler. Lote de envío: 11/2022; Lote de revisión: 1/2023; Publicado 6/2023.
C(cid:2) 2023 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
5
5
7
2
1
3
4
4
9
5

/
t

a
C
_
a
_
0
0
5
5
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

sequence via the decoder. Since the input sequence
can be very long, attention mechanisms (Bahdanau
et al., 2015; Vaswani et al., 2017) focusing on par-
ticular positions are often augmented to the basic
architecture. With transfer-learning, S2S models
pretrained on large unlabeled corpora have risen
to a diversity of new approaches that convert lan-
guage problems into a text-to-text format (Akbik
et al., 2018; Lewis et al., 2020; Radford et al.,
2019; Rafael y col., 2020; Brown y cols., 2020).
entre ellos, tasks most related to our work are
linguistic structure predictions using S2S, POS,
NER, DEP, and CON.

POS has been commonly tackled as a sequence
tagging task, where the input and output sequences
have equal lengths. S2S, por otro lado, does
not enjoy such constraints as the output sequence
can be arbitrarily long. Por lo tanto, S2S is not as
popular as sequence tagging for POS. Prevailing
neural architectures for POS are often built on top
of a neural sequence tagger with rich embeddings
(Bohnet et al., 2018; Akbik et al., 2018) and Con-
ditional Random Fields (Lafferty et al., 2001).

NER has been cast to a neural sequence tagging
task using the IOB notation (Lample et al., 2016)
over the years, which benefits most from contex-
tual word embeddings (Devlin et al., 2019; Wang
et al., 2021). Early S2S-based works cast NER
to a text-to-IOB transduction problem (Chen and
Moschitti, 2018; Strakov´a et al., 2019; Zhu et al.,
2020), which is included as a baseline schema in
Sección 3.2. Yan et al. (2021) augment Pointer Net-
works to generate numerical entity spans, cual
we refrain to use because the focus of this work
is purely on the S2S itself. Más recientemente, Cual
et al. (2021) propose the first template prompting
to query all possible spans against a S2S lan-
guage model, which is highly simplified into a
one-pass generation in our PT schema. Instead of
directly prompting for the entity type, Chen et al.
(2022) propose to generate its concepts first then
its type later. Their two-step generation is tai-
lored for few-shot learning, orthogonal to our ap-
proach. Además, our prompt approach does not
rely on non-textual tokens as they do.

CON is a more established task for S2S models
since the bracketed constituency tree is naturally
a linearized sequence. Top-down tree lineariza-
tions based on brackets (Vinyals et al., 2015b)
or shift-reduce actions (Sagae and Lavie, 2005)
rely on a strong encoder over the sentence while
bottom-up ones (Zhu et al., 2013; Ma et al.,

2017) can utilize rich features from readily built
the in-order traversal
partial parses. Recientemente,
has proved superior to bottom-up and top-down
in both transition (Liu and Zhang, 2017) y
S2S (Fern´andez-Gonz´alez and G´omez-Rodr´ıguez,
2020) constituency parsing. Más recientemente, a
Pointer Networks augmented approach (Yang and
Tu, 2022) is ranked top among S2S approaches.
Since we are interested in the potential of S2S
models without patches, a naive bottom-up baseline
and its novel upgrades are studied in Section 3.3.
DEP has been underexplored as S2S due to the
linearization complexity. The first S2S work maps
a sentence to a sequence of source sentence words
interleaved with the arc-standard, reduce-actions
in its parse (Wiseman and Rush, 2016), cual es
adopted as our LT baseline in Section 3.4. zhang
et al. (2017) introduce a stack-based multi-layer
attention mechanism to leverage structural lin-
guistics information from the decoding stack in
arc-standard parsing. Arc-standard is also used
in our LS baseline, sin embargo, we use no such ex-
tra layers. Apart from transition parsing, Li et al.
(2018) directly predict the relative head posi-
tion instead of the transition. This schema is
later extended to multilingual and multitasking by
Choudhary and O’riordan (2021). Their encoder
and decoder use different vocabularies, while in
our PT setting, we re-use the vocabulary in the
S2S language model.

S2S appears to be more prevailing for semantic
parsing due to two reasons. Primero, synchronous
context-free grammar bridges the gap between
natural text and meaning representation for S2S.
It has been employed to obtain silver annotations
(Jia and Liang, 2016), and to generate canonical
natural language paraphrases that are easier to
learn for S2S (Shin et al., 2021). This trend of in-
sights viewing semantic parsing as prompt-guided
generación (Hashimoto et al., 2018) and paraphras-
En g (Berant and Liang, 2014) has also inspired our
design of PT. Segundo, the flexible input/output
format of S2S facilitates joint learning of seman-
tic parsing and generation. Latent variable sharing
(Tseng et al., 2020) and unified pretraining (Bai
et al., 2022) are two representative joint model-
ing approaches, which could be augmented with
our idea of PT schema as a potentially more ef-
fective linearization.

Our finding that core NLP tasks can be solved
using LT overlaps with the Translation between
Augmented Natural Languages (Paolini et al.,

583

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
5
5
7
2
1
3
4
4
9
5

/
t

a
C
_
a
_
0
0
5
5
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

2021). Sin embargo, we take one step further to
study the impacts of textual tokens in schema
design choices. Our constrained decoding is sim-
ilar to existing work (Hokamp and Liu, 2017;
Deutsch et al., 2019; Shin et al., 2021). Nosotros
craft constrained decoding algorithms for our pro-
posed schemas and provide a systematic ablation
study in Section 6.1.

3 Schemas

This section presents our output schemas for
POS, NER, CON, and DEP in Table 1. For each
tarea, 3 lexically diverse schemas are designed as
follows to explore the best practice for structure
aprendiendo. Primero, Label Sequence (LS) is defined as
a sequence of labels consisting of a finite set of
task-related labels, that are merged into the S2S
vocabulary, with zero text. Segundo, Label with
Texto (LT) includes tokens from the input text on
top of the labels such that it has a medium num-
ber of labels and text. Tercero, PrompT (PT) gives
a list of sentences describing the linguistic struc-
ture in natural language with no label. We hy-
pothesize that the closer the output is to natural
idioma, the more advantage the S2S takes from
the PLM.

3.1 Part-of-Speech Tagging (POS)

LS LS defines the output as a sequence of POS
tags. Formalmente, given an input sentence of n to-
kens x = {x1, x2, · · · , xn}, its output is a tag se-
quence of the same length yLS = {y1, y2, · · · ,
en}. Distinguished from sequence tagging, cualquier
LS output sequence is terminated by the ‘‘end-
of-sequence’’ (EOS) simbólico, which is omitted
from yLS for simplicity. Predicting POS tags
often depends on their neighbor contexts. Nosotros
challenge that the autoregressive decoder of a
S2S model can capture this dependency through
self-attention.

LT For LT, the token from the input is inserted
before its corresponding tag. Formalmente, La salida
is defined yLT = {(x1, y1), (x2, y2), .., (xn, en)}.
Both x and y are part of the output and the S2S
model is trained to generate each pair sequentially.

PT PT is a human-readable text describing the
POS sequence. Específicamente, we use a phrase
i =‘‘xi is y(cid:3)
yPT
i is
the definition of a POS tag yi, p.ej., a noun. El

i’’ for the i-th token, where y(cid:3)

final prompt is then the semicolon concatenation
of all phrases: yPT = yPT

2 ; · · · ; yPT
norte .

1 ; yPT

3.2 Named Entity Recognition (NER)

LS LT of an input sentence comprising n tokens
x = {x1, x2, · · · , xn} is defined as the BIEOS tag
sequence yLS = {y1, y2, · · · , en}, which labels
each token as the Beginning, Inside, End, Outside,
or Single-token entity.

LT LT uses a pair of entity type labels to wrap
each entity: yLT = ..B-yj, xi, .., xi+k, E-yj, ..,
is the type label of the j-th entity
where yj
consisting of k tokens.

i’’, where y(cid:3)

PT PT is defined as a list of sentences describ-
i =‘‘xi is y(cid:3)
ing each entity: yPT
i is
the definition of a NER tag yi, p.ej., una persona.
Different from the prior prompt work (Cui et al.,
2021), our model generates all entities in one
pass which is more efficient than their brute-force
acercarse.

3.3 Constituency Parsing (CON)

Schemas for CON are developed on constituency
trees pre-processed by removing the first level of
non-terminals (POS tags) and rewiring their chil-
niños (tokens) to parents, p.ej., (notario público (PRON My)
(NOUN friend)) → (NP My friend).

LS LS is based on a top-down shift-reduce sys-
tem consisting of a stack, a buffer, and a depth
record d. Initially, the stack contains only the root
constituent with label TOP and depth 0; the buffer
contains all tokens from the input sentence; d is
set to 1. A Node-X (N-X) transition creates a
new depth-d non-terminal labeled with X, pushes
it to the stack, and sets d ← d + 1. A Shift
(SH) transition removes the first token from the
buffer and pushes it to the stack as a new terminal
with depth d. A Reduce (RE) pops all elements
with the same depth d from the stack then make
them the children of the top constituent of the
stack, and it sets d ← d − 1. The linearization of
a constituency tree using our LS schema can be
obtained by applying 3 string substitutions: re-
place each left bracket and the label X following
it with a Node-X, replace terminals with SH,
replace right brackets with RE.

LT LT is derived by reverting all SH in LS back
to the corresponding tokens so that tokens in LT
effectively serves as SH in our transition system.

584

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
5
5
7
2
1
3
4
4
9
5

/
t

a
C
_
a
_
0
0
5
5
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
5
5
7
2
1
3
4
4
9
5

/
t

a
C
_
a
_
0
0
5
5
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Mesa 1: Schemas for the sentence ‘‘My friend who lives in Orlando bought me a gift from Disney
World’’.

PT PT is also based on a top-down lineariza-
ción, although it describes a constituent using tem-
plates: ‘‘pi has {cj}'', where pi is a constituent
and cj-s are its children. To describe a constitu-

ent, the indefinite article ‘‘a’’ is used to denote a
new constituent (p.ej., ‘‘. . . has a noun phrase’’).
The definite article ‘‘the’’ is used for referring
to an existing constituent mentioned before (p.ej.,

585

‘‘the noun phrase has …''), or describing a con-
stituent whose children are all terminals (p.ej.,
‘‘. . . has the noun phrase ‘My friend’’’). Cuando
describing a constituent that directly follows its
mencionar, the determiner ‘‘which’’ is used instead
of repeating itself multiple times e.g., ‘‘(. . . y
the subordinating clause, which has …''). Sen-
tences are joined with a semicolon ‘‘;’’ as the
final prompt.

3.4 Dependency Parsing (DEP)

LS LS uses three transitions from the arc-
standard system (Nivre, 2004): shift (SH), izquierda
arc (<), and right arc (>).

LT LT for DEP is obtained by replacing each
SH in a LS with its corresponding token.

PT PT is derived from its LS sequence by re-
moving all SH. Entonces, for each left arc creating
an arc from xj to xi with dependency relation r
(p.ej., a possessive modifier), a sentence is created
by applying the template ‘‘xi is r of xj’’. Para
each right arc creating an arc from xi to xj with
the dependency relation r, a sentence is created
with another template ‘‘xi has r xj’’. The prompt
is finalized by joining all such sentences with a
punto y coma.

4 Decoding Strategies

To ensure well-formed output sequences that
match the schemas (Sección 3), a set of constrained
decoding strategies is designed per task except for
CON, which is already tackled as S2S model-
ing without constrained decoding (Vinyals et al.,
2015b; Fern´andez-Gonz´alez and G´omez-Rodr´ıguez,
2020). Formalmente, given an input x and any par-
tial y 2n then

return {EOS}

else

if i is even then
}
return {x i

else

return D

Algoritmo 2: Prefix Matching
Function PrefixMatch(t , pag):

node ← T
while node and p do

node ← node.children[p1]
p ← p>1
return node

depends on the parity of
Algorithm 1.

i, as defined in

PT The PT generation can be divided into two
phases: token and ‘‘is-a-tag’’ statement genera-
ciones. A binary status u is used to indicate whether
yi is expected to be a token. To generate a token,
an integer k ≤ n is used to track the index of
the next token. To generate an ‘‘is-a-tag’’ state-
mento, an offset o is used to track the beginning
of an ‘‘is-a-tag’’ statement. Each description dj
of a POS tag tj is extended to a suffix sj =
‘‘is dj ; ’’.

Suffixes are stored in a trie tree T to facilitate
prefix matching between a partially generated
statement and all candidate suffixes, as shown
in Algorithm 2. The full decoding is depicted in
Algorithm 3.

4.2 Named Entity Recognition

LS Similar to POS-LS, the NextY for NER
returns BIEOS tags if i ≤ n else EOS.

LT Opening tags (<>) in NER-LT are grouped
into a vocabulary O. The last generated output
token yi−1 (assuming y0 = BOS, a.k.a. beginning
of a sentence) is looked up in O to decide what
type of token will be generated next. To enforce
label consistency between a pair of tags, a variable
e is introduced to record the expected closing
tag. Reusing the definition of k in Algorithm 3,
decoding of NER-LT is described in Algorithm 4.

586

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
5
5
7
2
1
3
4
4
9
5

/
t

a
C
_
a
_
0
0
5
5
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Algoritmo 3: Constrained POS-PT
u ← true, k ← 0, o ← 0
Function NextY(X, yoh)
if node.children is empty then

u ← true
return NextY(X, y n then

Y ← Y ∪ {EOS}

else

Y ← Y ∪ {xk}

return Y

PT For each entity type ei, its description di is
filled into the template ‘‘is di;’’ to create an ‘‘is-a’’
suffix si. Since the prompt is constructed using
text while the number of entities is variable, es
not straightforward to tell whether a token belongs
to an entity or an ‘‘is-a’’ suffix. Por lo tanto, a noisy
segmentation procedure is utilized to split a phrase
into two parts: entity and ‘‘is-a’’ suffix. Each si is
collected into a trie S to perform segmentation of
a partially generated phrase p (Algoritmo 5).

Once a segment is obtained, the decoder is
constrained to generate the entity or the suffix.
For the generation of an entity, string matching
is used to find every occurrence o of its partial
generation in x and add the following token xo+1

587

Algoritmo 5: Segmentation
Function Segment(S, pag):
for i ← 1 a |pag| hacer

entidad, suffix = p≤i, p>i
node ← PrefixMatch(S, p>i)
if node then

return entity, suffix, nodo

return null

Algoritmo 6: Constrained NER-PT
Function NextY(X, y o then

spans ← spans ∪ {(oh, i, null)}

spans ← spans ∪ {(i, j, v)}

if o < |x| + 1 then spans ← spans ∪ {(o, |x| + 1, null)} return spans while parent do foreach sibling of parent do if sibling.label is label and sibling has no children then return sibling parent ← parent.parent return null Algorithm 10: Reverse CON-PT Function Reverse(T , x): root ← parent ← new TOP-tree latest ← null foreach (i, j, v) ∈ Split(T , x) do if v then if xi:j starts with ‘‘the’’ then target ← FindTarget(parent, v) else latest ← new v-tree add latest to parent.children latest.parent ← parent else if xi:j starts with ‘‘has’’ or ‘‘which has’’ then parent ← latest add tokens in ‘‘’’ into latest return root parent. The search of the target constituent is described in Algorithm 9. shows Algorithm 10 reverse final the creating new constituents, and sentences attach- ing new constituents to existing ones. Splitting is done by longest-prefix-matching (Algorithm 7) using a trie T built with the definite and indef- inite article versions of the description of each constituent label e.g., ‘‘the noun phrase’’ and ‘‘a noun phrase’’ of NP. Algorithm 8 describes the splitting procedure. Once a prompt is split into two types of sen- tences, a constituency tree is then built accord- ingly. We use a variable parent to track the last constituent that gets attachments, and another var- iable latest to track the current new consis- tent that gets created. Due to the top-down nature of linearization, the target constituent that new constituents are attached to is always among the siblings of either parent or the ancestors of linearization. 4.4 Dependency Parsing LS Arc-standard (Nivre, 2004) transitions are added to a candidate set and only transitions per- mitted by the current parsing state are allowed. LT DEP-LS replaces all SH transitions with input tokens in left-to-right order. Therefore, an incremental offset is kept to generate the next token in place of each SH in DEP-LT. PT DEP-PT is more complicated than CON-PT because each sentence contains one more token. Its generation is therefore divided into 4 possible token (1st), relation (rel), sec- states: first ond token (2ed), and semicolon. An arc-standard 588 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 5 7 2 1 3 4 4 9 5 / / t l a c _ a _ 0 0 5 5 7 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Algorithm 11: Recall Shift Function RecallShift(system, i, xj): while system.si is not xj do system.apply(SH) transition system is executed synchronously with constrained decoding since PT is essentially a sim- plified transition sequence with all SH removed. Let b and s be the system buffer and stack, respec- tively. Let c be a set of candidate tokens that will be generated in y, which initially contains all input tokens and an inserted token ‘‘sentence’’ that is only used to represent the root in ‘‘the sentence has a root . . .’’ A token is removed from c once it gets popped out of s. Since DEP-PT generates no SH, each input token xj in y effectively introduces SH(s) till it is pushed onto s at index i (i ∈ {1, 2}), as formally described in Algorithm 11. After the first token is generated, its offset o in y is recorded such that the following relation sequence yi>o can be located. To decide the next
token of yi>o, it is then prefix-matched with a
trie T built with the set of ‘‘has-’’ and ‘‘is-’’
dependency relations. The children of the prefix-
matched node are considered candidates if it
has any. De lo contrario, the dependency relation is
marked as completed. Once a relation is gen-
erated, the second token will be generated in a
similar way. Finalmente, upon the completion of a
oración, the transition it describes is applied to
the system and c is updated accordingly. The full
procedure is described in Algorithm 12. Since a
transition system has been synchronously main-
tained with constrained decoding, no extra re-
verse linearization is needed.

5 experimentos

For all tasks, BART-Large (Lewis et al., 2020)
is finetuned as our underlying S2S model. Nosotros
also tried T5 (Rafael y col., 2020), although its
performance was less satisfactory. Every model
is trained three times using different random seeds
and their average scores and standard deviations
on the test sets are reported. Our models are ex-
perimented on the OntoNotes 5 (Weischedel et al.,
2013) using the data split suggested by Pradhan
et al. (2013). Además, two other popular data-
sets are used for fair comparisons to previous
obras: the Wall Street Journal corpus from the

Algoritmo 12: Constrained DEP-PT
(estado, transition, t1, t2, oh) ← (1st, null, null, null, 0)
c ← {oración} ∪ y
Function NextY(X, yoh)
if node.children then

Y ← Y ∪ {node.children}

else

relation ← the relation in y>o
if y>o starts with ‘‘is’’ then

transition ← LA-relation

else

transition ← RA-relation

status ← 2ed

else if status is 2ed then

Y ← Y ∪ c
status ← semicolon

else if status is semicolon then

t2 ← yi−1
Y ← Y ∪ {; }
RecallShift(sistema, 1, t2)
RecallShift(sistema, 2, t1)
if transition starts with LA then

remove s1 from c

else

remove s2 from c
system.apply(transition)
if system is terminal then
Y ← Y ∪ {EOS}

status ← 1st

return Y

Penn Treebank 3 (Marcus et al., 1993) for POS,
DEP, and CON, as well as the English portion
of the CoNLL’03 dataset (Tjong Kim Sang and
De Meulder, 2003) for NER.

589

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
5
5
7
2
1
3
4
4
9
5

/
t

a
C
_
a
_
0
0
5
5
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

OntoNotes

Modelo

CoNLL’03

OntoNotes 5

Modelo

Bohnet et al. (2018)

He and Choi (2021b)
LS

PTB

97.96

–
97.51 ± 0.11
97.70 ± 0.02
97.64 ± 0.01

–
98.32 ± 0.02
98.21 ± 0.02
98.40 ± 0.01
98.37 ± 0.02

Mesa 2: Results for POS.

Each token is independently tokenized using
the subword tokenizer of BART and merged into
an input sequence. The boundary information for
each token is recorded to ensure full tokens are
generated in LT and PT without broken pieces.
To fit in the positional embeddings of BART, sen-
tences longer than 1,024 subwords are discarded,
which include 1 sentence from the Penn Tree-
bank 3 training set, y 24 sentences from the
OntoNotes 5 training set. Development sets and
test sets are not affected.

5.1 Part-of-Speech Tagging

Token level accuracy is used as the metric for
POS. LT outperforms LS although LT is twice as
long as LS, suggesting that textual tokens posi-
tively impact the learning of the decoder (Mesa 2).
PT performs almost the same with LT, tal vez
due to the fact that POS is not a task requiring a
powerful decoder.

5.2 Named Entity Recognition

For CoNLL’03, the provided splits without merg-
ing the development and training sets are used.
For OntoNotes 5, the same splits as Chiu and
Nichols (2016), Li et al. (2017); Ghaddar and
Langlais (2018); He and Choi (2020, 2021b) son
usado. Labeled span-level F1 score is used for
evaluación.

We acknowledge that the performance of NER
systems can be largely improved by rich embed-
dings (Wang y cols., 2021), document context fea-
turas (Yu et al., 2020), dependency tree features
(Xu et al., 2021), and other external resources.
While our focus is the potential of S2S, we mainly
consider two strong baselines that also use BART
the generative
as the only external resource:
BART-Pointer framework (Yan et al., 2021) y
the recent template-based BART NER (Cui et al.,
2021).

As shown in Table 3, LS performs the worst
on both datasets, possibly attributed to the fact

590

Clark et al. (2018)

Peters et al. (2018)

Akbik et al. (2019)

Strakov´a et al. (2019)

Yamada et al. (2020)
Yu et al. (2020)†
Yan et al. (2021)‡S
Cui et al. (2021)S
He and Choi (2021b)

Wang et al. (2021)

Zhu and Li (2022)

Ye et al. (2022)
LS

92.60

92.22

93.18

93.07

92.40

92.50

93.24

92.55

–

94.6

–

–
70.29 ± 0.70
92.75 ± 0.03
93.18 ± 0.04

–

89.83

90.38

–
89.04 ± 0.14
–

91.74

91.9
84.61 ± 1.18
89.60 ± 0.06
90.33 ± 0.04

Mesa 3: Results for NER. S denotes S2S.

that the autoregressive decoder overfits the high-
order left-to-right dependencies of BIEOS tags.
LT performs close to the BERT-Large biaffine
modelo (Yu et al., 2020). PT performs compa-
rably well with the Pointer Networks approach
(Yu et al., 2020) and it outperforms the template
prompting (Cui et al., 2021) by a large margin,
suggesting S2S has the potential to learn struc-
tures without using external modules.

5.3 Constituency Parsing

All POS tags are removed and not used in train-
ing or evaluation. Terminals belonging to the
same non-terminal are flattened into one con-
stituent before training and unflattened in post-
Procesando. The standard constituent-level F-score
produced by the EVALB3 is used as the evalua-
tion metric.

Mesa 4 shows the results on OntoNotes 5 y
PTB 3. Incorporating textual tokens into the out-
put sequence is important on OntoNotes 5, dirigir-
ing to a +0.9 F-score, while it is not the case on
PTB 3. It is possibly due to the fact that Onto-
Notes is more diverse in domains, requiring a
higher utilization of pre-trained S2S for domain
transfer. PT performs the best, and it has a com-
petitive performance to recent works, a pesar de la
fact that it uses no extra decoders.

3https://nlp.cs.nyu.edu/evalb/.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
5
5
7
2
1
3
4
4
9
5

/
t

a
C
_
a
_
0
0
5
5
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Modelo

PTB 3

OntoNotes 5

Fern´andez-Gonz´alez and
G´omez-Rodr´ıguez (2020)S
Mrini et al. (2020)

He and Choi (2021b)
Yang and Tu (2022)S
LS

91.6

96.38

–

96.01
95.23 ± 0.08
95.24 ± 0.04
95.34 ± 0.06

–

–
94.43 ± 0.03
–
93.40 ± 0.31
94.32 ± 0.11
94.55 ± 0.03

Mesa 4: Results for CON. S denotes S2S.

UAS
91.17

93.71

94.11

Modelo
Wiseman and Rush (2016)S
Zhang et al. (2017)S
Li et al. (2018)S
Mrini et al. (2020)
LS

97.42
92.83 ± 0.43
95.79 ± 0.07
95.91 ± 0.06
(a) PTB results for DEP.

Modelo
He and Choi (2021b)
LS

UAS
95.92 ± 0.02
86.54 ± 0.12
94.15 ± 0.14
94.51 ± 0.22

(b) OntoNotes results for DEP.

LAS
87.41

91.60

92.08

96.26
90.50 ± 0.53
93.17 ± 0.16
94.31 ± 0.09

LAS
94.24 ± 0.03
83.84 ± 0.13
91.27 ± 0.19
92.81 ± 0.21

Mesa 5: Results for DEP. S denotes S2S.

Modelo

w/o CD

PTB
97.51 ± 0.11
97.51 ± 0.11
97.70 ± 0.02
97.67 ± 0.02
97.64 ± 0.01
97.55 ± 0.02

OntoNotes
98.21 ± 0.02
98.21 ± 0.02
98.40 ± 0.01
98.39 ± 0.01
98.37 ± 0.02
98.29 ± 0.05

(a) Accuracy of ablation tests for POS.

Modelo

w/o CD

CONLL 03
70.29 ± 0.70
66.33 ± 0.73
92.75 ± 0.03
92.72 ± 0.02
93.18 ± 0.04
93.12 ± 0.06

OntoNotes 5
84.61 ± 1.18
84.57 ± 1.16
89.60 ± 0.06
89.50 ± 0.07
90.33 ± 0.04
90.23 ± 0.05

(b) F1 of ablation tests for NER.

Modelo

w/o CD

PTB
90.50 ± 0.53
90.45 ± 0.47
93.17 ± 0.16
93.12 ± 0.14
94.31 ± 0.09
81.50 ± 0.27

OntoNotes
83.84 ± 0.13
83.78 ± 0.13
91.27 ± 0.19
91.05 ± 0.20
92.81 ± 0.21
81.76 ± 0.36

5.4 Dependency Parsing

The constituency trees from PTB and OntoNotes
are converted into the Stanford dependencies
v3.3.0 (De Marneffe and Manning, 2008) for DEP
experimentos. Forty and 1 non-projective trees are
removed from the training and development sets
of PTB 3, respectivamente. For OntoNotes 5, estos
numbers are 262 y 28. Test sets are not affected.
As shown in Table 5, textual tokens are cru-
cial
in learning arc-standard transitions using
S2S, conduciendo a +2.6 y +7.4 LAS improve-
mentos, respectivamente. Although our PT method
underperforms recent state-of-the-art methods, él
has the strongest performance among all S2S
approaches. Curiosamente, our S2S model man-
ages to learn a transition system without explic-
itly modeling the stack, the buffer, the partial
parse, or pointers.

We believe that the performance of DEP with
S2S can be further improved with a larger and
more recent pretrained S2S model and dynamic
oracle (Goldberg and Nivre, 2012).

Mesa 6: Ablation test results.

6 Análisis

6.1 Ablation Study

We perform an ablation study to show the perfor-
mance gain of our proposed constrained decoding
algorithms on different tasks. Constrained decod-
ing algorithms (CD) are compared against free
generación (w/o CD) where a model freely gener-
ates an output sequence that is later post-processed
into task-specific structures using string-matching
normas. Invalid outputs are patched to the greatest
extent, p.ej., POS label sequences are padded or
truncated. As shown in Table 6, ablation of con-
strained decoding seldom impacts the performance
of LS on all tasks, suggesting that the decoder of
seq2seq can acclimatize to the newly added label

591

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
5
5
7
2
1
3
4
4
9
5

/
t

a
C
_
a
_
0
0
5
5
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

tokens. Curiosamente, the less performant NER-LS
model degrades the most, promoting the neces-
sity of constrained decoding for weaker seq2seq
modelos. The performance of LT on all tasks is
marginally degraded when constrained decoding
is ablated, indicating the decoder begins to gen-
erate structurally invalid outputs when textual
tokens are freely generated. This type of problem
seems to be exacerbated when more tokens are
freely generated in the PT schemas, especialmente para
the DEP-PT.

Unlike POS and NER, DEP is more prone
to hallucinated textual tokens as early errors in
the transition sequence get accumulated in the
arc-standard system which shifts all later predic-
tions off the track. It is not yet a critical problem
as LS generates no textual tokens while a textual
token in LT still serves as a valid shift action
even if it is hallucinated. Sin embargo, a hallucinated
textual token in PT is catastrophic as it could be
part of any arc-standard transitions. As no explicit
shift transition is designed, a hallucinated token
could lead to multiple instances of missing shifts
in Algorithm 12.

6.2 Case Study

To facilitate understanding and comparison of
different models, a concrete example of input
(I), gold annotation (GRAMO), and actual model pre-
diction per each schema is provided below for
each task. Wrong predictions and corresponding
ground truth are highlighted in red and teal,
respectivamente.

POS In the following example, only PT correctly
detects the past tense (VBD) of ‘‘put’’.

I: The word I put in boldface is extremely interesting.
GRAMO: DT NN PRP VBD IN NN VBZ RB JJ .
LS: DT NN PRP VBP IN NN VBZ RB RB JJ
LT: The/DT word/NN I/PRP put/VBP in/IN

boldface/NN is/VBZ extr./RB interesting/JJ./.
PT: ‘‘The’’ is a determiner; ‘‘word’’ is a singular noun;

‘‘I’’ is a personal pronoun; ‘‘put’’ is a past tense verb;
‘‘in’’ is a preposition or subordinating conjunction;
‘‘boldface’’ is a singular noun; ‘‘is’’ is a 3rd person
singular present verb; ‘‘extremely’’ is an adverb;
‘‘interesting’’ is an adjective; ‘‘.’’ is a period.

I: Large image of the Michael Jackson HIStory statue.
GRAMO: Large image of the Michael Jackson
statue.
(cid:5)

(cid:2)

(cid:3)(cid:4)
PERSON(PER)

HIStory
(cid:2) (cid:3)(cid:4) (cid:5)
WOA

LS: O O O O B-PER E-PER S-ORG O O O
LT: Large image of the Michael Jackson

HIStory statue.
PT: ‘‘Michael Jackson’’ is a person;
‘‘HIStory’’ is an art work.

CON As highlighted with strikeout
text be-
bajo, LS and LT failed to parse ‘‘how much’’
as a wh-noun phrase and a wh-adverb phrase,
respectivamente.

I: It’s crazy how much he eats.
GRAMO: (S (notario público (NP It)) (VP ’s (ADJP crazy) (SBAR
(WHNP (WHADJP how much)) (S (NP he)
(VP eats)))) .)

LS: N-S N-NP N-NP SH RE RE N-VP SH N-ADJP SH
RE N-SBAR N-WHNP N-WHADVP SH SH RE RE
N-S N-NP SH RE N-VP SH RE RE RE RE SH RE
LT: (S (notario público (NP It)) (VP ’s (ADJP crazy) (SBAR
(WHNP (WHADJP how much)) (S (NP he)
(VP eats)))) .)

PT: a sentence has a simple clause, which has a noun

phrase and a verb phrase and ‘‘.’’; the noun phrase
has a noun phrase ‘‘It’’, the verb phrase has ‘‘’s’’
and an adjective phrase ‘‘crazy’’ and a subordinating
cláusula, which has a wh-noun phrase and a simple
cláusula; the wh-noun phrase has a wh-adjective
phrase ‘‘how much’’, the simple clause has a noun
phrase ‘‘he’’ and a verb phrase ‘‘eats’’.

DEP In the following example, LS incorrectly
attached ‘‘so out of’’ to ‘‘place’’, and LT wrongly
attached ‘‘so’’ to ‘‘looks’’.

I: It looks so out of place.

GRAMO:
LS: SH SH LA-nsubj SH SH SH SH LA-advmod
LA-advmod LA-advmod RA-acomp SH
RA-punct RA-root

LT: It looks LA-nsubj so out of place RA-pobj

RA-pcomp RA-prep RA-ccomp . RA-punct
RA-root

PT: ‘‘It’’ is a nominal subject of ‘‘looks’’; ‘‘so’’ is an

adverbial modifier of ‘‘out’’; ‘‘of’’ has an object of
a preposition ‘‘place’’; ‘‘out’’ has a prepositional
complement ‘‘of’’; ‘‘looks’’ has a prepositional
modifier ‘‘out’’; ‘‘looks’’ has a punctuation ‘‘.’’;
‘‘sentence’’ has a root ‘‘looks’’.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
5
5
7
2
1
3
4
4
9
5

/
t

a
C
_
a
_
0
0
5
5
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

NER In the following example, LS and LT
could not correctly recognize ‘‘HIStory’’ as an
art work, possibly due to its leading uppercase
letters.

6.3 Design Choices

In the interest of experimentally comparing the
schema variants, we would like each design we

592

Modelo
POS-PT
dec.LEX
DEP-PT
dec.LEX

PTB 3
97.64 ± 0.01
97.63 ± 0.02
94.31 ± 0.09
93.89 ± 0.18

OntoNotes 5
98.37 ± 0.02
98.35 ± 0.03
92.81 ± 0.21
91.19 ± 0.86

Mesa 7: Study of lexicality on POS and DEP.

Modelo
NER-PT
inc.VRB
Modelo
CON-PT
inc.VRB

CONLL 03
93.18 ± 0.04
92.47 ± 0.03
PTB 3
95.34 ± 0.06
95.19 ± 0.06

OntoNotes 5
90.33 ± 0.04
89.63 ± 0.23
OntoNotes 5
94.55 ± 0.03
94.02 ± 0.49

Mesa 8: Study of verbosity on NER and CON.

consider to be equivalent in some systematic way.
Para tal fin, we fix other aspects and variate
two dimensions of the prompt design, lexicality,
and verbosity, to isolate the impact of individual
variables.

Lexicality We call the portion of textual tok-
ens in a sequence its lexicality. De este modo, LS and PT
have zero and full lexicality, respectivamente, mientras
LT falls in the middle. To tease apart the impact
of lexicality, we substitute the lexical phrases
with corresponding tag abbreviations in PT on
POS and DEP, p.ej., ‘‘friend’’ is a noun →
‘‘friend’’ is a NN, ‘‘friend’’ is a nominal sub-
ject of ‘‘bought’’ → ‘‘friend’’ is a nsubj of
‘‘bought’’. Tags are added to the BART vocabu-
lary and learned from scratch as LS and LT. Como
mostrado en la tabla 7, decreasing the lexicality of PT
marginally degrades the performance of S2S on
POS. On DEP, the performance drop is rather sig-
nificant. Similar trends are observed comparing
LT and LS in Section 5, confirming that lexicons
play an important role in prompt design.

Verbosity Our PT schemas on NER and CON
are designed to be as concise as human narrative,
and as easy for S2S to generate. Another design
choice would be as verbose as some LS and LT
schemas. To explore this dimension, we increase
the verbosity of NER-PT and CON-PT by adding
‘‘isn’t an entity’’ for all non-entity tokens and
substituting each ‘‘which’’ to its actual referred
phrase, respectivamente. The results are presented in
Mesa 8. Though increased verbosity would elim-
inate any ambiguity, unfortunately, it hurts per-
rendimiento. Emphasizing a token ‘‘isn’t an entity’’
might encounter the over-confidence issue as the
boundary annotation might be ambiguous in gold
NER data (Zhu and Li, 2022). CON-PT deviates
from human language style when reference is
forbidden, which eventually makes it lengthy and
hard to learn.

6.4 Stratified Analysis

Sección 5 shows that our S2S approach performs
comparably to most ad-hoc models. To reveal its
pros and cons, we further partition the test data
using task-specific factors and run tests on them.
The stratified performance on OntoNotes 5 is com-
pared to the strong BERT baseline (He and Choi,
2021b), which is representative of non-S2S models
implementing many state-of-the-art decoders.

For POS, we consider the rate of Out-Of-
Vocabulary tokens (OOV, tokens unseen in the
training set) in a sentence as the most significant
factor. As illustrated in Figure 1a, the OOV rate
degrades the baseline performance rapidly, es-
pecially when over half tokens in a sentence are
OOV. Sin embargo, all S2S approaches show strong
resistance to OOV, suggesting that our S2S mod-
els unleash greater potential
through transfer
aprendiendo.

For NER, entities unseen during training often
confuse a model. This negative impact can be
observed on the baseline and LT in Figure 1b.
Sin embargo, the other two schemas generating tex-
tual tokens, LT and PT, are less severely impacted
by unseen entities. It further supports the intuition
behind our approach and agrees with the finding
by Shin et al. (2021): With the output sequence
being closer to natural language, the S2S model
has less difficulty generating it even with unseen
entidades.

Since the number of binary parses for a sen-
tence of n + 1 tokens is the nth Catalan Number
(Church and Patil, 1982), the length is a crucial
factor for CON. As shown in Figure 1c, all models,
especially LS, perform worse when the sentence
gets longer. Curiosamente, by simply recalling all
the lexicons, LT easily regains the ability to parse
long sentences. Using an even more natural rep-
resentimiento, PT outperforms them with a perfor-
mance on par with the strong baseline. It again

593

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
5
5
7
2
1
3
4
4
9
5

/
t

a
C
_
a
_
0
0
5
5
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 1: Factors impacting each task: the rate of OOV tokens for POS, the rate of unseen entities for NER, el
sentence length for CON, and the head-dependent distance for DEP.

supports our intuition that natural language is
beneficial for pretrained S2S.

For DEP, the distance between each depen-
dent and its head is used to factorize the overall
actuación. As shown in Figure 1d, the gap
between S2S models and the baseline increases
with head-dependent distance. The degeneration
of relatively longer arc-standard transition se-
quences could be attributed to the static oracle
used in finetuning.

Comparing the three schemas across all sub-
grupos, LT uses the most special
tokens but
performs the worst, while PT uses zero special
tokens and outperforms the rest two. It suggests
that special tokens could harm the performance
of the pretrained S2S model as they introduce
a mismatch between pretraining and finetuning.
With zero special tokens, PT is most similar to
natural language, and it also introduces no ex-
tra parameters in finetuning, leading to better
actuación.

7 Conclusión

We aim to unleash the true potential of S2S
models for sequence tagging and structure parsing.
Para tal fin, we develop S2S methods that rival
state-of-the-art approaches more complicated than
nuestro, without substantial task-specific architecture
modifications. Our experiments with three novel
prompting schemas on four core NLP tasks dem-
onstrated the effectiveness of natural language
in S2S outputs. Our systematic analysis revealed
the pros and cons of S2S models, appealing for
more exploration of structure prediction with S2S.
Our proposed S2S approach reduces the need
for many heavily engineered task-specific archi-
tectures. It can be readily extended to multi-task
and few-shot learning. We have a vision of S2S
playing an integral role in more language under-
standing and generation systems. The limitation

of our approach is its relatively slow decoding
speed due to serial generation. This issue can be
mitigated with non-autoregressive generation and
model compression techniques in the future.

Expresiones de gratitud

We would like to thank Emily Pitler, Cindy
robinson, Ani Nenkova, and the anonymous
TACL reviewers for their insightful and thoughtful
feedback on the early drafts of this paper.

Referencias

Alan Akbik, Tanja Bergmann, and Roland
Vollgraf. 2019. Pooled contextualized embed-
dings for named entity recognition. En profesional-
cesiones de la 2019 Conference of the North
la Asociación para
American Chapter of
Ligüística computacional: Human Language
Technologies, Volumen 1 (Long and Short
Documentos), pages 724–728, Mineápolis,Minnesota.
Asociación de Lingüística Computacional.
https://doi.org/10.18653/v1/N19
-1078

Alan Akbik, Duncan Blythe, and Roland Vollgraf.
2018. Contextual String Embeddings for Se-
quence Labeling. In Proceedings of the 27th
International Conference on Computational
Lingüística, COLING’18, pages 1638–1649.

Dzmitry Bahdanau, Kyunghyun Cho, y yoshua
bengio. 2015. Traducción automática neuronal por
aprender juntos a alinear y traducir. en 3ro
Conferencia Internacional sobre Aprendizaje Repre-
sentaciones, ICLR 2015, San Diego, California, EE.UU,
May 7–9, 2015, Conference Track Proceedings.

Xuefeng Bai, Yulong Chen, and Yue Zhang.
2022. Graph pre-training for AMR parsing

594

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
5
5
7
2
1
3
4
4
9
5

/
t

a
C
_
a
_
0
0
5
5
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

y generación. In Proceedings of the 60th
Annual Meeting of the Association for Compu-
lingüística nacional (Volumen 1: Artículos largos),
pages 6001–6015, Dublín, Irlanda. asociación-
ción para la Lingüística Computacional.

Jonathan Berant and Percy Liang. 2014. Seman-
tic parsing via paraphrasing. En procedimientos
of the 52nd Annual Meeting of the Associa-
ción para la Lingüística Computacional (Volumen 1:
Artículos largos), pages 1415–1425, baltimore,
Maryland. Asociación de Lin Computacional-
guísticos. https://doi.org/10.3115/v1
/P14-1133

Michele Bevilacqua, Rexhina Blloshmi, y
Roberto Navigli. 2021. One spring to rule them
ambos: Symmetric AMR semantic parsing and
generation without a complex pipeline. En profesional-
ceedings of AAAI. https://doi.org/10
.1609/aaai.v35i14.17489

Bernd Bohnet, ryan mcdonald, Gonc¸alo
Sim˜oes, Daniel Andor, Emily Pitler, y
Joshua Maynez. 2018. Morphosyntactic tag-
ging with a Meta-BiLSTM model over context
sensitive token encodings. En procedimientos de
the 56th Annual Meeting of
the Associa-
ción para la Lingüística Computacional, ACL’18,
pages 2642–2652. https://doi.org/10
.18653/v1/P18-1246

Tom Brown, Benjamín Mann, Nick Ryder,
Melanie Subbiah, Jared D.. Kaplan, Prafulla
Dhariwal, Arvind Neelakantan, Pranav Shyam,
Girish Sastry, Amanda Askell, sandhi
agarwal, Ariel Herbert-Voss, Gretchen Krueger,
Tom Henighan, niño rewon, Aditya Ramesh,
Daniel Ziegler, Jeffrey Wu, Clemens Winter,
Chris Hesse, Marcos Chen, Eric Sigler, Mateusz
Litwin, Scott Gris, Benjamin Chess, Jacobo
clark, Christopher Berner, Sam McCandlish,
Alec Radford,
Ilya Sutskever, and Dario
Amodei. 2020. Language models are few-
En-
shot
formation Processing Systems, volumen 33,
páginas 1877-1901. Asociados Curran, Cª.

En avances en neurología

learners.

Jiawei Chen, Qing Liu, Hongyu Lin, Xianpei
Han, and Le Sun. 2022. Few-shot named en-
tity recognition with self-describing networks.
En actas de la 60.ª reunión anual de
la Asociación de Lingüística Computacional
(Volumen 1: Artículos largos), pages 5711–5722,

595

Dublín,
Irlanda. Asociación de Computación-
lingüística nacional. https://doi.org/10
.18653/v1/2022.acl-long.392

Lingzhen Chen and Alessandro Moschitti. 2018.
Learning to progressively recognize new
named entities with sequence to sequence
modelos. En procedimientos de
the 27th Inter-
national Conference on Computational Lin-
guísticos, pages 2181–2191, Santa Fe, Nuevo
México, EE.UU. Asociación de Computación
Lingüística.

Jason Chiu and Eric Nichols. 2016. Named entity
recognition with bidirectional LSTM-CNNs.
Transactions of the Association for Computa-
lingüística nacional, 4:357–370. https://doi
.org/10.1162/tacl_a_00104

Chinmay Choudhary and Colm O’riordan. 2021.
End-to-end mBERT based seq2seq enhanced
dependency parser with linguistic typology
conocimiento. In Proceedings of the 17th Inter-
national Conference on Parsing Technologies
and the IWPT 2021 Shared Task on Parsing
into Enhanced Universal Dependencies (IWPT
2021), pages 225–232, En línea. Asociación para
Ligüística computacional. https://doi.org
/10.18653/v1/2021.iwpt-1.24

Kenneth Church and Ramesh Patil. 1982.
Coping with syntactic ambiguity or how to
put the block in the box on the table. amer-
ican Journal of Computational Linguistics,
8(3–4):139–149.

Kevin Clark, Minh-Thang Luong, Cristóbal D..
Manning, and Quoc Le. 2018. Semi-supervised
sequence modeling with cross-view training.
En procedimientos de
el 2018 Conferencia sobre
Métodos empíricos en Natural Language Pro-
cesando, pages 1914–1925, Bruselas, Bélgica.
Asociación de Lingüística Computacional.
https://doi.org/10.18653/v1/D18
-1217

Leyang Cui, Yu Wu, Jian Liu, Sen Yang, y
Yue Zhang. 2021. Template-based named en-
tity recognition using BART. En hallazgos de
the Association for Computational Linguis-
tics: ACL-IJCNLP 2021, pages 1835–1845,
En línea. Asociación de Lin Computacional-
guísticos. https://doi.org/10.18653/v1
/2021.findings-acl.161

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
5
5
7
2
1
3
4
4
9
5

/
t

a
C
_
a
_
0
0
5
5
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Marie-Catherine De Marneffe and Christopher
D. Manning. 2008. The Stanford typed de-
pendencies representation. In COLING 2008:
Actas de
the Workshop on Cross-
framework and Cross-domain Parser Evalu-
ación, pages 1–8. https://doi.org/10
.3115/1608858.1608859

Daniel Deutsch, Shyam Upadhyay, and Dan
Roth. 2019. A general-purpose algorithm for
constrained sequential inference. En curso-
cosas de
the 23rd Conference on Computa-
tional Natural Language Learning (CONLL),
pages 482–492, Hong Kong, Porcelana. asociación-
ción para la Lingüística Computacional. https://
doi.org/10.18653/v1/K19-1045

Jacob Devlin, Ming-Wei Chang, Kenton Lee, y
Kristina Toutanova. 2019. BERT: Pre-entrenamiento
de transformadores bidireccionales profundos para el lenguaje
comprensión. En procedimientos de
el 2019
Conferencia del Capítulo Norteamericano
de la Asociación de Linguis Computacional-
tics: Tecnologías del lenguaje humano, Volumen 1
(Artículos largos y cortos), páginas 4171–4186,
Mineápolis, Minnesota. Asociación para Com-
Lingüística putacional.

Daniel Fern´andez-Gonz´alez and Carlos G´omez-
Rodr´ıguez. 2020. Enriched in-order lineariza-
tion for faster sequence-to-sequence constituent
analizando. In Proceedings of the 58th Annual
Meeting of the Association for Computational
Lingüística, pages 4092–4099, En línea. Associ-
ation for Computational Linguistics.

Abbas Ghaddar and Phillippe Langlais. 2018.
Robust lexical features for improved neural
network named-entity recognition. En curso-
ings of the 27th International Conference on
Ligüística computacional, pages 1896–1907,
Santa Fe, New Mexico, EE.UU. Asociación para
Ligüística computacional.

Yoav Goldberg and Joakim Nivre. 2012. A
dynamic oracle for arc-eager dependency
analizando. In Proceedings of COLING 2012,
pages 959–976, Mumbai, India. The COLING
2012 Organizing Committee.

Tatsunori B. Hashimoto, Kelvin Gu, Yonatan
Oren, and Percy S. Liang. 2018. A retrieve-
and-edit framework for predicting structured
outputs. Advances in Neural Information Pro-
cessing Systems, 31.

Han He and Jinho Choi. 2020. Establishing
strong baselines for the new decade: Sequence
tagging, syntactic and semantic parsing with
bert. In The Thirty-Third International Flairs
Conferencia.

Han He and Jinho D. Choi. 2021a. Levi graph
AMR parser using heterogeneous attention. En
Actas de
the 17th International Con-
ference on Parsing Technologies and the
IWPT 2021 Shared Task on Parsing into En-
hanced Universal Dependencies (IWPT 2021),
pages 50–57, En línea. Asociación de Computación-
lingüística nacional. https://doi.org/10
.18653/v1/2021.iwpt-1.5

Han He and Jinho D. Choi. 2021b. The stem
cell hypothesis: Dilemma behind multi-task
learning with transformer encoders. En profesional-
cesiones de la 2021 Conferencia sobre Empirismo
Métodos en el procesamiento del lenguaje natural,
pages 5555–5577, En línea y Punta Cana,
República Dominicana. Asociación de Computación-
lingüística nacional. https://doi.org/10
.18653/v1/2021.emnlp-main.451

Chris Hokamp and Qun Liu. 2017. Lexically
constrained decoding for sequence generation
using grid beam search. En Actas de la
55ª Reunión Anual de la Asociación de
Ligüística computacional (Volumen 1: Long Pa-
pers), pages 1535–1546, vancouver, Canada.
Asociación de Lingüística Computacional.
https://doi.org/10.18653/v1/P17
-1141

Robin Jia and Percy Liang. 2016. Data recom-
bination for neural semantic parsing. En profesional-
cesiones de
the 54th Annual Meeting of
la Asociación de Lingüística Computacional
(Volumen 1: Artículos largos), pages 12–22, Berlina,
Alemania. Asociación de Lin Computacional-
guísticos. https://doi.org/10.18653/v1
/P16-1002

John D. Lafferty, Andrew McCallum, y
Fernando C. norte. Pereira. 2001. Conditional
random fields: Probabilistic models for seg-
menting and labeling sequence data. In ICML,
pages 282–289.

Guillaume Lample, Miguel Ballesteros, Sandeep
Subramanian, Kazuya Kawakami, and Chris
Dyer. 2016. Neural architectures for named en-
tity recognition. En Actas de la 2016

596

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
5
5
7
2
1
3
4
4
9
5

/
t

a
C
_
a
_
0
0
5
5
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

the North American Chap-
Conference of
the Association for Computational
ter of
Lingüística: Tecnologías del lenguaje humano,
pages 260–270, San Diego, California. Associ-
ation for Computational Linguistics. https://
doi.org/10.18653/v1/N16-1030

mike lewis, Yinhan Liu, Naman Goyal,
Marjan Ghazvininejad, Abdelrahman Mohamed,
Omer Levy, Veselin Stoyanov, y lucas
Zettlemoyer. 2020. BART: Denoising sequence-
to-sequence pre-training for natural language
generación, traducción, and comprehension. En
Actas de la 58ª Reunión Anual de
la Asociación de Lingüística Computacional,
pages 7871–7880, En línea. Asociación para
Ligüística computacional. https://doi.org
/10.18653/v1/2020.acl-main.703

Peng-Hsuan Li, Ruo-Ping Dong, Yu-Siang Wang,
Ju-Chieh Chou, and Wei-Yun Ma. 2017. Lev-
eraging linguistic structures for named en-
tity recognition with bidirectional recursive
neural networks. En Actas de la 2017
Jornada sobre Métodos Empíricos en Natu-
Procesamiento del lenguaje oral, pages 2664–2669,
Copenhague, Dinamarca. Asociación para Com-
Lingüística putacional.

Zuchao Li, Jiaxun Cai, Shexia He, and Hai Zhao.
2018. Seq2seq dependency parsing. En curso-
ings of the 27th International Conference on
Ligüística computacional, pages 3203–3214,
Santa Fe, New Mexico, EE.UU. Asociación para
Ligüística computacional.

Jiangming Liu and Yue Zhang. 2017. In-order
transition-based constituent parsing. Transac-
tions of
the Association for Computational
Lingüística, 5:413–424. https://doi.org
/10.1162/tacl_a_00070

Chunpeng Ma, Lemao Liu, Akihiro Tamura,
Tiejun Zhao, and Eiichiro Sumita. 2017. De-
terministic attention for sequence-to-sequence
constituent parsing. In Thirty-First AAAI Con-
ference on Artificial Intelligence. https://
doi.org/10.1609/aaai.v31i1.10967

Mitchell P. marco, Mary Ann Marcinkiewicz,
and Beatrice Santorini. 1993. Building a
Large Annotated Corpus of English: El
Penn Treebank. Ligüística computacional,
19(2):313–330. https://doi.org/10.21236
/ADA273556

Khalil Mrini, Franck Dernoncourt, Quan Hung
Tran, Trung Bui, Walter Chang, and Ndapa
Nakashole. 2020. Rethinking self-attention:
Towards interpretability in neural parsing. En
Hallazgos de la Asociación de Computación
Lingüística: EMNLP 2020, pages 731–742,
En línea. Association for Computational Linguis-
tics. https://doi.org/10.18653/v1
/2020.findings-emnlp.65

Joakim Nivré. 2004. Incrementality in deter-
ministic dependency parsing. En procedimientos
de
the Workshop on Incremental Parsing:
Bringing Engineering and Cognition Together,
pages 50–57, Barcelona, España. Asociación para
Ligüística computacional. https://doi
.org/10.3115/1613148.1613156

Giovanni Paolini, Ben Athiwaratkun,

Jason
Krone, Jie Ma, Alessandro Achille, RISHITA
ANUBHAI, Cicero Nogueira dos Santos, Bing
Xiang, and Stefano Soatto. 2021. Structured
prediction as translation between augmented
natural languages. In International Conference
on Learning Representations.

Matthew E. Peters, Mark Neumann, Mohit Iyyer,
Matt Gardner, Christopher Clark, Kenton Lee,
and Luke Zettlemoyer. 2018. Deep contextu-
alized word representations. En procedimientos de
el 2018 Conference of the North American
Chapter of the Association for Computational
Lingüística: Tecnologías del lenguaje humano,
Volumen 1 (Artículos largos), pages 2227–2237,
Nueva Orleans, Luisiana. Asociación para Com-
Lingüística putacional. https://doi.org
/10.18653/v1/N18-1202

Sameer Pradhan, Alessandro Moschitti, Nianwen
Xue, Hwee Tou Ng, Anders Bj¨orkelund, Olga
Uryupina, Yuchen Zhang, and Zhi Zhong.
2013. Towards Robust Linguistic Analysis
using OntoNotes. In Proceedings of the Seven-
teenth Conference on Computational Natural
Aprendizaje de idiomas, pages 143–152, Sofia,
Bulgaria. Asociación de Lin Computacional-
guísticos.

Alec Radford, Jeffrey Wu, niño rewon, David
Luan, Dario Amodei, Ilya Sutskever, et al.
2019. Language models are unsupervised mul-
titask learners. OpenAI blog, 1(8):9.

Colin Raffel, Noam Shazeer, Adam Roberts,
Katherine Lee, Sharan Narang, Miguel

597

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
5
5
7
2
1
3
4
4
9
5

/
t

a
C
_
a
_
0
0
5
5
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

mañana, Yanqi Zhou, wei li, y Pedro J.. Liu.
2020. Explorando los límites del aprendizaje por transferencia
con un transformador unificado de texto a texto. Diario
de Investigación sobre Aprendizaje Automático, 21(140):1–67.

Kenji Sagae

and Alon Lavie.

2005. A
classifier-based parser with linear
run-time
complejidad. In Proceedings of the Ninth In-
ternational Workshop on Parsing Technology,
pages 125–132, vancouver, British Columbia.
Asociación de Lingüística Computacional.
https://doi.org/10.3115/1654494
.1654507

Richard Shin, Christopher Lin, Sam Thomson,
Charles Chen, Subhro Roy, Emmanouil
Antonios Platanios, Adam Pauls, Dan Klein,
Jason Eisner, and Benjamin Van Durme. 2021.
Constrained language models yield few-shot
semantic parsers. En Actas de la 2021
Jornada sobre Métodos Empíricos en Natu-
Procesamiento del lenguaje oral, pages 7699–7715,
En línea y Punta Cana, República Dominicana.
Asociación de Lingüística Computacional.
https://doi.org/10.18653/v1/2021
.emnlp-main.608

Jana Strakov´a, Milan Straka, and Jan Hajic.
2019. Neural architectures for nested NER
through linearization. En Actas de la
57ª Reunión Anual de la Asociación de
Ligüística computacional, pages 5326–5331,
Florencia, Italia. Asociación de Computación
Lingüística. https://doi.org/10.18653
/v1/P19-1527

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le.
2014. Sequence to sequence learning with
neural networks. Advances in Neural Informa-
tion Processing Systems, 27.

Erik F. Tjong Kim Sang and Fien De Meulder.
2003. Introduction to the CoNLL-2003 shared
tarea: Language-independent named entity
the Seventh
recognition. En procedimientos de
Conference on Natural Language Learning at
HLT-NAACL 2003, pages 142–147. https://
doi.org/10.3115/1119176.1119195

Bo-Hsiang Tseng,

Jianpeng Cheng, Yimai
Fang, and David Vandyke. 2020. A generative
model for joint natural language understand-
ing and generation. In Proceedings of the 58th
Annual Meeting of the Association for Com-
Lingüística putacional, pages 1795–1807, On-

598

line. Association for Computational Linguis-
tics. https://doi.org/10.18653/v1
/2020.acl-main.163

Ashish Vaswani, Noam Shazeer, Niki Parmar,
Jakob Uszkoreit, Leon Jones, Aidan N..
Gómez, lucas káiser, y Illia Polosukhin.
2017. Attention is all you need. In Advances
en sistemas de procesamiento de información neuronal,
pages 5998–6008.

Oriol Vinyals, Meire Fortunato, and Navdeep
Jaitly. 2015a. Pointer networks. In Advances
en sistemas de procesamiento de información neuronal,
volumen 28.

Oriol Vinyals, lucas káiser, Terry Koo, Slav
Petrov, Ilya Sutskever, and Geoffrey Hinton.
2015b. Grammar as a foreign language. En
Avances en el procesamiento de información neuronal
Sistemas, volumen 28. Asociados Curran, Cª.

Xinyu Wang, Yong Jiang, Nguyen Bach, tao
Wang, Zhongqiang Huang, Fei Huang, y
Kewei Tu. 2021. Automated concatenation
of embeddings for structured prediction. En
Proceedings of the 59th Annual Meeting of
la Asociación de Lingüística Computacional
y la 11ª Conferencia Conjunta Internacional
sobre el procesamiento del lenguaje natural (Volumen 1:
Artículos largos), pages 2643–2660, En línea.
Asociación de Lingüística Computacional.
https://doi.org/10.18653/v1/2021
.acl-long.206

Ralph Weischedel, Martha Palmer, mitchell
marco, Eduard Hovy, Sameer Pradhan, Lance
Ramshaw, Nianwen Xue, Ann Taylor, Jeff
Kaufman, Michelle Franchini, and Mohammed
El-Bachouti, Robert Belvin, and Ann Houston.
2013. Ontonotes release 5.0 ldc2013t19. lin-
guistic Data Consortium, Filadelfia, Pensilvania.

Sam Wiseman and Alexander M. Rush. 2016.
Sequence-to-sequence learning as beam-search
optimization. En Actas de la 2016 Estafa-
Conferencia sobre métodos empíricos en Lan Natural.-
Procesamiento de calibre, pages 1296–1306, austin,
Texas. Asociación de Lin Computacional-
guísticos. https://doi.org/10.18653/v1
/D16-1137

Lu Xu, Zhanming Jie, Wei Lu, and Lidong Bing.
2021. Better feature integration for named entity
recognition. En Actas de la 2021 Estafa-
ference of the North American Chapter of the

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
5
5
7
2
1
3
4
4
9
5

/
t

a
C
_
a
_
0
0
5
5
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Asociación de Lingüística Computacional: Hu-
man Language Technologies, pages 3457–3469,
En línea. Asociación de Lin Computacional-
guísticos. https://doi.org/10.18653/v1
/2021.naacl-main.271

Ikuya Yamada, Akari Asai, Hiroyuki Shindo,
Hideaki Takeda, and Yuji Matsumoto. 2020.
LUKE: Deep contextualized entity repre-
sentations with entity-aware self-attention.
En procedimientos de
el 2020 Conferencia sobre
Métodos empíricos en Natural Language Pro-
cesando (EMNLP), pages 6442–6454, En línea.
Asociación de Lingüística Computacional.
https://doi.org/10.18653/v1/2020
.emnlp-main.523

Hang Yan, Tao Gui, Junqi Dai, Qipeng Guo,
Zheng Zhang, and Xipeng Qiu. 2021. A uni-
fied generative framework for various NER
subtasks. In Proceedings of the 59th Annual
Meeting of the Association for Computational
Linguistics and the 11th International Joint
Conferencia sobre procesamiento del lenguaje natural
(Volumen 1: Artículos largos), pages 5808–5822,
En línea. Asociación de Lin Computacional-
guísticos. https://doi.org/10.18653/v1
/2021.acl-long.451

Songlin Yang and Kewei Tu. 2022. Bottom-up
constituency parsing and nested named en-
En
tity recognition with pointer networks.
Actas de la 60ª Reunión Anual de
la Asociación de Lingüística Computacional
(Volumen 1: Artículos largos), pages 2403–2416,
Irlanda. Asociación de Computación-
Dublín,
lingüística nacional. https://doi.org/10
.18653/v1/2022.acl-long.171

Deming Ye, Yankai Lin, Peng Li, and Maosong
Sol. 2022. Packed levitated marker for entity
and relation extraction. En Actas de la
60ª Reunión Anual de la Asociación de

Ligüística computacional (Volumen 1: Largo
Documentos), pages 4904–4917, Dublín, Irlanda.
Asociación de Lingüística Computacional.

Juntao Yu, Bernd Bohnet, and Massimo Poesio.
2020. Named entity recognition as dependency
analizando. In Proceedings of the 58th Annual
Meeting of the Association for Computational
Lingüística, pages 6470–6476, En línea. Associ-
ation for Computational Linguistics. https://
doi.org/10.18653/v1/2020.acl-main
.577

Zhirui Zhang, Shujie Liu, Mu Li, Ming
zhou, and Enhong Chen. 2017. Stack-based
multi-layer attention for transition-based de-
pendency parsing. En Actas de la 2017
Jornada sobre Métodos Empíricos en Natu-
Procesamiento del lenguaje oral, pages 1677–1682,
Copenhague, Dinamarca. Asociación para Com-
Lingüística putacional. https://doi.org
/10.18653/v1/D17-1175

Enwei Zhu and Jinpeng Li. 2022. Boundary
smoothing for named entity recognition. En
Actas de la 60ª Reunión Anual de
la Asociación de Lingüística Computacional
(Volumen 1: Artículos largos), pages 7096–7108,
Dublín, Irlanda. Asociación de Computación
Lingüística. https://doi.org/10.18653/v1
/2022.acl-long.490

Huiming Zhu, Chunhui He, Yang Fang, y
Weidong Xiao. 2020. Fine grained named en-
tity recognition via seq2seq framework. IEEE
Access, 8:53953–53961. https://doi.org
/10.1109/ACCESS.2020.2980431

Muhua Zhu, Yue Zhang, Wenliang Chen, mín.
zhang, and Jingbo Zhu. 2013. Fast and accurate
shift-reduce constituent parsing. En procedimientos
of the 51st Annual Meeting of the Associa-
ción para la Lingüística Computacional (Volumen 1:
Artículos largos), pages 434–443, Sofia, Bulgaria.
Asociación de Lingüística Computacional.

599

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
5
5
7
2
1
3
4
4
9
5

/
t

a
C
_
a
_
0
0
5
5
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Descargar PDF