Unleashing the True Potential of Sequence-to-Sequence Models

Unleashing the True Potential of Sequence-to-Sequence Models
for Sequence Tagging and Structure Parsing

Han He
Department of Computer Science
Emory University
Atlanta, GA 30322 USA
han.he@emory.edu

Jinho D. Choi
Department of Computer Science
Emory University
Atlanta, GA 30322 USA
jinho.choi@emory.edu

Astratto

Sequence-to-Sequence (S2S) models have
achieved remarkable success on various text
generation tasks. Tuttavia, learning complex
structures with S2S models remains challeng-
ing as external neural modules and additional
lexicons are often supplemented to predict
non-textual outputs. We present a systematic
study of S2S modeling using contained de-
coding on four core tasks: part-of-speech tag-
ging, named entity recognition, constituency,
and dependency parsing, to develop efficient
exploitation methods costing zero extra param-
eters. In particular, 3 lexically diverse lineari-
zation schemas and corresponding constrained
decoding methods are designed and evaluated.
Experiments show that although more lexical-
ized schemas yield longer output sequences
that require heavier training, their sequences
being closer to natural language makes them
easier to learn. Inoltre, S2S models using
our constrained decoding outperform other
S2S approaches using external resources. Nostro
best models perform better than or comparably
to the state-of-the-art for all 4 compiti, lighting
a promise for S2S models to generate non-
sequential structures.

1

introduzione

Sequence-to-Sequence (S2S) models pretrained
for language modeling (PLM) and denoising ob-
jectives have been successful on a wide range of
NLP tasks where both inputs and outputs are se-
quences (Radford et al., 2019; Raffel et al., 2020;
Lewis et al., 2020; Brown et al., 2020). Tuttavia,
for non-sequential outputs like trees and graphs, UN
procedure called linearization is often required to
flatten them into ordinary sequences (Li et al.,
2018; Fern´andez-Gonz´alez and G´omez-Rodr´ıguez,
2020; Yan et al., 2021; Bevilacqua et al., 2021; Lui
and Choi, 2021UN), where labels in non-sequential
structures are mapped heuristically as individual

582

tokens in sequences, and numerical properties like
indices are either predicted using an external de-
coder such as Pointer Networks (Vinyals et al.,
2015UN) or cast to additional tokens in the vocab-
ulary. While these methods are found to be effec-
tive, we hypothesize that S2S models can learn
complex structures without adapting such patches.
To challenge the limit of S2S modeling, BART
(Lewis et al., 2020) is finetuned on four tasks with-
out extra decoders: part-of-speech tagging (POS),
named entity recognition (NER), constituency pars-
ing (CON), and dependency parsing (DEP). Three
novel linearization schemas are introduced for
each task: label sequence (LS), label with text
(LT), and prompt (PT). LS to PT feature an
increasing number of lexicons and a decreasing
number of labels, which are not in the vocabulary
(Sezione 3). Every schema is equipped with a
constrained decoding algorithm searching over
valid sequences (Sezione 4).

Our experiments on three popular datasets de-
pict that S2S models can learn these linguistic
structures without external resources such as in-
dex tokens or Pointer Networks. Our best models
perform on par with or better than the other state-
of-the-art models for all four tasks (Sezione 5).
Finalmente, a detailed analysis is provided to compare
the distinctive natures of our proposed schemas
(Sezione 6).1

2 Related Work

S2S (Sutskever et al., 2014) architectures have
been effective on many sequential modeling tasks.
Conventionally, S2S is implemented as an en-
coder and decoder pair, where the encoder learns
input representations used to generate the output

1All our resources including source codes are publicly
available: https://github.com/emorynlp/seq2seq
-corenlp.

Operazioni dell'Associazione per la Linguistica Computazionale, vol. 11, pag. 582–599, 2023. https://doi.org/10.1162/tacl a 00557
Redattore di azioni: Emily Pitler. Lotto di invio: 11/2022; Lotto di revisione: 1/2023; Pubblicato 6/2023.
C(cid:2) 2023 Associazione per la Linguistica Computazionale. Distribuito sotto CC-BY 4.0 licenza.

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
5
7
2
1
3
4
4
9
5

/

/
T

l

UN
C
_
UN
_
0
0
5
5
7
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

sequence via the decoder. Since the input sequence
can be very long, attention mechanisms (Bahdanau
et al., 2015; Vaswani et al., 2017) focusing on par-
ticular positions are often augmented to the basic
architecture. With transfer-learning, S2S models
pretrained on large unlabeled corpora have risen
to a diversity of new approaches that convert lan-
guage problems into a text-to-text format (Akbik
et al., 2018; Lewis et al., 2020; Radford et al.,
2019; Raffel et al., 2020; Brown et al., 2020).
Among them, tasks most related to our work are
linguistic structure predictions using S2S, POS,
NER, DEP, and CON.

POS has been commonly tackled as a sequence
tagging task, where the input and output sequences
have equal lengths. S2S, on the other hand, fa
not enjoy such constraints as the output sequence
can be arbitrarily long. Therefore, S2S is not as
popular as sequence tagging for POS. Prevailing
neural architectures for POS are often built on top
of a neural sequence tagger with rich embeddings
(Bohnet et al., 2018; Akbik et al., 2018) and Con-
ditional Random Fields (Lafferty et al., 2001).

NER has been cast to a neural sequence tagging
task using the IOB notation (Lample et al., 2016)
over the years, which benefits most from contex-
tual word embeddings (Devlin et al., 2019; Wang
et al., 2021). Early S2S-based works cast NER
to a text-to-IOB transduction problem (Chen and
Moschitti, 2018; Strakov´a et al., 2019; Zhu et al.,
2020), which is included as a baseline schema in
Sezione 3.2. Yan et al. (2021) augment Pointer Net-
works to generate numerical entity spans, Quale
we refrain to use because the focus of this work
is purely on the S2S itself. Most recently, Cui
et al. (2021) propose the first template prompting
to query all possible spans against a S2S lan-
guage model, which is highly simplified into a
one-pass generation in our PT schema. Invece di
directly prompting for the entity type, Chen et al.
(2022) propose to generate its concepts first then
its type later. Their two-step generation is tai-
lored for few-shot learning, orthogonal to our ap-
proach. Inoltre, our prompt approach does not
rely on non-textual tokens as they do.

CON is a more established task for S2S models
since the bracketed constituency tree is naturally
a linearized sequence. Top-down tree lineariza-
tions based on brackets (Vinyals et al., 2015B)
or shift-reduce actions (Sagae and Lavie, 2005)
rely on a strong encoder over the sentence while
bottom-up ones (Zhu et al., 2013; Ma et al.,

2017) can utilize rich features from readily built
the in-order traversal
partial parses. Recentemente,
has proved superior to bottom-up and top-down
in both transition (Liu and Zhang, 2017) E
S2S (Fern´andez-Gonz´alez and G´omez-Rodr´ıguez,
2020) constituency parsing. Most recently, UN
Pointer Networks augmented approach (Yang and
Tu, 2022) is ranked top among S2S approaches.
Since we are interested in the potential of S2S
models without patches, a naive bottom-up baseline
and its novel upgrades are studied in Section 3.3.
DEP has been underexplored as S2S due to the
linearization complexity. The first S2S work maps
a sentence to a sequence of source sentence words
interleaved with the arc-standard, reduce-actions
in its parse (Wiseman and Rush, 2016), che è
adopted as our LT baseline in Section 3.4. Zhang
et al. (2017) introduce a stack-based multi-layer
attention mechanism to leverage structural lin-
guistics information from the decoding stack in
arc-standard parsing. Arc-standard is also used
in our LS baseline, Tuttavia, we use no such ex-
tra layers. Apart from transition parsing, Li et al.
(2018) directly predict the relative head posi-
tion instead of the transition. This schema is
later extended to multilingual and multitasking by
Choudhary and O’riordan (2021). Their encoder
and decoder use different vocabularies, while in
our PT setting, we re-use the vocabulary in the
S2S language model.

S2S appears to be more prevailing for semantic
parsing due to two reasons. Primo, synchronous
context-free grammar bridges the gap between
natural text and meaning representation for S2S.
It has been employed to obtain silver annotations
(Jia and Liang, 2016), and to generate canonical
natural language paraphrases that are easier to
learn for S2S (Shin et al., 2021). This trend of in-
sights viewing semantic parsing as prompt-guided
generation (Hashimoto et al., 2018) and paraphras-
ing (Berant and Liang, 2014) has also inspired our
design of PT. Secondo, the flexible input/output
format of S2S facilitates joint learning of seman-
tic parsing and generation. Latent variable sharing
(Tseng et al., 2020) and unified pretraining (Bai
et al., 2022) are two representative joint model-
ing approaches, which could be augmented with
our idea of PT schema as a potentially more ef-
fective linearization.

Our finding that core NLP tasks can be solved
using LT overlaps with the Translation between
Augmented Natural Languages (Paolini et al.,

583

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
5
7
2
1
3
4
4
9
5

/

/
T

l

UN
C
_
UN
_
0
0
5
5
7
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

2021). Tuttavia, we take one step further to
study the impacts of textual tokens in schema
design choices. Our constrained decoding is sim-
ilar to existing work (Hokamp and Liu, 2017;
Deutsch et al., 2019; Shin et al., 2021). Noi
craft constrained decoding algorithms for our pro-
posed schemas and provide a systematic ablation
study in Section 6.1.

3 Schemas

This section presents our output schemas for
POS, NER, CON, and DEP in Table 1. For each
task, 3 lexically diverse schemas are designed as
follows to explore the best practice for structure
apprendimento. Primo, Label Sequence (LS) is defined as
a sequence of labels consisting of a finite set of
task-related labels, that are merged into the S2S
vocabulary, with zero text. Secondo, Label with
Testo (LT) includes tokens from the input text on
top of the labels such that it has a medium num-
ber of labels and text. Third, PrompT (PT) gives
a list of sentences describing the linguistic struc-
ture in natural language with no label. We hy-
pothesize that the closer the output is to natural
lingua, the more advantage the S2S takes from
the PLM.

3.1 Part-of-Speech Tagging (POS)

LS LS defines the output as a sequence of POS
tags. Formalmente, given an input sentence of n to-
kens x = {x1, x2, · · · , xn}, its output is a tag se-
quence of the same length yLS = {y1, y2, · · · ,
yn}. Distinguished from sequence tagging, any
LS output sequence is terminated by the ‘‘end-
of-sequence’’ (EOS) token, which is omitted
from yLS for simplicity. Predicting POS tags
often depends on their neighbor contexts. Noi
challenge that the autoregressive decoder of a
S2S model can capture this dependency through
self-attention.

LT For LT, the token from the input is inserted
before its corresponding tag. Formalmente, the output
is defined yLT = {(x1, y1), (x2, y2), .., (xn, yn)}.
Both x and y are part of the output and the S2S
model is trained to generate each pair sequentially.

PT PT is a human-readable text describing the
POS sequence. Specifically, we use a phrase
i =‘‘xi is y(cid:3)
yPT
i is
the definition of a POS tag yi, per esempio., a noun. IL

i’’ for the i-th token, where y(cid:3)

final prompt is then the semicolon concatenation
of all phrases: yPT = yPT

2 ; · · · ; yPT
N .

1 ; yPT

3.2 Named Entity Recognition (NER)

LS LT of an input sentence comprising n tokens
x = {x1, x2, · · · , xn} is defined as the BIEOS tag
sequence yLS = {y1, y2, · · · , yn}, which labels
each token as the Beginning, Dentro, End, Outside,
or Single-token entity.

LT LT uses a pair of entity type labels to wrap
each entity: yLT = ..B-yj, xi, .., xi+k, E-yj, ..,
is the type label of the j-th entity
where yj
consisting of k tokens.

i’’, where y(cid:3)

PT PT is defined as a list of sentences describ-
i =‘‘xi is y(cid:3)
ing each entity: yPT
i is
the definition of a NER tag yi, per esempio., a person.
Different from the prior prompt work (Cui et al.,
2021), our model generates all entities in one
pass which is more efficient than their brute-force
approach.

3.3 Constituency Parsing (CON)

Schemas for CON are developed on constituency
trees pre-processed by removing the first level of
non-terminals (POS tags) and rewiring their chil-
dren (gettoni) to parents, per esempio., (NP (PRON My)
(NOUN friend)) (NP My friend).

LS LS is based on a top-down shift-reduce sys-
tem consisting of a stack, a buffer, and a depth
record d. Initially, the stack contains only the root
constituent with label TOP and depth 0; the buffer
contains all tokens from the input sentence; d is
set to 1. A Node-X (N-X) transition creates a
new depth-d non-terminal labeled with X, pushes
it to the stack, and sets d ← d + 1. A Shift
(SH) transition removes the first token from the
buffer and pushes it to the stack as a new terminal
with depth d. A Reduce (RE) pops all elements
with the same depth d from the stack then make
them the children of the top constituent of the
stack, and it sets d ← d − 1. The linearization of
a constituency tree using our LS schema can be
obtained by applying 3 string substitutions: Rif-
place each left bracket and the label X following
it with a Node-X, replace terminals with SH,
replace right brackets with RE.

LT LT is derived by reverting all SH in LS back
to the corresponding tokens so that tokens in LT
effectively serves as SH in our transition system.

584

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
5
7
2
1
3
4
4
9
5

/

/
T

l

UN
C
_
UN
_
0
0
5
5
7
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
5
7
2
1
3
4
4
9
5

/

/
T

l

UN
C
_
UN
_
0
0
5
5
7
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Tavolo 1: Schemas for the sentence ‘‘My friend who lives in Orlando bought me a gift from Disney
World’’.

PT PT is also based on a top-down lineariza-
zione, although it describes a constituent using tem-
plates: ‘‘pi has {cj}’’, where pi is a constituent
and cj-s are its children. To describe a constitu-

ent, the indefinite article ‘‘a’’ is used to denote a
new constituent (per esempio., ‘‘. . . has a noun phrase’’).
The definite article ‘‘the’’ is used for referring
to an existing constituent mentioned before (per esempio.,

585

‘‘the noun phrase has’’), or describing a con-
stituent whose children are all terminals (per esempio.,
‘‘. . . has the noun phrase ‘My friend’’’). When
describing a constituent that directly follows its
mention, the determiner ‘‘which’’ is used instead
of repeating itself multiple times e.g., ‘‘(. . . E
the subordinating clause, which has’’). Sen-
tences are joined with a semicolon ‘‘;’’ as the
final prompt.

3.4 Dependency Parsing (DEP)

LS LS uses three transitions from the arc-
standard system (Nivre, 2004): shift (SH), left
arc (<), and right arc (>).

LT LT for DEP is obtained by replacing each
SH in a LS with its corresponding token.

PT PT is derived from its LS sequence by re-
moving all SH. Then, for each left arc creating
an arc from xj to xi with dependency relation r
(per esempio., a possessive modifier), a sentence is created
by applying the template ‘‘xi is r of xj’’. For
each right arc creating an arc from xi to xj with
the dependency relation r, a sentence is created
with another template ‘‘xi has r xj’’. The prompt
is finalized by joining all such sentences with a
semicolon.

4 Decoding Strategies

To ensure well-formed output sequences that
match the schemas (Sezione 3), a set of constrained
decoding strategies is designed per task except for
CON, which is already tackled as S2S model-
ing without constrained decoding (Vinyals et al.,
2015B; Fern´andez-Gonz´alez and G´omez-Rodr´ıguez,
2020). Formalmente, given an input x and any par-
tial y 2n then

return {EOS}

else

if i is even then
}
return {x i

2

else

return D

Algorithm 2: Prefix Matching
Function PrefixMatch(T , P):

node ← T
while node and p do

node ← node.children[p1]
p ← p>1
return node

depends on the parity of
Algorithm 1.

io, as defined in

PT The PT generation can be divided into two
phases: token and ‘‘is-a-tag’’ statement genera-
zioni. A binary status u is used to indicate whether
yi is expected to be a token. To generate a token,
an integer k ≤ n is used to track the index of
the next token. To generate an ‘‘is-a-tag’’ state-
ment, an offset o is used to track the beginning
of an ‘‘is-a-tag’’ statement. Each description dj
of a POS tag tj is extended to a suffix sj =
‘‘is dj ; ’’.

Suffixes are stored in a trie tree T to facilitate
prefix matching between a partially generated
statement and all candidate suffixes, come mostrato
in Algorithm 2. The full decoding is depicted in
Algorithm 3.

4.2 Named Entity Recognition

LS Similar to POS-LS, the NextY for NER
returns BIEOS tags if i ≤ n else EOS.

LT Opening tags (<>) in NER-LT are grouped
into a vocabulary O. The last generated output
token yi−1 (assuming y0 = BOS, a.k.a. beginning
of a sentence) is looked up in O to decide what
type of token will be generated next. To enforce
label consistency between a pair of tags, a variable
e is introduced to record the expected closing
tag. Reusing the definition of k in Algorithm 3,
decoding of NER-LT is described in Algorithm 4.

586

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
5
7
2
1
3
4
4
9
5

/

/
T

l

UN
C
_
UN
_
0
0
5
5
7
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Algorithm 3: Constrained POS-PT
u ← true, k ← 0, o ← 0
Function NextY(X, sìo)
if node.children is empty then

u ← true
return NextY(X, sì n then

Y ← Y ∪ {EOS}

else

Y ← Y ∪ {xk}

return Y

PT For each entity type ei, its description di is
filled into the template ‘‘is di;’’ to create an ‘‘is-a’’
suffix si. Since the prompt is constructed using
text while the number of entities is variable, it is
not straightforward to tell whether a token belongs
to an entity or an ‘‘is-a’’ suffix. Therefore, a noisy
segmentation procedure is utilized to split a phrase
into two parts: entity and ‘‘is-a’’ suffix. Each si is
collected into a trie S to perform segmentation of
a partially generated phrase p (Algorithm 5).

Once a segment is obtained, the decoder is
constrained to generate the entity or the suffix.
For the generation of an entity, string matching
is used to find every occurrence o of its partial
generation in x and add the following token xo+1

587

Algorithm 5: Segmentation
Function Segment(S, P):
for i ← 1 A |P| do

entity, suffix = p≤i, p>i
node ← PrefixMatch(S, p>i)
if node then

return entity, suffix, node

return null

Algorithm 6: Constrained NER-PT
Function NextY(X, sì o then

spans ← spans ∪ {(o, io, null)}

spans ← spans ∪ {(io, j, v)}

if o < |x| + 1 then spans ← spans ∪ {(o, |x| + 1, null)} return spans while parent do foreach sibling of parent do if sibling.label is label and sibling has no children then return sibling parent ← parent.parent return null Algorithm 10: Reverse CON-PT Function Reverse(T , x): root ← parent ← new TOP-tree latest ← null foreach (i, j, v) ∈ Split(T , x) do if v then if xi:j starts with ‘‘the’’ then target ← FindTarget(parent, v) else latest ← new v-tree add latest to parent.children latest.parent ← parent else if xi:j starts with ‘‘has’’ or ‘‘which has’’ then parent ← latest add tokens in ‘‘’’ into latest return root parent. The search of the target constituent is described in Algorithm 9. shows Algorithm 10 reverse final the creating new constituents, and sentences attach- ing new constituents to existing ones. Splitting is done by longest-prefix-matching (Algorithm 7) using a trie T built with the definite and indef- inite article versions of the description of each constituent label e.g., ‘‘the noun phrase’’ and ‘‘a noun phrase’’ of NP. Algorithm 8 describes the splitting procedure. Once a prompt is split into two types of sen- tences, a constituency tree is then built accord- ingly. We use a variable parent to track the last constituent that gets attachments, and another var- iable latest to track the current new consis- tent that gets created. Due to the top-down nature of linearization, the target constituent that new constituents are attached to is always among the siblings of either parent or the ancestors of linearization. 4.4 Dependency Parsing LS Arc-standard (Nivre, 2004) transitions are added to a candidate set and only transitions per- mitted by the current parsing state are allowed. LT DEP-LS replaces all SH transitions with input tokens in left-to-right order. Therefore, an incremental offset is kept to generate the next token in place of each SH in DEP-LT. PT DEP-PT is more complicated than CON-PT because each sentence contains one more token. Its generation is therefore divided into 4 possible token (1st), relation (rel), sec- states: first ond token (2ed), and semicolon. An arc-standard 588 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 5 5 7 2 1 3 4 4 9 5 / / t l a c _ a _ 0 0 5 5 7 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Algorithm 11: Recall Shift Function RecallShift(system, i, xj): while system.si is not xj do system.apply(SH) transition system is executed synchronously with constrained decoding since PT is essentially a sim- plified transition sequence with all SH removed. Let b and s be the system buffer and stack, respec- tively. Let c be a set of candidate tokens that will be generated in y, which initially contains all input tokens and an inserted token ‘‘sentence’’ that is only used to represent the root in ‘‘the sentence has a root . . .’’ A token is removed from c once it gets popped out of s. Since DEP-PT generates no SH, each input token xj in y effectively introduces SH(s) till it is pushed onto s at index i (i ∈ {1, 2}), as formally described in Algorithm 11. After the first token is generated, its offset o in y is recorded such that the following relation sequence yi>o can be located. To decide the next
token of yi>o, it is then prefix-matched with a
trie T built with the set of ‘‘has-’’ and ‘‘is-’’
dependency relations. The children of the prefix-
matched node are considered candidates if it
has any. Otherwise, the dependency relation is
marked as completed. Once a relation is gen-
erated, the second token will be generated in a
similar way. Finalmente, upon the completion of a
sentence, the transition it describes is applied to
the system and c is updated accordingly. The full
procedure is described in Algorithm 12. Since a
transition system has been synchronously main-
tained with constrained decoding, no extra re-
verse linearization is needed.

5 Experiments

For all tasks, BART-Large (Lewis et al., 2020)
is finetuned as our underlying S2S model. Noi
also tried T5 (Raffel et al., 2020), although its
performance was less satisfactory. Every model
is trained three times using different random seeds
and their average scores and standard deviations
on the test sets are reported. Our models are ex-
perimented on the OntoNotes 5 (Weischedel et al.,
2013) using the data split suggested by Pradhan
et al. (2013). Inoltre, two other popular data-
sets are used for fair comparisons to previous
works: the Wall Street Journal corpus from the

Algorithm 12: Constrained DEP-PT
(status, transition, t1, t2, o) (1st, null, null, null, 0)
c ← {sentence} ∪ y
Function NextY(X, sìo)
if node.children then

Y ← Y ∪ {node.children}

else

relation ← the relation in y>o
if y>o starts with ‘‘is’’ then

transition ← LA-relation

else

transition ← RA-relation

status ← 2ed

else if status is 2ed then

Y ← Y ∪ c
status ← semicolon

else if status is semicolon then

t2 ← yi−1
Y ← Y ∪ {; }
RecallShift(system, 1, t2)
RecallShift(system, 2, t1)
if transition starts with LA then

remove s1 from c

else

remove s2 from c
system.apply(transition)
if system is terminal then
Y ← Y ∪ {EOS}

status ← 1st

return Y

Penn Treebank 3 (Marcus et al., 1993) for POS,
DEP, and CON, as well as the English portion
of the CoNLL’03 dataset (Tjong Kim Sang and
De Meulder, 2003) for NER.

589

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
5
7
2
1
3
4
4
9
5

/

/
T

l

UN
C
_
UN
_
0
0
5
5
7
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

OntoNotes

Model

CoNLL’03

OntoNotes 5

Model

Bohnet et al. (2018)

He and Choi (2021B)
LS

LT

PT

PTB

97.96


97.51 ± 0.11
97.70 ± 0.02
97.64 ± 0.01


98.32 ± 0.02
98.21 ± 0.02
98.40 ± 0.01
98.37 ± 0.02

Tavolo 2: Results for POS.

Each token is independently tokenized using
the subword tokenizer of BART and merged into
an input sequence. The boundary information for
each token is recorded to ensure full tokens are
generated in LT and PT without broken pieces.
To fit in the positional embeddings of BART, sen-
tences longer than 1,024 subwords are discarded,
which include 1 sentence from the Penn Tree-
bank 3 training set, E 24 sentences from the
OntoNotes 5 training set. Development sets and
test sets are not affected.

5.1 Part-of-Speech Tagging

Token level accuracy is used as the metric for
POS. LT outperforms LS although LT is twice as
long as LS, suggesting that textual tokens posi-
tively impact the learning of the decoder (Tavolo 2).
PT performs almost the same with LT, perhaps
due to the fact that POS is not a task requiring a
powerful decoder.

5.2 Named Entity Recognition

For CoNLL’03, the provided splits without merg-
ing the development and training sets are used.
For OntoNotes 5, the same splits as Chiu and
Nichols (2016), Li et al. (2017); Ghaddar and
Langlais (2018); He and Choi (2020, 2021B) are
used. Labeled span-level F1 score is used for
evaluation.

We acknowledge that the performance of NER
systems can be largely improved by rich embed-
dings (Wang et al., 2021), document context fea-
tures (Yu et al., 2020), dependency tree features
(Xu et al., 2021), and other external resources.
While our focus is the potential of S2S, we mainly
consider two strong baselines that also use BART
the generative
as the only external resource:
BART-Pointer framework (Yan et al., 2021) E
the recent template-based BART NER (Cui et al.,
2021).

As shown in Table 3, LS performs the worst
on both datasets, possibly attributed to the fact

590

Clark et al. (2018)

Peters et al. (2018)

Akbik et al. (2019)

Strakov´a et al. (2019)

Yamada et al. (2020)
Yu et al. (2020)
Yan et al. (2021)‡S
Cui et al. (2021)S
He and Choi (2021B)

Wang et al. (2021)

Zhu and Li (2022)

Ye et al. (2022)
LS

LT

PT

92.60

92.22

93.18

93.07

92.40

92.50

93.24

92.55

94.6


70.29 ± 0.70
92.75 ± 0.03
93.18 ± 0.04

89.83

90.38


89.04 ± 0.14

91.74

91.9
84.61 ± 1.18
89.60 ± 0.06
90.33 ± 0.04

Tavolo 3: Results for NER. S denotes S2S.

that the autoregressive decoder overfits the high-
order left-to-right dependencies of BIEOS tags.
LT performs close to the BERT-Large biaffine
modello (Yu et al., 2020). PT performs compa-
rably well with the Pointer Networks approach
(Yu et al., 2020) and it outperforms the template
prompting (Cui et al., 2021) by a large margin,
suggesting S2S has the potential to learn struc-
tures without using external modules.

5.3 Constituency Parsing

All POS tags are removed and not used in train-
ing or evaluation. Terminals belonging to the
same non-terminal are flattened into one con-
stituent before training and unflattened in post-
processing. The standard constituent-level F-score
produced by the EVALB3 is used as the evalua-
tion metric.

Tavolo 4 shows the results on OntoNotes 5 E
PTB 3. Incorporating textual tokens into the out-
put sequence is important on OntoNotes 5, Guida-
ing to a +0.9 F-score, while it is not the case on
PTB 3. It is possibly due to the fact that Onto-
Notes is more diverse in domains, requiring a
higher utilization of pre-trained S2S for domain
transfer. PT performs the best, and it has a com-
petitive performance to recent works, despite the
fact that it uses no extra decoders.

3https://nlp.cs.nyu.edu/evalb/.

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
5
7
2
1
3
4
4
9
5

/

/
T

l

UN
C
_
UN
_
0
0
5
5
7
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Model

PTB 3

OntoNotes 5

Fern´andez-Gonz´alez and
G´omez-Rodr´ıguez (2020)S
Mrini et al. (2020)

He and Choi (2021B)
Yang and Tu (2022)S
LS

LT

PT

91.6

96.38

96.01
95.23 ± 0.08
95.24 ± 0.04
95.34 ± 0.06


94.43 ± 0.03

93.40 ± 0.31
94.32 ± 0.11
94.55 ± 0.03

Tavolo 4: Results for CON. S denotes S2S.

UAS
91.17

93.71

94.11

Model
Wiseman and Rush (2016)S
Zhang et al. (2017)S
Li et al. (2018)S
Mrini et al. (2020)
LS

LT

PT

97.42
92.83 ± 0.43
95.79 ± 0.07
95.91 ± 0.06
(UN) PTB results for DEP.

Model
He and Choi (2021B)
LS

LT

PT

UAS
95.92 ± 0.02
86.54 ± 0.12
94.15 ± 0.14
94.51 ± 0.22

(B) OntoNotes results for DEP.

LAS
87.41

91.60

92.08

96.26
90.50 ± 0.53
93.17 ± 0.16
94.31 ± 0.09

LAS
94.24 ± 0.03
83.84 ± 0.13
91.27 ± 0.19
92.81 ± 0.21

Tavolo 5: Results for DEP. S denotes S2S.

Model

LS

w/o CD

LT

w/o CD

PT

w/o CD

PTB
97.51 ± 0.11
97.51 ± 0.11
97.70 ± 0.02
97.67 ± 0.02
97.64 ± 0.01
97.55 ± 0.02

OntoNotes
98.21 ± 0.02
98.21 ± 0.02
98.40 ± 0.01
98.39 ± 0.01
98.37 ± 0.02
98.29 ± 0.05

(UN) Accuracy of ablation tests for POS.

Model

LS

w/o CD

LT

w/o CD

PT

w/o CD

CoNLL 03
70.29 ± 0.70
66.33 ± 0.73
92.75 ± 0.03
92.72 ± 0.02
93.18 ± 0.04
93.12 ± 0.06

OntoNotes 5
84.61 ± 1.18
84.57 ± 1.16
89.60 ± 0.06
89.50 ± 0.07
90.33 ± 0.04
90.23 ± 0.05

(B) F1 of ablation tests for NER.

Model

LS

w/o CD

LT

w/o CD

PT

w/o CD

PTB
90.50 ± 0.53
90.45 ± 0.47
93.17 ± 0.16
93.12 ± 0.14
94.31 ± 0.09
81.50 ± 0.27

OntoNotes
83.84 ± 0.13
83.78 ± 0.13
91.27 ± 0.19
91.05 ± 0.20
92.81 ± 0.21
81.76 ± 0.36

5.4 Dependency Parsing

(C) LAS of ablation tests for DEP.

The constituency trees from PTB and OntoNotes
are converted into the Stanford dependencies
v3.3.0 (De Marneffe and Manning, 2008) for DEP
esperimenti. Forty and 1 non-projective trees are
removed from the training and development sets
of PTB 3, rispettivamente. For OntoNotes 5, these
numbers are 262 E 28. Test sets are not affected.
As shown in Table 5, textual tokens are cru-
cial
in learning arc-standard transitions using
S2S, leading to +2.6 E +7.4 LAS improve-
menti, rispettivamente. Although our PT method
underperforms recent state-of-the-art methods, Esso
has the strongest performance among all S2S
approcci. È interessante notare, our S2S model man-
ages to learn a transition system without explic-
itly modeling the stack, the buffer, the partial
parse, or pointers.

We believe that the performance of DEP with
S2S can be further improved with a larger and
more recent pretrained S2S model and dynamic
oracle (Goldberg and Nivre, 2012).

Tavolo 6: Ablation test results.

6 Analysis

6.1 Ablation Study

We perform an ablation study to show the perfor-
mance gain of our proposed constrained decoding
algorithms on different tasks. Constrained decod-
ing algorithms (CD) are compared against free
generation (w/o CD) where a model freely gener-
ates an output sequence that is later post-processed
into task-specific structures using string-matching
rules. Invalid outputs are patched to the greatest
extent, per esempio., POS label sequences are padded or
truncated. As shown in Table 6, ablation of con-
strained decoding seldom impacts the performance
of LS on all tasks, suggesting that the decoder of
seq2seq can acclimatize to the newly added label

591

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
5
7
2
1
3
4
4
9
5

/

/
T

l

UN
C
_
UN
_
0
0
5
5
7
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

gettoni. È interessante notare, the less performant NER-LS
model degrades the most, promoting the neces-
sity of constrained decoding for weaker seq2seq
models. The performance of LT on all tasks is
marginally degraded when constrained decoding
is ablated, indicating the decoder begins to gen-
erate structurally invalid outputs when textual
tokens are freely generated. This type of problem
seems to be exacerbated when more tokens are
freely generated in the PT schemas, especially for
the DEP-PT.

Unlike POS and NER, DEP is more prone
to hallucinated textual tokens as early errors in
the transition sequence get accumulated in the
arc-standard system which shifts all later predic-
tions off the track. It is not yet a critical problem
as LS generates no textual tokens while a textual
token in LT still serves as a valid shift action
even if it is hallucinated. Tuttavia, a hallucinated
textual token in PT is catastrophic as it could be
part of any arc-standard transitions. As no explicit
shift transition is designed, a hallucinated token
could lead to multiple instances of missing shifts
in Algorithm 12.

6.2 Case Study

To facilitate understanding and comparison of
different models, a concrete example of input
(IO), gold annotation (G), and actual model pre-
diction per each schema is provided below for
each task. Wrong predictions and corresponding
ground truth are highlighted in red and teal,
rispettivamente.

POS In the following example, only PT correctly
detects the past tense (VBD) of ‘‘put’’.

IO: The word I put in boldface is extremely interesting.
G: DT NN PRP VBD IN NN VBZ RB JJ .
LS: DT NN PRP VBP IN NN VBZ RB RB JJ
LT: The/DT word/NN I/PRP put/VBP in/IN

boldface/NN is/VBZ extr./RB interesting/JJ./.
PT: ‘‘The’’ is a determiner; ‘‘word’’ is a singular noun;

‘‘I’’ is a personal pronoun; ‘‘put’’ is a past tense verb;
‘‘in’’ is a preposition or subordinating conjunction;
‘‘boldface’’ is a singular noun; ‘‘is’’ is a 3rd person
singular present verb; ‘‘extremely’’ is an adverb;
‘‘interesting’’ is an adjective; ‘‘.’’ is a period.

IO: Large image of the Michael Jackson HIStory statue.
G: Large image of the Michael Jackson
statue.
(cid:5)

(cid:2)

(cid:3)(cid:4)
PERSON(PER)

HIStory
(cid:2) (cid:3)(cid:4) (cid:5)
WOA

LS: O O O O B-PER E-PER S-ORG O O O
LT: Large image of the Michael Jackson

HIStory statue.
PT: ‘‘Michael Jackson’’ is a person;
‘‘HIStory’’ is an art work.

CON As highlighted with strikeout
text be-
low, LS and LT failed to parse ‘‘how much’’
as a wh-noun phrase and a wh-adverb phrase,
rispettivamente.

IO: It’s crazy how much he eats.
G: (S (NP (NP It)) (VP ’s (ADJP crazy) (SBAR
(WHNP (WHADJP how much)) (S (NP he)
(VP eats)))) .)

LS: N-S N-NP N-NP SH RE RE N-VP SH N-ADJP SH
RE N-SBAR N-WHNP N-WHADVP SH SH RE RE
N-S N-NP SH RE N-VP SH RE RE RE RE SH RE
LT: (S (NP (NP It)) (VP ’s (ADJP crazy) (SBAR
(WHNP (WHADJP how much)) (S (NP he)
(VP eats)))) .)

PT: a sentence has a simple clause, which has a noun

phrase and a verb phrase and ‘‘.’’; the noun phrase
has a noun phrase ‘‘It’’, the verb phrase has ‘‘’s’’
and an adjective phrase ‘‘crazy’’ and a subordinating
clause, which has a wh-noun phrase and a simple
clause; the wh-noun phrase has a wh-adjective
phrase ‘‘how much’’, the simple clause has a noun
phrase ‘‘he’’ and a verb phrase ‘‘eats’’.

DEP In the following example, LS incorrectly
attached ‘‘so out of’’ to ‘‘place’’, and LT wrongly
attached ‘‘so’’ to ‘‘looks’’.

IO: It looks so out of place.

G:
LS: SH SH LA-nsubj SH SH SH SH LA-advmod
LA-advmod LA-advmod RA-acomp SH
RA-punct RA-root

LT: It looks LA-nsubj so out of place RA-pobj

RA-pcomp RA-prep RA-ccomp . RA-punct
RA-root

PT: ‘‘It’’ is a nominal subject of ‘‘looks’’; ‘‘so’’ is an

adverbial modifier of ‘‘out’’; ‘‘of’’ has an object of
a preposition ‘‘place’’; ‘‘out’’ has a prepositional
complement ‘‘of’’; ‘‘looks’’ has a prepositional
modifier ‘‘out’’; ‘‘looks’’ has a punctuation ‘‘.’’;
‘‘sentence’’ has a root ‘‘looks’’.

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
5
7
2
1
3
4
4
9
5

/

/
T

l

UN
C
_
UN
_
0
0
5
5
7
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

NER In the following example, LS and LT
could not correctly recognize ‘‘HIStory’’ as an
art work, possibly due to its leading uppercase
letters.

6.3 Design Choices

In the interest of experimentally comparing the
schema variants, we would like each design we

592

Model
POS-PT
dec.LEX
DEP-PT
dec.LEX

PTB 3
97.64 ± 0.01
97.63 ± 0.02
94.31 ± 0.09
93.89 ± 0.18

OntoNotes 5
98.37 ± 0.02
98.35 ± 0.03
92.81 ± 0.21
91.19 ± 0.86

Tavolo 7: Study of lexicality on POS and DEP.

Model
NER-PT
inc.VRB
Model
CON-PT
inc.VRB

CoNLL 03
93.18 ± 0.04
92.47 ± 0.03
PTB 3
95.34 ± 0.06
95.19 ± 0.06

OntoNotes 5
90.33 ± 0.04
89.63 ± 0.23
OntoNotes 5
94.55 ± 0.03
94.02 ± 0.49

Tavolo 8: Study of verbosity on NER and CON.

consider to be equivalent in some systematic way.
A tal fine, we fix other aspects and variate
two dimensions of the prompt design, lexicality,
and verbosity, to isolate the impact of individual
variables.

Lexicality We call the portion of textual tok-
ens in a sequence its lexicality. Così, LS and PT
have zero and full lexicality, rispettivamente, while
LT falls in the middle. To tease apart the impact
of lexicality, we substitute the lexical phrases
with corresponding tag abbreviations in PT on
POS and DEP, per esempio., ‘‘friend’’ is a noun →
‘‘friend’’ is a NN, ‘‘friend’’ is a nominal sub-
ject of ‘‘bought’’ → ‘‘friend’’ is a nsubj of
‘‘bought’’. Tags are added to the BART vocabu-
lary and learned from scratch as LS and LT. As
shown in Table 7, decreasing the lexicality of PT
marginally degrades the performance of S2S on
POS. On DEP, the performance drop is rather sig-
nificant. Similar trends are observed comparing
LT and LS in Section 5, confirming that lexicons
play an important role in prompt design.

Verbosity Our PT schemas on NER and CON
are designed to be as concise as human narrative,
and as easy for S2S to generate. Another design
choice would be as verbose as some LS and LT
schemas. To explore this dimension, we increase
the verbosity of NER-PT and CON-PT by adding
‘‘isn’t an entity’’ for all non-entity tokens and
substituting each ‘‘which’’ to its actual referred
phrase, rispettivamente. The results are presented in
Tavolo 8. Though increased verbosity would elim-
inate any ambiguity, unfortunately, it hurts per-
formance. Emphasizing a token ‘‘isn’t an entity’’
might encounter the over-confidence issue as the
boundary annotation might be ambiguous in gold
NER data (Zhu and Li, 2022). CON-PT deviates
from human language style when reference is
forbidden, which eventually makes it lengthy and
hard to learn.

6.4 Stratified Analysis

Sezione 5 shows that our S2S approach performs
comparably to most ad-hoc models. To reveal its
pros and cons, we further partition the test data
using task-specific factors and run tests on them.
The stratified performance on OntoNotes 5 is com-
pared to the strong BERT baseline (He and Choi,
2021B), which is representative of non-S2S models
implementing many state-of-the-art decoders.

For POS, we consider the rate of Out-Of-
Vocabulary tokens (OOV, tokens unseen in the
training set) in a sentence as the most significant
factor. As illustrated in Figure 1a, the OOV rate
degrades the baseline performance rapidly, es-
pecially when over half tokens in a sentence are
OOV. Tuttavia, all S2S approaches show strong
resistance to OOV, suggesting that our S2S mod-
els unleash greater potential
through transfer
apprendimento.

For NER, entities unseen during training often
confuse a model. This negative impact can be
observed on the baseline and LT in Figure 1b.
Tuttavia, the other two schemas generating tex-
tual tokens, LT and PT, are less severely impacted
by unseen entities. It further supports the intuition
behind our approach and agrees with the finding
by Shin et al. (2021): With the output sequence
being closer to natural language, the S2S model
has less difficulty generating it even with unseen
entities.

Since the number of binary parses for a sen-
tence of n + 1 tokens is the nth Catalan Number
(Church and Patil, 1982), the length is a crucial
factor for CON. As shown in Figure 1c, all models,
especially LS, perform worse when the sentence
gets longer. È interessante notare, by simply recalling all
the lexicons, LT easily regains the ability to parse
long sentences. Using an even more natural rep-
resentation, PT outperforms them with a perfor-
mance on par with the strong baseline. It again

593

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
5
7
2
1
3
4
4
9
5

/

/
T

l

UN
C
_
UN
_
0
0
5
5
7
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Figura 1: Factors impacting each task: the rate of OOV tokens for POS, the rate of unseen entities for NER, IL
sentence length for CON, and the head-dependent distance for DEP.

supports our intuition that natural language is
beneficial for pretrained S2S.

For DEP, the distance between each depen-
dent and its head is used to factorize the overall
performance. As shown in Figure 1d, the gap
between S2S models and the baseline increases
with head-dependent distance. The degeneration
of relatively longer arc-standard transition se-
quences could be attributed to the static oracle
used in finetuning.

Comparing the three schemas across all sub-
groups, LT uses the most special
tokens but
performs the worst, while PT uses zero special
tokens and outperforms the rest two. It suggests
that special tokens could harm the performance
of the pretrained S2S model as they introduce
a mismatch between pretraining and finetuning.
With zero special tokens, PT is most similar to
natural language, and it also introduces no ex-
tra parameters in finetuning, leading to better
performance.

7 Conclusione

We aim to unleash the true potential of S2S
models for sequence tagging and structure parsing.
A tal fine, we develop S2S methods that rival
state-of-the-art approaches more complicated than
ours, without substantial task-specific architecture
modifications. Our experiments with three novel
prompting schemas on four core NLP tasks dem-
onstrated the effectiveness of natural language
in S2S outputs. Our systematic analysis revealed
the pros and cons of S2S models, appealing for
more exploration of structure prediction with S2S.
Our proposed S2S approach reduces the need
for many heavily engineered task-specific archi-
tectures. It can be readily extended to multi-task
and few-shot learning. We have a vision of S2S
playing an integral role in more language under-
standing and generation systems. The limitation

of our approach is its relatively slow decoding
speed due to serial generation. This issue can be
mitigated with non-autoregressive generation and
model compression techniques in the future.

Ringraziamenti

We would like to thank Emily Pitler, Cindy
Robinson, Ani Nenkova, and the anonymous
TACL reviewers for their insightful and thoughtful
feedback on the early drafts of this paper.

Riferimenti

Alan Akbik, Tanja Bergmann, and Roland
Vollgraf. 2019. Pooled contextualized embed-
dings for named entity recognition. Nel professionista-
ceedings of the 2019 Conference of the North
the Association for
American Chapter of
Linguistica computazionale: Human Language
Technologies, Volume 1 (Long and Short
Carte), pages 724–728, Minneapolis,Minnesota.
Associazione per la Linguistica Computazionale.
https://doi.org/10.18653/v1/N19
-1078

Alan Akbik, Duncan Blythe, and Roland Vollgraf.
2018. Contextual String Embeddings for Se-
quence Labeling. In Proceedings of the 27th
Conferenza internazionale sul calcolo
Linguistica, COLING’18, pages 1638–1649.

Dzmitry Bahdanau, Kyunghyun Cho, e Yoshua
Bengio. 2015. Traduzione automatica neurale di
imparare insieme ad allineare e tradurre. In 3rd
International Conference on Learning Repre-
sentations, ICLR 2015, San Diego, CA, USA,
May 7–9, 2015, Conference Track Proceedings.

Xuefeng Bai, Yulong Chen, and Yue Zhang.
2022. Graph pre-training for AMR parsing

594

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
5
7
2
1
3
4
4
9
5

/

/
T

l

UN
C
_
UN
_
0
0
5
5
7
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

and generation. In Proceedings of the 60th
Annual Meeting of the Association for Compu-
linguistica nazionale (Volume 1: Documenti lunghi),
pages 6001–6015, Dublin, Ireland. Associa-
tion for Computational Linguistics.

Jonathan Berant and Percy Liang. 2014. Seman-
tic parsing via paraphrasing. Negli Atti
of the 52nd Annual Meeting of the Associa-
tion for Computational Linguistics (Volume 1:
Documenti lunghi), pages 1415–1425, Baltimore,
Maryland. Association for Computational Lin-
guistics. https://doi.org/10.3115/v1
/P14-1133

Michele Bevilacqua, Rexhina Blloshmi, E
Roberto Navigli. 2021. One spring to rule them
both: Symmetric AMR semantic parsing and
generation without a complex pipeline. Nel professionista-
ceedings of AAAI. https://doi.org/10
.1609/aaai.v35i14.17489

Bernd Bohnet, Ryan McDonald, Gonc¸alo
Sim˜oes, Daniel Andor, Emily Pitler, E
Joshua Maynez. 2018. Morphosyntactic tag-
ging with a Meta-BiLSTM model over context
sensitive token encodings. Negli Atti di
the 56th Annual Meeting of
the Associa-
tion for Computational Linguistics, ACL’18,
pages 2642–2652. https://doi.org/10
.18653/v1/P18-1246

Tom Brown, Benjamin Mann, Nick Ryder,
Melanie Subbiah, Jared D. Kaplan, Prafulla
Dhariwal, Arvind Neelakantan, Pranav Shyam,
Girish Sastry, Amanda Askell, Sandhini
Agarwal, Ariel Herbert-Voss, Gretchen Krueger,
Tom Henighan, Rewon Child, Aditya Ramesh,
Daniel Ziegler, Jeffrey Wu, Clemens Winter,
Chris Hesse, Mark Chen, Eric Sigler, Mateusz
Litwin, Scott Gray, Benjamin Chess, Jack
Clark, Christopher Berner, Sam McCandlish,
Alec Radford,
Ilya Sutskever, and Dario
Amodei. 2020. Language models are few-
In-
shot
formation Processing Systems, volume 33,
pages 1877–1901. Curran Associates, Inc.

In Advances in Neural

learners.

Jiawei Chen, Qing Liu, Hongyu Lin, Xianpei
Han, and Le Sun. 2022. Few-shot named en-
tity recognition with self-describing networks.
In Proceedings of the 60th Annual Meeting of
the Association for Computational Linguistics
(Volume 1: Documenti lunghi), pages 5711–5722,

595

Dublin,
Ireland. Associazione per il calcolo-
linguistica nazionale. https://doi.org/10
.18653/v1/2022.acl-long.392

Lingzhen Chen and Alessandro Moschitti. 2018.
Learning to progressively recognize new
named entities with sequence to sequence
models. Negli Atti di
the 27th Inter-
national Conference on Computational Lin-
guistics, pages 2181–2191, Santa Fe, Nuovo
Mexico, USA. Associazione per il calcolo
Linguistica.

Jason Chiu and Eric Nichols. 2016. Named entity
recognition with bidirectional LSTM-CNNs.
Transactions of the Association for Computa-
linguistica nazionale, 4:357–370. https://doi
.org/10.1162/tacl_a_00104

Chinmay Choudhary and Colm O’riordan. 2021.
End-to-end mBERT based seq2seq enhanced
dependency parser with linguistic typology
knowledge. In Proceedings of the 17th Inter-
national Conference on Parsing Technologies
and the IWPT 2021 Shared Task on Parsing
into Enhanced Universal Dependencies (IWPT
2021), pages 225–232, Online. Associazione per
Linguistica computazionale. https://doi.org
/10.18653/v1/2021.iwpt-1.24

Kenneth Church and Ramesh Patil. 1982.
Coping with syntactic ambiguity or how to
put the block in the box on the table. Amer-
ican Journal of Computational Linguistics,
8(3–4):139–149.

Kevin Clark, Minh-Thang Luong, Christopher D.
Equipaggio, and Quoc Le. 2018. Semi-supervised
sequence modeling with cross-view training.
Negli Atti di
IL 2018 Conference on
Empirical Methods in Natural Language Pro-
cessazione, pages 1914–1925, Brussels, Belgium.
Associazione per la Linguistica Computazionale.
https://doi.org/10.18653/v1/D18
-1217

Leyang Cui, Yu Wu, Jian Liu, Sen Yang, E
Yue Zhang. 2021. Template-based named en-
tity recognition using BART. In Findings of
the Association for Computational Linguis-
tic: ACL-IJCNLP 2021, pages 1835–1845,
Online. Association for Computational Lin-
guistics. https://doi.org/10.18653/v1
/2021.findings-acl.161

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
5
7
2
1
3
4
4
9
5

/

/
T

l

UN
C
_
UN
_
0
0
5
5
7
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Marie-Catherine De Marneffe and Christopher
D. Equipaggio. 2008. The Stanford typed de-
pendencies representation. In COLING 2008:
Proceedings of
the Workshop on Cross-
framework and Cross-domain Parser Evalu-
ation, pages 1–8. https://doi.org/10
.3115/1608858.1608859

Daniel Deutsch, Shyam Upadhyay, and Dan
Roth. 2019. A general-purpose algorithm for
constrained sequential inference. In Procedi-
ings di
the 23rd Conference on Computa-
tional Natural Language Learning (CoNLL),
pages 482–492, Hong Kong, China. Associa-
tion for Computational Linguistics. https://
doi.org/10.18653/v1/K19-1045

Jacob Devlin, Ming-Wei Chang, Kenton Lee, E
Kristina Toutanova. 2019. BERT: Pre-training
of deep bidirectional transformers for language
understanding. Negli Atti di
IL 2019
Conference of the North American Chapter
of the Association for Computational Linguis-
tic: Tecnologie del linguaggio umano, Volume 1
(Long and Short Papers), pages 4171–4186,
Minneapolis, Minnesota. Association for Com-
Linguistica putazionale.

Daniel Fern´andez-Gonz´alez and Carlos G´omez-
Rodr´ıguez. 2020. Enriched in-order lineariza-
tion for faster sequence-to-sequence constituent
parsing. In Proceedings of the 58th Annual
Riunione dell'Associazione per il Computazionale
Linguistica, pages 4092–4099, Online. Associ-
ation for Computational Linguistics.

Abbas Ghaddar and Phillippe Langlais. 2018.
Robust lexical features for improved neural
network named-entity recognition. In Procedi-
ings of the 27th International Conference on
Linguistica computazionale, pages 1896–1907,
Santa Fe, New Mexico, USA. Associazione per
Linguistica computazionale.

Yoav Goldberg and Joakim Nivre. 2012. UN
dynamic oracle for arc-eager dependency
parsing. Negli Atti di COLING 2012,
pages 959–976, Mumbai, India. The COLING
2012 Organizing Committee.

Tatsunori B. Hashimoto, Kelvin Guu, Yonatan
Oren, and Percy S. Liang. 2018. A retrieve-
and-edit framework for predicting structured
outputs. Advances in Neural Information Pro-
cessing Systems, 31.

Han He and Jinho Choi. 2020. Establishing
strong baselines for the new decade: Sequence
tagging, syntactic and semantic parsing with
bert. In The Thirty-Third International Flairs
Conferenza.

Han He and Jinho D. Choi. 2021UN. Levi graph
AMR parser using heterogeneous attention. In
Proceedings of
the 17th International Con-
ference on Parsing Technologies and the
IWPT 2021 Shared Task on Parsing into En-
hanced Universal Dependencies (IWPT 2021),
pages 50–57, Online. Association for Compu-
linguistica nazionale. https://doi.org/10
.18653/v1/2021.iwpt-1.5

Han He and Jinho D. Choi. 2021B. The stem
cell hypothesis: Dilemma behind multi-task
learning with transformer encoders. Nel professionista-
ceedings of the 2021 Conferenza sull'Empirico
Metodi nell'elaborazione del linguaggio naturale,
pages 5555–5577, Online and Punta Cana,
Dominican Republic. Association for Compu-
linguistica nazionale. https://doi.org/10
.18653/v1/2021.emnlp-main.451

Chris Hokamp and Qun Liu. 2017. Lexically
constrained decoding for sequence generation
using grid beam search. Negli Atti del
55esima Assemblea Annuale dell'Associazione per
Linguistica computazionale (Volume 1: Long Pa-
pers), pages 1535–1546, Vancouver, Canada.
Associazione per la Linguistica Computazionale.
https://doi.org/10.18653/v1/P17
-1141

Robin Jia and Percy Liang. 2016. Data recom-
bination for neural semantic parsing. Nel professionista-
ceedings of
the 54th Annual Meeting of
the Association for Computational Linguistics
(Volume 1: Documenti lunghi), pages 12–22, Berlin,
Germany. Association for Computational Lin-
guistics. https://doi.org/10.18653/v1
/P16-1002

John D. Lafferty, Andrew McCallum, E
Fernando C. N. Pereira. 2001. Conditional
random fields: Probabilistic models for seg-
menting and labeling sequence data. In ICML,
pages 282–289.

Guillaume Lample, Miguel Ballesteros, Sandeep
Subramanian, Kazuya Kawakami, and Chris
Dyer. 2016. Neural architectures for named en-
tity recognition. Negli Atti del 2016

596

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
5
7
2
1
3
4
4
9
5

/

/
T

l

UN
C
_
UN
_
0
0
5
5
7
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

the North American Chap-
Conference of
the Association for Computational
ter of
Linguistica: Tecnologie del linguaggio umano,
pages 260–270, San Diego, California. Associ-
ation for Computational Linguistics. https://
doi.org/10.18653/v1/N16-1030

Mike Lewis, Yinhan Liu, Naman Goyal,
Marjan Ghazvininejad, Abdelrahman Mohamed,
Omer Levy, Veselin Stoyanov, and Luke
Zettlemoyer. 2020. BART: Denoising sequence-
to-sequence pre-training for natural language
generation, translation, and comprehension. In
Proceedings of the 58th Annual Meeting of
the Association for Computational Linguistics,
pages 7871–7880, Online. Associazione per
Linguistica computazionale. https://doi.org
/10.18653/v1/2020.acl-main.703

Peng-Hsuan Li, Ruo-Ping Dong, Yu-Siang Wang,
Ju-Chieh Chou, and Wei-Yun Ma. 2017. Lev-
eraging linguistic structures for named en-
tity recognition with bidirectional recursive
neural networks. Negli Atti del 2017
Conference on Empirical Methods in Natu-
ral Language Processing, pages 2664–2669,
Copenhagen, Denmark. Association for Com-
Linguistica putazionale.

Zuchao Li, Jiaxun Cai, Shexia He, and Hai Zhao.
2018. Seq2seq dependency parsing. In Procedi-
ings of the 27th International Conference on
Linguistica computazionale, pages 3203–3214,
Santa Fe, New Mexico, USA. Associazione per
Linguistica computazionale.

Jiangming Liu and Yue Zhang. 2017. In-order
transition-based constituent parsing. Transac-
tions of
the Association for Computational
Linguistica, 5:413–424. https://doi.org
/10.1162/tacl_a_00070

Chunpeng Ma, Lemao Liu, Akihiro Tamura,
Tiejun Zhao, and Eiichiro Sumita. 2017. De-
terministic attention for sequence-to-sequence
constituent parsing. In Thirty-First AAAI Con-
ference on Artificial Intelligence. https://
doi.org/10.1609/aaai.v31i1.10967

Mitchell P. Marcus, Mary Ann Marcinkiewicz,
and Beatrice Santorini. 1993. Building a
Large Annotated Corpus of English: IL
Penn Treebank. Linguistica computazionale,
19(2):313–330. https://doi.org/10.21236
/ADA273556

Khalil Mrini, Franck Dernoncourt, Quan Hung
Tran, Trung Bui, Walter Chang, and Ndapa
Nakashole. 2020. Rethinking self-attention:
Towards interpretability in neural parsing. In
Findings of the Association for Computational
Linguistica: EMNLP 2020, pages 731–742,
Online. Association for Computational Linguis-
tic. https://doi.org/10.18653/v1
/2020.findings-emnlp.65

Joakim Nivre. 2004. Incrementality in deter-
ministic dependency parsing. Negli Atti
Di
the Workshop on Incremental Parsing:
Bringing Engineering and Cognition Together,
pages 50–57, Barcelona, Spain. Associazione per
Linguistica computazionale. https://doi
.org/10.3115/1613148.1613156

Giovanni Paolini, Ben Athiwaratkun,

Jason
Krone, Jie Ma, Alessandro Achille, RISHITA
ANUBHAI, Cicero Nogueira dos Santos, Bing
Xiang, and Stefano Soatto. 2021. Structured
prediction as translation between augmented
natural languages. In International Conference
sulle rappresentazioni dell'apprendimento.

Matthew E. Peters, Marco Neumann, Mohit Iyyer,
Matt Gardner, Cristoforo Clark, Kenton Lee,
e Luke Zettlemoyer. 2018. Deep contextu-
alized word representations. Negli Atti di
IL 2018 Conferenza del Nord America
Capitolo dell'Associazione per il calcolo
Linguistica: Tecnologie del linguaggio umano,
Volume 1 (Documenti lunghi), pagine 2227–2237,
New Orleans, Louisiana. Association for Com-
Linguistica putazionale. https://doi.org
/10.18653/v1/N18-1202

Sameer Pradhan, Alessandro Moschitti, Nianwen
Xue, Hwee Tou Ng, Anders Bj¨orkelund, Olga
Uryupina, Yuchen Zhang, and Zhi Zhong.
2013. Towards Robust Linguistic Analysis
using OntoNotes. In Proceedings of the Seven-
teenth Conference on Computational Natural
Language Learning, pages 143–152, Sofia,
Bulgaria. Association for Computational Lin-
guistics.

Alec Radford, Jeffrey Wu, Rewon Child, David
Luan, Dario Amodei, Ilya Sutskever, et al.
2019. Language models are unsupervised mul-
titask learners. OpenAI blog, 1(8):9.

Colin Raffel, Noam Shazeer, Adam Roberts,
Katherine Lee, Sharan Narang, Michael

597

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
5
7
2
1
3
4
4
9
5

/

/
T

l

UN
C
_
UN
_
0
0
5
5
7
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Matena, Yanqi Zhou, Wei Li, and Peter J. Liu.
2020. Exploring the limits of transfer learning
with a unified text-to-text transformer. Journal
of Machine Learning Research, 21(140):1–67.

Kenji Sagae

and Alon Lavie.

2005. UN
classifier-based parser with linear
run-time
complexity. In Proceedings of the Ninth In-
ternational Workshop on Parsing Technology,
pages 125–132, Vancouver, British Columbia.
Associazione per la Linguistica Computazionale.
https://doi.org/10.3115/1654494
.1654507

Richard Shin, Christopher Lin, Sam Thomson,
Charles Chen, Subhro Roy, Emmanouil
Antonios Platanios, Adam Pauls, Dan Klein,
Jason Eisner, and Benjamin Van Durme. 2021.
Constrained language models yield few-shot
semantic parsers. Negli Atti del 2021
Conference on Empirical Methods in Natu-
ral Language Processing, pages 7699–7715,
Online and Punta Cana, Dominican Republic.
Associazione per la Linguistica Computazionale.
https://doi.org/10.18653/v1/2021
.emnlp-main.608

Jana Strakov´a, Milan Straka, and Jan Hajic.
2019. Neural architectures for nested NER
through linearization. Negli Atti del
57esima Assemblea Annuale dell'Associazione per
Linguistica computazionale, pages 5326–5331,
Florence, Italy. Associazione per il calcolo
Linguistica. https://doi.org/10.18653
/v1/P19-1527

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le.
2014. Sequence to sequence learning with
neural networks. Advances in Neural Informa-
tion Processing Systems, 27.

Erik F. Tjong Kim Sang and Fien De Meulder.
2003. Introduction to the CoNLL-2003 shared
task: Language-independent named entity
the Seventh
recognition. Negli Atti di
Conference on Natural Language Learning at
HLT-NAACL 2003, pages 142–147. https://
doi.org/10.3115/1119176.1119195

Bo-Hsiang Tseng,

Jianpeng Cheng, Yimai
Fang, and David Vandyke. 2020. A generative
model for joint natural language understand-
ing and generation. In Proceedings of the 58th
Annual Meeting of the Association for Com-
Linguistica putazionale, pages 1795–1807, On-

598

line. Association for Computational Linguis-
tic. https://doi.org/10.18653/v1
/2020.acl-main.163

Ashish Vaswani, Noam Shazeer, Niki Parmar,
Jakob Uszkoreit, Llion Jones, Aidan N.
Gomez, Łukasz Kaiser, and Illia Polosukhin.
2017. Attention is all you need. In Advances
in Neural Information Processing Systems,
pages 5998–6008.

Oriol Vinyals, Meire Fortunato, and Navdeep
Jaitly. 2015UN. Pointer networks. In Advances
in Neural Information Processing Systems,
volume 28.

Oriol Vinyals, Łukasz Kaiser, Terry Koo, Slav
Petrov, Ilya Sutskever, and Geoffrey Hinton.
2015B. Grammar as a foreign language. In
Advances in Neural Information Processing
Sistemi, volume 28. Curran Associates, Inc.

Xinyu Wang, Yong Jiang, Nguyen Bach, Tao
Wang, Zhongqiang Huang, Fei Huang, E
Kewei Tu. 2021. Automated concatenation
of embeddings for structured prediction. In
Proceedings of the 59th Annual Meeting of
the Association for Computational Linguistics
and the 11th International Joint Conference
on Natural Language Processing (Volume 1:
Documenti lunghi), pages 2643–2660, Online.
Associazione per la Linguistica Computazionale.
https://doi.org/10.18653/v1/2021
.acl-long.206

Ralph Weischedel, Martha Palmer, Mitchell
Marcus, Eduard Hovy, Sameer Pradhan, Lance
Ramshaw, Nianwen Xue, Ann Taylor, Jeff
Kaufman, Michelle Franchini, and Mohammed
El-Bachouti, Robert Belvin, and Ann Houston.
2013. Ontonotes release 5.0 ldc2013t19. Lin-
guistic Data Consortium, Philadelphia, PAPÀ.

Sam Wiseman and Alexander M. Rush. 2016.
Sequence-to-sequence learning as beam-search
optimization. Negli Atti del 2016 Contro-
ference on Empirical Methods in Natural Lan-
guage Processing, pages 1296–1306, Austin,
Texas. Association for Computational Lin-
guistics. https://doi.org/10.18653/v1
/D16-1137

Lu Xu, Zhanming Jie, Wei Lu, and Lidong Bing.
2021. Better feature integration for named entity
recognition. Negli Atti del 2021 Contro-
ferenza del capitolo nordamericano della

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
5
7
2
1
3
4
4
9
5

/

/
T

l

UN
C
_
UN
_
0
0
5
5
7
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

Associazione per la Linguistica Computazionale: Eh-
uomo Tecnologie del linguaggio, pages 3457–3469,
Online. Association for Computational Lin-
guistics. https://doi.org/10.18653/v1
/2021.naacl-main.271

Ikuya Yamada, Akari Asai, Hiroyuki Shindo,
Hideaki Takeda, and Yuji Matsumoto. 2020.
LUKE: Deep contextualized entity repre-
sentations with entity-aware self-attention.
Negli Atti di
IL 2020 Conference on
Empirical Methods in Natural Language Pro-
cessazione (EMNLP), pages 6442–6454, Online.
Associazione per la Linguistica Computazionale.
https://doi.org/10.18653/v1/2020
.emnlp-main.523

Hang Yan, Tao Gui, Junqi Dai, Qipeng Guo,
Zheng Zhang, and Xipeng Qiu. 2021. A uni-
fied generative framework for various NER
subtasks. In Proceedings of the 59th Annual
Riunione dell'Associazione per il Computazionale
Linguistics and the 11th International Joint
Conferenza sull'elaborazione del linguaggio naturale
(Volume 1: Documenti lunghi), pages 5808–5822,
Online. Association for Computational Lin-
guistics. https://doi.org/10.18653/v1
/2021.acl-long.451

Songlin Yang and Kewei Tu. 2022. Bottom-up
constituency parsing and nested named en-
In
tity recognition with pointer networks.
Proceedings of the 60th Annual Meeting of
the Association for Computational Linguistics
(Volume 1: Documenti lunghi), pages 2403–2416,
Ireland. Associazione per il calcolo-
Dublin,
linguistica nazionale. https://doi.org/10
.18653/v1/2022.acl-long.171

Deming Ye, Yankai Lin, Peng Li, and Maosong
Sun. 2022. Packed levitated marker for entity
and relation extraction. Negli Atti del
60esima Assemblea Annuale dell'Associazione per

Linguistica computazionale (Volume 1: Lungo
Carte), pages 4904–4917, Dublin, Ireland.
Associazione per la Linguistica Computazionale.

Juntao Yu, Bernd Bohnet, and Massimo Poesio.
2020. Named entity recognition as dependency
parsing. In Proceedings of the 58th Annual
Riunione dell'Associazione per il Computazionale
Linguistica, pages 6470–6476, Online. Associ-
ation for Computational Linguistics. https://
doi.org/10.18653/v1/2020.acl-main
.577

Zhirui Zhang, Shujie Liu, Mu Li, Ming
Zhou, and Enhong Chen. 2017. Stack-based
multi-layer attention for transition-based de-
pendency parsing. Negli Atti del 2017
Conference on Empirical Methods in Natu-
ral Language Processing, pages 1677–1682,
Copenhagen, Denmark. Association for Com-
Linguistica putazionale. https://doi.org
/10.18653/v1/D17-1175

Enwei Zhu and Jinpeng Li. 2022. Boundary
smoothing for named entity recognition. In
Proceedings of the 60th Annual Meeting of
the Association for Computational Linguistics
(Volume 1: Documenti lunghi), pages 7096–7108,
Dublin, Ireland. Associazione per il calcolo
Linguistica. https://doi.org/10.18653/v1
/2022.acl-long.490

Huiming Zhu, Chunhui He, Yang Fang, E
Weidong Xiao. 2020. Fine grained named en-
tity recognition via seq2seq framework. IEEE
Access, 8:53953–53961. https://doi.org
/10.1109/ACCESS.2020.2980431

Muhua Zhu, Yue Zhang, Wenliang Chen, Min
Zhang, and Jingbo Zhu. 2013. Fast and accurate
shift-reduce constituent parsing. Negli Atti
of the 51st Annual Meeting of the Associa-
tion for Computational Linguistics (Volume 1:
Documenti lunghi), pages 434–443, Sofia, Bulgaria.
Associazione per la Linguistica Computazionale.

599

l

D
o
w
N
o
UN
D
e
D

F
R
o
M
H

T
T

P

:
/
/

D
io
R
e
C
T
.

M

io
T
.

e
D
tu

/
T

UN
C
l
/

l

UN
R
T
io
C
e

P
D

F
/

D
o

io
/

.

1
0
1
1
6
2

/
T

l

UN
C
_
UN
_
0
0
5
5
7
2
1
3
4
4
9
5

/

/
T

l

UN
C
_
UN
_
0
0
5
5
7
P
D

.

F

B

G
tu
e
S
T

T

o
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3
Scarica il pdf