Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner.

Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner.

Submitted 6/2013; Published 10/2013. c
(cid:13)

2013 Association for Computational Linguistics.

TrainingDeterministicParserswithNon-DeterministicOraclesYoavGoldbergBar-IlanUniversityDepartmentofComputerScienceRamat-Gan,Israelyoav.goldberg@gmail.comJoakimNivreUppsalaUniversityDepartmentofLinguisticsandPhilologyUppsala,Swedenjoakim.nivre@lingfil.uu.seAbstractGreedytransition-basedparsersareveryfastbuttendtosufferfromerrorpropagation.Thisproblemisaggravatedbythefactthattheyarenormallytrainedusingoraclesthataredeter-ministicandincompleteinthesensethattheyassumeauniquecanonicalpaththroughthetransitionsystemandareonlyvalidaslongastheparserdoesnotstrayfromthispath.Inthispaper,wegiveageneralcharacterizationoforaclesthatarenondeterministicandcom-plete,presentamethodforderivingsuchora-clesfortransitionsystemsthatsatisfyaprop-ertywecallarcdecomposition,andinstanti-atethismethodforthreewell-knowntransi-tionsystemsfromtheliterature.Wesaythattheseoraclesaredynamic,becausetheyallowustodynamicallyexplorealternativeandnon-optimalpathsduringtraining–incontrasttooraclesthatstaticallyassumeauniqueoptimalpath.Experimentalevaluationonawiderangeofdatasetsclearlyshowsthatusingdynamicoraclestotraingreedyparsersgivessubstan-tialimprovementsinaccuracy.Moreover,thisimprovementcomesatnocostintermsofefficiency,unlikeothertechniqueslikebeamsearch.1IntroductionGreedytransition-basedparsersareeasytoimple-mentandareveryefficient,buttheyaregenerallynotasaccurateasparsersthatarebasedonglobalsearch(McDonaldetal.,2005;KooandCollins,2010)orastransition-basedparsersthatusebeamsearch(ZhangandClark,2008)ordynamicpro-gramming(HuangandSagae,2010;Kuhlmannetal.,2011).Thisworkispartofalineofresearchtryingtopushtheboundariesofgreedyparsingandnarrowtheaccuracygapof2–3%betweensearch-basedandgreedyparsers,whilemaintainingtheef-ficiencyandincrementalnatureofgreedyparsers.Onereasonfortheloweraccuracyofgreedyparsersiserrorpropagation:oncetheparsermakesanerrorindecoding,moreerrorsarelikelytofol-low.Thisbehavioriscloselyrelatedtothewayinwhichgreedyparsersarenormallytrained.Givenatreebankoracle,agoldsequenceoftransitionsisderived,andapredictoristrainedtopredicttransi-tionsalongthisgoldsequence,withoutconsideringanyparserstateoutsidethissequence.Thus,oncetheparserstraysfromthegoldenpathattesttime,itventuresintounknownterritoryandisforcedtoreacttosituationsithasneverbeentrainedfor.Inrecentwork(GoldbergandNivre,2012),weintroducedtheconceptofadynamicoracle,whichisnon-deterministicandnotrestrictedtoasinglegoldenpath,butinsteadprovidesoptimalpredic-tionsforanypossiblestatetheparsermightbein.Dynamicoraclesarenon-deterministicinthesensethattheyreturnasetofvalidtransitionsforagivenparserstateandgoldtree.Moreover,theyarewell-definedandoptimalalsoforstatesfromwhichthegoldtreecannotbederived,inthesensethattheyreturnthesetoftransitionsleadingtothebesttreederivablefromeachstate.Weshowedexperimen-tallythat,usingadynamicoracleforthearc-eagertransitionsystem(Nivre,2003),agreedyparsercanbetrainedtoperformwellalsoafterincurringamis-take,thusalleviatingtheeffectoferrorpropagationandresultinginconsistentlybetterparsingaccuracy.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
2
3
7
1
5
6
6
6
8
1

/

/
t

l

a
c
_
a
_
0
0
2
3
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

404

Inthispaper,weextendtheworkofGoldbergandNivre(2012)bygivingageneralcharacteri-zationofdynamicoraclesasoraclesthatarenon-deterministic,inthattheyreturnsetsoftransitions,andcomplete,inthattheyaredefinedforallpossiblestates.Wethendefineaformalpropertyoftransitionsystemswhichwecallarcdecomposition,andin-troduceaframeworkforderivingdynamicoraclesforarc-decomposablesystems.Usingthisframe-work,wederivenoveldynamicoraclesforthehy-brid(Kuhlmannetal.,2011)andeasy-first(Gold-bergandElhadad,2010)transitionsystems,whicharearc-decomposable(asisthearc-eagersystem).Wealsoshowthatthepopulararc-standardsystem(Nivre,2004)isnotarc-decomposable,andsoderiv-ingadynamicoracleforitremainsanopenresearchquestion.Finally,weperformasetofexperimentsontheCoNLL2007datasets,validatingthattheuseofdynamicoraclesforexploringstatesthatresultfromparsingmistakesduringtrainingisbeneficialacrosstransitionsystems.2Transition-BasedDependencyParsingWebeginwithaquickreviewoftransition-baseddependencyparsing,presentingthearc-eager,arc-standard,hybridandeasy-firsttransitionssystemsinacommonnotation.Thetransition-basedpars-ingframework(Nivre,2008)assumesatransitionsystem,anabstractmachinethatprocessessentencesandproducesparsetrees.Thetransitionsystemhasasetofconfigurationsandasetoftransitionswhichareappliedtoconfigurations.Whenparsingasen-tence,thesystemisinitializedtoaninitialconfigu-rationbasedontheinputsentence,andtransitionsarerepeatedlyappliedtothisconfiguration.Afterafinitenumberoftransitions,thesystemarrivesataterminalconfiguration,andaparsetreeisreadofftheterminalconfiguration.Inagreedyparser,aclas-sifierisusedtochoosethetransitiontotakeineachconfiguration,basedonfeaturesextractedfromtheconfigurationitself.Transitionsystemsdifferbythewaytheydefineconfigurations,andbytheparticularsetoftransitionsavailable.2.1DependencyTreesWedefineadependencytreeforasentenceW=w1,…,wntobealabeleddirectedtreeT=(V,A),whereV={w1,…,wn}isasetofnodesgivenbythetokensoftheinputsentence,andA⊆V×L×V(forsomedependencylabelsetL)isasetoflabeleddirectedarcsoftheform(h,lb,d),whereh∈Vissaidtobethehead,d∈Vthedependent,andlb∈Lthedependencylabel.Whendealingwithunlabeledparsing,orwhenthelabelidentityisirrelevant,wetakeA⊆V×Vtobeasetofordinarydirectedarcsoftheform(h,d).Notethat,sincethenodesofthetreearegivenbytheinputsentence,adependencytreeT=(V,A)forasentenceWisuniquelydefinedbythearcsetA.Forconvenience,wewillthereforeequatethetreewiththearcsetandandusethesymbolTforthelatter,reservingthesymbolAforarcsetsthatarenotnecessarilytrees.Inthecontextofthisworkitisassumedthatallthedependencytreesareprojective.Althoughthegeneraldefinitionofadependencytreedoesnotmakeanyassumptionsaboutwhichnodeistherootofthetree,itiscommonpracticeindependencyparsingtoaddadummynodeROOT,whichisprefixedorsuffixedtothesentenceandwhichalwaysactsastherootofthetree.Wewillfollowthispracticeinourdescriptionofdifferenttransitionsystemsbelow.2.2TransitionSystemsArc-EagerInthearc-eagersystem(Nivre,2003),aconfigurationc=(σ,β,A)consistsofastackσ,abufferβ,andasetAofdependencyarcs.1GivenasentenceW=w1,…,wn,thesystemisinitializedwithanemptystack,anemptyarcset,andβ=w1,…,wn,ROOT,whereROOTisthespecialrootnode.AnyconfigurationcwithanemptystackandabuffercontainingonlyROOTisterminal,andtheparsetreeisgivenbythearcsetAcofc.2Thesystemhas4transitions:RIGHTlb,LEFTlb,SHIFT,REDUCE,definedasfollows:SHIFT[(σ,b|β,A)]=(σ|b,β,A)RIGHTlb[(σ|s,b|β,A)]=(σ|s|b,β,A∪{(s,lb,b)})LEFTlb[(σ|s,b|β,A)]=(σ,b|β,A∪{(b,lb,s)})REDUCE[(σ|s,β,A)]=(σ,β,A)1Weuseσ|xtodenoteastackwithtopelementxandre-mainderσ,andx|βtodenoteabufferwithaheadxfollowedbytheelementsinβ.2ThisdefinitionofaterminalconfigurationdiffersfromthatinNivre(2003)butguaranteesthatthesetAcisadependencytreerootedinROOT.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
2
3
7
1
5
6
6
6
8
1

/

/
t

l

a
c
_
a
_
0
0
2
3
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

405

ThereisapreconditionontheRIGHTandSHIFTtransitionstobelegalonlywhenb6=ROOT,andforLEFT,RIGHTandREDUCEtobelegalonlywhenthestackisnon-empty.Moreover,LEFTisonlyle-galwhensdoesnothaveaparentinA,andREDUCEwhensdoeshaveaparentinA.Ingeneral,weuseLEGAL(c)torefertothesetoftransitionsthatarele-galinaconfigurationc.Thearc-eagersystembuildstreeseagerlyinthesensethatarcsareaddedattheearliesttimepossible.Inaddition,eachwordwillcollectallofitsleftdependentsbeforecollectingitsrightdependents.Arc-StandardThearc-standardsystem(Nivre,2004)hasconfigurationsofthesameformc=(σ,β,A)asthearc-eagersystem.Theinitialcon-figurationforasentenceW=w1,…,wnhasanemptystackandarcsetandβ=ROOT,w1,…,wn.AconfigurationcisterminalifithasanemptybufferandastackcontainingthesinglenodeROOT;theparsetreeisgivenbyAc.Thesystemhas3transi-tions:RIGHTlb,LEFTlb,SHIFT,definedasfollows:SHIFT[(σ,b|β,A)]=(σ|b,β,A)RIGHTlb[(σ|s1|s0,β,A)]=(σ|s1,β,A∪{(s1,lb,s0)})LEFTlb[(σ|s1|s0,β,A)]=(σ|s0,β,A∪{(s0,lb,s1)})ThereisapreconditionontheLEFTtransitiontobelegalonlywhens16=ROOT,andforLEFTandRIGHTtobelegalonlywhenthestackhasatleasttwoelements.Thearc-standardsystembuildstreesinabottom-upfashion:eachwordmustcollectallitsdependentsbeforebeingattachedtoitshead.Thesystemdoesnotposeanyrestrictionwithregardtotheorderofcollectingleftandrightdependents.HybridThehybridsystem(Kuhlmannetal.,2011)hasthesameconfigurationsandthesameinitializationandterminationconditionsasthearc-standardsystem.Thesystemhas3transitions:RIGHTlb,LEFTlb,SHIFT,definedasfollows:SHIFT[(σ,b|β,A)]=(σ|b,β,A)RIGHTlb[(σ|s1|s0,β,A)]=(σ|s1,β,A∪{(s1,lb,s0)})LEFTlb[(σ|s,b|β,A)]=(σ,b|β,A∪{(b,lb,s)})ThereisapreconditiononRIGHTtobelegalonlywhenthestackhasatleasttwoelements,andonLEFTtobelegalonlywhenthestackisnon-emptyands6=ROOT.Thehybridsystemcanbeseenasacombinationofthearc-standardandarc-eagerAlgorithm1Greedytransition-basedparsing1:Input:sentenceW,parameter-vectorw2:c←INITIAL(W)3:whilenotTERMINAL(c)do4:tp←argmaxt∈LEGAL(c)w·φ(c,t)5:c←tp(c)6:returnAcsystems,usingtheLEFTactionofarc-eagerandtheRIGHTactionofarc-standard.Likearc-standard,itbuildstreesinabottom-upfashion.Butlikearc-eager,itrequiresawordtocollectallitsleftdepen-dentsbeforecollectinganyrightdependent.Easy-FirstIntheeasy-firstsystem(GoldbergandElhadad,2010),aconfigurationc=(λ,A)consistsofalistλandasetAofdependencyarcs.Weuselitodenotetheithmemberofλandwrite|λ|forthelengthofλ.GivenasentenceW=w1,…,wn,thesystemisinitializedwithanemptyarcsetandλ=ROOT,w1,…,wn.Aconfigurationcister-minalwithparsetreeAcifλ=ROOT.Thesetoftransitionsforagivenconfigurationc=(λ,A)is:{LEFTilb|1kandRAND()0withrespecttoatreeTifoneofthefollowingholds:12Noteagainthats0maybemissinginthecaseofSHIFTcaseands1inthecaseofSHIFTandLEFT.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
2
3
7
1
5
6
6
6
8
1

/

/
t

l

a
c
_
a
_
0
0
2
3
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

412

0.3.6.9012345arabic81.6283.880.3.6.9012345basque75.1976.180.3.6.9012345catalan91.2492.100.3.6.9012345chinese84.8485.890.3.6.9012345czech79.0380.370.3.6.9012345english86.2988.690.3.6.9012345greek79.3681.190.3.6.9012345hungarian76.3577.580.3.6.9012345italian84.0184.510.3.6.9012345turkish77.0277.67Figure1:Effectofk(yaxis)andp(xaxis)valuesonparsingaccuraciesforthearc-eagersystemonthevariousCoNLL-2007shared-tasklanguages.EachpointisanaverageUASof5runswithdifferentseeds.Thegeneraltrendisthatsmallerkandhigherparebetter.(1)itaddsanarc(h,d)suchthat(h0,d)∈Tforsomeh0∈λ,h06=h;(2)itaddsanarc(h,d)suchthat(d,d0)∈Tforsomed0∈λ.Theexactcostcanbecalculatedbycountingthenumberofsucharcs.5ExperimentsandResultsSetup,dataandparametersThegoalofourex-perimentsistoevaluatetheutilityofthedynamicoraclesfortraining,bycomparingatrainingsce-nariowhichonlyseesconfigurationsthatcanleadtothegoldtree(followingastaticoraclefortheleft-to-rightsystemsandanon-deterministicbutin-completeoraclefortheeasy-firstsystem),againstatrainingscenariothatinvolvesexplorationofincor-rectstates,usingthedynamicoracles.Asourtrainingalgorithminvolvesarandomcom-ponent(weshufflethesentencespriortoeachitera-tion,andrandomlyselectwhethertofollowacor-rectorincorrectaction),weevaluateeachsetupfivetimesusingdifferentrandomseeds,andreporttheaveragedresults.Weperformalloftheexperimentsonthemulti-lingualCoNLL-2007datasets.Weuse15trainingiterationsfortheleft-to-rightparsers,and20trainingiterationsfortheeasy-firstparser.Weusethestan-dardperceptronupdateasourupdateruleintraining,andusetheaveragedweightvectorforpredictionintesttime.Thefeaturesetsdifferbytransitionsys-tembutarekeptthesameacrossdatasets.Theex-actfeature-setdefinitionsforthedifferentsystemsareavailableintheaccompanyingsoftware,whichisavailableonlineatthefirstauthor’shomepage.EffectofexplorationparametersInaninitialsetofexperiments,weinvestigatetheeffectoftheex-plorationparameterskandponthearc-eagersys-tem.TheresultsarepresentedinFigure1.Whiletheoptimalparametersvarybydataset,thereisacleartrendtowardlowervaluesofkandhighervaluesofp.ThisisconsistentwiththereportofGoldbergandNivre(2012)whousedafixedsmallvalueofkandlargevalueofpthroughouttheirexperiments.TrainingwithexplorationforthevarioussystemsForthesecondexperiment,inwhichwecomparedtrainingwithastaticoracletotrainingwithexplo-ration,wefixedtheexplorationparameterstok=1andp=0.9foralldatasetsandtransition-systemcombinations.Theresultsintermsoflabeledaccu-racies(fortheleft-to-rightsystems)andunlabeledaccuracies(forallsystems)arepresentedinTable1.Trainingwithexplorationusingthedynamicoraclesyieldsimprovedaccuracyforthevastmajorityofthesetups.Thenotableexceptionsarethearc-eagerandeasy-firstsystemsforunlabeledItalianandthearc-hybridsysteminCatalan,whereweobserveasmalldropinaccuracy.However,wecansafelyconcludethattrainingwithexplorationisbeneficialandnotethatwemaygetevenfurthergainsinthefutureusingbettermethodsfortuningtheexplorationparametersorbettertrainingmethods.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
2
3
7
1
5
6
6
6
8
1

/

/
t

l

a
c
_
a
_
0
0
2
3
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

413

system/languagehungarianchinesegreekczechbasquecatalanenglishturkisharabicitalianUASeager:static76.4285.0179.5378.7075.1491.3086.1077.3881.5984.40eager:dynamic77.4885.8980.9880.2575.9792.0288.6977.3983.6284.30hybrid:static76.3984.9679.4079.7173.1891.3086.4375.9183.4383.43hybrid:dynamic77.5485.1080.4980.0773.7091.0687.6276.9084.0483.83easyfirst:static81.2787.0181.2882.0075.0192.5088.5778.9282.7385.31easyfirst:dynamic81.5287.4882.2582.3975.8792.8589.4179.2983.7085.11LASeager:static66.7281.2472.4471.0865.3486.0284.9366.5972.1080.17eager:dynamic68.4182.2373.8172.9966.6386.9387.6967.0573.9280.43hybrid:static66.5480.1770.9971.8862.8485.5784.9664.8073.1678.78hybrid:dynamic68.0580.5972.0772.1563.5285.4786.2866.1274.1079.25Table1:ResultsontheCoNLL2007dataset.UAS,includingpunctuation.Eachnumberisanaverageover5runswithdifferentrandomizationseeds.Allexperimentsusedthesameexplorationparametersofk=1,p=0.9.6RelatedWorkTheerrorpropagationproblemforgreedytransition-basedparsingwasdiagnosedbyMcDonaldandNivre(2007)andhasbeentackledwithavarietyoftechniquesincludingparserstacking(NivreandMc-Donald,2008;Martinsetal.,2008)andbeamsearchandstructuredprediction(ZhangandClark,2008;ZhangandNivre,2011).Thetechniquecalledboot-strappinginChoiandPalmer(2011)issimilarinspirittotrainingwithexplorationbutisappliediter-ativelyinbatchmodeandisonlyapproximateduetotheuseofstaticoracles.DynamicoracleswerefirstexploredbyGoldbergandNivre(2012).Inmachinelearningmoregenerally,ourapproachcanbeseenasaproblem-specificinstanceofimita-tionlearning(AbbeelandNg,2004;Vlachos,2012;Heetal.,2012;Daum´eIIIetal.,2009;Rossetal.,2011),wherethedynamicoracleisusedtoim-plementtheoptimalexpertneededintheimitationlearningsetup.Indeed,ourtrainingprocedureiscloselyrelatedtoDAgger(Rossetal.,2011),whichalsotrainsaclassifiertomatchanexpertonadis-tributionofpossiblysuboptimalstatesobtainedbyrunningthesystemitself.OurtrainingprocedurecanbeviewedasanonlineversionofDAgger(Heetal.,2012)withtwoextensions:First,ourlearn-ingalgorithminvolvesastochasticpolicyparame-terizedbyk,pforchoosingbetweentheoracleorthemodelprediction,whereasDAggeralwaysfol-lowsthesystem’sownprediction(essentiallyrun-ningwithk=0,p=1).TheheatmapsinFigure1showthatthisparameterizationisbeneficial.Sec-ond,whileDAggerassumesanexpertprovidingasinglelabelateachstate,ouroracleisnondetermin-isticandallowsmultiplecorrectlabels(transitions)whichourtrainingproceduretie-breaksaccordingtothemodel’scurrentprediction,atechniquethathasrecentlybeenproposedinanextensiontoDAggerbyHeetal.(2012).Otherrelatedapproachesinthemachinelearningliteratureincludestackedsequen-tiallearning(CohenandCarvalho,2005),LaSO(Daum´eIIIandMarcu,2005),Searn(Daum´eIIIetal.,2009)andSMILe(RossandBagnell,2010).7ConclusionInthispaper,wehaveextendedtheworkondynamicoraclespresentedinGoldbergandNivre(2012)inseveraldirectionsbygivingformalcharacterizationsofnon-deterministicandcompleteoracles,definingthearc-decompositionpropertyfortransitionsys-tems,andusingthispropertytoderivenovelcom-pletenon-deterministicoraclesforthehybridandeasy-firstsystems(aswellasacorrectedoracleforthearc-eagersystem).Wehavethenusedthecom-pletenessoftheseneworaclestoimprovethetrain-ingprocedureofgreedyparserstoincludeexplo-rationsofconfigurationswhichresultfromincor-recttransitions.Forallthreetransitionsystems,wegetsubstantialaccuracyimprovementsonmanylan-guages.Asthechangesalltakeplaceattrainingtime,theveryfastrunningtimeofthegreedyalgo-rithmattesttimeismaintained.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
2
3
7
1
5
6
6
6
8
1

/

/
t

l

a
c
_
a
_
0
0
2
3
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

414

ReferencesPieterAbbeelandAndrewYNg.2004.Apprenticeshiplearningviainversereinforcementlearning.InPro-ceedingsofthe21stInternationalConferenceonMa-chineLearning(ICML),page1.JinhoD.ChoiandMarthaPalmer.2011.Gettingthemostoutoftransition-baseddependencyparsing.InProceedingsofthe49thAnnualMeetingoftheAsso-ciationforComputationalLinguistics:HumanLan-guageTechnologies,pages687–692.WilliamW.CohenandVitorR.Carvalho.2005.Stackedsequentiallearning.InProceedingsoftheInter-nationalJointConferenceonArtificialIntelligence,pages671–676.HalDaum´eIIIandDanielMarcu.2005.Learningassearchoptimization:Approximatelargemarginmeth-odsforstructuredprediction.InProceedingsofthe22ndInternationalConferenceonMachineLearning,pages169–176.HalDaum´eIII,JohnLangford,andDanielMarcu.2009.Search-basedstructuredprediction.MachineLearn-ing,75:297–325.YoavGoldbergandMichaelElhadad.2010.Aneffi-cientalgorithmforeasy-firstnon-directionaldepen-dencyparsing.InHumanLanguageTechnologies:The2010AnnualConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguis-tics(NAACLHLT),pages742–750.YoavGoldbergandJoakimNivre.2012.Adynamicor-acleforarc-eagerdependencyparsing.InProceed-ingsofthe24thInternationalConferenceonCompu-tationalLinguistics(COLING),pages959–976.HeHe,HalDaum´eIII,andJasonEisner.2012.Imitationlearningbycoaching.InAdvancesinNeuralInforma-tionProcessingSystems25.LiangHuangandKenjiSagae.2010.Dynamicprogram-mingforlinear-timeincrementalparsing.InProceed-ingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages1077–1086.TerryKooandMichaelCollins.2010.Efficientthird-orderdependencyparsers.InProceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages1–11.MarcoKuhlmann,CarlosG´omez-Rodr´ıguez,andGior-gioSatta.2011.Dynamicprogrammingalgorithmsfortransition-baseddependencyparsers.InProceed-ingsofthe49thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages673–682.Andr´eFilipeMartins,DipanjanDas,NoahA.Smith,andEricP.Xing.2008.Stackingdependencyparsers.InProceedingsoftheConferenceonEmpiricalMeth-odsinNaturalLanguageProcessing(EMNLP),pages157–166.RyanMcDonaldandJoakimNivre.2007.Charac-terizingtheerrorsofdata-drivendependencyparsingmodels.InProceedingsofthe2007JointConferenceonEmpiricalMethodsinNaturalLanguageProcess-ingandComputationalNaturalLanguageLearning(EMNLP-CoNLL),pages122–131.RyanMcDonald,KobyCrammer,andFernandoPereira.2005.Onlinelarge-margintrainingofdependencyparsers.InProceedingsofthe43rdAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages91–98.JoakimNivreandRyanMcDonald.2008.Integratinggraph-basedandtransition-baseddependencyparsers.InProceedingsofthe46thAnnualMeetingoftheAs-sociationforComputationalLinguistics(ACL),pages950–958.JoakimNivre.2003.Anefficientalgorithmforpro-jectivedependencyparsing.InProceedingsofthe8thInternationalWorkshoponParsingTechnologies(IWPT),pages149–160.JoakimNivre.2004.Incrementalityindeterministicde-pendencyparsing.InProceedingsoftheWorkshoponIncrementalParsing:BringingEngineeringandCog-nitionTogether(ACL),pages50–57.JoakimNivre.2008.Algorithmsfordeterministicincre-mentaldependencyparsing.ComputationalLinguis-tics,34:513–553.St´ephaneRossandJ.AndrewBagnell.2010.Efficientreductionsforimitationlearning.InProceedingsofthe13thInternationalConferenceonArtificialIntelli-genceandStatistics,pages661–668.St´ephaneRoss,GeoffreyJ.Gordon,andJ.AndrewBag-nell.2011.Areductionofimitationlearningandstructuredpredictiontono-regretonlinelearning.InProceedingsofthe14thInternationalConferenceonArtificialIntelligenceandStatistics,pages627–635.AndreasVlachos.2012.Aninvestigationofimitationlearningalgorithmsforstructuredprediction.InPro-ceedingsoftheEuropeanWorkshoponReinforcementLearning(EWRL),pages143–154.YueZhangandStephenClark.2008.Ataleoftwoparsers:Investigatingandcombininggraph-basedandtransition-baseddependencyparsing.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages562–571.YueZhangandJoakimNivre.2011.Transition-baseddependencyparsingwithrichnon-localfeatures.InProceedingsofthe49thAnnualMeetingoftheAsso-ciationforComputationalLinguistics:HumanLan-guageTechnologies,pages188–193.Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. image
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. image
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. image
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. image
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. image
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. image
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. image
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. image
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. image
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. image

Download pdf