Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner.

Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner.

Submitted 6/2013; Published 10/2013. C
(cid:13)

2013 Verein für Computerlinguistik.

TrainingDeterministicParserswithNon-DeterministicOraclesYoavGoldbergBar-IlanUniversityDepartmentofComputerScienceRamat-Gan,Israelyoav.goldberg@gmail.comJoakimNivreUppsalaUniversityDepartmentofLinguisticsandPhilologyUppsala,Swedenjoakim.nivre@lingfil.uu.seAbstractGreedytransition-basedparsersareveryfastbuttendtosufferfromerrorpropagation.Thisproblemisaggravatedbythefactthattheyarenormallytrainedusingoraclesthataredeter-ministicandincompleteinthesensethattheyassumeauniquecanonicalpaththroughthetransitionsystemandareonlyvalidaslongastheparserdoesnotstrayfromthispath.Inthispaper,wegiveageneralcharacterizationoforaclesthatarenondeterministicandcom-plete,presentamethodforderivingsuchora-clesfortransitionsystemsthatsatisfyaprop-ertywecallarcdecomposition,andinstanti-atethismethodforthreewell-knowntransi-tionsystemsfromtheliterature.Wesaythattheseoraclesaredynamic,becausetheyallowustodynamicallyexplorealternativeandnon-optimalpathsduringtraining–incontrasttooraclesthatstaticallyassumeauniqueoptimalpath.Experimentalevaluationonawiderangeofdatasetsclearlyshowsthatusingdynamicoraclestotraingreedyparsersgivessubstan-tialimprovementsinaccuracy.Moreover,thisimprovementcomesatnocostintermsofefficiency,unlikeothertechniqueslikebeamsearch.1IntroductionGreedytransition-basedparsersareeasytoimple-mentandareveryefficient,buttheyaregenerallynotasaccurateasparsersthatarebasedonglobalsearch(McDonaldetal.,2005;KooandCollins,2010)orastransition-basedparsersthatusebeamsearch(ZhangandClark,2008)ordynamicpro-gramming(HuangandSagae,2010;Kuhlmannetal.,2011).Thisworkispartofalineofresearchtryingtopushtheboundariesofgreedyparsingandnarrowtheaccuracygapof2–3%betweensearch-basedandgreedyparsers,whilemaintainingtheef-ficiencyandincrementalnatureofgreedyparsers.Onereasonfortheloweraccuracyofgreedyparsersiserrorpropagation:oncetheparsermakesanerrorindecoding,moreerrorsarelikelytofol-low.Thisbehavioriscloselyrelatedtothewayinwhichgreedyparsersarenormallytrained.Givenatreebankoracle,agoldsequenceoftransitionsisderived,andapredictoristrainedtopredicttransi-tionsalongthisgoldsequence,withoutconsideringanyparserstateoutsidethissequence.Thus,oncetheparserstraysfromthegoldenpathattesttime,itventuresintounknownterritoryandisforcedtoreacttosituationsithasneverbeentrainedfor.Inrecentwork(GoldbergandNivre,2012),weintroducedtheconceptofadynamicoracle,whichisnon-deterministicandnotrestrictedtoasinglegoldenpath,butinsteadprovidesoptimalpredic-tionsforanypossiblestatetheparsermightbein.Dynamicoraclesarenon-deterministicinthesensethattheyreturnasetofvalidtransitionsforagivenparserstateandgoldtree.Moreover,theyarewell-definedandoptimalalsoforstatesfromwhichthegoldtreecannotbederived,inthesensethattheyreturnthesetoftransitionsleadingtothebesttreederivablefromeachstate.Weshowedexperimen-tallythat,usingadynamicoracleforthearc-eagertransitionsystem(Nivre,2003),agreedyparsercanbetrainedtoperformwellalsoafterincurringamis-take,thusalleviatingtheeffectoferrorpropagationandresultinginconsistentlybetterparsingaccuracy.

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
2
3
7
1
5
6
6
6
8
1

/

/
T

l

A
C
_
A
_
0
0
2
3
7
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

404

Inthispaper,weextendtheworkofGoldbergandNivre(2012)bygivingageneralcharacteri-zationofdynamicoraclesasoraclesthatarenon-deterministic,inthattheyreturnsetsoftransitions,andcomplete,inthattheyaredefinedforallpossiblestates.Wethendefineaformalpropertyoftransitionsystemswhichwecallarcdecomposition,andin-troduceaframeworkforderivingdynamicoraclesforarc-decomposablesystems.Usingthisframe-work,wederivenoveldynamicoraclesforthehy-brid(Kuhlmannetal.,2011)andeasy-first(Gold-bergandElhadad,2010)transitionsystems,whicharearc-decomposable(asisthearc-eagersystem).Wealsoshowthatthepopulararc-standardsystem(Nivre,2004)isnotarc-decomposable,andsoderiv-ingadynamicoracleforitremainsanopenresearchquestion.Finally,weperformasetofexperimentsontheCoNLL2007datasets,validatingthattheuseofdynamicoraclesforexploringstatesthatresultfromparsingmistakesduringtrainingisbeneficialacrosstransitionsystems.2Transition-BasedDependencyParsingWebeginwithaquickreviewoftransition-baseddependencyparsing,presentingthearc-eager,arc-standard,hybridandeasy-firsttransitionssystemsinacommonnotation.Thetransition-basedpars-ingframework(Nivre,2008)assumesatransitionsystem,anabstractmachinethatprocessessentencesandproducesparsetrees.Thetransitionsystemhasasetofconfigurationsandasetoftransitionswhichareappliedtoconfigurations.Whenparsingasen-tence,thesystemisinitializedtoaninitialconfigu-rationbasedontheinputsentence,andtransitionsarerepeatedlyappliedtothisconfiguration.Afterafinitenumberoftransitions,thesystemarrivesataterminalconfiguration,andaparsetreeisreadofftheterminalconfiguration.Inagreedyparser,aclas-sifierisusedtochoosethetransitiontotakeineachconfiguration,basedonfeaturesextractedfromtheconfigurationitself.Transitionsystemsdifferbythewaytheydefineconfigurations,andbytheparticularsetoftransitionsavailable.2.1DependencyTreesWedefineadependencytreeforasentenceW=w1,…,wntobealabeleddirectedtreeT=(V,A),whereV={w1,…,wn}isasetofnodesgivenbythetokensoftheinputsentence,andA⊆V×L×V(forsomedependencylabelsetL)isasetoflabeleddirectedarcsoftheform(H,lb,D),whereh∈Vissaidtobethehead,d∈Vthedependent,andlb∈Lthedependencylabel.Whendealingwithunlabeledparsing,orwhenthelabelidentityisirrelevant,wetakeA⊆V×Vtobeasetofordinarydirectedarcsoftheform(H,D).Notethat,sincethenodesofthetreearegivenbytheinputsentence,adependencytreeT=(V,A)forasentenceWisuniquelydefinedbythearcsetA.Forconvenience,wewillthereforeequatethetreewiththearcsetandandusethesymbolTforthelatter,reservingthesymbolAforarcsetsthatarenotnecessarilytrees.Inthecontextofthisworkitisassumedthatallthedependencytreesareprojective.Althoughthegeneraldefinitionofadependencytreedoesnotmakeanyassumptionsaboutwhichnodeistherootofthetree,itiscommonpracticeindependencyparsingtoaddadummynodeROOT,whichisprefixedorsuffixedtothesentenceandwhichalwaysactsastherootofthetree.Wewillfollowthispracticeinourdescriptionofdifferenttransitionsystemsbelow.2.2TransitionSystemsArc-EagerInthearc-eagersystem(Nivre,2003),aconfigurationc=(σ,β,A)consistsofastackσ,abufferβ,andasetAofdependencyarcs.1GivenasentenceW=w1,…,wn,thesystemisinitializedwithanemptystack,anemptyarcset,andβ=w1,…,wn,ROOT,whereROOTisthespecialrootnode.AnyconfigurationcwithanemptystackandabuffercontainingonlyROOTisterminal,andtheparsetreeisgivenbythearcsetAcofc.2Thesystemhas4transitions:RIGHTlb,LEFTlb,SHIFT,REDUCE,definedasfollows:SHIFT[(σ,B|β,A)]=(σ|B,β,A)RIGHTlb[(σ|S,B|β,A)]=(σ|S|B,β,A∪{(S,lb,B)})LEFTlb[(σ|S,B|β,A)]=(σ,B|β,A∪{(B,lb,S)})REDUCE[(σ|S,β,A)]=(σ,β,A)1Weuseσ|xtodenoteastackwithtopelementxandre-mainderσ,andx|βtodenoteabufferwithaheadxfollowedbytheelementsinβ.2ThisdefinitionofaterminalconfigurationdiffersfromthatinNivre(2003)butguaranteesthatthesetAcisadependencytreerootedinROOT.

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
2
3
7
1
5
6
6
6
8
1

/

/
T

l

A
C
_
A
_
0
0
2
3
7
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

405

ThereisapreconditionontheRIGHTandSHIFTtransitionstobelegalonlywhenb6=ROOT,andforLEFT,RIGHTandREDUCEtobelegalonlywhenthestackisnon-empty.Moreover,LEFTisonlyle-galwhensdoesnothaveaparentinA,andREDUCEwhensdoeshaveaparentinA.Ingeneral,weuseLEGAL(C)torefertothesetoftransitionsthatarele-galinaconfigurationc.Thearc-eagersystembuildstreeseagerlyinthesensethatarcsareaddedattheearliesttimepossible.Inaddition,eachwordwillcollectallofitsleftdependentsbeforecollectingitsrightdependents.Arc-StandardThearc-standardsystem(Nivre,2004)hasconfigurationsofthesameformc=(σ,β,A)asthearc-eagersystem.Theinitialcon-figurationforasentenceW=w1,…,wnhasanemptystackandarcsetandβ=ROOT,w1,…,wn.AconfigurationcisterminalifithasanemptybufferandastackcontainingthesinglenodeROOT;theparsetreeisgivenbyAc.Thesystemhas3transi-tions:RIGHTlb,LEFTlb,SHIFT,definedasfollows:SHIFT[(σ,B|β,A)]=(σ|B,β,A)RIGHTlb[(σ|s1|s0,β,A)]=(σ|s1,β,A∪{(s1,lb,s0)})LEFTlb[(σ|s1|s0,β,A)]=(σ|s0,β,A∪{(s0,lb,s1)})ThereisapreconditionontheLEFTtransitiontobelegalonlywhens16=ROOT,andforLEFTandRIGHTtobelegalonlywhenthestackhasatleasttwoelements.Thearc-standardsystembuildstreesinabottom-upfashion:eachwordmustcollectallitsdependentsbeforebeingattachedtoitshead.Thesystemdoesnotposeanyrestrictionwithregardtotheorderofcollectingleftandrightdependents.HybridThehybridsystem(Kuhlmannetal.,2011)hasthesameconfigurationsandthesameinitializationandterminationconditionsasthearc-standardsystem.Thesystemhas3transitions:RIGHTlb,LEFTlb,SHIFT,definedasfollows:SHIFT[(σ,B|β,A)]=(σ|B,β,A)RIGHTlb[(σ|s1|s0,β,A)]=(σ|s1,β,A∪{(s1,lb,s0)})LEFTlb[(σ|S,B|β,A)]=(σ,B|β,A∪{(B,lb,S)})ThereisapreconditiononRIGHTtobelegalonlywhenthestackhasatleasttwoelements,andonLEFTtobelegalonlywhenthestackisnon-emptyands6=ROOT.Thehybridsystemcanbeseenasacombinationofthearc-standardandarc-eagerAlgorithm1Greedytransition-basedparsing1:Input:sentenceW,parameter-vectorw2:c←INITIAL(W)3:whilenotTERMINAL(C)do4:tp←argmaxt∈LEGAL(C)w·φ(C,T)5:c←tp(C)6:returnAcsystems,usingtheLEFTactionofarc-eagerandtheRIGHTactionofarc-standard.Likearc-standard,itbuildstreesinabottom-upfashion.Butlikearc-eager,itrequiresawordtocollectallitsleftdepen-dentsbeforecollectinganyrightdependent.Easy-FirstIntheeasy-firstsystem(GoldbergandElhadad,2010),aconfigurationc=(λ,A)consistsofalistλandasetAofdependencyarcs.Weuselitodenotetheithmemberofλandwrite|λ|forthelengthofλ.GivenasentenceW=w1,…,wn,thesystemisinitializedwithanemptyarcsetandλ=ROOT,w1,…,wn.Aconfigurationcister-minalwithparsetreeAcifλ=ROOT.Thesetoftransitionsforagivenconfigurationc=(λ,A)Ist:{LEFTilb|1kandRAND()e d u / t a c l / l A R T ich C e - P D F / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 2 3 7 1 5 6 6 6 8 1 / / t l a c _ a _ 0 0 2 3 7 P D . F B j G u e S T T O N 0 7 S e P e M B e R 2 0 2 3 410 Easy-FirstFortheeasy-firstsystem,weonlyneedtopartitionarcsintoL={(H,D)|d6∈λ}andL={(H,D)|H,d∈λ}.TheformermustalreadybeinA,andforthelattertherecanbenoconflictbetweenarcsaslongaswerespectthebottom-upordering.Arc-StandardUnfortunately,arcdecompositiondoesnotholdforthearc-standardsystem.Toseewhy,consideraconfigurationwiththestackσ=a,B,c.Thearc(C,B)isreachableviaLEFT,thearc(B,A)isreachableviaRIGHT,LEFT,thearcsetA={(C,B),(B,A)}formsaprojectivetreeandisthustreeconsistent,butitiseasytoconvinceoneselfthatAisnotreachablefromthisconfiguration.Thereasonthattheaboveprooftechniquefailsforthearc-standardsystemisthatthearcsetcorrespond-ingtoBinthearc-eagersystemmayinvolvearcswherebothnodesarestillonthestack,andwecan-notguaranteethatallprojectivetreesconsistentwiththesearcscanbederived.Intheverysimilarhybridsystem,sucharcsexistaswellbuttheyarelimitedtoarcsoftheform(H,D)whereh0withrespecttoatreeTifoneofthefollowingholds:12Noteagainthats0maybemissinginthecaseofSHIFTcaseands1inthecaseofSHIFTandLEFT.

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
2
3
7
1
5
6
6
6
8
1

/

/
T

l

A
C
_
A
_
0
0
2
3
7
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

412

0.3.6.9012345arabic81.6283.880.3.6.9012345basque75.1976.180.3.6.9012345catalan91.2492.100.3.6.9012345chinese84.8485.890.3.6.9012345czech79.0380.370.3.6.9012345english86.2988.690.3.6.9012345greek79.3681.190.3.6.9012345hungarian76.3577.580.3.6.9012345italian84.0184.510.3.6.9012345turkish77.0277.67Figure1:Effectofk(yaxis)andp(xaxis)valuesonparsingaccuraciesforthearc-eagersystemonthevariousCoNLL-2007shared-tasklanguages.EachpointisanaverageUASof5runswithdifferentseeds.Thegeneraltrendisthatsmallerkandhigherparebetter.(1)itaddsanarc(H,D)suchthat(h0,d)∈Tforsomeh0∈λ,h06=h;(2)itaddsanarc(H,D)suchthat(D,d0)∈Tforsomed0∈λ.Theexactcostcanbecalculatedbycountingthenumberofsucharcs.5ExperimentsandResultsSetup,dataandparametersThegoalofourex-perimentsistoevaluatetheutilityofthedynamicoraclesfortraining,bycomparingatrainingsce-nariowhichonlyseesconfigurationsthatcanleadtothegoldtree(followingastaticoraclefortheleft-to-rightsystemsandanon-deterministicbutin-completeoraclefortheeasy-firstsystem),againstatrainingscenariothatinvolvesexplorationofincor-rectstates,usingthedynamicoracles.Asourtrainingalgorithminvolvesarandomcom-ponent(weshufflethesentencespriortoeachitera-tion,andrandomlyselectwhethertofollowacor-rectorincorrectaction),weevaluateeachsetupfivetimesusingdifferentrandomseeds,andreporttheaveragedresults.Weperformalloftheexperimentsonthemulti-lingualCoNLL-2007datasets.Weuse15trainingiterationsfortheleft-to-rightparsers,and20trainingiterationsfortheeasy-firstparser.Weusethestan-dardperceptronupdateasourupdateruleintraining,andusetheaveragedweightvectorforpredictionintesttime.Thefeaturesetsdifferbytransitionsys-tembutarekeptthesameacrossdatasets.Theex-actfeature-setdefinitionsforthedifferentsystemsareavailableintheaccompanyingsoftware,whichisavailableonlineatthefirstauthor’shomepage.EffectofexplorationparametersInaninitialsetofexperiments,weinvestigatetheeffectoftheex-plorationparameterskandponthearc-eagersys-tem.TheresultsarepresentedinFigure1.Whiletheoptimalparametersvarybydataset,thereisacleartrendtowardlowervaluesofkandhighervaluesofp.ThisisconsistentwiththereportofGoldbergandNivre(2012)whousedafixedsmallvalueofkandlargevalueofpthroughouttheirexperiments.TrainingwithexplorationforthevarioussystemsForthesecondexperiment,inwhichwecomparedtrainingwithastaticoracletotrainingwithexplo-ration,wefixedtheexplorationparameterstok=1andp=0.9foralldatasetsandtransition-systemcombinations.Theresultsintermsoflabeledaccu-racies(fortheleft-to-rightsystems)andunlabeledaccuracies(forallsystems)arepresentedinTable1.Trainingwithexplorationusingthedynamicoraclesyieldsimprovedaccuracyforthevastmajorityofthesetups.Thenotableexceptionsarethearc-eagerandeasy-firstsystemsforunlabeledItalianandthearc-hybridsysteminCatalan,whereweobserveasmalldropinaccuracy.However,wecansafelyconcludethattrainingwithexplorationisbeneficialandnotethatwemaygetevenfurthergainsinthefutureusingbettermethodsfortuningtheexplorationparametersorbettertrainingmethods.

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
2
3
7
1
5
6
6
6
8
1

/

/
T

l

A
C
_
A
_
0
0
2
3
7
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

413

system/languagehungarianchinesegreekczechbasquecatalanenglishturkisharabicitalianUASeager:static76.4285.0179.5378.7075.1491.3086.1077.3881.5984.40eager:dynamic77.4885.8980.9880.2575.9792.0288.6977.3983.6284.30hybrid:static76.3984.9679.4079.7173.1891.3086.4375.9183.4383.43hybrid:dynamic77.5485.1080.4980.0773.7091.0687.6276.9084.0483.83easyfirst:static81.2787.0181.2882.0075.0192.5088.5778.9282.7385.31easyfirst:dynamic81.5287.4882.2582.3975.8792.8589.4179.2983.7085.11LASeager:static66.7281.2472.4471.0865.3486.0284.9366.5972.1080.17eager:dynamic68.4182.2373.8172.9966.6386.9387.6967.0573.9280.43hybrid:static66.5480.1770.9971.8862.8485.5784.9664.8073.1678.78hybrid:dynamic68.0580.5972.0772.1563.5285.4786.2866.1274.1079.25Table1:ResultsontheCoNLL2007dataset.UAS,includingpunctuation.Eachnumberisanaverageover5runswithdifferentrandomizationseeds.Allexperimentsusedthesameexplorationparametersofk=1,p=0.9.6RelatedWorkTheerrorpropagationproblemforgreedytransition-basedparsingwasdiagnosedbyMcDonaldandNivre(2007)andhasbeentackledwithavarietyoftechniquesincludingparserstacking(NivreandMc-Donald,2008;Martinsetal.,2008)andbeamsearchandstructuredprediction(ZhangandClark,2008;ZhangandNivre,2011).Thetechniquecalledboot-strappinginChoiandPalmer(2011)issimilarinspirittotrainingwithexplorationbutisappliediter-ativelyinbatchmodeandisonlyapproximateduetotheuseofstaticoracles.DynamicoracleswerefirstexploredbyGoldbergandNivre(2012).Inmachinelearningmoregenerally,ourapproachcanbeseenasaproblem-specificinstanceofimita-tionlearning(AbbeelandNg,2004;Vlachos,2012;Heetal.,2012;Daum´eIIIetal.,2009;Rossetal.,2011),wherethedynamicoracleisusedtoim-plementtheoptimalexpertneededintheimitationlearningsetup.Indeed,ourtrainingprocedureiscloselyrelatedtoDAgger(Rossetal.,2011),whichalsotrainsaclassifiertomatchanexpertonadis-tributionofpossiblysuboptimalstatesobtainedbyrunningthesystemitself.OurtrainingprocedurecanbeviewedasanonlineversionofDAgger(Heetal.,2012)withtwoextensions:Erste,ourlearn-ingalgorithminvolvesastochasticpolicyparame-terizedbyk,pforchoosingbetweentheoracleorthemodelprediction,whereasDAggeralwaysfol-lowsthesystem’sownprediction(essentiallyrun-ningwithk=0,p=1).TheheatmapsinFigure1showthatthisparameterizationisbeneficial.Sec-ond,whileDAggerassumesanexpertprovidingasinglelabelateachstate,ouroracleisnondetermin-isticandallowsmultiplecorrectlabels(transitions)whichourtrainingproceduretie-breaksaccordingtothemodel’scurrentprediction,atechniquethathasrecentlybeenproposedinanextensiontoDAggerbyHeetal.(2012).Otherrelatedapproachesinthemachinelearningliteratureincludestackedsequen-tiallearning(CohenandCarvalho,2005),LaSO(Daum´eIIIandMarcu,2005),Searn(Daum´eIIIetal.,2009)andSMILe(RossandBagnell,2010).7ConclusionInthispaper,wehaveextendedtheworkondynamicoraclespresentedinGoldbergandNivre(2012)inseveraldirectionsbygivingformalcharacterizationsofnon-deterministicandcompleteoracles,definingthearc-decompositionpropertyfortransitionsys-tems,andusingthispropertytoderivenovelcom-pletenon-deterministicoraclesforthehybridandeasy-firstsystems(aswellasacorrectedoracleforthearc-eagersystem).Wehavethenusedthecom-pletenessoftheseneworaclestoimprovethetrain-ingprocedureofgreedyparserstoincludeexplo-rationsofconfigurationswhichresultfromincor-recttransitions.Forallthreetransitionsystems,wegetsubstantialaccuracyimprovementsonmanylan-guages.Asthechangesalltakeplaceattrainingtime,theveryfastrunningtimeofthegreedyalgo-rithmattesttimeismaintained.

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
2
3
7
1
5
6
6
6
8
1

/

/
T

l

A
C
_
A
_
0
0
2
3
7
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

414

ReferencesPieterAbbeelandAndrewYNg.2004.Apprenticeshiplearningviainversereinforcementlearning.InPro-ceedingsofthe21stInternationalConferenceonMa-chineLearning(ICML),page1.JinhoD.ChoiandMarthaPalmer.2011.Gettingthemostoutoftransition-baseddependencyparsing.InProceedingsofthe49thAnnualMeetingoftheAsso-ciationforComputationalLinguistics:HumanLan-guageTechnologies,pages687–692.WilliamW.CohenandVitorR.Carvalho.2005.Stackedsequentiallearning.InProceedingsoftheInter-nationalJointConferenceonArtificialIntelligence,pages671–676.HalDaum´eIIIandDanielMarcu.2005.Learningassearchoptimization:Approximatelargemarginmeth-odsforstructuredprediction.InProceedingsofthe22ndInternationalConferenceonMachineLearning,pages169–176.HalDaum´eIII,JohnLangford,andDanielMarcu.2009.Search-basedstructuredprediction.MachineLearn-ing,75:297–325.YoavGoldbergandMichaelElhadad.2010.Aneffi-cientalgorithmforeasy-firstnon-directionaldepen-dencyparsing.InHumanLanguageTechnologies:The2010AnnualConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguis-tics(NAACLHLT),pages742–750.YoavGoldbergandJoakimNivre.2012.Adynamicor-acleforarc-eagerdependencyparsing.InProceed-ingsofthe24thInternationalConferenceonCompu-tationalLinguistics(COLING),pages959–976.HeHe,HalDaum´eIII,andJasonEisner.2012.Imitationlearningbycoaching.InAdvancesinNeuralInforma-tionProcessingSystems25.LiangHuangandKenjiSagae.2010.Dynamicprogram-mingforlinear-timeincrementalparsing.InProceed-ingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages1077–1086.TerryKooandMichaelCollins.2010.Efficientthird-orderdependencyparsers.InProceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages1–11.MarcoKuhlmann,CarlosG´omez-Rodr´ıguez,andGior-gioSatta.2011.Dynamicprogrammingalgorithmsfortransition-baseddependencyparsers.InProceed-ingsofthe49thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages673–682.Andr´eFilipeMartins,DipanjanDas,NoahA.Smith,andEricP.Xing.2008.Stackingdependencyparsers.InProceedingsoftheConferenceonEmpiricalMeth-odsinNaturalLanguageProcessing(EMNLP),pages157–166.RyanMcDonaldandJoakimNivre.2007.Charac-terizingtheerrorsofdata-drivendependencyparsingmodels.InProceedingsofthe2007JointConferenceonEmpiricalMethodsinNaturalLanguageProcess-ingandComputationalNaturalLanguageLearning(EMNLP-CoNLL),pages122–131.RyanMcDonald,KobyCrammer,andFernandoPereira.2005.Onlinelarge-margintrainingofdependencyparsers.InProceedingsofthe43rdAnnualMeetingoftheAssociationforComputationalLinguistics(ACL),pages91–98.JoakimNivreandRyanMcDonald.2008.Integratinggraph-basedandtransition-baseddependencyparsers.InProceedingsofthe46thAnnualMeetingoftheAs-sociationforComputationalLinguistics(ACL),pages950–958.JoakimNivre.2003.Anefficientalgorithmforpro-jectivedependencyparsing.InProceedingsofthe8thInternationalWorkshoponParsingTechnologies(IWPT),pages149–160.JoakimNivre.2004.Incrementalityindeterministicde-pendencyparsing.InProceedingsoftheWorkshoponIncrementalParsing:BringingEngineeringandCog-nitionTogether(ACL),pages50–57.JoakimNivre.2008.Algorithmsfordeterministicincre-mentaldependencyparsing.ComputationalLinguis-tics,34:513–553.St´ephaneRossandJ.AndrewBagnell.2010.Efficientreductionsforimitationlearning.InProceedingsofthe13thInternationalConferenceonArtificialIntelli-genceandStatistics,pages661–668.St´ephaneRoss,GeoffreyJ.Gordon,andJ.AndrewBag-nell.2011.Areductionofimitationlearningandstructuredpredictiontono-regretonlinelearning.InProceedingsofthe14thInternationalConferenceonArtificialIntelligenceandStatistics,pages627–635.AndreasVlachos.2012.Aninvestigationofimitationlearningalgorithmsforstructuredprediction.InPro-ceedingsoftheEuropeanWorkshoponReinforcementLearning(EWRL),pages143–154.YueZhangandStephenClark.2008.Ataleoftwoparsers:Investigatingandcombininggraph-basedandtransition-baseddependencyparsing.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages562–571.YueZhangandJoakimNivre.2011.Transition-baseddependencyparsingwithrichnon-localfeatures.InProceedingsofthe49thAnnualMeetingoftheAsso-ciationforComputationalLinguistics:HumanLan-guageTechnologies,pages188–193.Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. Bild
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. Bild
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. Bild
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. Bild
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. Bild
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. Bild
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. Bild
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. Bild
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. Bild
Transactions of the Association for Computational Linguistics, 1 (2013) 403–414. Action Editor: Jason Eisner. Bild

PDF Herunterladen