Transactions of the Association for Computational Linguistics, 1 (2013) 255–266. Action Editor: Kristina Toutanova.

Transactions of the Association for Computational Linguistics, 1 (2013) 255–266. Action Editor: Kristina Toutanova.

Submitted 11/2012; Published 5/2013. c
(cid:13)

2013 Association for Computational Linguistics.

Minimally-SupervisedMorphologicalSegmentationusingAdaptorGrammarsKairitSirtsInstituteofCyberneticsTallinnUniversityofTechnologysirts@phon.ioc.eeSharonGoldwaterSchoolofInformaticsTheUniversityofEdinburghsgwater@inf.ed.ac.ukAbstractThispaperexplorestheuseofAdaptorGram-mars,anonparametricBayesianmodellingframework,forminimallysupervisedmorpho-logicalsegmentation.Wecomparethreetrain-ingmethods:unsupervisedtraining,semi-supervisedtraining,andanovelmodelselec-tionmethod.Inthemodelselectionmethod,wetrainunsupervisedAdaptorGrammarsus-inganover-articulatedmetagrammar,thenuseasmalllabelleddatasettoselectwhichpoten-tialmorphboundariesidentifiedbythemeta-grammarshouldbereturnedinthefinaloutput.Weevaluateonfivelanguagesandshowthatsemi-supervisedtrainingprovidesaboostoverunsupervisedtraining,whilethemodelselec-tionmethodyieldsthebestaverageresultsoveralllanguagesandiscompetitivewithstate-of-the-artsemi-supervisedsystems.Moreover,thismethodprovidesthepotentialtotuneper-formanceaccordingtodifferentevaluationmet-ricsordownstreamtasks.1IntroductionResearchintounsupervisedlearningofmorphologyhasalonghistory,startingwiththeworkofHarris(1951).Whileearlyresearchwasmostlymotivatedbylinguisticinterests,morerecentworkinNLPoftenaimstoreducedatasparsityinmorphologicallyrichlanguagesfortaskssuchasautomaticspeechrecogni-tion,statisticalmachinetranslation,orautomatictextgeneration.Fortheseapplications,cependant,com-pletelyunsupervisedsystemsmaynotbeidealifevenasmallamountofsegmentedtrainingdataisavailable.Inthispaper,weexploretheuseofAdap-torGrammars(Johnsonetal.,2007)forminimallysupervisedmorphologicalsegmentation.AdaptorGrammars(AGs)areanonparametricBayesianmodellingframeworkthatcanlearnlatenttreestructuresoveraninputcorpusofstrings.Forexample,theycanbeusedtodefineamorpholog-icalgrammarwhereeachwordconsistsofzeroormoreprefixes,astem,andzeroormoresuffixes;theactualformsofthesemorphs(andthesegmentationofwordsintomorphs)arelearnedfromthedata.InthisgeneralapproachAGsaresimilartomanyotherunsupervisedmorphologicalsegmentationsystems,suchasLinguistica(Goldsmith,2001)andtheMor-fessorfamily(CreutzandLagus,2007).Amajordifference,cependant,isthatthemorphologicalgram-marisspecifiedasaninputtotheprogram,ratherthanhard-coded,whichallowsdifferentgrammarstobeexploredeasily.Forthetaskofsegmentingutterancesintowords,forexample,Johnsonandcol-leagueshaveexperimentedwithgrammarsencodingdifferentkindsofsub-wordandsuper-wordstructure(e.g.,syllablesandcollocations),showingthatthebestgrammarsfaroutperformothersystemsonthesamecorpora(Johnson,2008un;JohnsonandGoldwa-ter,2009;JohnsonandDemuth,2010).ThesewordsegmentationpapersdemonstratedboththepoweroftheAGapproachandthesyner-gisticbehaviorthatoccurswhenlearningmultiplelevelsofstructuresimultaneously.However,thebest-performinggrammarswereselectedusingthesamecorpusthatwasusedforfinaltesting,andeachpaperdealtwithonlyonelanguage.Theidealunsuper-visedlearnerwoulduseasinglegrammartunedon

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
2
2
5
1
5
6
6
6
5
1

/

/
t

je

un
c
_
un
_
0
0
2
2
5
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

256

oneormoredevelopmentlanguagesandstillperformwellonotherlanguageswheredevelopmentdataisunavailable.Indeed,thisisthebasicprinciplebe-hindLinguisticaandMorfessor.However,weknowthatdifferentlanguagescanhaveverydifferentmor-phologicalproperties,sousingasinglegrammarforalllanguagesmaynotbethebestapproachifthereisaprincipledwaytochoosebetweengrammars.ThoughAGsmakeiteasytotrymanydifferentpos-siblegrammars,theprocessofproposingandtestingplausibleoptionscanstillbetime-consuming.Inthispaper,weproposeanovelmethodforau-tomaticallyselectinggoodmorphologicalgrammarsfordifferentlanguages(English,Finnish,Turkish,German,andEstonian)usingasmallamountofgold-segmenteddata(1000wordtypes).WeusetheAGframeworktospecifyaverygeneralbinary-branchinggrammarofdepthfourwithwhichwelearnaparsetreeofeachwordthatcontainsseveralpossiblesegmentationsplitsfortheword.Then,weusethegold-segmenteddatatolearn,foreachlan-guage,whichoftheproposedsplitsfromtheoriginalgrammarshouldactuallybeusedinordertobestsegmentthatlanguage.Weevaluateourapproachonbothasmalldevel-opmentsetandthefullMorphoChallengetestsetforeachlanguage—uptothreemillionwordtypes.Indoingso,wedemonstratethatusingtheposteriorgrammarofanAGmodeltodecodeunseendataisafeasiblewaytoscalethesemodelstolargedatasets.Wecomparetoseveralbaselineswhichusetheannotateddatatodifferentdegrees:parametertuning,grammartuning,supervisedtraining,ornouseofannotateddata.Inadditiontoexistingapproaches—unsupervisedandsemi-supervisedMorfessor,unsu-pervisedMorsel(Lignos,2010),andunsupervisedAGs—wealsoshowhowtousetheannotateddatatotrainsemi-supervisedAGs(usingthedatatoaccumu-laterulestatisticsratherthanforgrammarselection).Thegrammarselectionmethodyieldscomparableresultstothebestoftheseotherapproaches.Tosummarize,ourcontributionsinthispaperare:1)scalingAGstolargedatasetsbyusingtheposte-riorgrammartodefineaninductivemodel;2)demon-stratinghowtotrainsemi-supervisedAGmodels,andshowingthatthisimprovesmorphologicalsegmenta-tionoverunsupervisedtraining;and3)introducinganovelgrammarselectionmethodforAGmodelswhosesegmentationresultsarecompetitivewiththebestexistingsystems.Beforeprovidingdetailsofourmethodsandre-sults,wefirstbrieflyreviewAdaptorGrammars.Foraformaldefinition,seeJohnsonetal.(2007).2AdaptorGrammarsAdaptorGrammarsareaframeworkforspecifyingnonparametricBayesianmodelsthatcanbeusedtolearnlatenttreestructuresfromacorpusofstrings.TherearetwocomponentstoanAGmodel:thebasedistribution,whichisjustaPCFG,andtheadaptor,which“adapts”theprobabilitiesassignedtoindivid-ualsubtreesunderthePCFGmodel,suchthattheprobabilityofasubtreeunderthecompletemodelmaybeconsiderablyhigherthantheproductoftheprobabilitiesofthePCFGrulesrequiredtoconstructit.Althoughinprincipletheadaptorcanbeanyfunc-tionthatmapsonedistributionontoanother,Johnsonetal.(2007)useaPitman-YorProcess(PYP)(Pit-manandYor,1997)astheadaptorbecauseitactsasacachingmodel.UnderaPYPAGmodel,theposteriorprobabilityofaparticularsubtreewillberoughlyproportionaltothenumberoftimesthatsub-treeoccursinthecurrentanalysisofthedata(withtheprobabilityofunseensubtreesbeingcomputedunderthebasePCFGdistribution).AnAGmodelcanbedefinedbyspecifyingtheCFGrules(thesupportforthebasedistribution)andindicatingwhichnon-terminalsare“adapted”,i.e.,canserveastherootofacachedsubtree.Giventhisdefinitionandaninputcorpusofstrings,MarkovchainMonteCarlosamplerscanbeusedtoinfertheposteriordistributionovertrees(andallhyperparam-etersofthemodel,includingPCFGprobabilitiesinthebasedistributionandPYPhyperparameters).Anyfrequentlyrecurringsubstring(e.g.,acommonpre-fix)willtendtobeparsedconsistently,asthispermitsthemodeltotreatthesubtreespanningthatstringasacachedsubtree,assigningithigherprobabilitythanunderthePCFGdistribution.AdaptorGrammarshavebeenappliedtoawidevarietyoftasks,includingsegmentingutterancesintowords(Johnson,2008un;JohnsonandGoldwa-ter,2009;JohnsonandDemuth,2010),classifyingdocumentsaccordingtoperspective(Hardistyetal.,2010),machinetransliterationofnames(Huanget

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
2
2
5
1
5
6
6
6
5
1

/

/
t

je

un
c
_
un
_
0
0
2
2
5
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

257

al.,2011),nativelanguageidentification(Wongetal.,2012),andnamedentityclustering(Elsneretal.,2009).TherehavealsobeenAGexperimentswithmorphologicalsegmentation,butmoreasaproofofconceptthananattempttoachievestate-of-the-artresults(Johnsonetal.,2007;Johnson,2008b).3UsingAGsforLearningMorphologyOriginally,theAGframeworkwasdesignedforun-supervisedlearning.ThissectionfirstdescribeshowAGscanbeusedforunsupervisedmorphologicalsegmentation,andthenintroducestwowaystouseasmalllabelleddatasettoimproveperformance:semi-supervisedlearningandgrammarselection.3.1UnsupervisedAdaptorGrammarsWedefinethreeAGmodelstouseasunsupervisedbaselinesinoursegmentationexperiments.Thefirstoftheseisverysimple:Word→Morph+Morph→Char+(1)Theunderlinenotationindicatesanadaptednon-terminal,and+abbreviatesasetofrecursiverules,e.g.,Word→Morph+isshortforWord→MorphsMorphs→MorphMorphsMorphs→MorphGrammar1(MorphSeq)isjustaunigrammodelovermorphs:theMorphsymbolisadapted,sotheprobabilityofeachMorphwillberoughlypropor-tionaltoits(inferred)frequencyinthecorpus.Thegrammarspecifiesnofurtherstructuralrelationshipsbetweenmorphsorinsideofmorphs(otherthanageometricdistributionontheirlengthincharacters).ExperimentswithAGsforunsupervisedwordseg-mentationsuggestthataddingfurtherlatentstructurecanhelpwithlearning.Here,weaddanotherlayerofstructurebelowthemorphs,1callingtheresulting1Becausethenonterminallabelsarearbitrary,thisgrammarcanalsobeinterpretedasaddinganotherlayerontopofmorphs,allowingthemodeltolearnmorphcollocationsthatencodede-pendenciesbetweenmorphs(whichthemselveshavenosubstruc-ture).Howeverpreliminaryexperimentsshowedthatthemorph-submorphinterpretationscoredbetterthanthecollocation-morphinterpretation,hencewechosethecorrespondingnonterminalnames.grammarSubMorphs:Word→Morph+Morph→SubMorph+SubMorph→Char+(2)Forcapturingtherulesofmorphotactics,agram-marwithlinguisticallymotivatednon-terminalscanbecreated.Therearemanyplausibleoptionsandthebest-performinggrammarmaybesomewhatlanguage-dependent.Ratherthanexperimentingex-tensively,wedesignedourthirdgrammartoreplicateascloselyaspossiblethegrammarthatisimplicitlyimplementedintheMorfessorsystem.ThisCom-poundinggrammardistinguishesbetweenprefixes,stemsandsuffixes,allowscompounding,definestheorderinwhichthemorphscanoccurandalsoallowsthemorphstohaveinnerlatentstructure:Word→Compound+Compound→Prefix∗StemSuffix∗Prefix→SubMorph+Stem→SubMorph+Suffix→SubMorph+SubMorph→Char+(3)3.2Semi-SupervisedAdaptorGrammarsThefirstnewuseofAGsweintroduceisthesemi-supervisedAG,whereweusethelabelleddatatoex-tractcountsofthedifferentrulesandsubtreesusedinthegold-standardanalyses.WethenruntheMCMCsamplerasusualoverboththeunlabelledandla-belleddata,treatingthecountsfromthelabelleddataasfixed.Weassumethatthelabelleddataprovidesacon-sistentbracketing(notwospansinthebracketingcanpartiallyoverlap)andthelabelsofthespansmustbecompatiblewiththegrammar.However,thebracketingmaynotspecifyalllevelsofstructureinthegrammar.Inourcase,wehavemorphemebracketingsbutnot,e.g.,submorphs.Thus,usingtheSubMorphsgrammarinsemi-supervisedmodewillconstrainthesamplersothatMorphspansinthelabelleddatawillremainfixed,whiletheSubMorphsinsidethoseMorphswillberesampled.

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
2
2
5
1
5
6
6
6
5
1

/

/
t

je

un
c
_
un
_
0
0
2
2
5
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

258

ThemainchangemadetotheAGinferencepro-cess2forimplementingthesemi-supervisedAGwastopruneoutfromthesamplingdistributionanynon-terminalsthatareinconsistentwiththespans/labelsinthegivenlabelling.3.3AGSelectBoththeunsupervisedandsemi-supervisedmethodsdescribedaboveassumethedefinitionofagrammarthatadequatelycapturesthephenomenabeingmod-elled.AlthoughtheAGframeworkmakesiteasytoexperimentwithdifferentgrammars,theseexperi-mentscanbetime-consumingandrequiresomegoodguessesastowhataplausiblegrammarmightbe.Theseproblemscanbeovercomebyautomatingthegrammardevelopmentprocesstosystematicallyeval-uatedifferentgrammarsandfindthebestone.Weproposeaminimallysupervisedmodelselec-tionmethodAGSelectthatusestheAGframeworktoautomaticallyidentifythebestgrammarfordifferentlanguagesanddatasets.Wefirstdefineaverygen-eralbinary-branchingCFGgrammarforAGtrainingthatwecallthemetagrammar.Themetagrammarlearnsaparsetreeforeachwordwhereeachbranchcontainsadifferentstructureintheword.Thegranu-larityofthesestructuresisdeterminedbythedepthofthistree.Forexample,Grammar4generatesbinarytreesofdepthtwoandcanlearnsegmentationsofuptofoursegments.Word→M1Word→M1M2M1→M11M1→M11M12M2→M21M2→M21M22M11→Chars+M12→Chars+M21→Chars+M22→Chars+(4)Nextweintroducethenotionofamorphologi-caltemplate,whichisanorderedsequenceofnon-terminalswhoseconcatenatedyieldsconstitutethewordandwhichareusedtoparseoutaspecificseg-mentationoftheword.Forexample,usingGram-mar4theparsetreeofthewordsaltinessisshowninFigure1.Therearefourpossibletemplateswithfour2WestartedwithMarkJohnson’sPYAGimplementa-tion,http://web.science.mq.edu.au/∼mjohnson/code/py-cfg.tgz,whichwealsousedforourunsupervisedandgrammarselectionexperiments.WordM1M11salM12tM2M21iM22nessFigure1:Theparsetreegeneratedbythemetagrammarofdepth2forthewordsaltiness.differentsegmentations:M1M2(saltiness),M11M12M2(saltiness),M1M21M22(saltiness),andM11M12M21M22(saltiness).Themorphologicaltemplateconsistingonlyofnon-terminalsfromthelowestcachedleveloftheparsetreeisexpectedtohavehighrecall,whereasthetemplatecontainingthenon-terminalsjustbelowtheWordisexpectedtohavehighprecision.Ourgoalistofindtheoptimaltemplatebyusingasmalllabelleddataset.Thegrammarselectionprocessiter-atesoverthesetofalltemplates.Foreachtemplate,thesegmentationsofthewordsinthelabelleddatasetareparsedoutandthevalueofthedesiredevalua-tionmetriciscomputed.Thetemplatethatobtainedthehighestscoreisthenchosen.Foreachlanguageweuseasingletemplatetoseg-mentallthewordsinthatlanguage.However,evenusing(say)afour-morphtemplatesuchasM11M12M21M22,somewordsmaycontainfewermorphsbecausethemetagrammarpermitseitherunaryorbinarybranchingrules,sosomeparsesmaynotcon-tainM12orM2(andthusM21M22)spans.Thus,wecanrepresentsegmentationsofdifferentlengths(from1to2n,wherenisthedepthofthemetagram-mar)withasingletemplate.3Forourexperimentsweuseametagrammarofdepthfour.Thisgrammarallowswordstoconsistofupto16segments,whichwefeltwouldbeenoughforanywordinthetrainingdata.Also,iteratingoverallthetemplatesofagrammarwithbiggerdepthwouldnotbefeasibleasthenumberofdifferenttemplatesincreasesveryrapidly.43Wealsoexperimentedwithselectingdifferenttemplatesforwordsofdifferentlengthbutobservednoimprovementsoverthesingletemplateapproach.4ThenumberoftemplatesofeachdepthcanbeexpressedrecursivelyasNi=(Ni−1+1)2,whereNi−1isthenumberof

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
2
2
5
1
5
6
6
6
5
1

/

/
t

je

un
c
_
un
_
0
0
2
2
5
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

259

3.4InductiveLearningPreviousworkonAGshasusedrelativelysmalldatasetsandrunthesamplerontheentireinputcorpus(someorallofwhichisalsousedforevaluation)—atransductivelearningscenario.However,ourlargerdatasetscontainmillionsofwordtypes,wheresam-plingoverthewholesetisnotfeasible.Forexample,1000trainingiterationson50kwordtypestookaboutaweekonone2.67GHzCPU.Tosolvethisproblem,weneedaninductivelearnerthatcanbetrainedonasmallsetofdataandthenusedtosegmentadifferentlargerset.Tocreatesuchalearner,werunthesampleronupto50kwordtypes,andthenextracttheposteriorgrammarasaPCFG.5ThisgrammarcontainsalltheinitialCFGrules,plusrulestogenerateeachofthecachedsubtreesinferredbythesampler.ThesamplercountsofallrulesarenormalizedtoobtainaPCFG,andwecanthenuseastandardCKYparsertodecodetheremainingdatausingthisPCFG.4Experiments4.1DataWetestonlanguageswitharangeofmorphologi-calcomplexity:English,Finnish,Turkish,German,andEstonian.Foreachlanguageweusetwosmallsetsofgold-annotateddata—alabelledsetforsemi-supervisedtrainingormodelselectionandadevsetfordevelopmentresults—andonelargergold-annotateddatasetforfinaltests.Wealsohavealargeunlabelledtrainingsetforeachlanguage.Table1givesstatistics.ThedatasetsforEnglish,Finnish,TurkishandGermanarefromtheMorphoChallenge2010com-petition6(MC2010).WeusetheMC2010trainingsetof1000annotatedwordtypesasourlabelleddata,andforourdevsetswecollatetogetherthedevel-opmentdatafromallyearsoftheMCcompetition.FinalevaluationisdoneontheofficialMC2010testsets,whicharenotpublic,sowerelyontheMCorganizerstoperformtheevaluation.ThewordsintemplatesinthegrammarofdepthonelessandN0=0.5ThiscanbeseenasaformofStructureCompilation(Liangetal.,2008),wherethesolutionfoundbyamorecostlymodelisusedtodefinealesscostlymodel.HoweverinLiangetal.’scasebothmodelswerealreadyinductive.6http://research.ics.aalto.fi/events/morphochallenge2010/datasets.shtmlUnlab.Lab.DevTestEnglish0.9M1000121216KFinnish2.9M10001494225KTurkish0.6M1000153164KGerman2.3M100078562KEstonian2.1M1000150074KTable1:Numberofwordtypesinourdatasets.eachtestsetareanunknownsubsetofthewordsintheunlabelledcorpus,sotoevaluatewesegmentedtheentireunlabelledcorpusandsenttheresultstotheMCteam,whothencomputedscoresonthetestwords.TheEstonianwordlistisgatheredfromthenews-papertextsofamixedcorpusofEstonian.7GoldstandardsegmentationsofsomeofthesewordsareavailablefromtheEstonianmorphologicallydisam-biguatedcorpus;8weusedtheseforthetestset,withsmallsubsetsselectedrandomlyforthelabelledanddevsets.Forsemi-supervisedtestsoftheAGCompoundinggrammarweannotatedthemorphemesintheEnglish,FinnishandEstonianlabelledsetsasprefixes,stemsorsuffixes.WecouldnotdosoforTurkishbecausenoneoftheauthorsknowsTurkish.4.2EvaluationWeevaluateourresultswithtwomeasures:segmentborderF1-score(SBF1)andEMMA(SpieglerandMonson,2010).SBF1isoneofthesimplestandmostpopularevaluationmetricsformorphologicalsegmentations.ItcomputesF1-scorefromthepreci-sionandrecallofambiguoussegmentboundaries—i.e.,wordedgesarenotcounted.Itiseasyandquicktocomputebuthasthedrawbackthatitgivesnocreditforone-morphemewordsthathavebeenseg-mentedcorrectly(i.e.,areassignednosegmentbor-ders).Alsoitcanonlybeusedonsystemsandgoldstandardswheretheoutputisjustasegmentationofthesurfacestring(e.g.,availabil+ity)ratherthanamorphemeanalysis(e.g.,available+ity).ForthisreasonwecannotreportSBF1onourGermandata,whichannotationscontainonlyanalyses.EMMAisanewermeasurethataddressesboth7http://www.cl.ut.ee/korpused/segakorpus/epl8http://www.cl.ut.ee/korpused/morfkorpus/

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
2
2
5
1
5
6
6
6
5
1

/

/
t

je

un
c
_
un
_
0
0
2
2
5
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

260

oftheseissues—correctlysegmentedone-morphemewordsarereflectedinthescore,anditcanevalu-atebothconcatenativeandnon-concatenativemor-phology.EMMAworksbyfindingthebestone-to-onemappingbetweenthehypothesizedandtrueseg-ments.Theinducedsegmentsarethenreplacedwiththeirmappingsandbasedonthat,F1-scoreonmatch-ingsegmentsiscalculated.UsingEMMAwecanevaluatetheinducedsegmentationsofGermanwordsagainstgoldstandardanalyses.EMMAhasafreelyavailableimplementation,9butisslowtocomputebecauseitusesIntegerLinearProgramming.Forourdevresults,wecomputedbothscoresus-ingtheentiredevset,butforthelargetestsets,theevaluationisdoneonbatchesof1000wordtypesse-lectedrandomlyfromthetestset.Thisprocedureisrepeated10timesandtheaverageisreported,justasintheMC2010competition(Kohonenetal.,2010a).4.3BaselineModelsWecompareourAGmodelstoseveralothermor-phologylearningsystems.Wewereabletoobtainimplementationsoftwoofthebestunsupervisedsys-temsfromMC2010,Morfessor(CreutzandLagus,2007)andMorsel(Lignos,2010),andweusetheseforcomparisonsonboththedevandtestsets.WealsoreporttestresultsfromMC2010fortheonlysemi-supervisedsysteminthecompetition,semi-supervisedMorfessor(Kohonenetal.,2010a;Ko-honenetal.,2010b).Nodevresultsarereportedonthissystemsincewewereunabletoobtainanimple-mentation.Thissectionbrieflyreviewsthesystems.4.3.1MorfessorCategories-MAPMorfessorCategories-MAP(Morfessor)isastate-of-the-artunsupervisedmorphologylearningsystem.Itsimplementationisfreelyavailable10soitiswidelyusedbothasapreprocessingstepintasksrequiringmorphologicalsegmentations,andasabaselineforevaluatingmorphologylearningsystems.MorfessorusestheMinimumDescriptionLength(MDL)principletochoosetheoptimalsegmentlexi-conandthecorpussegmentation.Eachmorphinthesegmentlexiconislabelledasastem,prefix,suffix9http://www.cs.bris.ac.uk/Research/MachineLearning/Morphology/resources.jsp#eval10http://www.cis.hut.fi/projects/morpho/morfessorcatmapdownloadform.shtmlornon-morph.ThemorphotacticrulesareencodedasanHMM,whichspecifiestheallowedmorphse-quenceswithrespecttothelabels(e.g.,asuffixcan-notdirectlyfollowaprefix).Themorphsinthesegmentlexiconcanhaveahierachicalstructure,containingsubmorphswhichthemselvescanconsistofsubmorphsetc.Wehypoth-esizethatthishierarchicalstructureisoneofthekeyreasonswhyMorfessorhasbeensosuccessful,astheexperimentsalsointhispaperwithdifferentgram-marsshowthattheabilitytolearnlatentstructuresiscrucialforlearninggoodsegmentations.OneessentialdifferencebetweenMorfessorandtheproposedAGSelectisthatwhileweusethela-belleddatatochoosewhichlevelsofthehierarchyaretobeusedasmorphs,Morfessormakesthisde-cisionbasedonthelabelsofthesegments,choosingthemostfine-grainedmorphsequencethatdoesnotcontainthenon-morphlabel.Morfessorincludesafreeparameter,perplexitythreshold,whichwefoundcanaffecttheSBF1scoreconsiderably(7pointsormore).Thebestvalueforthisparameterdependsonthesizeofthetrainingset,characteristicsofthelanguagebeinglearned,andalsotheevaluationmetricbeingused,asinsomecasesthebestSBF1andEMMAscoresareobtainedwithcompletelydifferentvalues.Thus,wetunedthevalueoftheperplexitythresh-oldonthelabelledsetforeachlanguageandevalua-tionmetricfordifferentunlabelledtrainingsetsizes.4.3.2Semi-SupervisedMorfessorRecently,theMorfessorsystemhasbeenadaptedtoallowsemi-supervisedtraining.FourversionsofthesystemwereevaluatedinMC2010,usingdiffer-entdegreesofsupervision.ResultsreportedherearefromtheMorfessorS+Wsystem,whichperformedbestofthosethatusethesamekindoflabelleddataaswedo.11ThissystemusestheMorfessorBase-linemodel(notCat-MAP),whichincorporatesalexiconprioranddatalikelihoodterm.Thesemi-supervisedversionmaintainsseparatelikelihoodsforthelabelledandunlabelleddata,andusesthedevel-opmentsettotunetwoparametersthatweightthesetermswithrespecttoeachotherandtheprior.11MorfessorS+W+Lperformsbetter,butusestrainingdatawithmorphemeanalysesratherthansurfacesegmentations.

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
2
2
5
1
5
6
6
6
5
1

/

/
t

je

un
c
_
un
_
0
0
2
2
5
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

261

4.3.3MorselMorsel(Lignos,2010)isanunsupervisedmor-phologylearningsystemintroducedinMC2010;weobtainedtheimplementationfromtheauthor.Morsellearnsmorphologicalanalysesratherthansegmenta-tions,soitcanbeevaluatedonlyusingEMMA.TherearetwooptionsprovidedforrunningMorsel:aggres-siveandconservative.Weusedthedevelopmentsettochoosethebestineachexperimentalcase.TheMCdatasetscontaingoldstandardmorpho-logicalanalyses(aswellassegmentations)sowecouldcomputeMorsel’sEMMAscoresusingtheanalyses.However,wefoundthatMorselobtainshigherEMMAscoreswhenevaluatedagainstgoldstandardsegmentationsandthusweusedthisoptioninalltheexperiments.(EMMAscoresforothersys-temswerealsocomputedusingthesegmentations.)4.4MethodTheexperimentswereconductedintwoparts.First,weevaluateddifferentaspectsoftheAGmodelsandcomparedtoallbaselinemodelsusingthedevsetdata.Thenweevaluatedthemostcompetitivemodelsonthefinaltestdata.Forthedevelopmentexperiments,wecompiledun-labelledtrainingsetswithsizesrangingfrom10kto50kwordtypes(usingthemostfrequentwordtypesineachcase).FortheAGresults,wereporttheaver-ageoffivedifferentrunsmadeonthesametrainingset.Weletthesamplerrunfor1000iterations.Noannealingwasusedasitdidnotseemtohelp.Thetablelabelresamplingoptionwasturnedonandthehyperparametervalueswereinferred.WetrainedallAGandbaselinemodelsoneachofthesetrainingsets.ForAGSelect,thewordsfromthelabelleddatasetwereaddedtothetrainingsettoallowfortemplateselection.12Tocomputeresultsintransductivemode,thewordsfromthedevsetwerealsoaddedtothetrainingdata.Ininductivemode,thedevsetwasinsteadparsedwiththeCKYparser.Preliminaryexperimentsshowedthattheperfor-manceofunsupervisedAGandAGSelectimprovedwithlargertrainingsets,thoughtheeffectissmall(seeFigure2forresultsofAGSelectintransductive12Wealsoexperimentedwithsmallersetsoflabelleddata.Inmostcases,thetemplateselectedbasedononly300wordtypeswasthesamethantheoneselectedwith1000wordtypes.mode;thetrendininductivemodeissimilar).Basedontheseandsimilarresultswithotherbaselinesys-tems,allresultsreportedlaterforunsupervisedmod-els(AGandbaseline)andAGSelectwereobtainedusingtrainingsetsof50kwords.Incontrasttotheabovemodels,thesemi-supervisedAGdoesnotalwaysimprovewithmoreunlabelleddata(seeFigure2)andinthelimit,itwillmatchtheperformanceofthesamegrammarintheunsupervisedsetting.Othersemi-supervisedapproachesoftensolvethisproblembyweightingthelabelleddatamoreheavilythantheunlabelleddatawhenestimatingmodelparameters—effectively,assumingthateachlabelleditemhasactuallybeenobservedmorethanonce.However,duplicatingthelabelleddatadoesnotmakesenseintheAGframe-work,becauseduplicateitemswillinmostcasesjustbecachedattheroot(Word)node,providingnoaddi-tionalcountsofMorphs(whicharewheretheusefulinformationis).Itmightbepossibletocomeupwithadifferentwaytoweightthelabelleddatamoreheav-ilywhenlargerunlabelledsetsareused,howeverforthispaperweinsteadkeptthelabelleddatathesameandtunedtheamountofunlabelleddata.Weusedthedevsettochoosetheamountofunlabelleddata(intherangefrom10kto50ktypes);resultsforsemi-supervisedAGarereportedusingtheoptimalamountofunlabelleddataforeachexperiment.Fortestsetexperimentswithsemi-supervisedAG,weevaluatedeachlanguageusingwhichevergram-marperformedbestonthatlanguage’sdevset.FortestsetexperimentswithAGSelect,wechosethetemplateswithatwo-passprocedure.First,wetrained5samplersonthe50ktrainingsetwithla-belledsetadded,andusedthelabelleddatatochoosethebesttemplateforeachinferredgrammar.Then,wedecodedthedevsetusingeachofthe5gram-mar/templatepairsandbasedontheseresults,chosethebestofthesepairstodecodethetestset.4.5ResultsWepresentthedevsetresultsinTable2(un)fortrans-ductiveandinTable2(b)forinductivelearning.Ineachtable,unsupervisedmodelsareshownintheuppersectionandthesemi-supervisedmodelsandAGSelectbelow.MorselappearsonlyinTable2(un)sinceitonlyworkstransductively.Semi-supervisedgrammarscannotbetrainedonGerman,sincewe

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
2
2
5
1
5
6
6
6
5
1

/

/
t

je

un
c
_
un
_
0
0
2
2
5
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

262

55 60 65 70 75 80 85 90 10000 20000 30000 40000 50000F-score# of word typesEnglishEstonianFinnishTurkish 55 60 65 70 75 80 85 90 10000 20000 30000 40000 50000F-score# of word typesEnglishEstonianFinnishTurkishFigure2:EffectoftrainingdatasizeondevsetSBF1forAGSelect(gauche)andsemi-supervisedSubMorphsgrammar(droite)intransductivemode.onlyhavegoldstandardanalyses,notsegmentations.TheSubMorphsgrammarperformsthebestoftheunsupervisedAGmodels,withtheCompoundinggrammarbeingonlyslightlyworse.WealsotriedtheCompoundinggrammarwithoutthesub-morphstructuresbuttheresultswereevenworsethanthoseofMorphSeq.Thisshowsthatthelatentstructuresareimportantforlearninggoodsegmentations.Inallcases,thesemi-supervisedAGsperformbet-ter(ofenmuchbetter)thanthecorrespondingunsu-pervisedgrammars.EventhoughtheiraveragescoresarenotashighasAGSelect’s,theygivethebestdevsetresultsinmanycases.Thisshowsthatalthoughforsemi-supervisedAGthegrammarmustbecho-senmanually,withasuitablechoiceofthegrammarandonlyasmallsetoflabelleddataitcanimproveconsiderablyoverunsupervisedAG.Intransductivemode,theAGSelectperformsthebestinseveralcases.Inbothtransductiveandinduc-tivemode,theresultsofAGSelectareclosetothebestresultsobtainedandareconsistentlygoodacrossalllanguages—itachievesthebestaveragescoresofallmodels,suggestingthatthemodelselectionmethodisrobusttodifferenttypesofmorphologyandannotationschemes.Table3presentsthetestsetresults.WeincludescoresforunsupervisedMorfessorinbothtransduc-tiveandinductivemode,wheretransductivemodetrainsontheentireunlabelledcorpusandinductivemodetrainsonthe50ksubset.Thesemi-supervisedMorfessorscoresaretakenfromtheMCresultspage13afterverifyingthattheevaluationmethod-13http://research.ics.aalto.fi/events/morphochallenge/ologyandlabelleddatausedisthesameasours.14Thereisagooddealofvariationbetweendevel-opmentandtestresults,indicatingthatthedevsetsmaynotbearepresentativesample.Themostno-tabledifferencesareinTurkish,whereallmodelsperformfarworseonthetestthandevset.However,AGSelectperformsslightlybetteronthetestsetfortheotherlanguages.ThusitsaverageSBF1scoreac-tuallyimprovesonthetestsetandisnotmuchworsethansemi-supervisedMorfessor.WhileitsaverageperformancedropssomewhatontestsetEMMA,itisstillasgoodasanyothermodelonthatmeasure.Again,theseresultssupporttheideathatAGSelectisrobusttovariationsinlanguageanddataset.WealsonotethesurprisinglygoodperformanceofMorfessorintransductivemodeonEstonian;thiscouldpossiblybeduetothelargeramountoftrainingdatausedforthetestsetresults,butitisnotclearwhythiswouldimproveperformancesomuchonEstonianandnotontheotherlanguages.5DiscussionTogiveasenseofwhattheAGSelectmodelislearn-ing,weprovidesomeexamplesofbothcorrectlyandincorrectlyinducedsegmentationsinTable4.TheseexamplessuggestthatforexampleinEnglish,M1isusedtomodelthestem,M21isforthesuffixorthesecondsteminthecompoundword,andtherestoftheelementsinthetemplatearefortheremainingsuffixes(ifany).Table5presentsexamplesofsomeofthemostfrequentlyusedmetagrammarrulesandcachedrules14SamiVirpioja,personalcommunication.

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
2
2
5
1
5
6
6
6
5
1

/

/
t

je

un
c
_
un
_
0
0
2
2
5
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

263

(un)TransductivemodeBorderF1-scoreEMMAEngEstFinTurAvgEngEstFinTurGerAvgAGMorphSeq61.554.056.959.558.074.774.163.753.559.465.1AGSubMorphs66.266.960.559.563.379.183.466.853.457.468.0AGCompounding63.064.860.960.962.475.481.665.553.762.267.7Morfessor69.555.765.069.364.981.375.367.862.262.769.9Morsel—–76.874.466.150.155.964.7AGssvMorphSeq64.457.363.068.963.474.475.965.659.6AGssvSubMorphs67.669.164.463.466.179.584.469.256.1AGssvCompounding70.067.571.8–79.582.874.0—AGSelect71.968.570.272.670.877.581.873.263.062.471.6(b)InductivemodeBorderF1-scoreEMMAEngEstFinTurAvgEngEstFinTurGerAvgAGMorphSeq57.654.055.459.856.772.073.862.653.758.964.2AGSubMorphs66.167.561.659.863.778.683.767.453.456.067.8AGCompounding62.064.857.461.161.373.581.161.953.261.066.2Morfessor68.951.163.568.262.980.972.068.160.660.868.5AGssvMorphSeq64.656.963.170.363.872.773.365.961.2AGssvSubMorphs70.169.766.367.968.480.483.770.559.0AGssvCompounding70.567.270.0–77.381.970.5—AGSelect69.868.867.570.169.177.381.971.159.762.670.5Table2:Devsetresultsforallmodelsin(un)transductiveand(b)inductivemode.UnsupervisedAGmodelsandbaselinesareshowninthetoppartofeachtable;semi-supervisedAGmodelsandgrammarselectionmethodarebelow.BorderF1-scoreEMMAEngEstFinTurAvg-EstEngEstFinTurGerAvg-Est/GerMorf.trans67.373.961.257.164.961.978.478.861.849.865.266.863.3Morf.ind65.757.760.860.161.162.276.570.559.647.064.163.561.0Morsel——81.977.263.347.859.065.864.3Morf.ssv77.8-71.768.9-72.880.6-62.149.9–64.2AGssvbest70.3?68.6†64.9?58.2∗65.564.575.9?80.3†61.3?46.1–61.1AGSelect74.471.770.061.469.468.681.381.064.047.563.867.564.3Table3:TestsetresultsforunsupervisedbaselinesMorfessorCatMAP(intransductiveandinductivemode)andMorsel;semi-supervisedMorfessor;andAGsemi-supervised(?markstheCompoundinggrammar,†denotesSubMorphsgrammar,and∗istheMorphSeqgrammar)andgrammarselectionmethods.Resultsareshownforeachlanguage,averagedoveralllanguages(whenpossible:Avg),andaveragedoverjustthelanguageswherescoresareavailableforallsystems(-Est,-Est/Ger).forEnglish,togetherwiththeirrelativefrequencies.ItshowsthatattheWordlevelthebinaryruleisselectedoverthreetimesmorefrequentlythantheunaryrule.Also,mostofthemorefrequentlyusedgrammarrulesexpandthefirstbranch(rootedinM1)intomorefinegrainedstructures.Thesecondbranch(M2)ismostlymodelledwiththeunaryrule.AmongthefrequentlycachedrulesweseethecommonEnglishprefixesandsuffixes.Oneofthemostfrequentcachedrulestoresthesinglelettereattheendofaword,whichoftencausesoversegmen-tationofwordsendingine(asseenintheincorrectexamplesinTable4).ThisproblemiscommoninunsupervisedmorphologicalsegmentationofEnglish(Goldwateretal.,2006;Goldsmith,2001).Wealsotookalookatthemostfrequentcachedruleslearnedbythesemi-supervisedAGwiththeSubMorphsgrammar,andobservedthatMorphs

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
2
2
5
1
5
6
6
6
5
1

/

/
t

je

un
c
_
un
_
0
0
2
2
5
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

264

CorrectSegmentationsIncorrectSegmentationsWordSegmentationInducedCorrecttreatable[tr.ea.t]M1[a.b.le]M21disagreesdisagreesdisciplined[dis.cip.l.i.n]M1[e.d]M21reducereducemonogamous[mon.o.g.a.m]M1[o.u.s]M21revaluerevaluestreakers[st.r.e.a.k]M1[e.r]M21[s]M2211deridederidetollgate[t.o.l.l.]M1[g.a.t.e]M21[s]M2211accompaniedaccompaniedfoxhunting[f.o.x]M1[h.u.n.t]M21[ing]M2211warywarymuscovites[m.u.sc.o.v]M1[i.t.e]M21[s]M2211indescribableindescribablestandardizes[st.a.n.d.a.rd]M1[i.z.e]M21[s]M2211oratesoratesslavers’[sl.a.v]M1[e.r]M21[s]M2211[']M2212algeriansalgeriansearthiness’[e.ar.th]M1[je]M2111[ness]M2211[']M2212disputesdisputesinstinkt[in.st.in.kt]M1meisterlikkustmeisterlikkustrebis[re.b.i]M1[s]M2minaminatoitsid[to.it]M1[s.id]M2teisteteistearmuavaldus[a.rm.u]M11[ava.ld.u.s]M12kuritegudessekuritegudessem¨a¨agivale[m¨a¨a.g.i]M11[v.a]M12[l.e]M2liharoagaliharoagakeskuskoulussa[kesk.us]M11[koul.u]M12[s.sa]M2poltettiinpoltettiinperusl¨ahteille[per.u.s]M11[l.¨a.ht.e]M12[je]M211[ll.e]M212kulttuuriseltakinkulttuuriseltakinperunakaupoista[per.u.n.a]M11[k.au.p.o]M12[je]M211[st.a]M212tuotepalkintojatuotepalkintojay¨opaikkaani[y¨o]M11[p.ai.kk.a]M12[un]M21[n.i]M22velipuoltavelipuoltanimett¨ak¨o¨on[ni.m.e]M11[tt.¨a]M12[k.¨o]M21[¨o.n]M22otattavaotattavaTable4:ExamplesofsegmentedwordsinEnglish(top),Estonian(middle)andFinnish(bottom).Correctlysegmentedwordsareintheleftpartofthetable.Theidentifiedsegmentsareinbracketsindexedbytherespectivetemplatenonterminal;dotsseparatethemetagrammargeneratedparsetreeleaves.Examplesofincorrectlysegmentedwordstogetherwiththecorrectsegmentationareontheright.Freq(%)RuleFreq(%)CachedRule9.9Word→M1M21.2(M2(M21(M211(M2111s)))))5.7M1→M11M120.9(M2(M21(M211(M2111e))(M212(M2121d))))3.1Word→M10.7(M2(M21(M211(M2111i))(M212(M2121ng))))2.5M11→M1110.6(M2(M21(M211(M2111e)))1.8M2→M210.4(M2(M21(M211(M2111’)))(M22(M221(M2211s))))1.4M12→M121M1220.3(M1112a)1.4M111→M1111M11120.3(M2(M21(M211(M2111y))))0.9M12→M1210.3(M2(M21(M211(M2111e)))(M212(M2121r)))0.9M11→M111M1120.2(M2(M21(M211(M2111a))))Table5:ExamplesfromEnglishmostfrequentlyusedmetagrammarrulesandcachedrulestogetherwiththeirrelativeoccurrencefrequencies(inpercentages).tendedtocontainonlyasingleSubMorph.ThishelpstoexplainwhytheSubMorphsgrammarinsemi-supervisedAGimprovedlessovertheunsuper-visedAGascomparedtotheMorphSeqgrammar—theruleswithonlyasingleSubMorphundertheMorphareessentiallythesameastheywouldbeintheMorphSeqgrammar.Finally,weexaminedtheconsistencyofthetem-plateschosenforeachofthe5samplersduringmodelselectionforthetestset(Section4.4).Wefoundthattherewassomevariabilityinthetemplates,butinmostexperimentsthesametemplatewaschosenforthemajorityofthesamplers(seeTable6).Whilethismajoritytemplateisnotalwaystheoptimaloneonthedevset,weobservedthatitdoesproducecon-sistentlygoodresults.Itispossiblethatusingthemajoritytemplate,ratherthantheoptimaltemplateforthedevset,wouldactuallyproducebetterresults

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
2
2
5
1
5
6
6
6
5
1

/

/
t

je

un
c
_
un
_
0
0
2
2
5
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

265

MajoritytemplateEnglishM1M21M2211M2212M222FinnishM11M12M211M212M22TurkishM11M12M211M212M22GermanM11M121M122M21M221M222EstonianM11M12M2Table6:Majoritytemplatesforeachlanguage.NotethattheEstoniangoldstandardcontainslessfine-grainedsegmentationsthansomeoftheotherlanguages.onthetestset,especiallyif(asappearstobethecasehere,andmayoftenbethecaseinrealapplications)thedevandtestsetsarefromsomewhatdifferentdistributions.ItmustbenotedthatbothAGSelectandsemi-supervisedAGarecomputationallymoredemandingthanthecomparisonsystems.Sincewedoinferenceovertreestructures,thecomplexityiscubicintheinputwordlength,whilemostsegmentationsystemsarequadraticorlinear.Evencomparedtotheunsu-pervisedAG,AGSelectismoreexpensive,becauseofthelargergrammarandnumberofcachedsymbols.Nevertheless,oursystemscanfeasiblyberunonthelargeMorphoChallengedatasets.Otherrecentunsupervisedsystemshavereportedstate-of-theartresultsbyincorporatingadditionalin-formationfromsurroundingwords(Leeetal.,2011),multilingualalignments(SnyderandBarzilay,2008),oroverlappingcontextfeaturesinalog-linearmodel(Poonetal.,2009),buttheyhaveonlybeenrunonSemiticlanguagesandEnglish(andinthelattercase,averysmallcorpus).Sincetheyexplicitlyenumerateandsamplefromallpossiblesegmentationsofeachword(oftenwithsomeheuristicconstraints),theycouldhavetroublewiththemuchlongerwordsoftheagglutinativelanguagestestedhere.Inanycasetheresultsarenotdirectlycomparabletoours.6ConclusionInthispaperwehaveintroducedthreenewmeth-odsforAdaptorGrammarsanddemonstratedtheirusefulnessforminimallysupervisedmorphologicalsegmentation.First,weshowedthatAGmodelscanbescaledtolargedatasetsbyusingtheposteriorgrammarfordefininganinductivemodel,thatonaverageresultsinthesameaccuracyascomparedtofulltransductivetraining.Second,weimplementedsemi-supervisedAGin-ference,whichuseslabelleddatatoconstrainthesampler,andshowedthatinallcasesitperformsmuchbetterthantheunsupervisedAGonthesamegrammar.Semi-supervisedAGcouldbenefitfromlabelleddatareweightingtechniquesfrequentlyusedinsemi-supervisedlearning,andstudyingtheproperwaysofdoingsowithintheAGframeworkwouldbeapotentialtopicforfutureresearch.OurfinalcontributionistheAGSelectmethod,wheretheinitialmodelistrainedusingaverygeneralgrammarthatoversegmentsthedata,andthelabelleddataisusedtoselectwhichgranularityofsegmentstouse.Unlikeothermorphologicalsegmentationmod-els,thismethodcanadaptitsgrammartolanguageswithdifferentstructures,ratherthanhavingtousethesamegrammarforeverylanguage.Indeed,wefoundthatAGSelectperformswellacrossarangeoflanguagesandalsoseemstobelesssensitivetodifferencesbetweendatasets(ici,devvs.test).Inaddition,itcanbetrainedoneithermorphologicalanalysesorsegmentations.AlthoughwetunedallresultstooptimizetheSBF1metric,inprinciplethesamemethodcouldbeusedtooptimizeothermea-sures,includingextrinsicmeasuresondownstreamapplicationssuchasmachinetranslationorinforma-tionretrieval.Infuturewehopetoshowthatthismethodcanbeusedtoimproveperformanceonsuchapplications,andalsotoexploreitsuseforrelatedsegmentationtaskssuchasstemmingorsyllabifica-tion.Also,themethoditselfcouldpotentiallybeimprovedbydesigningaclassifiertodetermininethebesttemplateforeachwordbasedonasetoffeatures,ratherthanusingasingletemplateforallwordsinthelanguage.AcknowledgmentsThisworkwassupportedbytheTigerUniversitypro-gramofEstonianInformationTechnologyFounda-tionforthefirstauthor.WethankConstantineLignosforreleasinghisMorselcodetous,SamiVirpiojaforevaluatingtestsetresults,andFedericoSangatiforprovidingusefulscripts.ReferencesMathiasCreutzandKristaLagus.2007.Unsupervisedmodelsformorphemesegmentationandmorphology

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
2
2
5
1
5
6
6
6
5
1

/

/
t

je

un
c
_
un
_
0
0
2
2
5
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3

266

learning.ACMTransactionsofSpeechandLanguageProcessing,4(1):1–34,February.MichaElsner,EugeneCharniak,andMarkJohnson.2009.Structuredgenerativemodelsforunsupervisednamed-entityclustering.InProceedingsofNAACL,pages164–172.AssociationforComputationalLinguistics.JohnGoldsmith.2001.Unsupervisedlearningofthemorphologyofanaturallanguage.ComputationalLin-guistics,27(2):153–198,June.SharonGoldwater,ThomasL.Griffiths,andMarkJohn-son.2006.Interpolatingbetweentypesandtokensbyestimatingpower-lawgenerators.InAdvancesinNeu-ralInformationProcessingSystems18,pages459–466,Cambridge,MA.MITPress.EricA.Hardisty,JordanBoyd-Graber,andPhilipResnik.2010.Modelingperspectiveusingadaptorgrammars.InProceedingsofEMNLP,pages284–292.AssociationforComputationalLinguistics.ZelligHarris.1951.StructuralLinguistics.UniversityofChicagoPress.YunHuang,MinZhang,andChewLimTan.2011.Non-parametricbayesianmachinetransliterationwithsyn-chronousadaptorgrammars.InProceedingsofACL:shortpapers-Volume2,pages534–539.AssociationforComputationalLinguistics.MarkJohnsonandKatherineDemuth.2010.Unsuper-visedphonemicchinesewordsegmentationusingadap-torgrammars.InProceedingsofCOLING,pages528–536.AssociationforComputationalLinguistics.MarkJohnsonandSharonGoldwater.2009.Improvingnonparametericbayesianinference:experimentsonun-supervisedwordsegmentationwithadaptorgrammars.InProceedingsofNAACL,pages317–325.AssociationforComputationalLinguistics.MarkJohnson,ThomasL.Griffiths,andSharonGold-water.2007.Adaptorgrammars:Aframeworkforspecifyingcompositionalnonparametricbayesianmod-els.InB.Sch¨olkopf,J.Platt,andT.Hoffman,editors,AdvancesinNeuralInformationProcessingSystems19,pages641–648.MITPress,Cambridge,MA.MarkJohnson.2008a.Unsupervisedwordsegmentationforsesothousingadaptorgrammars.InProceedingsofACLSpecialInterestGrouponComputationalMor-phologyandPhonology,pages20–27.AssociationforComputationalLinguistics.MarkJohnson.2008b.Usingadaptorgrammarstoiden-tifysynergiesintheunsupervisedacquisitionoflinguis-ticstructure.InProceedingsofACL,pages398–406.AssociationforComputationalLinguistics.OskarKohonen,SamiVirpioja,andKristaLagus.2010a.Semi-supervisedlearningofconcatenativemorphology.InProceedingsofACLSpecialInterestGrouponCom-putationalMorphologyandPhonology,pages78–86.AssociationforComputationalLinguistics.OskarKohonen,SamiVirpioja,LauraLepp¨anen,andKristaLagus.2010b.Semi-supervisedextensionstomorfessorbaseline.InProceedingsoftheMorphoChallenge2010Workshop,pages30–34.AaltoUniver-sitySchoolofScienceandTechnology.YoongKeokLee,AriaHaghighi,andReginaBarzilay.2011.Modelingsyntacticcontextimprovesmorpho-logicalsegmentation.InProceedingsofCoNLL,pages1–9.AssociationforComputationalLinguistics.PercyLiang,HalDaum´e,III,andDanKlein.2008.Struc-turecompilation:tradingstructureforfeatures.InProceedingsofICML,pages592–599.AssociationforComputingMachinery.ConstantineLignos.2010.LearningfromUnseenData.InMikkoKurimo,SamiVirpioja,andVilleT.Turunen,editors,ProceedingsoftheMorphoChallenge2010Workshop,pages35–38.AaltoUniversitySchoolofScienceandTechnology.JimPitmanandMarcYor.1997.Thetwo-parameterPoisson-Dirichletdistributionderivedfromastablesub-ordinator.AnnalsofProbability,25(2):855–900.HoifungPoon,ColinCherry,andKristinaToutanova.2009.Unsupervisedmorphologicalsegmentationwithlog-linearmodels.InProceedingsofNAACL,pages209–217.AssociationforComputationalLinguistics.BenjaminSnyderandReginaBarzilay.2008.Unsuper-visedmultilinguallearningformorphologicalsegmen-tation.InProceedingsofACL,pages737–745.Associ-ationforComputationalLinguistics.SebastianSpieglerandChristianMonson.2010.Emma:Anovelevaluationmetricformorphologicalanalysis.InProceedingsofCOLING,pages1029–1037.Associ-ationforComputationalLinguistics.Sze-MengJojoWong,MarkDras,andMarkJohnson.2012.Exploringadaptorgrammarsfornativelanguageidentification.InProceedingsofEMNLP,pages699–709.AssociationforComputationalLinguistics.
Télécharger le PDF