Transactions of the Association for Computational Linguistics, Bd. 5, S. 73–86, 2017. Action Editor: Eric Fosler-Lussier.

Transactions of the Association for Computational Linguistics, Bd. 5, S. 73–86, 2017. Action Editor: Eric Fosler-Lussier.
Submission batch: 8/2016; Revision batch: 11/2016; Published 2/2017.

2017 Verein für Computerlinguistik. Distributed under a CC-BY 4.0 Lizenz.

C
(cid:13)

AGenerativeModelofPhonotacticsRichardFutrellBrainandCognitiveSciencesMassachusettsInstituteofTechnologyfutrell@mit.eduAdamAlbrightDepartmentofLinguisticsMassachusettsInstituteofTechnologyalbright@mit.eduPeterGraffIntelCorporationgraffmail@gmail.comTimothyJ.O’DonnellDepartmentofLinguisticsMcGillUniversitytimothy.odonnell@mcgill.caAbstractWepresentaprobabilisticmodelofphono-tactics,thesetofwell-formedphonemese-quencesinalanguage.Unlikemostcompu-tationalmodelsofphonotactics(HayesandWilson,2008;GoldsmithandRiggle,2012),wetakeafullygenerativeapproach,model-ingaprocesswhereformsarebuiltupoutofsubpartsbyphonologically-informedstruc-turebuildingoperations.Welearnaninven-toryofsubpartsbyapplyingstochasticmemo-ization(Johnsonetal.,2007;Goodmanetal.,2008)toagenerativeprocessforphonemesstructuredasanand-orgraph,basedoncon-ceptsoffeaturehierarchyfromgenerativephonology(Clements,1985;Dresher,2009).Subpartsarecombinedinawaythatallowstier-basedfeatureinteractions.Weevaluateourmodels’abilitytocapturephonotacticdis-tributionsinthelexiconsof14languagesdrawnfromtheWOLEXcorpus(Graff,2012).Ourfullmodelrobustlyassignshigherproba-bilitiestoheld-outformsthanasophisticatedN-grammodelforalllanguages.Wealsopresentnovelanalysesthatprobemodelbe-haviorinmoredetail.1IntroductionPeoplehavesystematicintuitionsaboutwhichse-quencesofsoundswouldconstitutelikelyorun-likelywordsintheirlanguage:AlthoughblickisnotanEnglishword,itsoundslikeitcouldbe,whilebnickdoesnot(ChomskyandHalle,1965).Suchin-tuitionsrevealthatspeakersareawareoftherestric-tionsonsoundsequenceswhichcanmakeuppossi-blemorphemesintheirlanguage—thephonotacticsofthelanguage.Phonotacticrestrictionsmeanthateachlanguageusesonlyasubsetofthelogically,orevenarticulatorily,possiblestringsofphonemes.Admissiblephonemecombinations,ontheotherhand,typicallyrecurinmultiplemorphemes,lead-ingtoredundancy.Itiswidelyacceptedthatphonotacticjudgmentsmaybegradient:thenonsensewordblickisbetterasahypotheticalEnglishwordthanbwick,whichisbetterthanbnick(HayesandWilson,2008;Al-bright,2009;Dalandetal.,2011).Toaccountforsuchgradedjudgements,therehavebeenavari-etyofprobabilistic(oder,moregenerally,weighted)modelsproposedtohandlephonotacticlearningandgeneralizationoverthelasttwodecades(seeDa-landetal.(2011)andbelowforreview).How-ever,inspiredbyoptimality-theoreticapproachestophonology,themostlinguisticallyinformedandsuc-cessfulsuchmodelshavebeenconstraint-based—formulatingtheproblemofphonotacticgeneraliza-tionintermsofrestrictionsthatpenalizeillicitcom-binationsofsounds(e.g.,rulingout∗bn-).Inthispaper,bycontrast,weadoptagenerativeapproachtomodelingphonotacticstructure.Ourapproachharkensbacktoearlyworkonthesoundstructureoflexicalitemswhichmadeuseofmor-phemestructurerulesorconditions(Halle,1959;Stanley,1967;Booij,2011;RasinandKatzir,2014).Suchapproachesexplicitlyattemptedtomodelthe

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
0
4
7
1
5
6
7
4
3
6

/

/
T

l

A
C
_
A
_
0
0
0
4
7
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

74

redudancywithinthesetofallowablelexicalformsinalanguage.Weadoptaprobabilisticversionofthisidea,conceivingofthephonotacticsystemasthecomponentofthelinguisticsystemwhichgen-eratesthephonologicalformoflexicalitemssuchaswordsandmorphemes.1Oursystemlearnsin-ventoriesofreusablephonotacticallylicitstructuresfromexistinglexicalitems,andassemblesnewlex-icalitemsbycombiningtheselearnedphonotac-ticpatternsusingphonologicallyplausiblestructure-buildingoperations.Thus,insteadofmodelingphonotacticgeneralizationsintermsofconstraints,wetreattheproblemasaproblemoflearninglan-guagespecificinventoriesofphonologicalunitsandlanguagespecificbiasesonhowthesephonesarelikelytobecombined.Althoughtherehavebeenanumberofearliergen-erativemodelsofphonotacticstructure(seeSec-tion4)thesemodelshavemostlyusedrelativelysimplisticorphonologicallyimplausiblerepresenta-tionsofphonesandphonologicalstructure-building.Bycontrast,ourmodelisbuiltaroundthreerepre-sentationalassumptionsinspiredbythegenerativephonologyliterature.First,wecapturesparsityinthespaceoffeature-specificationsofphonemesbyusingfeaturedependencygraphs—anideainspiredbyworkonfeaturegeometriesandthecontrastivehierarchy(Clements,1985;Dresher,2009).Sec-ond,oursystemcanrepresentphonotacticgeneral-izationsnotonlyattheleveloffullyspecifiedseg-ments,butalsoallowsthestorageandreuseofsub-segments,inspiredbytheautosegmentsandclassnodesofautosegmentalphonology.Finally,alsoin-spiredbyautosegmentalphonology,wemakeuseofastructure-buildingoperationwhichissenstitivetotier-basedcontextualstructure.Tomodelphonotacticlearning,wemakeuseoftoolsfromBayesiannonparametricstatistics.Inpar-ticular,wemakeuseofthenotionoflexicalmem-oization(?;Goodmanetal.,2008;Woodetal.,2009;O’Donnell,2015)—theideathatlanguage-specificgeneralizationscanbecapturedbythestor-ageandreuseoffrequentpatternsfromalinguisti-1Ultimately,weconceiveofphonotacticsasthemoduleofphonologywhichgeneratestheunderlyingformsoflexicalitems,whicharethensubjecttophonologicaltransformations(i.e.,transductions).Inthiswork,Jedoch,wedonotattempttomodeltransformationsfromunderlyingtosurfaceforms.callyuniversalinventory.Inourcase,thisamountstotheideathataninventoryofsegmentsandsub-segmentscanbeacquiredbyalearnerthatstoresandreusescommonlyoccuringsegmentsinpartic-ular,phonologicallyrelevantcontexts.Inshort,weviewtheproblemoflearningthephonemeinven-toryasoneofconcentratingprobabilitymassonthesegmentswhichhavebeenobservedbefore,andtheproblemofphonotacticgeneralizationaslearningwhich(sub-)segmentsarelikelyinparticulartier-basedphonologicalcontexts.2ModelMotivationsInthissection,wegiveanoverviewofhowourmodelworksanddiscussthephenomenaandthe-oreticalideasthatmotivateit.2.1FeatureDependencyGraphsMostformalmodelsofphonologypositthatseg-mentsaregroupedintosets,knownasnaturalclasses,thatarecharacterizedbysharedarticulatoryandacousticproperties,orphonologicalfeatures(Trubetzkoy,1939;Jakobsonetal.,1952;ChomskyandHalle,1968).Forexample,thesegments/n/and/m/areclassifiedwithapositivevalueofanasal-ityfeature(i.e.,NASALITY:+).Ähnlich,/m/and/p/canbeclassifiedusingthelabialvalueofaPLACEfeature,PLACE:labial.Thesefeaturesal-lowcompactdescriptionofmanyphonotacticgen-eralizations.2Fromaprobabilisticstructure-buildingperspec-tive,weneedtospecifyagenerativeprocedurewhichassemblessegmentsoutofpartsdefinedintermsofthesefeatures.Inthissection,wewillbuildupsuchaprocedurestartingfromthesimplestpossi-bleprocedureandprogressingtowardsonewhichismorephonologicallyinformed.Wewillclarifythe2Forcompatibilitywiththedatasourcesusedinevaluation(Section5.2),thefeaturesystemweuseheredepartsinseveralwaysfromstandardfeaturesets:(1)Weusemultivalentratherthanbinary-valuedfeatures.(2)Werepresentmannerwithasinglefeature,whichhasvaluessuchasvocalic,stop,andfricative.Thisapproachallowsustorefertomannersmorecompactlythaninsystemsthatemploycombinationsoffeaturessuchassonorant,continuant,andconsonantal.Forexample,ratherthanreferringtovowelsas‘non-syllabic’,werefertothemusingfeaturevaluevocalicforthefeatureMANNER.

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
0
4
7
1
5
6
7
4
3
6

/

/
T

l

A
C
_
A
_
0
0
0
4
7
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

75

generativeprocesshereusingananalogytoPCFGs,butthisanalogywillbreakdowninlatersections.Thesimplestprocedureforgeneratingaseg-mentfromfeaturesistospecifyeachfeatureindependently.Forexample,considerthesetoffeature-valuepairsfor/t/:{NASALITY:-,PLACE:alveolar,…}.Inanaivegenerativepro-cedure,onecouldgenerateaninstanceof/t/byinde-pendentlychoosingvaluesforeachfeatureintheset{NASALITY,PLACE,…}.Weexpressthisprocessusingtheand-orgraphnotationbelow.Box-shapednodes—calledor-nodes—representfeaturessuchasNASALITY,whilecircularnodesrepresentgroupsoffeatureswhosevaluesarechosenindependentlyandarecalledand-nodes.NASALITYPLACEThisgenerativeprocedureisequivalent(ignoringor-der)toaPCFGwithrules:SEGMENT→NASALITYPLACENASALITY→+NASALITY→-PLACE→bilabialPLACE→alveolarNotallcombinationsoffeature-valuepairscor-respondtopossiblephonemes.Forexample,while/l/isdistinguishedfromotherconsonantsbythefeatureLATERAL,itisincoherenttospecifyvow-elsasLATERAL.Inordertoconcentrateprobabil-itymassonrealsegments,ourprocessshouldopti-mallyassignzeroprobabilitymasstotheseincoher-entphonemes.WecanavoidspecifyingaLATERALfeatureforvowelsbystructuringthegenerativepro-cessasbelow,sothattheLATERALor-nodeisonlyreachedforconsonants:VOCALICALATERALBHEIGHTconsonantvowelBeyondgeneratingwell-formedphonemes,aba-sicrequirementofamodelofphonotacticsisthatitconcentratesmassonlyonthesegmentsinapar-ticularlanguage’ssegmentinventory.Forexam-ple,themodelofEnglishphonotacticsshouldputzeroornominalmassonanysequencecontainingthesegment/x/,althoughthisisalogicallypossi-blephoneme.Soourgenerativeprocedureforaphonememustbeabletolearntogenerateonlythelicitsegmentsofalanguage,givensomeprobabil-itydistributionsattheand-andor-nodes.Forthistask,independentlysamplingvaluesatand-nodesdoesnotgiveusawaytoruleoutparticularcom-binationsoffeaturessuchasthoseforming/x/.Ourapproachtothisproblemusestheideaofstochasticmemoization(oradaptation),inwhichtheresultsofcertaincomputationsarestoredandmaybeprobabilisticallyreused“aswholes,”ratherthanrecomputedfromscratch(Michie,1968;Goodmanetal.,2008).Thistechniquehasbeenappliedtotheproblemoflearninglexicalitemsatvariouslevelsoflinguisticstructure(deMarcken,1996;Johnsonetal.,2007;Goldwater,2006;O’Donnell,2015).Givenourmodelsofar,applyingstochasticmemo-izationisequivalenttospecifyinganadaptorgram-maroverthePCFGsdescribedsofar.Letfbeastochasticfunctionwhichsamplesfeaturevaluesusingtheand-orgraphrepresenta-tiondescribedabove.Weapplystochasticmemo-izationtoeachnode.FollowingJohnsonetal.(2007)andGoodmanetal.(2008),weuseadistri-butionforprobabilisticmemoizationknownastheDirichletProcess(DP)(Ferguson,1973;Sethura-man,1994).Letmem{F}beaDP-memoizedver-sionoff.ThebehaviorofaDP-memoizedfunctioncanbedescribedasfollows.Thefirsttimeweinvokemem{F},thefeaturespecificationofanewsegmentwillbesampledusingf.Onsubsequentinvocations,weeitherchooseavaluefromamongthesetofpre-vioussampledvalues(amemodraw),orwedrawanewvaluefromf(abasedraw).TheprobabilityofsamplingtheitholdvalueinamemodrawisniN+θ,whereNisthenumberoftokenssampledsofar,niisthenumberoftimesthatvalueihasbeenusedinthepast,andθ>0isaparameterofthemodel.AbasedrawhappenswithprobabilityθN+θ.Thispro-cessinducesabiastoreuseitemsfromfwhichhavebeenfrequentlygeneratedinthepast.Weapplymemrecursivelytothesamplingproce-dureforeachnodeinthefeaturedependencygraph.Themoretimesthatweusesomeparticularsetoffeaturesunderanodetogeneratewordsinalan-guage,themorelikelywearetoreusethatsetof

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
0
4
7
1
5
6
7
4
3
6

/

/
T

l

A
C
_
A
_
0
0
0
4
7
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

76

featuresinthefutureinamemodraw.Thisdynamicleadsourmodeltorapidlyconcentrateprobabilitymassonthesubsetofsegmentswhichoccurintheinventoryofalanguage.2.2ClassNodeStructureOuruseofand-orgraphsandlexicalmemoiza-tiontomodelinter-featuredependenciesisin-spiredbyworkinphonologyondistinctivenessandmarkednesshierarchies(Kean,1975;Berwick,1985;Dresher,2009).Inadditiontousingfeaturehierarchiestodelineatepossiblesegments,theliter-aturehasusedthesestructurestodesignatebundlesoffeaturesthathaveprivilegedstatusinphonolog-icaldescription,i.e.featuregeometries(Clements,1985;Halle,1995;McCarthy,1988).Forexample,manyanalysesgroupfeaturesconcerninglaryngealstates(e.g.,VOICE,ASPIRATION,usw.)underala-ryngealnode,whichisdistinctfromthenodecon-tainingoralplace-of-articulationfeatures(ClementsandHume,1995).Thesenodesareknownasclassnodes.Intheseanalyses,featuresgroupedtogetherunderthelaryngealclassnodemaycovarywhilebe-ingindependentoffeaturesgroupedundertheoralclassnode.Thelexicalmemoizationtechniquediscussedabovecapturesthisnotionofclassnodedirectly,be-causethemodellearnsaninventoryofsubsegmentsundereachnode.Considerthefeaturedependencygraphbelow.ABNASALITYVOICEVOCALICCBACKNESSHEIGHT……consonantvowelInthisgraph,theand-nodeAgeneratesfullyspec-ifiedsegments.And-nodeBcanbethoughtofasgeneratingthenon-oralpropertiesofasegment,in-cludingvoicingandnasality.And-nodeCisaclassnodebundlingtogethertheoralfeaturesofvowelsegments.ThefeaturesunderBareoutsideoftheVO-CALICnode,sothesefeaturesarespecifiedforbothconsonantandvowelsegments.Thisallowscombinationssuchasvoicednasalconsonants,andalsorarercombinationssuchasunvoicednasalvow-els.Becausealland-nodesarerecursivelymemo-ized,ourmodelisabletobindtogetherparticularnon-oralchoices(nodeB),learningforinstancethatthecombination{NASALITY:+,VOICED:+}com-monlyrecursforbothvowelsandconsonantsinalanguage.Thatis,{NASALITY:+,VOICED:+}be-comesahigh-probabilitymemodraw.Sincethemodellearnsaninventoryoffullyspec-ifiedsegmentsatnodeA,themodelcouldlearnone-offexceptionstothisgeneralizationaswell.Forexample,itcouldstoreatahighlevelasegmentwith{NASALITY:+,VOICED:-}alongwithsomeotherfeatures,whilemaintainingthegeneralizationthat{NASALITY:+,VOICED:+}ishighlyfrequentinbasedraws.Language-specificphonemeinvento-riesaboundwithsuchcombinationsofclass-node-basedgeneralizationsandidiosyncrasies.Byusinglexicalmemoizationatmultipledifferentlevels,ourmodelcancaptureboththebroadergeneralizationsdescribedinclassnodeterminologyandtheexcep-tionstothosegeneralizations.2.3SequentialStructureasMemoizationinContextInSection2.2,wefocusedontherolethatfeaturesplayindefiningalanguage’ssegmentinventory.Wegaveaphonologically-motivatedgenerativeprocess,equivalenttoanadaptorgrammar,forphonemesinisolation.However,featuresalsoplayanim-portantroleincharacterizinglicitsequences.Wemodelsequentialrestrictionsascontext-dependentsegmentinventories.Ourmodellearnsadistributionoversegmentsandsubsegmentsconditionaloneachprecedingsequenceof(sub)segments,usinglexi-calmemoization.Introducingcontext-dependencemeansthatthemodelcannolongerbeformulatedasanadaptorgrammar.2.4Tier-basedInteractionOnesalientpropertyofsequentialrestrictionsinphonotacticsisthatsegmentsareoftenrequiredtobearthesamefeaturevaluesasnearbysegments.Forexample,asequenceofanasalandafollow-ingstopmustagreeinplacefeaturesattheendofamorphemeinEnglish.Suchrestrictionsmayevenbenon-local.Forexample,manylanguagespre-fercombinationsofvowelsthatagreeinfeaturessuchasHEIGHT,BACKNESS,orROUNDING,sogar

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
0
4
7
1
5
6
7
4
3
6

/

/
T

l

A
C
_
A
_
0
0
0
4
7
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

77

Figure1:TiersdefinedbyclassnodesAandBforcontextsequence/ak/.Seetext.acrossarbitrarynumbersofinterveningconsonants(i.e.,vowelharmony).Onewaytodescribethesesequentialfeaturein-teractionsistoassumethatfeaturevaluesofonesegmentinaworddependonvaluesforthesameorcloselyrelatedfeaturesinothersegments.Thisisaccomplishedbydividingsegmentsintosubsets(suchasconsonantsandvowels),calledtiers,andthenmakingasegment’sfeaturevaluespreferen-tiallydependentonthevaluesofothersegmentsonthesametier.Suchphonologicaltiersareoftenidentifiedwithclassnodesinafeaturedependencygraph.Forexample,arequirementthatonevowelidenticallymatchthevowelintheprecedingsyllablewouldbestatedasarequirementthatthevowel’sHEIGHT,BACKNESS,andROUNDINGfeaturesmatchtheval-uesoftheprecedingvowel’sfeatures.Inthiscase,thevowelsthemselvesneednotbeadjacent—byas-sumingthatvowelqualityfeaturesarenotpresentinconsonants,itispossibletosaythattwovowelsareadjacentonatierdefinedbythenodesHEIGHT,BACKNESS,andROUNDING.Ourfullgenerativeprocessforasegmentfollow-ingothersegmentsisthefollowing.Wefollowtheexampleofthegenerationofaphonemeconditionalonaprecedingcontextof/ak/,shownwithsimpli-fiedfeaturalspecificationsandtiersinFigure1.Ateachnodeinthefeaturedependencygraph,wecaneithergenerateafully-specifiedsubsegmentforthatnode(memodraw),orassembleanovelsubseg-mentforthatnodeoutofpartsdefinedbythefea-turedependencygraph(basedraw).Startingattherootnodeofthefeaturedependencygraph,wede-cidewhethertodoamemodraworbasedrawcon-ditionalonthepreviousnsubsegmentsatthatnode.Soinordertogeneratethenextsegmentfollow-ing/ak/intheexample,westartatnodeAinthenextdrawfromthefeaturegeometry,withsomeprobabil-itywedoamemodrawconditionedon/ak/,definedbytheredtier.Ifwedecidetodoabasedrawin-stead,wethenrepeattheprocedureconditionalonthepreviousn−1segments,recursivelyuntilweareconditioningontheemptycontext.Thatis,wedoamemodrawconditionalon/k/,orconditionalontheemptycontext.Thisprocessofconditioningonsuccessivelysmallercontextsisastandardtech-niqueinBayesiannonparametriclanguagemodeling(Teh,2006;Goldwateretal.,2006).Attheemptycontext,ifwedecidetodoabasedraw,thenwegenerateanovelsegmentbyrepeat-ingthewholeprocessateachchildnode,togen-erateseveralsubsegments.Intheexample,wewouldassembleaphonemebyindependentlysam-plingsubsegmentsatthenasal/laryngealnodeBandtheMANNERnode,andthencombiningthem.Cru-cially,theconditioningcontextconsistsonlyofthevaluesatthecurrentnodeinthepreviousphonemes.SowhenwesampleasubsegmentfromnodeB,itisconditionalontheprevioustwovaluesatnodeB,{VOICE:+,NASAL:-}Und{VOICE:-,NASAL:-},definedbythebluetierinthefigure.Theprocesscontinuesdownthefeaturedependencygraphrecur-sively.Atthepointwherethemodeldecidesonvowelplacefeaturessuchasheightandbackness,thesewillbeconditionedonlyonthevowelplacesfeaturesofthepreceding/a/,with/k/skippeden-tirelyasitdoesnothavevaluesatvowelplacenodes.Thissectionhasprovidedmotivationsandawalk-throughofourproposedgenerativeprocedureforse-quencesofsegments.Inthenextsection,wegivetheformalizationofthemodel.3FormalizationoftheModelsHerewegiveafullformaldescriptionofourpro-posedmodelinthreesteps.First,inSection3.1,weformalizethegenerativeprocessforasegmentinisolation.Second,inSection3.2,wegivefor-mulationofBayesiannonparametricN-grammod-elswithbackoff.Third,inSection3.3,weshowhowtodropthegenerativeprocessforaphonemeintotheN-grammodelsuchthattier-basedinterac-tionsemergenaturally.

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
0
4
7
1
5
6
7
4
3
6

/

/
T

l

A
C
_
A
_
0
0
0
4
7
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

78

3.1GenerativeProcessforaSegmentAfeaturedependencygraphGisafullyconnected,singlyrooted,gerichtet,acyclicgraphgivenbythetriplehV,A,T,riwhereVisasetofverticesornodes,Aisasetofdirectedarcs,tisatotalfunctiont(N):V7→{Und,oder},andrisadistinguishedrootnodeinV.Adirectedarcisapairhp,ciwheretheparentpandchildcarebothelementsinV.Thefunctiont(N)identifieswhethernisanand-oror-node.Definech(N)tobethefunctionthatreturnsallchildrenofnoden,thatis,alln0∈Nsuchthathn,n0i∈A.AsubgraphGsoffeaturedependencygraphGisthegraphobtainedbystartingfromnodesbyre-tainingonlynodesandarcsreachablebytraversingarcsstartingfroms.Asubsegmentpsisasubgraphrootedinnodesforwhicheachor-nodecontainsex-actlyoneoutgoingarc.Subsegmentsrepresentsam-pledphoneconstituents.Asegmentisasubsegmentrootedinr—thatis,afullyspecifiedphoneme.ThedistributionassociatedwithasubgraphGsisgivenbyGsbelow.Gsisadistributionoversub-segments;thedistributionforthefullgraphGrisadistributionoverfullyspecifiedsegments.Weoc-casionallyoverloadthenotationsuchthatGs(ps)willrefertotheprobabilitymassfunctionassociatedwithdistributionGsevaluatedatthesubsegmentps.Hs∼DP(θs,Gs)(1)Gs(ps)=Ys0∈ch(S)Hs0(ps0)T(S)=ANDXs0∈ch(S)ψss0Hs0(ps0)T(S)=ORThefirstcaseofthedefinitioncoversand-nodes.Weassumethattheleavesofourfeaturedependencygraph—whichrepresentatomicfeaturevaluessuchasthelaryngealvalueofaPLACEfeature—arechildlessand-nodes.Thesecondcaseofthedefinitioncoversor-nodesinthegraph,whereψss0istheprobabilityassociatedwithchoosingoutgoingarchs,s0ifromparentor-nodestochildnodes0.Thus,or-nodesdefinemix-turedistributionsoveroutgoingarcs.ThemixtureweightsaredrawnfromaDirichletprocess.Inpar-ticular,foror-nodenintheunderlyinggraphG,thevectorofprobabilitiesoveroutgoingedgesisdis-tributedasfollows.~ψs∼DP(θs,UNIFORM(|ch(S)|))NotethatinbothcasesthedistributionoverchildsubgraphsisdrawnfromaDirichletprocess,asbe-low,capturingthenotionofsubsegmentalstoragediscussedabove.3.2N-GramModelswithDP-BackoffLetTbeasetofdiscreteobjects(e.g.,atomicsym-bolsorstructuredsegmentsasdefinedinthepreced-ingsections).LetT∗bethesetofallfinite-lengthstringswhichcanbegeneratedbycombiningele-mentsofT,underconcatenation,·,includingtheemptystring(cid:15).Acontext,uisanyfinitestringbe-ginningwithaspecialdistinguishedstartsym-bolandendingwithsomesequenceinT∗,thatis,u∈{start·T∗}.Foranystringα,definehd(α)tobethefunctionthatreturnsthefirstsymbolinthestring,tl(α)tobethefunctionthatreturnssuffixofαminusthefirstsymbol,Und|α|tobethelengthofα,withhd((cid:15))=tl((cid:15))=(cid:15)Und|(cid:15)|=0.Writetheconcatenationoftwostringsαandα0asα·α0.LetHubeadistributiononnextsymbols—thatis,objectsinT∪{stop}—conditionedonagivencontextu.ForanN-grammodeloforderN,theprobabilityofastringβinT∗isgivenbyKNstart(β·stop),whereKNu(α)isdefinedas:KNu(α)=n1α=(cid:15)HfN(u)(hd(α))×KNu·hd(α)(tl(α))ansonsten,(2)wherefn(·)isacontext-managementfunctionwhichdetermineswhichpartsoftheleft-contextshouldbeusedtodeterminetheprobabilityofthecurrentsymbol.InthecaseoftheN-grammodelsusedinthispaper,fn(·)takesasequenceuandre-turnsonlytherightmostn−1elementsfromthesequence,ortheentiresequenceifithaslengthlessthann.NotetwoaspectsofthisformulationofN-grammodels.First,Huisafamilyofdistributionsovernextsymbolsormoregeneralobjects.Later,wewilldropinphonological-feature-basedgenerativepro-cessesforthesedistributions.Second,thefunctionfnisaparameteroftheabovedefinitions.Inwhatfollows,wewilluseavariantofthisfunctionwhichissensitivetotier-basedstructure,returningthepre-viousn−1onlyontheappropriatetier.MacKayandPeto(1994)introducedahierarchi-calDirichletprocess-basedbackoffschemeforN-

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
0
4
7
1
5
6
7
4
3
6

/

/
T

l

A
C
_
A
_
0
0
0
4
7
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

79

grammodels,withgeneralizationsinTeh(2006)andGoldwateretal.(2006).Inthissetup,thedistribu-tionovernextsymbolsgivenacontextuisdrawnhierarchicallyfromaDirichletprocesswhosebasemeasureisanotherDirichletprocessassociatedwithcontexttl(u),andsoon,withalldrawsultimatelybackingoffintosomeunconditioneddistributionoverallpossiblenextsymbols.Thatis,inahier-archicalDirichletprocessN-grammodel,Hfn(u)isgivenasfollows.Hfn(u)∼nDP(θfn(u),Hfn−1(u))n≥1DP(θfn(u),UNIFORM(T∪{stop}))n=03.3Tier-BasedInteractionsTomaketheN-grammodeldefinedinthelastsec-tioncapturetier-basedinteractions,wemaketwochanges.First,wegeneralizethegenerativepro-cessHsfromEquation1toHsu,whichgeneratessubsegmentsconditionalonasequenceu.Sec-ond,wedefineacontext-truncatingfunctionfsn(u)whichtakesacontextofsegmentsuandreturnstherightmostn−1non-emptysubsegmentswhoserootnodeiss.Thenwesubstitutethegenerativepro-cessHsfsn(u)(whichappliesthecontext-managementfunctionfsn(·)tothecontextu)forHfn(u)inEqua-tion2.Theresultingprobabilitydistributionis:KNu(α)=n1α=(cid:15)HrfrN(u)(hd(α))×KNu·hd(α)(tl(α))otherwise.KNu(α)isthedistributionovercontinuationsgivenacontextofsegments.ItsdefinitiondependsonHsfsn(u),whichisthegeneralizationofthegener-ativeprocessforsegmentsHstobeconditionalonsometier-basedN-gramcontextfsn(u).Hsfsn(u)Ist:Hsfsn(u)∼nDP(θsfsn(u),Hsfsn−1(u))n≥1DP(θsfsn(u),GsfsN(u))n=0Gsfsn(u)(ps)=(Qs0∈ch(S)Hs0fs0n(u)(ps0)T(S)=ANDPs0∈ch(S)ψss0Hs0fs0n(u)(ps0)T(S)=OR.Hsfsn(u)andGsfsn(u)abovearemutuallyrecursivefunctions.Hsfsn(u)implementsbackoffinthetier-basedcontextofprevioussubsegments;Gsfsn(u)im-plementsbackoffbygoingdownintotheprobabil-itydistributionsdefinedbythefeaturedependencygraph.NotethatthefunctionHsfsn(u)recursivelybacksofftotheemptycontext,butitsultimatebasedistri-butionisindexedbyfsN(u),usingtheglobalmaxi-mumN-gramorderN.Sowhensamplesaredrawnfromthefeaturedependencygraph,theyarecon-ditionedonnon-emptytier-basedcontexts.Inthisway,subsegmentsaregeneratedbasedontier-basedcontextandbasedonfeaturalbackoffinaninter-leavedfashion.3.4InferenceWeusetheChineseRestaurantProcessrepresen-tationforsampling.Inferenceinthemodelisoverseatingarrangementsforobservationsofsub-segmentsandoverthehyperparametersθforeachrestaurant.WeperformGibbssamplingonseatingarrangementsintheDirichletN-grammodelsbyre-movingandre-addingobservationsineachrestau-rant.TheseGibbssweepshadnegligibleimpactonmodelbehavior.Fortheconcentrationparame-terθ,wesetapriorGamma(10,.1).Wedrawpos-teriorsamplesusingtheslicesamplerdescribedinJohnsonandGoldwater(2009).Wedrawonepos-teriorsampleofthehyperparametersforeachGibbssweep.IncontrasttotheGibbssweeps,wefoundre-samplinghyperparameterstobecrucialforachiev-ingtheperformancedescribedbelow(Section5.3).4RelatedWorkPhonotacticshasprovenafruitfulproblemdomainforcomputationalmodels.Mostsuchworkhasadoptedaconstraint-basedapproach,attemptingtodesignascoringfunctionbasedonphonologicalfea-turestoseparateacceptableformsfromunaccept-ableones,typicallybyformulatingrestrictionsorconstraintstoruleoutless-goodstructures.Thisconcepthaslednaturallytotheuseofundi-rected(maximum-entropy,log-linear)models.Inthisclassofmodels,aformisscoredbyevaluationagainstanumberofpredicates,calledfactors3—forexample,whethertwoadjacentsegmentshavethephonologicalfeaturesVOICE:+VOICE:-.Eachfac-torisassociatedwithaweight,andthescoreforaformisthesumoftheweightsofthefactorswhicharetruefortheform.Thewell-knownmodelof3Factorsarealsocommonlycalled“features”—atermweavoidtopreventconfusionwithphonologicalfeatures.

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
0
4
7
1
5
6
7
4
3
6

/

/
T

l

A
C
_
A
_
0
0
0
4
7
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

80

HayesandWilson(2008)adoptsthisframework,pairingitwithaheuristicprocedureforfindingex-planatoryfactorswhilepreventingoverfitting.Simi-larly,Albright(2009)assignsascoretoformsbasedonfactorsdefinedovernaturalclassesofadjacentsegments.Constraint-basedmodelshavetheadvan-tageofflexibility:itispossibletoscoreformsusingarbitrarilycomplexandoverlappingsetsoffactors.Forexample,onecanstateaconstraintagainstad-jacentphonemeshavingfeaturesVOICE:+andLAT-ERAL:+,oranycombinationoffeaturevalues.Incontrast,wehavepresentedamodelwhereformsarebuiltoutofpartsbystructure-buildingop-erations.Fromthisperspective,thegoalofamodelisnottoruleoutbadforms,butrathertodiscoverrepeatingstructuresingoodforms,suchthatnewformswiththosestructurescanbegenerated.Inthissettingthereislessflexibilityinhowphonologicalfeaturescanaffectwell-formedness.Forastructure-buildingmodeltoassign“scores”toarbitrarypairsofco-occurringfeatures,theremustbeapointinthegenerativeprocesswherethosefea-turesareconsideredinisolation.Comingupwithsuchaprocesshasbeenchallenging.Asaresultofthislimitation,structure-buildingmodelsofphono-tacticshavenotgenerallyincludedrichfeaturalin-teractions.Forexample,ColemanandPierrehum-bert(1997)giveaprobabilisticmodelforphonotac-ticswherewordsaregeneratedusinggrammaroverunitssuchassyllables,onsets,andrhymes.Thismodeldoesnotincorporatefine-grainedphonolog-icalfeaturessuchasvoicingandplace.Infact,ithasbeenarguedthataconstraint-basedapproachisrequiredinordertocapturerichfeature-basedinteractions.Forexample,GoldsmithandRiggle(2012)developatier-basedstructure-buildingmodelofFinnishphonotacticswhichcap-turesnonlocalvowelharmonyinteractions,butar-guethatthismodelisinadequatebecauseitdoesnotassignhigherprobabilitiestoformsthananN-grammodel,acommonbaselinemodelforphono-tactics(Dalandetal.,2011).Theyarguethatthisdeficiencyisbecausethemodelcannotsimulta-neouslymodelnonlocalvowel-vowelinteractionsandlocalconsonant-vowelinteractions.Becauseofourtier-basedconditioningmechanism(Sections2.4and3.3),ourmodelcansimultaneouslyproducelo-calandnonlocalinteractionsbetweenfeaturesus-ingstructure-buildingoperations,anddoesassignhigherprobabilitiestoheld-outformsthananN-grammodel(Section5.3).Fromthisperspective,ourmodelcanbeseenasaproofofconceptthatitispossibletohaverichfeature-basedconditioningwithoutadoptingaconstraint-basedapproach.Whileourmodelcancapturefeaturalinteractions,itislessflexiblethanaconstraint-basedmodelinthattheallowableinteractionsarespecifiedbythefeaturedependencygraph.Forexample,thereisnowaytoencodeadirectconstraintagainstadja-centphonemeshavingfeaturesVOICE:+andLAT-ERAL:+.Weconsiderthisastrengthoftheap-proach:Aparticularfeaturedependencygraphisaparameterofourmodel,andaspecificscientifichypothesisaboutthespaceoflikelyfeaturalinterac-tionsbetweenphonemes,similartofeaturegeome-triesfromclassicalgenerativephonology(Clements,1985;McCarthy,1988;Halle,1995).4Whileprobabilisticapproacheshavemostlytakenaconstraint-basedapproach,recentformallanguagetheoreticapproachestophonologyhaveinvestigatedwhatbasicpartsandstructurebuildingoperationsareneededtocapturerealisticfeature-basedinterac-tions(Heinzetal.,2011;JardineandHeinz,2015).Weseeprobabilisticstructure-buildingapproachessuchasthisworkasawaytounifytherecentfor-mallanguagetheoreticadvancesincomputationalphonologywithcomputationalphonotacticmodel-ing.OurmodeljoinsotherNLPworkattemptingtodosequencegenerationwhereeachsymbolisgen-eratedbasedonarichfeaturalrepresentationofprevioussymbols(BilmesandKirchhoff,2003;DuhandKirchhoff,2004),thoughwefocusmoreonphonology-specificrepresentations.Ourand-orgraphsaresimilartothoseusedincomputervisiontorepresentpossibleobjects(JinandGeman,2006).5ModelEvaluationandExperimentsHereweevaluatesomeofthedesigndecisionsofourmodelandcompareittoabaselineN-grammodelandtoawidely-usedconstraint-basedmodel,BLICK.Inordertoprobemodelbehavior,wealso4Wedohowevernotethatitmaybepossibletolearnfeaturehierarchiesonalanguage-by-languagebasisfromuniversalar-ticulatoryandacousticbiases,assuggestedbyDresher(2009).

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
0
4
7
1
5
6
7
4
3
6

/

/
T

l

A
C
_
A
_
0
0
0
4
7
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

81

presentevaluationsonartificialdata,andasamplingof“representativeforms”preferredbyonemodelascomparedtoanother.Ourmodelconsistsofstructure-buildingopera-tionsoveralearnedinventoryofsubsegments.Ifourmodelcanexploitmorerepeatedstructureinphono-logicalformsthantheN-grammodelorconstraint-basedmodels,thenitshouldassignhigherprobabil-itiestoforms.Thelogprobabilityofaformunderamodelcorrespondstothedescriptionlengthofthatformunderthemodel;ifamodelassignsahigherlogprobabilitytoaform,thatmeansthemodelisca-pableofcompressingtheformmorethanothermod-els.Therefore,wecomparemodelsontheirabilitytoassignhighprobabilitiestophonologicalforms,asinGoldsmithandRiggle(2012).5.1EvaluationofModelComponentsWeareinterestedindiscoveringtheextenttowhicheachmodelcomponentdescribedabove—featuredependencygraphs(Section2.1),classnodestruc-ture(Section2.2),andtier-basedconditioning(Sec-tion2.4)—contributestotheabilityofthemodeltoexplainwordforms.Toevaluatethecontributionoffeaturedepen-dencygraphs,wecompareourmodelswithabase-lineN-grammodel,whichrepresentsphonemesasatomicunits.ForthisN-grammodel,weuseaHier-archicalDirichletProcesswithn=3.Toevaluatefeaturedependencygraphswithandwithoutarticulatedclassnodestructure,wecom-paremodelsusingthegraphshowninFigure3(theminimalstructurerequiredtoproducewell-formedphonemes)tomodelswiththegraphshowninFigure2,whichincludesphonologicallymoti-vated“classnodes”.5Toevaluatetier-basedconditioning,wecomparemodelswiththeconditioningdescribedinSec-tions2.4and3.3tomodelswherealldecisionsareconditionedonthefullfeaturalspecificationofthepreviousn−1phonemes.Thisallowsustoisolateimprovementsduetotier-basedconditioningbeyondimprovementsfromthefeaturehierarchy.5ThesefeaturedependencygraphsdifferfromthoseintheexpositioninSection2inthattheydonotincludeaMANNERfeature;butrathertreatvowelasapossiblevalueofMANNER.durationlaryngealnasalmannersuprasegmentalbacknessheightroundingCplace2ndart.lateralotherwisevowelFigure2:Featuredependencygraphwithclassnodestructureusedinourexperiments.PlaintextnodesareOR-nodeswithnochilddistributions.Thearcmarkedotherwiserepresentsseveralarcs,eachlabelledwithaconsonantmannersuchasstop,fricative,etc.durationlaryngealnasalmannersuprasegmentalbacknessheightroundingCplace2ndart.lateralotherwisevowelFigure3:“Flat”featuredependencygraph.5.2LexiconDataTheWOLEXcorpusprovidestranscriptionsforwordsindictionariesof60diverselanguages,rep-resentedintermsofphonologicalfeatures(Graff,2012).Inadditiontowords,thedictionariesin-cludesomeshortsetphrases,suchasofcourse.WeusethefeaturalrepresentationofWOLEX,andde-signourfeaturedependencygraphstogenerateonlywell-formedphonemesaccordingtothisfeaturesys-tem.Forspacereasons,wepresenttheevaluationofourmodelon14oftheselanguages,chosenbasedonthequalityoftheirtranscribedlexicons,andtheauthors’knowledgeoftheirphonologicalsystems.5.3Held-OutEvaluationHerewetestwhetherthedifferentmodelconfigu-rationsdescribedaboveassignhighprobabilitytoheld-outforms.Thisteststhemodels’abilitytogeneralizebeyondtheirtrainingdata.Wetraineachmodelon2500randomlyselectedwordformsfromaWOLEXdictionary,andcomputeposteriorpredic-tiveprobabilitiesfortheremainingwordformsfromthefinalstateofthemodel.

l

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

P

:
/
/

D
ich
R
e
C
T
.

M

ich
T
.

e
D
u

/
T

A
C
l
/

l

A
R
T
ich
C
e

P
D

F
/

D
Ö

ich
/

.

1
0
1
1
6
2

/
T

l

A
C
_
A
_
0
0
0
4
7
1
5
6
7
4
3
6

/

/
T

l

A
C
_
A
_
0
0
0
4
7
P
D

.

F

B
j
G
u
e
S
T

T

Ö
N
0
9
S
e
P
e
M
B
e
R
2
0
2
3

82

Languagengramflatcl.nodeflat/notierscl.node/notiersEnglish-22.20-22.15-21.73∗∗-22.15-22.14French-18.30-18.28-17.93∗∗-18.29-18.28Georgian-20.21-20.17-19.64∗-20.18-20.18German-24.77-24.72-24.07∗∗-24.73-24.74Greek-22.48-22.45-21.65∗∗-22.45-22.45HaitianCreole-16.09-16.04-15.82∗∗-16.05-16.04Lithuanian-19.03-18.99-18.58∗-18.99-18.99Mandarin-13.95-13.83∗-13.78∗∗-13.82∗-13.82∗Mor.Arabic-16.15-16.10-16.00∗-16.13-16.12Polish-20.12-20.08-19.76∗∗-20.08-20.07Quechua-14.35-14.30-13.87∗-14.30-14.31Romanian-18.71-18.68-18.32∗∗-18.69-18.68Tatar-16.21-16.18-15.65∗∗-16.19-16.19Turkish-18.88-18.85-18.55∗∗-18.85-18.84Table1:Averagelogposteriorpredictiveprobabilityofaheld-outform.“ngram”istheDPBackoff3-grammodel.“flat”modelsusethefeaturedependencygraphinFig-ure3.“cl.node”modelsusethegraphinFigure2.Seetextformotivationsofthesegraphs.“notiers”modelsconditioneachdecisiononthepreviousphoneme,ratherthanontiersofpreviousfeatures.Asterisksindicatesta-tisticalsignificanceaccordingtoat-testcomparingwiththescoresundertheN-grammodel.*=p<.05;**=p<.001.Table1showstheaverageprobabilityofaheld-outwordunderourmodelsandundertheN-grammodelforonemodelrun.6Foralllanguages,wegetastatisticallysignificantincreaseinprobabili-tiesbyadoptingtheautosegmentalmodelwithclassnodesandtier-basedconditioning.Modelvariantswithouteithercomponentdonotsignificantlyout-performtheN-grammodelexceptinChinese.Thecombinationofclassnodesandtier-basedcondition-ingresultsinmodelimprovementsbeyondthecon-tributionsoftheindividualfeatures.5.4EvaluationonArtificialDataOurmodeloutperformstheN-grammodelinpre-dictingheld-outforms,butitremainstobeshownthatthisperformanceisduetocapturingthekindsoflinguisticintuitionsdiscussedinSection2.AnalternativepossibilityisthattheAutosegmentalN-grammodel,whichhasmanymoreparametersthanaplainN-grammodel,cansimplylearnamoreac-curatemodelofanysequence,evenifthatsequencehasnoneofthestructurediscussedabove.Toevalu-atethispossibility,wecomparetheperformanceofourmodelinpredictingheld-outlinguisticformstoitsperformanceinpredictingheld-outformsfromartificiallexiconswhichexpresslydonothavethe6Themeanstandarddeviationperformoflogprobabilitiesover50runsofthefullmodelrangedfrom.09forAmharicto.23forDutch.linguisticstructureweareinterestedin.IftheautosegmentalmodeloutperformstheN-grammodelevenonartificialdatawithnophono-logicalstructure,thenitsperformanceonthereallinguisticdatainSection5.3mightbeoverfitting.Ontheotherhand,iftheautosegmentalmodeldoesbetteronrealdatabutnotartificialdata,thenwecanconcludethatitispickinguponsomerealdistinc-tivestructureofthatdata.ForeachreallexiconLr,wegenerateanartificiallexiconLabytrainingaDP3-grammodelonLrandforward-sampling|Lr|forms.Additionally,theformsinLaareconstrainedtohavethesamedistri-butionoverlengthsastheformsinLr.Theresultinglexiconshavenotier-basedorfeaturalinteractionsexceptastheyappearbychancefromtheN-grammodeltrainedontheselexica.ForeachLawethentrainourmodelsonthefirst2500formsandscoretheprobabilitiesoftheheld-outforms,thesamepro-cedureasinSection5.3.WeranthisprocedureforallthelexiconsshowninTable1.Forallbutonelexicon,wefindthattheautosegmentalmodelsdonotsignificantlyoutper-formtheN-grammodelsonartificialdata.Theex-ceptionisMandarinChinese,wheretheaveragelogprobabilityofanartificialformis−13.81undertheN-grammodeland−13.71underthefullautoseg-mentalmodel.Theresultsuggeststhattheanoma-lousbehaviorofMandarinChineseinSection5.3maybeduetooverfitting.Whenexposedtodatathatexplicitlydoesnothaveautosegmentalstructure,themodelisnotmoreaccuratethanaplainsequencemodelforalmostalllanguages.Butwhenexposedtoreallinguisticdata,themodelismoreaccurate.Thisresultprovidesev-idencethatthegenerativemodeldevelopedinSec-tion2capturestruedistributionalpropertiesoflexi-consthatareabsentinN-gramdistributions,suchasfeaturalandtier-basedinteractions.5.5ComparisonwithaConstraint-BasedModelHereweprovideacomparisonwithHayesandWilson(2008)’sPhonotacticLearner,whichout-putsaphonotacticgrammarintheformofasetofweightedconstraintsonfeatureco-occurrences.Thisgrammarisoptimizedtomatchtheconstraintviolationprofileinatraininglexicon,andsocanbe l D o w n o a d e d f r o m h t t p : / / D ich R e C T . M ich T . e d u / t a c l / l A R T ich C e - P D F / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 7 1 5 6 7 4 3 6 / / t l a c _ a _ 0 0 0 4 7 P D . F B j G u e S T T O N 0 9 S e P e M B e R 2 0 2 3 83 seenasaprobabilisticmodelofthatlexicon.Theauthorshavedistributedonesuchgrammar,BLICK,asa“referencepointforphonotacticprobabilityinexperimentation”(Hayes,2012).HerewecompareourmodelagainstBLICKonitsabilitytoassignprobabilitiestoforms,asinSection5.3.Ideally,wewouldsimplycomputetheprobabil-ityofformslikewedidinourearliermodelcom-parisons.BLICKreturnsscoresforeachform.However,sincetheprobabilisticmodelunderlyingBLICKisundirected,thesescoresareinfactunnor-malizedlogprobabilities,sotheycannotbecom-pareddirectlytothenormalizedprobabilitiesas-signedbytheothermodels.Furthermore,becausetheprobabilisticmodelunderlyingBLICKdoesnotpenalizeformsforlength,thenormalizingconstantoverallformsisinfactinfinite,makingstraightfor-wardcomparisonofpredictiveprobabilitiesimpos-sible.Nevertheless,wecanturnBLICKscoresintoprobabilitiesbyconditioningonfurtherconstraints,suchasthelengthkoftheform.Weenumerateallpossibleformsoflengthktocomputethenormal-izingconstantforthedistributionoverformsofthatlength.Thesameprocedurecanalsobeusedtocom-putetheprobabilitiesofeachform,conditionedonthelengthoftheformk,undertheN-gramandAu-tosegmentalmodels.TocompareourmodelsagainstBLICK,wecal-culateconditionalprobabilitiesforformsoflength2through5fromtheEnglishlexicon.7TheformsarethoseintheWOLEXcorpus;weincludethemforthisevaluationiftheyareksymbolslongintheWOLEXrepresentation.ForourN-gramandAu-tosegmentalmodels,weusethesamemodelsasinSection5.3.Theaverageprobabilitiesofformsun-derthethreemodelsareshowninTable2.Forlength3-5,theautosegmentalmodelassignsthehighestprobabilities,followedbytheN-grammodelandBLICK.Forlength2,BLICKoutperformstheDPN-grammodelbutnottheautosegmentalmodel.OurmodelassignshigherprobabilitiestoshortformsthanBLICK.Thatis,ourmodelshaveiden-tifiedmoreredundantstructureintheformsthanBLICK,allowingthemtocompressthedatamore.However,thecomparisonisimperfectinseveral7Enumeratingandscoringthe22,164,361,129possibleformsoflength6wascomputationallyimpractical.LengthBLICKngramcl.node2-6.50-6.81-5.183-9.38-8.76-7.954-14.1-11.7-11.45-18.1-14.2-13.9Table2:AveragelogposteriorpredictiveprobabilityofanEnglishformoffixedlengthunderBLICKandourmodels.EnglishN-gramEnglishFullModelcollaborationistmistrustfulaposterioriinharmoniousnesssacristyabsentmindednessmatterofcourseblamelessnessearnestmoneyphlegmaticallyTable3:MostrepresentativeformsfortheN-grammodelandforourfullmodel(“cl.node”inTable1)inEn-glish.Formsarepresentedinnativeorthography,butwerescoredbasedontheirphoneticform.ways.First,BLICKandourmodelsweretrainedondifferentdata;itispossiblethatourtrainingdataaremorerepresentativeofourtestdatathanBLICK’strainingdatawere.Second,BLICKusesadifferentunderlyingfeaturaldecompositionthanourmodels;itispossiblethatourfeaturesystemismoreac-curate.Nevertheless,theseresultsshowthatourmodelconcentratesmoreprobabilitymasson(short)formsattestedinalanguage,whereasBLICKlikelyspreadsitsprobabilitymassmoreevenlyoverthespaceofallpossible(short)strings.5.6RepresentativeFormsInordertogetasenseofthedifferencesbetweenmodels,weinvestigatewhatphonologicalformsarepreferredbydifferentkindsofmodels.Theseformsmightbeinformativeaboutthephonotacticpatternsthatourmodeliscapturingwhicharenotwell-representedinsimplermodels.Wecalculatetherep-resentativenessofaformfwithrespecttomodelm1asopposedtom2asp(F|m1)/P(F|m2)(Good,1965;TenenbaumandGriffiths,2001).Theformsthataremost“representative”ofmodelm1arenottheformsthatm1assignsthehighestprobability,butrathertheformsthatm1rankshighestrelativetom2.Tables3and4showformsfromthelexiconthataremostrepresentativeofourfullmodelandoftheN-grammodelforEnglishandTurkish.Themost l D o w n o a d e d f r o m h t t p : / / D ich R e C T . M ich T . e d u / t a c l / l A R T ich C e - P D F / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 7 1 5 6 7 4 3 6 / / t l a c _ a _ 0 0 0 4 7 P D . F B j G u e S T T O N 0 9 S e P e M B e R 2 0 2 3 84 TurkishN-gramTurkishFullModelüstfamilyabüyükkarapınardekstrinkızılcapınarmnemoteknialtınpınarekskavatörsarımehmetlerfoksteryekaraellilerTable4:MostrepresentativeformsforN-gramandAu-tosegmentalmodelsinTurkish.uniquelyrepresentativeformsforourfullmodelaremorphologicallycomplexformsconsistingofmanyproductive,frequentlyreusedmorphemessuchasness.Ontheotherhand,therepresentativeformsfortheN-grammodelincludeforeignformssuchasaposteriori(forEnglish)andekskavatör(forTurk-ish),whicharenotbuiltoutofpartsthatfrequentlyrepeatinthoselanguages.Therepresentativeformssuggestthatthefullmodelplacesmoreprobabilitymassonwordswhicharebuiltoutofhighlyproduc-tive,phonotacticallywell-formedparts.6DiscussionWefindthatourmodelssucceedinassigninghighprobabilitiestounseenforms,thattheydosospecifi-callyforlinguisticformsandnotrandomsequences,thattheytendtofavorformswithmanyproductiveparts,andthattheyperformcomparablytoastate-of-the-artconstraint-basedmodelinassigningprob-abilitiestoshortforms.TheimprovementforourmodelsovertheN-grambaselineisconsistentbutnotlarge.Weattributethistothewayinwhichphonologicalgeneraliza-tionsareusedinthepresentmodel:inparticular,phonologicalgeneralizationsfunctionprimarilyasaformofbackoffforasequencemodel.Ourmod-elshavelexicalmemoizationateachnodeinafea-turedependencygraph;assuch,thetopnodeinthegraphendsuprepresentingtransitionprobabil-itiesforwholephonemesconditionedonpreviousphonemes,andtherestofthefeaturedependencygraphfunctionsasabackoffdistribution.Whenamodelhasbeenexposedtomanytrainingforms,itsbehaviorwillbelargelydominatedbytheN-gram-likebehaviorofthetopnode.Infutureworkitmightbeeffectivetolearnanoptimalbackoffprocedurewhichgivesmoreinfluencetothebasedistribution(DuhandKirchhoff,2004;WoodandTeh,2009).Whilethetier-basedconditioninginourmodelwouldseemtobecapableofmodelingnonlocalinteractionssuchasvowelharmony,wehavenotfoundthatthemodelsdowellatreproducingthesenonlocalinteractions.Webelievethisisbecausethemodel’sbehaviorisdominatedbynodeshighinthefeaturedependencygraph.Inanycase,asimpleMarkovmodeldefinedovertiers,aswehavepre-sentedhere,mightnotbeenoughtofullymodelvowelharmony.Rather,amodelofphonologicalprocesses,transducingunderlyingformstosurfaceforms,seemslikeamorenaturalwaytocapturethesephenomena.Westressthatthismodelisnottiedtoaparticularfeaturedependencygraph.Infact,webelieveourmodelprovidesanovelwayoftestingdifferenthy-pothesesaboutfeaturestructures,andcouldformthebasisforlearningtheoptimalfeaturehierarchyforagivendataset.Thechoiceoffeaturedependencygraphhasalargeeffectonwhatfeaturalinteractionsthemodelcanrepresentdirectly.Forexample,nei-therfeaturedependencygraphhassharedplacefea-turesforconsonantsandvowels,sothemodelhaslimitedabilitytorepresentplace-basedrestrictionsonconsonant-vowelsequencessuchasrequirementsforlabializedorpalatalizedconsonantsinthecon-textof/u/or/i/.Theseinteractionscanbetreatedinourframeworkifvowelsandconsonantsshareplacefeatures,asinPadgett(2011).7ConclusionWehavepresentedaprobabilisticgenerativemodelforsequencesofphonemesdefinedintermsofphonologicalfeatures,basedonrepresentationalideasfromgenerativephonologyandtoolsfromBayesiannonparametricmodeling.Weconsiderourmodelasaproofofconceptthatprobabilisticstructure-buildingmodelscanincluderichfeaturalinteractions.OurmodelrobustlyoutperformsanN-grammodelonsimplemetrics,andlearnstogener-ateformsconsistingofhighlyproductiveparts.Wealsoviewthisworkasatestofthescientifichy-pothesesthatphonologicalfeaturescanbeorganizedinahierarchyandthattheyinteractalongtiers:inourmodelevaluation,wefoundthatbothconceptswerenecessarytogetanimprovementoverabase-lineN-grammodel. l D O w N O A D e D F R O M H T T P : / / D ich R e C T . M ich T . e d u / t a c l / l A R T ich C e - P D F / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 7 1 5 6 7 4 3 6 / / t l a c _ a _ 0 0 0 4 7 P D . F B j G u e S T T O N 0 9 S e P e M B e R 2 0 2 3 85 AcknowledgmentsWewouldliketothankTalLinzen,LeonBergen,EdwardFlemming,EdwardGibson,BobBerwick,JimGlass,andtheaudiencesatMIT’sPhonologyCircle,SIGMORPHON,andtheLSA2016AnnualMeetingforhelpfulcomments.Thisworkwassup-portedinpartbyNSFDDRIGGrant#1551543toR.F.ReferencesAdamAlbright.2009.Feature-basedgeneralizationasasourceofgradientacceptability.Phonology,26:9–41.RobertC.Berwick.1985.Theacquisitionofsyntacticknowledge.MITPress,Cambridge,MA.JeffA.BilmesandKatrinKirchhoff.2003.Fac-toredlanguagemodelsandgeneralizedparallelback-off.InProceedingsofthe2003ConferenceoftheNorthAmericanChapteroftheAssociationforCom-putationalLinguisticsonHumanLanguageTechnol-ogy:CompanionVolumeoftheProceedingsofHLT-NAACL2003–ShortPapers–Volume2,pages4–6.As-sociationforComputationalLinguistics.GeertBooij.2011.Morphemestructureconstraints.InTheBlackwellCompaniontoPhonology.Blackwell.NoamChomskyandMorrisHalle.1965.Somecontro-versialquestionsinphonologicaltheory.JournalofLinguistics,1:97–138.NoamChomskyandMorrisHalle.1968.TheSoundPatternofEnglish.Harper&Row,NewYork,NY.GeorgeN.ClementsandElizabethV.Hume.1995.Theinternalorganizationofspeechsounds.InTheHand-bookofPhonologicalTheory,pages24–306.Black-well,Oxford.GeorgeN.Clements.1985.Thegeometryofphonologi-calfeatures.PhonologyYearbook,2:225–252.JohnColemanandJanetB.Pierrehumbert.1997.Stochasticphonologicalgrammarsandacceptability.InJohnColeman,editor,Proceedingsofthe3rdMeet-ingoftheACLSpecialInterestGroupinComputa-tionalPhonology,pages49–56,Somserset,NJ.Asso-ciationforComputationalLinguistics.RobertDaland,BruceHayes,JamesWhite,MarcGarellek,AndreasDavis,andIngridNormann.2011.Explainingsonorityprojectioneffects.Phonology,28:197–234.CarldeMarcken.1996.Linguisticstructureascom-positionandperturbation.InProceedingsofthe34thannualmeetingonAssociationforComputationalLin-guistics,pages335–341.AssociationforComputa-tionalLinguistics.B.ElanDresher.2009.TheContrastiveHierarchyinPhonology.CambridgeUniversityPress.KevinDuhandKatrinKirchhoff.2004.Automaticlearningoflanguagemodelstructure.InProceedingsofCOLING2004,Geneva,Switzerland.ThomasS.Ferguson.1973.Abayesiananalysisofsomenonparametricproblems.AnnalsofStatistics,1(2):209–230.JohnGoldsmithandJasonRiggle.2012.Informationtheoreticapproachestophonology:thecaseofFinnishvowelharmony.Naturallangaugeandlinguisticthe-ory,30(3):859–96.SharonGoldwater,ThomasL.Griffiths,andMarkJohn-son.2006.Interpolatingbetweentypesandtokensbyestimatingpower-lawgenerators.InAdvancesinNeu-ralInformationProcessingSystems,volume18,pages459–466,Cambridge,MA.MITPress.SharonGoldwater.2006.NonparametricBayesianMod-elsofLexicalAcquisition.Ph.D.thesis,BrownUni-versity.IrvingJohnGood.1965.TheEstimationofProbabili-ties.MITPress,Cambridge,MA.NoahD.Goodman,VikashK.Mansinghka,DanielRoy,KeithBonawitz,andJoshuaB.Tenenbaum.2008.Church:Alanguageforgenerativemodels.InUn-certaintyinArtificialIntelligence,Helsinki,Finland.AUAIPress.PeterGraff.2012.CommunicativeEfficiencyintheLexi-con.Ph.D.thesis,MassachusettsInstituteofTechnol-ogy.MorrisHalle.1959.TheSoundPatternofRussian:Alinguisticandacousticalinvestigation.Mouton,TheHague,TheNetherlands.MorrisHalle.1995.Featuregeometryandfeaturespreading.LinguisticInquiry,26(1):1–46.BruceHayesandColinWilson.2008.Amaximumen-tropymodelofphonotacticsandphonotacticlearning.LinguisticInquiry,39(3):379–440.BruceHayes.2012.Blick-aphonotacticprobabilitycalculator.JeffreyHeinz,ChetanRawal,andHerbertG.Tanner.2011.Tier-basedstrictlylocalconstraintsforphonol-ogy.InThe49thAnnualMeetingoftheAssociationforComputationalLinguistics.RomanJakobson,C.GunnarM.Fant,andMorrisHalle.1952.PreliminariestoSpeechAnalysis:TheDistinc-tiveFeaturesandtheirCorrelates.TheMITPress,Cambridge,MassachusettsandLondon,England.AdamJardineandJeffreyHeinz.2015.Aconcatenationoperationtoderiveautosegmentalgraphs.InProceed-ingsofthe14thMeetingontheMathematicsofLan-guage(MoL2015),pages139–151,Chicago,USA,Juli. l D O w N O A D e D F R O M H T T P : / / D ich R e C T . M ich T . e d u / t a c l / l A R T ich C e - P D F / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 4 7 1 5 6 7 4 3 6 / / t l a c _ a _ 0 0 0 4 7 P D . F B j G u e S T T O N 0 9 S e P e M B e R 2 0 2 3 86 YaJinandStuartGeman.2006.Contextandhierar-chyinaprobabilisticimagemodel.InProceedingsofthe2006IEEEComputerSocietyConferenceonComputerVisionandPatternRecognition(CVRP’06),pages2145–2152.MarkJohnsonandSharonGoldwater.2009.Improvingnonparametericbayesianinference:experimentsonunsupervisedwordsegmentationwithadaptorgram-mars.InNAACL-HLT2009,pages317–325.Associa-tionforComputationalLinguistics.MarkJohnson,ThomasLGriffiths,andSharonGoldwa-ter.2007.Adaptorgrammars:Aframeworkforspeci-fyingcompositionalnonparametricBayesianmodels.InAdvancesinNeuralInformationProcessingSys-tems,pages641–648.Mary-LouiseKean.1975.TheTheoryofMarkednessinGenerativeGrammar.Ph.D.thesis,MassachusettsInstituteofTechnology.DavidJ.C.MacKayandLindaC.BaumanPeto.1994.AhierarchicalDirichletlanguagemodel.NaturalLan-guageEngineering,1:1–19.JohnMcCarthy.1988.Featuregeometryanddepen-dency:Areview.Phonetica,43:84–108.DonaldMichie.1968.“memo”functionsandmachinelearning.Nature,218:19–22.TimothyJ.O’Donnell.2015.ProductivityandReuseinLanguage:ATheoryofLinguisticComputationandStorage.TheMITPress,Cambridge,MassachusettsandLondon,England.JayePadgett.2011.Consonant-vowelplacefeaturein-teractions.InTheBlackwellCompaniontoPhonol-ogy,pages1761–1786.BlackwellPublishing,Malden,MA.EzerRasinandRoniKatzir.2014.Alearnabilityargu-mentforconstraintsonunderlyingrepresentations.InProceedingsofthe45thAnnualMeetingoftheNorthEastLinguisticSociety(NELS45),Cambridge,Mas-sachusetts.JayaramSethuraman.1994.AconstructivedefinitionofDirichletpriors.StatisticaSinica,pages639–650.RichardStanley.1967.Redundancyrulesinphonology.Language,43(2):393–436.YeeWhyeTeh.2006.ABayesianinterpretationofin-terpolatedKneser-Ney.TechnicalReportTRA2/06,SchoolofComputing,NationalUniversityofSinga-pore.JoshuaBTenenbaumandThomasLGriffiths.2001.Therationalbasisofrepresentativeness.InProceedingsofthe23rdAnnualConferenceoftheCognitiveScienceSociety,pages1036–1041.NikolaiS.Trubetzkoy.1939.GrundzügederPhonolo-gie.Number7inTravauxduCercleLinguistiquedePrague.Vandenhoeck&Ruprecht,Göttingen.FrankWoodandYeeWhyeTeh.2009.Ahierarchi-calnonparametricBayesianapproachtostatisticallan-guagemodeldomainadaptation.InArtificialIntelli-genceandStatistics,pages607–614.FrankWood,CédricArchambeau,JanGasthaus,JamesLancelot,andYeeWhyeTeh.2009.Astochasticmemoizerforsequencedata.InProceedingsofthe26thInternationalConferenceonMachineLearning,Montreal,Kanada.Transactions of the Association for Computational Linguistics, Bd. 5, S. 73–86, 2017. Action Editor: Eric Fosler-Lussier. Bild

PDF Herunterladen