Transactions of the Association for Computational Linguistics, 1 (2013) 75–88. Action Editor: Sharon Goldwater.
Submitted 11/2012; Überarbeitet 1/2013; Published 3/2013. C
(cid:13)
2013 Verein für Computerlinguistik.
AnHDPModelforInducingCombinatoryCategorialGrammarsYonatanBiskandJuliaHockenmaierDepartmentofComputerScienceTheUniversityofIllinoisatUrbana-Champaign201NGoodwinAveUrbana,IL61801{bisk1,juliahmr}@illinois.eduAbstractWeintroduceanovelnonparametricBayesianmodelfortheinductionofCombinatoryCat-egorialGrammarsfromPOS-taggedtext.Itachievesstateoftheartperformanceonanumberoflanguages,andinduceslinguisti-callyplausiblelexicons.1IntroductionWhatgrammaticalrepresentationisappropriateforunsupervisedgrammarinduction?Initialattemptswithcontext-freegrammars(CFGs)werenotverysuccessful(CarrollandCharniak,1992;Charniak,1993).OnereasonmaybethatCFGsrequirethespecificationofafiniteinventoryofnonterminalcat-egoriesandrewriterules,butunlessoneadoptslin-guisticprinciplessuchasX-bartheory(Jackendoff,1977),thesenonterminalsareessentiallyarbitrarylabelsthatcanbecombinedinarbitraryways.WhilefurtherCFG-basedapproacheshavebeenproposed(Clark,2001;KuriharaandSato,2004),mostre-centworkhasfollowedKleinandManning(2004)indevelopingmodelsfortheinductionofprojec-tivedependencygrammars.Ithasbeenshownthatmoresophisticatedprobabilitymodels(HeaddenIIIetal.,2009;Gillenwateretal.,2011;CohenandSmith,2010)andlearningregimes(Spitkovskyetal.,2010),aswellastheincorporationofpriorlin-guisticknowledge(CohenandSmith,2009;Berg-KirkpatrickandKlein,2010;Naseemetal.,2010)canleadtosignificantimprovementoverKleinandManning’sbaselinemodel.Theuseofdependencygrammarscircumventsthequestionofhowtoobtainanappropriateinventoryofcategories,sincedepen-dencyparsesaresimplydefinedbyunlabelededgesbetweenthelexicalitemsinthesentence.Butde-pendencygrammarsmakeitalsodifficulttocap-turenon-localstructures,andBlunsomandCohn(2010)showthatitmaybeadvantageoustorefor-mulatetheunderlyingdependencygrammarintermsofatree-substitutiongrammar(TSG)whichpairswordswithtreeletsthatspecifythenumberofleftandrightdependentstheyhave.Inthispaper,weexploreyetanotheroption:insteadofdependencygrammars,weuseCombinatoryCategorialGram-mar(CCG,Steedman(1996;2000)),alinguisticallyexpressiveformalismthatpairslexicalitemswithrichcategoriesthatcapturealllanguage-specificin-formation.Thismayseemapuzzlingchoice,sinceCCGrequiresasignificantlylargerinventoryofcat-egoriesthaniscommonlyassumedforCFGs.How-ever,unlikeCFGnonterminals,CCGcategoriesarenotarbitrarysymbols:theyencode,andaredeter-minedby,thebasicwordorderofthelanguageandthenumberofargumentseachwordtakes.CCGisverysimilartoTSGinthatitalsopairslexicalitemswithrichitemsthatcapturealllanguage-specificin-formation.LikeTSGandprojectivedependencygrammars,werestrictourselvestoaweaklycontext-freefragmentofCCG.ButwhileTSGdoesnotdis-tinguishbetweenargumentandmodifierdependen-cies,CCGmakesanexplicitdistinctionbetweenthetwo.AndwhiletheelementarytreesofBlunsomandCohn(2010)’sTSGandtheirinternalnodella-belshavenoobviouslinguisticinterpretation,thesyntacticbehaviorofanyCCGconstituentcanbedirectlyinferredfromitscategory.Toseewhether
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
1
1
1
5
6
6
6
3
1
/
/
T
l
A
C
_
A
_
0
0
2
1
1
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
76
thealgorithmhasidentifiedthebasicsyntacticprop-ertiesofthelanguage,itishencesufficienttoin-specttheinducedlexicon.Conversely,BoonkwanandSteedman(2011)showthatknowledgeofthesebasicsyntacticpropertiesmakesitveryeasytocre-atealanguage-specificlexiconforaccurateunsu-pervisedCCGparsing.WehaverecentlyproposedanalgorithmforinducingCCGs(BiskandHocken-maier,2012B)thathasbeenshowntobecompetitivewithotherapproachesevenwhenpairedwithaverysimpleprobabilitymodel(Gellingetal.,2012).Inthispaper,wepairthisinductionalgorithmwithanovelnonparametricBayesianmodelthatisbasedonadifferentfactorizationofCCGderivations,andshowthatitoutperformsouroriginalmodelandmanyotherapproachesonalargenumberoflan-guages.OurresultsindicatethattheuseofCCGyieldsgrammarsthataresignificantlymorerobustwhendealingwithlongersentencesthanmostde-pendencygrammar-basedapproaches.2CombinatoryCategorialGrammarCombinatoryCategorialGrammar(Steedman,2000)isalinguisticallyexpressive,lexicalizedgrammarformalismthatassociatesrichsyntactictypeswithwordsandconstituents.Forsimplicity,werestrictourselvestothestandardtwoatomictypesS(Sätze)andN(encompassingbothnounsandnounphrases)fromwhichwerecursivelybuildcategories.ComplexcategoriesareoftheformX/YorX\Y,andrepresentfunctionswhichreturnaresultoftypeXwhencombinedwithanargumentoftypeY.Thedirectionalityoftheslashindicateswhethertheargumentprecedesorfollowsthefunctor.WewriteX|Ywhenthedirectionoftheslashdoesnotmatter.TheCCGlexiconencodesalllanguage-specificinformation.Itpairseverywordwithasetofcate-goriesthatdefinebothitsspecificsyntacticbehavioraswellastheoverallwordorderofthelanguage:N:{Er,girl,lunch,…}N/N:{good,Die,eating,…}S\N:{sleeps,aß,eating,…}(S\N)/N:{sees,aß,…}S\S:{quickly,Heute…}(S\N)/(S\N):{good,Die,…}Todrawasimplecontrast,inSpanishwewouldexpectadjectivestotakethecategoryN\NbecauseSpanishwordorderingdictatesthattheadjectivefol-lowthenoun.Thelexicalcategoriesalsocaptureword-worddependencies:head-argumentrelationsarecapturedbythelexicalcategoryofthehead(z.B.(S\N)/N),whereashead-modifierrelationsarecap-turedbythelexicalcategoryofthemodifier,whichisoftheformX\XorX/X,andmaytakefurtherargumentsofitsown.Ourgoalwillbetoautomati-callylearnthesetypesoflexiconsforalanguage.InFigure3,wejuxtaposeseveralsuchlexiconswhichwereautomaticallydiscoveredbyoursystem.TherulesofCCGaredefinedbyasmallsetofofcombinatoryrules,whicharetraditionallywrit-tenasschemasthatdefinehowconstituentscanbecombinedinabottom-upfashion(althoughgenera-tiveprobabilitymodelsforCCGviewtheminatop-downmanner,akintoCFGrules).Thefirst,andmostobvious,oftheserulesisfunctionapplication:X/YY⇒X(B0>)YX\Y⇒X(B0<)HerethefunctorX/YorX\YisappliedtoanargumentYresultinginX.WhilestandardCCGhasanumberofadditionalcombinatoryrules(type-raising,generalizedvariantsofcompositionandsubstitution)thatincreaseitsgenerativecapacitybe-yondcontext-freegrammarsandallowaneleganttreatmentofnon-localdependenciesarisinginex-traction,coordinationandscrambling,wefollowBiskandHockenmaier(2012b)andusearestrictedfragment,withouttype-raising,thatallowsonlyba-siccompositionandiscontext-free:X/YY/Z⇒X/Z(B1>)X/YY\Z⇒X\Z(B1X>)Y\ZX\Y⇒X\Z(B1<)Y/ZX\Y⇒X/Z(B1X<)Thesuperscript1denotesthearityofthecompo-sitionwhichistoolowtorecovernon-projectivede-pendencies,andourgrammaristhusweaklyequiva-lenttothedependencygrammarrepresentationsthatarecommonlyusedforgrammarinduction.Themainroleofcompositioninourfragmentisthatitallowssententialandverbmodifierstobothtakecat-egoriesoftheformS\SandS/S.Compositionin-
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
1
1
1
5
6
6
6
3
1
/
/
t
l
a
c
_
a
_
0
0
2
1
1
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
77
troducesspuriousambiguities,whichweeliminatebyusingEisner(1996)’snormalform.1Coordinatingconjunctionshaveaspecialcate-goryconj,andwebinarizecoordinationasfollows(HockenmaierandSteedman,2007):XX[conj]⇒&1X(&1)conjX⇒&2X[conj](&2)3CategoryinductionUnlikedependencygrammars,CCGrequiresanin-ventoryoflexicalcategories.Givenasetoflexicalcategories,thecombinatoryrulesdefinethesetofparsesforeachsentence.WefollowthealgorithmproposedbyBiskandHockenmaier(2012b)toau-tomaticallyinducethesecategories.Thelexiconisinitializedbypairingallnominaltags(nouns,pro-nounsanddeterminers)withthecategoryN,allverbtagswiththecategoryS,andcoordinatingconjunc-tionswiththecategoryconj:CONJ→conjDET,NOUN,NUM,PRON→NVERB→SAlthoughourlexiconsaredefinedovercorpus-specificPOStags,weuseaslightlymodifiedversionofPetrovetal.(2012)’sUniversalPOStagsettocat-egorizethemintothesebroadclasses.Theprimarychangeswemaketotheirmappingsaretheadditionofadistinction(wherepossible)betweensubordi-natingandcoordinatingconjunctionsandbetweenmainandauxiliaryverbs2.Sincetheinitiallexiconconsistsonlyofatomiccategories,itcannotparseanycomplexsentences:ThemanatequicklyDTNNSVBDRB-NS-Complexlexicalcategoriesareinducedbycon-sideringthelocalcontextinwhichtokensappear.Givenaninputsentence,andacurrentlexiconwhichassignscategoriestoatleastsomeofthetokensinthesentence,weapplythefollowingtworulestoaddnewcategoriestothelexicon:TheargumentruleallowsanylexicaltokensthathavecategoriesotherthanNandconjtotakeimmediatelyadjacent1Thenormal-formofHockenmaierandBisk(2010)isnotrequiredforthisfragmentofCCG.2Thisdistinctionwassuggestedbytheauthors(p.c.)Nsasarguments.Themodifierruleallowsanyto-ken(otherthancoordinatingconjunctionsthatap-pearinthemiddleofsentences)tomodifyanimme-diateneighborthathasthecategorySorNorisamodifier(S|SorN|N)itself.ThemanatequicklyDTNNSVBDRBN/NN,S/SS,N\NS\SS\NTheserulescanbeappliediterativelytoformmorecomplexcategories.Werestrictlexicalcate-goriestoamaximalarityof2,anddisallowthecat-egory(S/N)\N,sinceitisequivalentto(S\N)/N.ThemanatequicklyDTNNSVBDRBN/N,N,S/SS,N\N,S\S,(S/S)/(S/S)(N\N)/(N\N)S\N(N\N)\(N\N)(N/N)\(N/N)(S/S)\(S/S)(S\S)/(S\S)Theresultant,overlygeneral,lexiconisthenusedtoparsethetrainingdata.EachcompleteparsehastobeofcategorySorN,withtheconstraintthatsentencesthatcontainamainverbcanonlyformparsesofcategoryS.4AnewprobabilitymodelforCCGGenerativemodelsdefinetheprobabilityofaparsetreeτastheproductofindividualruleprobabili-ties.Ourpreviouswork(BiskandHockenmaier,2012b)usesthemostbasicmodelofHockenmaierandSteedman(2002),whichfirstgeneratestheheaddirection(left,right,unary,orlexical),followedbytheheadcategory,andfinallythesistercategory.3ThisfactorizationdoesnottakeadvantageoftheuniquefunctionalnatureofCCG.Wethereforein-troduceanewfactorizationwecalltheArgumentModel.ItexploitsthefactthatCCGimposesstrongconstraintsonacategory’sleftandrightchildren,sincethesemustcombinetocreatetheparenttypeviaoneofthecombinators.InpracticethismeansthatgiventheparentX/Z,thechoiceofcombinator4candanargumentYwecanuniquelydeterminethecategoriesoftheleftandrightchildren:3Huangetal.(2012)presenta(deficient)variantandBayesianextensionoftheBiskandHockenmaier(2012b)modelwithoutk-bestsmoothingthatbothunderperformourpublishedresults.4IfXisanatomiccategory,onlyapplicationispossible.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
2
1
1
1
5
6
6
6
3
1
/
/
t
l
a
c
_
a
_
0
0
2
1
1
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
8
S
e
p
e
m
b
e
r
2
0
2
3
78
Parentc⇒LeftRightX/ZB0>(X/Z)/YYB0
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
1
1
1
5
6
6
6
3
1
/
/
T
l
A
C
_
A
_
0
0
2
1
1
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
79
HDP-CCG1)DrawglobalparametersDefineMLErootparameterθTOPDrawtop-levelsymbolweightsβY∼GEM(αY)Drawtop-levellexicalweightsβL∼GEM(αL)Foreachgrammarsymbolz∈{1,2,…}:DefineMLEruletypeparametersθTzDrawargumentparametersφYz∼DP(αY,βY)DrawlexicalemissionparametersφLz∼DP(αL,βL)Foreachgrammarsymboly∈{1,2,…}:DefineMLEcombinatorparametersθCz,y2)Foreachparsetree:GeneraterootnodezTOP∼Binomial(θTOP)Foreachnodeiintheparsetree:Chooseruletypeti∼Multinomial(θTzi)Ifti==Lex:Emitterminalsymbolxi∼Multinomial(φLzi)Ifti==Left/Right/Unary:Generateargumentcategoryyi∼Multinomial(φYzi)Generatecombinatorci∼Multinomial(θCzi,yi)DeterministicallycreatezL(ich)(andzR(ich)ifbinary)ziyicizL(ich)zR(ich)xL(ich)xR(ich)z∞∞yφYθTθCφLβYβLBecauseweareworkingwithCCG,theparentzi,argumentyiandcombinatorciuniquelydefinethetwochildrencategories(zL(ich),zR(ich)).Thedashedarrowshererep-resentthedeterministicprocessusedtogeneratethesetwocategories.Figure1:TheHDP-CCGhastwobasedistributions,oneoverthespaceofcategoriesandtheotheroverwords(ortags).Foreverygrammarsymbol,anargumentdistributionandemissiondistributionisdrawnfromthecorrespondingDirichletProcesses.Inaddition,thereareseveralMLEdistributionstiedtoagivensymbolforgeneratingruletypes,combinatorsandlexicaltokens.categorieswhosestickweights(βY)correspondtothefrequencyofthecategoryinthecorpus.Nextwebuildthehierarchicalcomponentofourmodelbychoosinganargumentdistribution(φY),againoverthespaceofcategories,foreveryparentX/Z.ThisargumentdistributionisdrawnfromthepreviouslydefinedbaseDP,allowingforanimportantlevelofparametertyingacrossallargumentdistributions.WhilethebaseDPdoesdefinethemeanaroundwhichallargumentdistributionsaredrawn,wealsorequireanotionofvarianceorprecisionwhichde-termineshowsimilarindividualdrawswillbe.Thisprecisionisdeterminedbythemagnitudeofthehy-perparameterαY.ThishierarchyisparalleledforlexicalproductionswhicharedrawnfromaunigrambaseDPoverterminalsymbolscontrolledbyαL.ForsimplicityweusethesameschemeforsettingthevaluesforαLasαY.Wepresentexperimen-talresultsinwhichwevarythevalueofαYasafunctionofthenumberofoutcomesallowedbythegrammarforargumentcategoriesorthecorpusinthecaseofterminalsymbols.Specifically,wesetαY=npforconditioningcontextswithnout-comes.SinceLiangetal.(2009)foundthattheidealvalueforalphaappearstobesuperlinearbutsub-quadraticinn,wepresentresultswhereptakesthevalues0,1.0,1.5,and2.0toexploretherangefromuniformtoquadratic.Thissettingforαistheonlyfreeparameterinthemodel.Bycontrollingpreci-sionwecantellthemodeltowhatextentglobalcor-pusstatisticsshouldbetrusted.WebelievethishasasimilareffecttoBiskandHockenmaier(2012B)’stop-kupweightingandsmoothingscheme.Oneadvantageoftheargumentmodelisthatitonlyrequiresasingledistributionovercategoriesforeachbinarytree.IncontrasttosimilarproposalsforCFGs(Liangetal.,2007),whichimposenoformalrestrictionsonthenonterminalsX,Y,Zthatcanap-pearinarewriteruleX→YZ,thisgreatlysim-plifiesthemodelingproblem(yieldingeffectivelyamodelthatismoreakintononparametricHMMs),sinceitavoidstheneedtocapturecorrelationsbe-
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
1
1
1
5
6
6
6
3
1
/
/
T
l
A
C
_
A
_
0
0
2
1
1
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
80
tweendifferentbasedistributionsforYandZ.VariationalInferenceHDPsneedtobeestimatedwithapproximatetechniques.AsanalternativetoGibbssampling(Tehetal.,2006),whichisexact,buttypicallyveryslowandhasnoclearconvergencecriteria,variationalinferencealgorithms(Bishop,2006;BleiandJordan,2004)estimatetheparame-tersofatruncatedmodeltomaximizealowerboundofthelikelihoodoftheactualmodel.ThisallowsforfactorizationofthemodelandatrainingprocedureanalogoustotheInside-Outsidealgorithm(LariandYoung,1991),allowingtrainingtorunveryquicklyandinatriviallyparallelizablemanner.ToinitializethebaseDP’sstickweights,wefol-lowtheexampleofKuriharaetal.(2007)anduseanMLEmodelinitializedwithuniformdistributionstocomputeglobalcountsforthecategoriesinourgrammar.Whennormalizedtheseprovideabetterinitializationthanauniformsetofweights.Updatestothedistributionsarethenperformedinacoordi-natedescentmannerwhichincludesre-estimationofthebaseDPs.Invariationalinference,multinomialweightsWtaketheplaceofprobabilities.TheweightsforanoutcomeYwithconditioningvariableParecom-putedbysummingpseudocountswithascaledmeanvectorfromthebaseDP.ThecomputationinvolvesmovinginthedirectionofthegradientoftheDirich-letdistributionwhichresultsinthefollowingdiffer-enceofDigammas(Ψ):WP(Y)=Ψ(C(P,Y)+αPβY)−Ψ(C(P,∗)+αP)Wichtig,theDigammaandmultinomialweightscomprisearigh-get-richerscheme,biasingthemodelagainstrareoutcomes.Inaddition,sincevariationalinferenceisdonebycoordinatedescent,itistriviallyparallelizeable.Inpractice,trainingandtestingourmodelsonthecorporacontainingsen-tencesuptolength15usedinthispapertakesbe-tweenoneminutetoatmostthreehoursonasingle12-coremachinedependingontheirsize.6EvaluationAsisstandardforthistask,weevaluateoursystemsagainstanumberofdifferentdependencytreebanks,andmeasureperformanceintermsoftheaccuracyofdirecteddependencies(i.e.thepercentageofwordsinthetestcorpusthatarecorrectlyattached).WeusethedatafromthePASCALchallengeforgrammarinduction(Gellingetal.,2012),thedatafromtheCoNLL-Xsharedtask(BuchholzandMarsi,2006)andGoldberg(2011)’sHebrewcorpus.ConvertingCCGderivationsintodependenciesismostlystraightforward,sincetheCCGderivationidentifiestherootwordofeachsentence,andhead-argumentandhead-modifierdependenciesareeasilyreadoffofCCGderivations,sincethelexiconde-finesthemexplicitly.Unlikedependencygrammar,CCGisdesignedtorecovernon-localdependenciesthatariseincontrolandbindingconstructionsaswellasinwh-extractionandnon-standardcoordi-nation,butsincethisrequiresre-entrancies,orco-indexationofarguments(HockenmaierandSteed-man,2007),withinthelexicalcategoriesthattriggertheseconstructions,ourcurrentsystemreturnsonlylocaldependencies.Butsincedependencygram-marsalsocapturesonlylocaldependencies,thishasnonegativeinfluenceonourcurrentevaluation.However,adirectcomparisonbetweendepen-dencytreebanksanddependenciesproducedbyCCGismoredifficult(ClarkandCurran,2007),sincedependencygrammarsallowconsiderablefreedominhowtoanalyzespecificconstructionssuchasverbclusters(whichverbisthehead?)prepositionalphrasesandparticles(istheheadthenounorthepreposition/particle?),subordinatingconjunctions(istheconjunctionadependentoftheheadofthemainclauseandtheheadoftheembed-dedclauseadependentoftheconjunction,orviceversa?)andthisisreflectedinthefactthatthetree-banksweconsideroftenapplydifferentconventionsforthesecases.Althoughremedyingthisissueisbeyondthescopeofthiswork,thesediscrepanciesverymuchhintattheneedforabettermechanismtoevaluatelinguisticallyequivalentstructuresortree-bankstandardization.Themostproblematicconstructioniscoordina-tion.InstandardCCG-to-dependencyschemes,bothconjunctsareindependent,andtheconjunctionitselfisnotattachedtothedependencygraph,whereasde-pendencygrammarshavetostipulatethateitheroneoftheconjunctsortheconjunctionitselfisthehead,withmultiplepossibilitiesofwheretheremaining
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
1
1
1
5
6
6
6
3
1
/
/
T
l
A
C
_
A
_
0
0
2
1
1
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
81
constituentsattach.InadditiontothestandardCCGscheme,wehaveidentifiedfivemainstylesofcon-junctioninourdata(Figure2),althoughseveralcor-poradistinguishmultipletypesofcoordinatingcon-junctionswhichusedifferentstyles(notallshownhere).Sinceoursystemhasexplicitrulesforcoordi-nation,wetransformitsoutputintothedesiredtargetrepresentationthatisspecifictoeachlanguage.7ExperimentsWeevaluateoursystemon13differentlanguages.Ineachcase,wefollowthetestandtrainingregimesthatwereusedtoobtainpreviouslypublishedresultsinordertoallowadirectcomparison.Wecom-pareoursystemtotheresultspresentedatthePAS-CALChallengeonGrammarInduction(Gellingetal.,2012)6,aswellastoGillenwateretal.(2011)andNaseemetal.(2012).WeuseNivre(2006)’sPenn2MaltimplementationofCollins(2003)’sheadrulestotranslatetheWSJPennTreebank(Marcusetal.,1993)intodependencies.Finally,whentrain-ingtheMLEversionofourmodelweuseasimplesmoothingschemewhichdefinesasmallruleproba-bility(e−15)topreventanyruleusedduringtrainingfromgoingtozero.7.1PASCALChallengeonGrammarInductionInTable1,wecomparetheperformanceoftheba-sicArgumentmodel(MLE),ofourHDPmodelwithfourdifferentsettingsofthehyperparameters(asex-plainedabove)andofthesystemspresentedinthePASCALChallengeonGrammarInduction(Gellingetal.,2012).Thesystemsinthiscompetitionwereinstructedtotrainoverthefulldataset,includingtheunlabelledtestdata,andincludeBiskandHocken-maier(2012A)’sCCG-basedsystem(BH)toCohnetal.(2010)’sreimplementationofKleinandManning(2004)’sDMVmodelinatree-substitutiongram-marframework(BC),aswellasthreeotherde-pendencybasedsystemswhicheitherincorporateNaseemetal.(2010)’srulesinadeterministicfash-ion(Søgaard,2012),relyonextensivetuningon6Numbersarefrompersonalcorrespondencewiththeorga-nizers.Thepreviouslypublishednumbersarenotcomparabletoliteratureduetoanerrorintheevaluation.http://wiki.cs.ox.ac.uk/InducingLinguisticStructure/ResultsDepComparablethedevelopmentset(Tu,2012)orincorporatemil-lionsofadditionaltokensfromWikipediatoesti-matemodelparameters(MarecekandZabokrtsky,2012).Weignorepunctuationforallexperimentsreportedinthispaper,butsincethetrainingdata(butnottheevaluation)includespunctuationmarks,participantswerefreetochoosewhethertoincludepunctuationorignoreit.WhileBHistheonlyothersystemwithdirectlyinterpretablelinguisticoutput,wealsoincludeadi-rectcomparisonwithBC,whoseTSGrepresenta-tionisequallyexpressivetoours.Finallywepresentarowwiththemaximumperformanceamongtheotherthreemodels.Aswehavenoknowledgeofhowmuchdatawasusedinthetrainingofothersys-temswesimplypresentresultsforsystemstrainedonlength15(notincludingpunctuation)sentencesandthenevaluatedatlengths10and15.TheMLEversionofourmodelshowsrathervari-ableperformance:althoughitsresultsareparticu-larlybadonBasque(Eu),itoutperformsbothBHandBConsomeothersettings.Bycontrast,theHDPsystemisalwaysbetterthantheMLEmodel.Itoutperformsallothersystemsonhalfofthecor-pora.Onaverage,itoutperformsBHandBCby10.3%and9.3%onlength10,or9.7%and7.8%onlength15respectively.ThemainreasonwhyoursystemdoesnotoutperformBCbyanevenhighermarginistheveryobvious11.4%/11.5%deficitonSlovene.However,theSlovenedependencytree-bankseemstofollowasubstantiallydifferentanno-tationscheme.Inparticular,thegoldstandardan-notationofthe1,000sentencesintheSlovenede-velopmentsettreatsmanyofthemasconsistingofindependentsentences(oftenseparatedbypunctua-tionmarksthatoursystemhasnoaccessto),sothattheaveragenumberofrootspersentenceis2.7:>>“verjetibelievetiI,,hh”jeismehkosoftreklasaidWhenoursystemispresentedwiththeseshortcomponentsinisolation,itoftentimesanalyzesthemcorrectly,butsinceithastoreturnatreewithasin-gleroot,itsperformancedegradessubstantially.WebelievetheHDPperformssowellascom-paredtotheMLEmodelbecauseoftheinfluenceofthesharedbasedistribution,whichallowsthe
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
1
1
1
5
6
6
6
3
1
/
/
T
l
A
C
_
A
_
0
0
2
1
1
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
82
Ar,Eu,Cs,Nl,WSJ,Ch,HeDa,HeEs,Bg,Von,PtSv,SlJanounconjnounnounconjnounnounconjnounnounconjnounnounconjnounFigure2:Inthetreebanksusedforevaluationdifferentstandardsexistforannotatingcoordination.Whilenotexhaustive,thistabledemonstratesfiveofthemostcommonschemesusedintheliterature.SyntacticallytheseareidenticalandtraditionallyCCGdrawsarcsonlytotheargumentswithoutattachingtheconjunction.Forthepurposesofcomparisonwiththeliteraturewehaveimplementedthesefivetranslationschemes.ArabicDanishSloveneSwedishDutchBasquePortugueseWSJChildesCzech#Tokens5,47025,34154,03261,87778,73781,345158,648163,727290,604436,126#Tags202436303041423366962PASCALBC60.8/58.444.7/39.462.6/57.963.2/56.651.8/52.053.0/48.952.4/50.268.6/63.347.4/46.147.9/43.1Max67.2/66.860.1/56.065.6/61.872.8/63.451.1/47.653.7/47.867.0/61.871.2/64.856.0/54.558.3/54.4BH41.6/43.746.4/43.849.6/43.963.7/57.049.7/43.645.1/39.670.8/67.268.2/59.661.4/59.845.0/38.9ThisworkMLE41.6/42.943.4/39.246.1/41.170.1/59.752.2/47.229.6/26.562.2/59.759.5/52.453.3/51.950.5/45.8HDP0.048.0/50.063.9/58.544.8/39.867.6/62.145.0/33.941.6/39.171.0/66.059.8/52.956.3/55.254.0/49.0HDP1.045.6/47.145.7/42.353.9/46.974.5/66.958.5/54.450.1/44.665.1/60.664.3/56.571.5/70.355.8/50.7HDP1.549.6/50.458.7/54.453.2/48.274.3/67.157.4/54.550.6/45.070.0/64.765.5/57.269.6/68.655.6/50.3HDP2.066.4/65.156.5/49.554.2/46.471.6/64.151.7/48.349.4/43.376.3/70.570.7/62.974.1/73.354.4/48.5+/−-0.8/-1.7+3.8/+2.5-11.4/-15.4+1.7/+3.5+6.7/+2.4-3.1/-3.9+5.5/+3.3-0.5/-1.9+12.7/+13.5-2.5/-3.7Table1:AcomparisonofthebasicArgumentmodel(MLE)andfourhyper-parametersettingsoftheHDP-CCGagainsttwosyntacticformalismsthatparticipatedinthePASCALChallenge(Gellingetal.,2012),BH(BiskandHockenmaier,2012A)andBC(BlunsomandCohn,2010),inadditiontoamaxoverallotherparticipants.Wetrainedonlength15data(punctuationremoved),includingthetestdataasrecommendedbytheorganizers.Thelastrowindicatesthedifferencebetweenourbestsystemandthecompetition.globalcategorydistributiontoinfluenceeachofthemorespecificdistributions.Further,itprovidesaverysimpleknobinthechoiceofhyperparame-ters,whichhasasubstantialeffectonperformance.Asideeffectofthehyperparametersisthattheirstrengthalsodeterminestherateofconvergence.Thismaybeoneofthereasonsforthehighvari-anceseeninthefoursettingstested,althoughwenotethatsinceourinitializationisalwaysuniform,andnotrandom,consecutiverunsdonotintroducevarianceinthemodel’sperformance.7.2ComparisonwithsystemsthatcapturelinguisticconstraintsSinceourinductionalgorithmisbasedontheknowl-edgeofwhichPOStagsarenounsandverbs,wecompareinTable2oursystemtoNaseemetal.(2010),whopresentanonparametricdependencymodelthatincorporatesthirteenuniversallinguisticconstraints.Threeoftheseconstraintscorrespondtoourrulesthatverbsaretherootsofsentencesandmaytakenounsasdependents,buttheothertencon-straints(e.g.thatadjectivesmodifynouns,adverbsmodifyadjectivesorverbs,usw.)havenoequivalentinoursystem.Althoughoursystemhaslesspriorknowledge,itstillperformscompetitively.OntheWSJ,Naseemetal.demonstratetheim-portanceandeffectofthespecificchoiceofsyntacticrulesbycomparingtheperformanceoftheirsystemwithhandcrafteduniversalrules(71.9),withEn-glishspecificrules(73.8),andwithrulesproposedbyDrucketal.(2009)(64.9).TheperformanceofNaseemetal.’ssystemdropsverysignificantlyassentencelength(andpresumableparsecomplexity)
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
1
1
1
5
6
6
6
3
1
/
/
T
l
A
C
_
A
_
0
0
2
1
1
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
83
SlEsDaPtSv∼#Tokens3.8K4.2K9.5K15K24KN1050.967.251.971.563.3HDP56.662.151.574.769.8Table2:AcomparisonofoursystemwithNaseemetal.(2010),bothtrainedandtestedonthelength10trainingdatafromtheCoNLL-XSharedTask.increases,whereasoursystemshowssignificantlylessdecline,andoutperformstheiruniversalsystembyasignificantmargin.7≤10≤20NaseemUniversalRules71.950.4NaseemEnglishRules73.866.1HDP-CCG68.264.2HDP-CCG(train≤20)71.9IncontrasttoSpitkovskyetal.(2010),whoreportedthatperformanceoftheirdependencybasedsystemdegradeswhentrainedonlongersentences,ourper-formanceonlength10sentencesincreasesto71.9whenwetrainonsentencesuptolength20.AnothersystemthatisalsobasedonCCG,butcapturessignificantlymorelinguisticknowledgethanours,waspresentedbyBoonkwanandSteed-man(2011),whoachieveanaccuracyof74.5onWSJ10section23(trainedonsections02-22).Us-ingthesamesettings,oursystemachievesanaccu-racyof68.4.Unlikeourapproach,BoonkwanandSteedmandonotautomaticallyinduceanappropri-ateinventoryoflexicalcategory,butuseanexten-sivequestionnairethatdefinesprototypecategoriesforvarioussyntacticconstructions,andrequiressig-nificantmanualengineeringofwhichPOStagsaremappedtowhatcategoriestogeneratealanguage-specificlexicon.However,theirperformancede-gradessignificantlywhenonlyasubsetoftheques-tionsareconsidered.Usingonlythefirst14ques-tions,coveringfactsabouttheorderingofsubjects,verbsandobjects,Adjektive,Adverbien,auxiliaries,adpositions,possessivesandrelativemarkers,theyachieveanaccuracyof68.2,whichisalmostiden-7Ourearliergenerativemodelshowedsimilarbehavior,al-thoughtheresultsinBiskandHockenmaier(2012B)arenotdirectlycomparableduetodifferencesinthedata.SlEsDaPtSv#Tokens3,8574,2309,54915,01524,021G1051.262.447.254.348.6HDP57.965.449.373.573.2BgWSJNlJaDe#Tokens38,22042,44243,40543,50177,705G1059.864.447.560.247.4HDP66.170.356.264.168.4Table3:AcomparisonofoursystemwithGillenwa-teretal.(2010),bothtrainedonthelength10train-ingdata,andtestedonthelength10testdata,fromtheCoNLL-XSharedtask.ticaltoours,eventhoughweusesignificantlylessinitialknowledge.However,thelexiconswepresentbelowindicatethatweareinfactlearningmanyoftheveryexactdetailsthatintheirsystemarecon-structedbyhand.Theremaining14questionsinBoonkwanandSteedman’squestionnairecoverlessfrequentphenomenasuchastheorderofnegativemarkers,dativeshift,andpro-drop.Theobviousad-vantageofthisapproachisthatthisallowsthemtodefineamuchmorefine-grainedinventoryoflexicalcategoriesthanoursystemcanautomaticallyinduce.Wealsostipulatethatforcertainlanguagesknowl-edgeofpro-dropcouldplayasignificantroleinthesuccessoftheirapproach:ifcompletesentencesareallowedtobeoftheformS\NorS/N,thesamelex-icalcategorycanbeusedfortheverbregardlessofwhetherthesubjectispresentorhasbeendropped.7.3AdditionalLanguagesInordertoprovideresultsonadditionallanguages,wepresentinTable3acomparisontotheworkofGillenwateretal.(2010)(G10),usingtheConLL-XSharedTaskdata(BuchholzandMarsi,2006).Fol-lowingGillenwateretal.,wetrainonlyonsentencesoflength10fromthetrainingsetandevaluateonthetestset.Sincethisisadifferenttrainingregime,andthesecorporadifferformanylanguagesfromthatofthePASCALchallenge,numbersfromTable1can-notbecompareddirectlywiththoseinTable3.WehavealsoappliedourmodeltoGoldberg(2011)’sHebrewcorpus,whereitachievesanaccuracyof62.1(trainedandtestedonallsentenceslength10;7,253)and59.6(length15;21,422tokens).
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
1
1
1
5
6
6
6
3
1
/
/
T
l
A
C
_
A
_
0
0
2
1
1
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
84
Arabic%Swedish%WSJ%Childes%Japanese%Czech%VERB(S\N)/N56S45S\N52S/N44S84S26(S/N)/N29S\N20(S\N)/N19S37S\N25ADPN\N68(S\S)/N49(S\S)/N46(S\S)/N45(S/S)\N44(S\S)/N42N/N21(N\N)/N25(N\N)/N20N/N25N\N23(S/S)/N26NOUNN\N50N91N79N89N73N74N35N/N12ADJN\N82N/N50N/N70N/N46S/S64N/N55Figure3:Partiallexiconsdemonstratinglanguagespecificknowledgelearnedautomaticallyforfivelan-guages.Foreaseofcomparisonbetweenlanguages,weusetheuniversaltaglabel(Verb,Adposition,NounandAdjective).Shownarethemostcommoncategoriesandthefractionofoccurrencesofthetagthatareassignedthiscategory(accordingtotheViterbiparses).8TheInducedLexiconsSinceourapproachisbasedonalexicalizedfor-malismsuchasCCG,oursystemautomaticallyin-duceslexiconsthatpairwords(oder,inourcase,POS-tags)withlanguage-specificcategoriesthatcapturetheirsyntacticbehavior.Ifourapproachissuccess-ful,itshouldlearnthebasicsyntacticpropertiesofeachlanguage,whichwillbereflectedinthecorre-spondinglexicon.InFigure3oneseeshowverbssubcategorizedifferently,howwordorderingdiffersbylanguage,andhowtheattachmentstructuresofprepositionsareautomaticallydiscoveredanddifferacrosslanguages.InArabic,forexample,thesys-temlearnsthatwordorderisvariableandthereforetheverbmustallowforbothSVOandVOSstyleconstructions.Wegenerallylearnthatadpositions(prepositionsorpostpositions)takenounsasargu-ments.InCzech,PPscanappearbeforeandaftertheverb,leadingtotwodifferentcategories((S\S)/Nand(S/S)/N).Japanesehaspostpositionsthatap-pearinpreverbalposition((S/S)\N),butwhenthiscategoryisassignedtonominalparticlesthatcor-respondtocasemarkers,iteffectivelyabsorbsthenoun,leadingtoapreferenceforverbsthatdonottakeanyarguments(S),andtoamisanalysisofad-jectivesasverbmodifiers(S/S).Ourlexiconsalsoreflectdifferencesinstyle:whileChildesandtheWSJarebothEnglish,theyrepresentverydifferentregisters.Welearnthatsubjectsaremostlyabsentintheinformalspeechandchild-directedinstructionscontainedinChildes,whileeffectivelymandatoryintheWallStreetJournal.9ConclusionsThispaperhasintroducedanovelfactorizationforCCGmodelsandshowedhowwhencombinedwithnon-parametricBayesianstatisticsitcancompetewitheveryothergrammarinductionsystemcur-rentlyavailable,includingthosethatcaptureasig-nificantamountofpriorlinguisticknowledge.Theuseofapowerfulsyntacticformalismprovesben-eficialbothintermsofrequiringverylimiteduni-versalknowledgeandrobustnessatlongersentencelengths.Unlikestandardgrammarinductionsys-temsthatarebasedondependencygrammar,oursystemreturnslinguisticallyinterpretablelexiconsforeachlanguagethatdemonstrateithasdiscov-eredtheirbasicwordorder.Ofparticularnoteisthesimplicityofthemodelbothalgorithmicallyandintermsofimplementation.Bynotfalteringonlongersentencesorrequiringextensivetuning,thesystemcanbeeasilyandquicklydeployedonanewlan-guageandreturnstateoftheartperformanceandeasilyinterpretablelexicons.Inthispaper,wehaveappliedthismodelonlytoarestrictedfragmentofCCG,butfutureworkwilladdresstheimpactoflex-icalizationandtheinclusionofrichercombinators.10AcknowledgementsThisworkissupportedbyNSFCAREERaward1053856(BayesianModelsforLexicalizedGram-mars).
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
1
1
1
5
6
6
6
3
1
/
/
T
l
A
C
_
A
_
0
0
2
1
1
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
85
ReferencesTaylorBerg-KirkpatrickandDanKlein.2010.Phyloge-neticGrammarInduction.InProceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics,pages1288–1297,Uppsala,Schweden,July.ChristopherBishop.2006.PatternRecognitionandMa-chineLearning.Springer-Verlag,August.YonatanBiskandJuliaHockenmaier.2012a.InductionofLinguisticStructurewithCombinatoryCategorialGrammars.InNAACLHLTWorkshoponInductionofLinguisticStructure,pages90–95,Montr´eal,Kanada,June.YonatanBiskandJuliaHockenmaier.2012b.SimpleRobustGrammarInductionwithCombinatoryCate-gorialGrammars.InProceedingsoftheTwenty-SixthConferenceonArtificialIntelligence(AAAI-12),pages1643–1649,Toronto,Kanada,July.DavidMBleiandMichaelIJordan.2004.VariationalMethodsfortheDirichletProcess.InProceedingsoftheTwenty-FirstInternationalConferenceonMachineLearning(ICML2004),Banff,Alberta,Kanada,July.PhilBlunsomandTrevorCohn.2010.UnsupervisedInductionofTreeSubstitutionGrammarsforDepen-dencyParsing.Proceedingsofthe2010ConferenceonEmpiricalMethodsofNaturalLanguageProcess-ing,pages1204–1213,October.PrachyaBoonkwanandMarkSteedman.2011.Gram-marInductionfromTextUsingSmallSyntacticProto-types.InProceedingsof5thInternationalJointCon-ferenceonNaturalLanguageProcessing,pages438–446,ChiangMai,Thailand,November.SabineBuchholzandErwinMarsi.2006.CoNLL-XSharedTaskonMultilingualDependencyParsing.InProceedingsofthe10thConferenceonComputationalNaturalLanguageLearning(CoNLL-X),pages149–164,NewYorkCity,June.GlennCarrollandEugeneCharniak.1992.TwoExper-imentsonLearningProbabilisticDependencyGram-marsfromCorpora.WorkingNotesoftheWorkshopStatistically-BasedNLPTechniques,pages1–13.EugeneCharniak.1993.StatisticalLanguageLearning.TheMITPress,Cambridge,Massachusetts.StephenClarkandJamesRCurran.2007.Formalism-IndependentParserEvaluationwithCCGandDep-Bank.InProceedingsofthe45thAnnualMeetingoftheAssociationofComputationalLinguistics,pages248–255,Prague,CzechRepublic,June.AlexClark.2001.UnsupervisedLanguageAcquisition:TheoryandPractice.Ph.D.thesis,UniversityofSus-sex,September.ShayBCohenandNoahASmith.2009.VariationalInferenceforGrammarInductionwithPriorKnowl-edge.ProceedingsoftheACL-IJCNLP2009Confer-enceShortPapers,pages1–4.ShayBCohenandNoahASmith.2010.CovarianceinUnsupervisedLearningofProbabilisticGrammars.TheJournalofMachineLearningResearch,pages3117–3151,November.TrevorCohn,PhilBlunsom,andSharonGoldwater.2010.InducingTree-SubstitutionGrammars.TheJournalofMachineLearningResearch,11:3053–3096,November.MichaelCollins.2003.Head-DrivenStatisticalMod-elsforNaturalLanguageParsing.ComputationalLin-guistics,29(4):589–637,December.GregoryDruck,GideonMann,andAndrewMcCal-lum.2009.Semi-supervisedLearningofDependencyParsersusingGeneralizedExpectationCriteria.InProceedingsoftheJointConferenceofthe47thAn-nualMeetingoftheACLandthe4thInternationalJointConferenceonNaturalLanguageProcessingoftheAFNLP,pages360–368,Suntec,Singapur,Au-gust.JasonEisner.1996.EfficientNormal-FormParsingforCombinatoryCategorialGrammar.InProceedingsofthe34thAnnualMeetingoftheAssociationforCom-putationalLinguistics,pages79–86,SantaCruz,Cali-fornia,USA,June.DouweGelling,TrevorCohn,PhilBlunsom,andJo˜aoVGraca.2012.ThePASCALChallengeonGrammarInduction.InNAACLHLTWorkshoponInductionofLinguisticStructure,pages64–80,Montr´eal,Kanada,June.JenniferGillenwater,KuzmanGanchev,Jo˜aoVGraca,FernandoPereira,andBenTaskar.2010.SparsityinDependencyGrammarInduction.InProceedingsofthe48thAnnualMeetingoftheAssociationforCom-putationalLinguistics,pages194–199,Uppsala,Swe-den,July.JenniferGillenwater,KuzmanGanchev,Jo˜aoVGraca,FernandoPereira,andBenTaskar.2011.PosteriorSparsityinUnsupervisedDependencyParsing.TheJournalofMachineLearningResearch,12:455–490,February.YoavGoldberg.2011.AutomaticSyntacticProcessingofModernHebrew.Ph.D.thesis,Ben-GurionUniversityoftheNegev,November.WilliamPHeaddenIII,MarkJohnson,andDavidMc-Closky.2009.ImprovingUnsupervisedDependencyParsingwithRicherContextsandSmoothing.InPro-ceedingsofHumanLanguageTechnologies:The2009AnnualConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics,pages101–109,Boulder,Colorado,June.JuliaHockenmaierandYonatanBisk.2010.Normal-formparsingforCombinatoryCategorialGrammarswithgeneralizedcompositionandtype-raising.In
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
1
1
1
5
6
6
6
3
1
/
/
T
l
A
C
_
A
_
0
0
2
1
1
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
86
Proceedingsofthe23rdInternationalConferenceonComputationalLinguistics(Coling2010),pages465–473,Beijing,China,August.Coling2010OrganizingCommittee.JuliaHockenmaierandMarkSteedman.2002.Gener-ativeModelsforStatisticalParsingwithCombinatoryCategorialGrammar.InProceedingsof40thAnnualMeetingoftheAssociationforComputationalLin-guistics,pages335–342,Philadelphia,Pennsylvania,USA,July.JuliaHockenmaierandMarkSteedman.2007.CCG-bank:ACorpusofCCGDerivationsandDependencyStructuresExtractedfromthePennTreebank.Com-putationalLinguistics,33(3):355–396,September.YunHuang,MinZhang,andChewLimTan.2012.ImprovedCombinatoryCategorialGrammarInduc-tionwithBoundaryWordsandBayesianInference.InProceedingsofthe24rdInternationalConferenceonComputationalLinguistics(Coling2012),Mumbai,Indien,December.RayJackendoff.1977.X-BarSyntax:AStudyofPhraseStructure.TheMITPress.DanKleinandChristopherDManning.2004.Corpus-BasedInductionofSyntacticStructure:ModelsofDe-pendencyandConstituency.InProceedingsofthe42ndMeetingoftheAssociationforComputationalLinguistics(ACL’04),MainVolume,pages478–485,Barcelona,Spanien,July.KenichiKuriharaandTaisukeSato.2004.AnAppli-cationoftheVariationalBayesianApproachtoProb-abilisticContext-FreeGrammars.InternationalJointConferenceonNaturalLanguageLanguageProcess-ingWorkshopBeyondShallowAnalyses,March.KenichiKurihara,MaxWelling,andYee-WhyeTeh.2007.CollapsedVariationalDirichletProcessMix-tureModels.InProceedingsofthe20thInternationalJointConferenceonArtificialIntelligence(IJCAI07),pages2796–2801,Hyderabad,Indien,January.TomKwiatkowski,SharonGoldwater,LukeZettlemoyer,andMarkSteedman.2012.Aprobabilisticmodelofsyntacticandsemanticacquisitionfromchild-directedutterancesandtheirmeanings.InProceedingsofthe13thConferenceoftheEuropeanChapteroftheAs-sociationforComputationalLinguistics,pages234–244,Avignon,Frankreich,April.AssociationforCompu-tationalLinguistics.KarimLariandSteveJYoung.1991.Applicationsofstochasticcontext-freegrammarsusingtheInside-Outsidealgorithm.Computerspeech&Sprache,5(3):237–257,January.PercyLiang,SlavPetrov,MichaelIJordan,andDanKlein.2007.TheInfinitePCFGUsingHierarchicalDirichletProcesses.InProceedingsofthe2007JointConferenceonEmpiricalMethodsinNaturalLan-guageProcessingandComputationalNaturalLan-guageLearning(EMNLP-CoNLL),pages688–697,Prague,CzechRepublic.PercyLiang,MichaelIJordan,andDanKlein.2009.ProbabilisticGrammarsandHierarchicalDirichletProcesses.InTheOxfordHandbookofAppliedBayesianAnalysis.OxfordUniversityPress.MitchellPMarcus,BeatriceSantorini,andMaryAnnMarcinkiewicz.1993.BuildingaLargeAnnotatedCorpusofEnglish:ThePennTreebank.Computa-tionalLinguistics,19(2):313–330,June.DavidMarecekandZdenekZabokrtsky.2012.Unsu-pervisedDependencyParsingusingReducibilityandFertilityfeatures.InNAACLHLTWorkshoponInduc-tionofLinguisticStructure,pages84–89,Montr´eal,Kanada,June.TahiraNaseem,HarrChen,ReginaBarzilay,andMarkJohnson.2010.UsingUniversalLinguisticKnowl-edgetoGuideGrammarInduction.InProceedingsofthe2010ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages1234–1244,Cambridge,MA,October.TahiraNaseem,ReginaBarzilay,andAmirGloberson.2012.SelectiveSharingforMultilingualDependencyParsing.InProceedingsofthe50thAnnualMeetingoftheAssociationforComputationalLinguistics(Vol-ume1:LongPapers),pages629–637,Jeju,RepublicofKorea,July.JoakimNivre.2006.InductiveDependencyParsing.Springer.SlavPetrov,DipanjanDas,andRyanMcDonald.2012.AUniversalPart-of-SpeechTagset.InProceedingsofthe8thInternationalConferenceonLanguageRe-sourcesandEvaluation(LREC-2012),pages2089–2096,Istanbul,Truthahn,May.AndersSøgaard.2012.Twobaselinesforunsuper-viseddependencyparsing.InNAACLHLTWork-shoponInductionofLinguisticStructure,pages81–83,Montr´eal,Kanada,June.ValentinISpitkovsky,HiyanAlshawi,andDanielJuraf-sky.2010.FromBabyStepstoLeapfrog:How“LessisMore”inUnsupervisedDependencyParsing.InHu-manLanguageTechnologies:The2010AnnualCon-ferenceoftheNorthAmericanChapteroftheAssoci-ationforComputationalLinguistics,pages751–759,LosAngeles,Kalifornien,June.MarkSteedman.1996.SurfaceStructureandInterpre-tation.TheMITPress,January.MarkSteedman.2000.TheSyntacticProcess.TheMITPress,September.Yee-WhyeTeh,MichaelIJordan,MatthewJBeal,andDavidMBlei.2006.HierarchicalDirichletPro-
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
1
1
1
5
6
6
6
3
1
/
/
T
l
A
C
_
A
_
0
0
2
1
1
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3
87
cesses.JournaloftheAmericanStatisticalAssocia-tion,101(476):1566–1581.Yee-WhyeTeh.2006.AHierarchicalBayesianLan-guageModelbasedonPitman-YorProcesses.InPro-ceedingsofthe21stInternationalConferenceonCom-putationalLinguisticsand44thAnnualMeetingoftheAssociationforComputationalLinguistics,pages985–992,Sydney,Australia,July.Yee-WhyeTeh.2010.DirichletProcess.InEncyclope-diaofMachineLearning,pages280–287.Springer.KeweiTu.2012.CombiningtheSparsityandUnambi-guityBiasesforGrammarInduction.InNAACLHLTWorkshoponInductionofLinguisticStructure,pages105–110,Montr´eal,Kanada,Juni.
l
D
Ö
w
N
Ö
A
D
e
D
F
R
Ö
M
H
T
T
P
:
/
/
D
ich
R
e
C
T
.
M
ich
T
.
e
D
u
/
T
A
C
l
/
l
A
R
T
ich
C
e
–
P
D
F
/
D
Ö
ich
/
.
1
0
1
1
6
2
/
T
l
A
C
_
A
_
0
0
2
1
1
1
5
6
6
6
3
1
/
/
T
l
A
C
_
A
_
0
0
2
1
1
P
D
.
F
B
j
G
u
e
S
T
T
Ö
N
0
8
S
e
P
e
M
B
e
R
2
0
2
3