Transactions of the Association for Computational Linguistics, 2 (2014) 219–230. Action Editor: Alexander Koller.

Transactions of the Association for Computational Linguistics, 2 (2014) 219–230. Action Editor: Alexander Koller.

Submitted 11/2013; Revised 1/2014; Published 5/2014. c
(cid:13)

2014 Association for Computational Linguistics.

BacktoBasicsforMonolingualAlignment:ExploitingWordSimilarityandContextualEvidenceMdArafatSultan†,StevenBethard‡andTamaraSumner††InstituteofCognitiveScienceandDepartmentofComputerScienceUniversityofColoradoBoulder‡DepartmentofComputerandInformationSciencesUniversityofAlabamaatBirminghamarafat.sultan@colorado.edu,bethard@cis.uab.edu,sumner@colorado.eduAbstractWepresentasimple,easy-to-replicatemonolin-gualalignerthatdemonstratesstate-of-the-artperformancewhilerelyingonalmostnosu-pervisionandaverysmallnumberofexternalresources.Basedonthehypothesisthatwordswithsimilarmeaningsrepresentpotentialpairsforalignmentiflocatedinsimilarcontexts,weproposeasystemthatoperatesbyfindingsuchpairs.Intwointrinsicevaluationsonalignmenttestdata,oursystemachievesF1scoresof88–92%,demonstrating1–3%absoluteimprove-mentoverthepreviousbestsystem.Moreover,intwoextrinsicevaluationsouralignerout-performsexistingaligners,andevenanaiveapplicationofthealignerapproachesstate-of-the-artperformanceineachextrinsictask.1IntroductionMonolingualalignmentisthetaskofdiscoveringandaligningsimilarsemanticunitsinapairofsentencesexpressedinanaturallanguage.Suchalignmentspro-videvaluableinformationregardinghowandtowhatextentthetwosentencesarerelated.Consequently,alignmentisacentralcomponentofanumberofimportanttasksinvolvingtextcomparison:textualentailmentrecognition,textualsimilarityidentifica-tion,paraphrasedetection,questionansweringandtextsummarization,tonameafew.Thehighutilityofmonolingualalignmenthasspawnedsignificantresearchonthetopicinthere-centpast.Majoreffortsthathavetreatedalignmentasastandaloneproblem(MacCartneyetal.,2008;ThadaniandMcKeown,2011;Yaoetal.,2013a)areprimarilysupervised,thankstothemanuallyalignedcorpuswithtrainingandtestsetsfromMicrosoftRe-search(Brockett,2007).Primaryconcernsofsuchworkincludebothqualityandspeed,duetothefactthatalignmentisfrequentlyacomponentoflargerNLPtasks.Drivenbysimilarmotivations,weseektodevisealightweight,easy-to-constructalignerthatproduceshigh-qualityoutputandisapplicabletovariousendtasks.Amidavarietyofproblemformulationsandingeniousapproachestoalignment,wetakeastepbackandexaminecloselytheeffectivenessoftwofrequentlymadeassumptions:1)Relatedsemanticunitsintwosentencesmustbesimilarorrelatedintheirmeaning,and2)Commonalitiesintheirse-manticcontextsintherespectivesentencesprovideadditionalevidenceoftheirrelatedness(MacCartneyetal.,2008;ThadaniandMcKeown,2011;Yaoetal.,2013a;Yaoetal.,2013b).Alignment,basedsolelyonthesetwoassumptions,reducestofindingthebestcombinationofpairsofsimilarsemanticunitsinsim-ilarcontexts.Exploitingexistingresourcestoidentifysimilarityofsemanticunits,wesearchforrobusttechniquestoidentifycontextualcommonalities.Dependencytreesareacommonlyusedstructureforthispurpose.Whiletheyremainacentralpartofouraligner,weexpandthehorizonsofdependency-basedalignmentbeyondexactmatchingbysystematicallyexploitingthenotionof“typeequivalence”withasmallhand-craftedsetofequivalentdependencytypes.Inaddi-tion,weaugmentdependency-basedalignmentwithsurface-leveltextanalysis.Whilephrasalalignmentsareimportantandhave

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
8
1
5
6
6
8
3
6

/

/
t

je

un
c
_
un
_
0
0
1
7
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

220

beeninvestigatedinmultiplestudies,wefocuspri-marilyonwordalignments(whichhavebeenshowntoformthevastmajorityofalignments(≥95%)inmultiplehuman-annotatedcorpora(Yaoetal.,2013b)),keepingtheframeworkflexibleenoughtoallowincorporationofphrasalalignmentsinfuture.Evaluationofouraligneronthebenchmarkdatasetreportedin(Brockett,2007)showsanF1scoreof91.7%:a3.1%absoluteimprovementovertheprevi-ousbestsystem(Yaoetal.,2013a),correspondingtoa27.2%errorreduction.Itshowssuperiorperfor-mancealsoonthedatasetreportedin(Thadanietal.,2012).En plus,wepresentresultsoftwoextrinsicevaluations,namelytextualsimilarityiden-tificationandparaphrasedetection.Ouralignernotonlyoutperformsexistingalignersineachtask,butalsoapproachestopsystemsfortheextrinsictasks.2BackgroundMonolingualalignmenthasbeenappliedtovariousNLPtasksincludingtextualentailmentrecognition(Hickletal.,2006;HicklandBensley,2007),para-phraseidentification(DasandSmith,2009;Madnanietal.,2012),andtextualsimilarityassessment(B¨aretal.,2012;Hanetal.,2013)–insomecasesex-plicitly,i.e.,asaseparatemodule.Butmanysuchsystemsresorttosimplisticand/orad-hocstrategiesforalignmentandinmostsuchwork,thealignmentmoduleswerenotseparatelyevaluatedonalignmentbenchmarks,makingtheirdirectassessmentdifficult.WiththeintroductionoftheMSRalignmentcor-pus(Brockett,2007)developedfromthesecondRecognizingTextualEntailmentchallengedata(Bar-Haimetal.,2006),directevaluationandcomparisonofalignersbecamepossible.ThefirstalignertrainedandevaluatedonthecorpuswasaphrasalalignercalledMANLI(MacCartneyetal.,2008).Itrepre-sentsalignmentsassetsofdifferenteditoperations(whereasequenceofeditsturnsoneinputsentenceintotheother)andfindsanoptimalsetofeditsviaasimulatedannealingsearch.WeightsofdifferenteditfeaturesarelearnedfromthetrainingsetoftheMSRalignmentcorpususingaperceptronlearningalgorithm.MANLIincorporatesonlyshallowfea-turescharacterizingcontextualsimilarity:relativepositionsofthetwophrasesbeingaligned(ornot)inthetwosentencesandbooleanfeaturesrepresentingwhetherornottheprecedingandfollowingtokensofthetwophrasesaresimilar.ThadaniandMcKeown(2011)substitutedMANLI’ssimulatedannealing-baseddecodingwithintegerlinearprogramming,andachievedaconsider-ablespeed-up.Moreimportantlyforourdiscussion,theyfoundcontextualevidenceintheformofsyn-tacticconstraintsusefulinbetteraligningstopwords.Thadanietal.(2012)furtherextendedthemodelbyaddingfeaturescharacterizingdependencyarcedits,effectivelybringingstrongerinfluenceofcontextualsimilarityintoalignmentdecisions.Againtheperfor-manceimprovedconsequently.Themostsuccessfulalignertodatebothintermsofaccuracyandspeed,calledJacanaAlign,wasde-velopedbyYaoetal.(2013un).Incontrasttotheearliersystems,JacanaAlignisawordalignerthatformulatesalignmentasasequencelabelingprob-lem.Eachwordinthesourcesentenceislabeledwiththecorrespondingtargetwordindexifanalign-mentisfound.ItemploysaconditionalrandomfieldtoassignthelabelsandusesafeaturesetsimilartoMANLI’sintermsoftheinformationtheyencode(withsomeextensions).Contextualfeaturesincludeonlysemanticmatchoftheleftandtherightneigh-borsofthetwowordsandtheirPOStags.EventhoughJacanaAlignoutperformedtheMANLIen-hancementsdespitehavinglesscontextualfeatures,itisdifficulttocomparetheroleofcontextinthetwomodelsbecauseofthelargeparadigmaticdispar-ity.AnextensionofJacanaAlignwasproposedforphrasalalignmentsin(Yaoetal.,2013b),butthecontextualfeaturesremainedlargelyunchanged.Noticeableinalltheabovesystemsistheuseofcontextualevidenceasafeatureforalignment,butinouropinion,nottoanextentsufficienttoharnessitsfullpotential.Eventhoughdeeperdependency-basedmodelingofcontextualcommonalitiescanbefoundinsomeotherstudies(KouylekovandMagnini,2005;Chambersetal.,2007;Changetal.,2010;Yaoetal.,2013c),webelievethereisfurtherscopeforsystematicexploitationofcontextualevidenceforalignment,whichweaimtodointhiswork.Onthecontrary,wordsemanticsimilarityhasbeenacentralcomponentofmostaligners;variousmea-suresofwordsimilarityhavebeenutilized,includingstringsimilarity,resource-basedsimilarity(derivedfromoneormorelexicalresourceslikeWordNet)

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
8
1
5
6
6
8
3
6

/

/
t

je

un
c
_
un
_
0
0
1
7
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

221

AlignidenticalwordsequencesAlignnamedentitiesAligncontentwordsAlignstopwordsFigure1:Systemoverviewanddistributionalsimilarity(computedfromwordco-occurrencestatisticsinlargecorpora).Animpor-tanttrade-offbetweenprecision,coverageandspeedexistshereandalignerscommonlyrelyononlyasubsetofthesemeasures(ThadaniandMcKeown,2011;Yaoetal.,2013a).WeusetheParaphraseDatabase(PPDB)(Ganitkevitchetal.,2013),whichisalargeresourceoflexicalandphrasalparaphrasesconstructedusingbilingualpivoting(BannardandCallison-Burch,2005)overlargeparallelcorpora.3SystemOursystemoperatesasapipelineofalignmentmod-ulesthatdifferinthetypesofwordpairstheyalign.Figure1isasimplisticrepresentationofthepipeline.Eachmodulemakesuseofcontextualevidencetomakealignmentdecisions.Inaddition,thelasttwomodulesareinformedbyawordsemanticsimilarityalgorithm.Becauseoftheirphrasalnature,wetreatnamedentitiesseparatelyfromothercontentwords.Therationalebehindtheorderinwhichthemodulesarearrangedisdiscussedlaterinthissection(3.3.5).Beforediscussingeachalignmentmoduleinde-tail,wedescribethesystemcomponentsthatidentifywordandcontextualsimilarity.3.1WordSimilarityTheabilitytocorrectlyidentifysemanticsimilaritybetweenwordsiscrucialtoouraligner,sincecon-textualevidenceisimportantonlyforsimilarwords.Insteadoftreatingwordsimilarityasacontinuousvariable,wedefinethreelevelsofsimilarity.Thefirstlevelisanexactwordorlemmamatchwhichisrepresentedbyasimilarityscoreof1.Thesecondlevelrepresentssemanticsimilaritybetweentwotermswhicharenotidentical.Toidentifysuchwordpairs,weemploytheParaphraseDatabase(PPDB)1.Weusethelargest(XXXL)ofthePPDB’slexicalparaphrasepackagesandtreatallpairsiden-ticallybyignoringtheaccompanyingstatistics.We1http://paraphrase.orgcustomizetheresourcebyremovingpairsofidenti-calwordsorlemmasandaddinglemmatizedformsoftheremainingpairs.Fornow,weusethetermppdbSimtorefertothesimilarityofeachwordpairinthismodifiedversionofPPDB(whichisavaluein(0,1))andlaterexplainhowwedetermineit(Section3.3.5).Enfin,anypairofdifferentwordswhichisabsentinPPDBisassignedazerosimilarityscore.3.2ExtractingContextualEvidenceOuralignmentmodulescollectcontextualevidencefromtwocomplementarysources:syntacticdepen-denciesandwordsoccurringwithinasmalltextualvicinityofthetwowordstobealigned.Theapplica-tionofeachkindassumesacommonprincipleofmin-imalevidence.Formally,giventwoinputsentencesSandT,weconsidertwowordss∈Sandt∈Ttoformacandidatepairforalignmentif∃rs∈Sand∃rt∈Tsuchthat:1.(s,t)t l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 7 8 1 5 6 6 8 3 6 / / t l a c _ a _ 0 0 1 7 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 222 S:Hewroteabook.nsubjdobjdetT:Ireadthebookhewrote.nsubjdobjdetrcmodnsubjFigure2:Equivalentdependencytypes:dobjandrcmodalsoan(rs,rt)alignmentif(s,t)l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 7 8 1 5 6 6 8 3 6 / / t l a c _ a _ 0 0 1 7 8 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 223 OrientationPOS(s,t)POS(rs,rt)SDependencyTypesTDependencyTypessrstrtverbverb{purpcl,xcomp}←−noun{agent,nsubj,xsubj}←−{dobj,nsubjpass,rel}←−{tmod,prepin,prepat,prepon}←−{iobj,prepto}←−nounverb{infmod,partmod,rcmod}←−(un)noun{pos,nn,prepof,prepin,prepat,prepfor}←−adjective{amod,rcmod}←−srstrtverbverb{conjand}←−{conjor}←−{conjnor}←−noun{dobj,nsubjpass,rel}{infmod,partmod,rcmod}adjective{acomp}{cop,csubj}nounnoun{conjand}←−{conjor}←−{conjnor}←−adjective{amod,rcmod}{nsubj}adjectiveadjective{conjand}←−{conjor}←−(c){conjnor}←−adverbadverb{conjand}←−{conjor}←−{conjnor}←−Table1:EquivalentdependencystructuresAlgorithm1:depContext(S,T,je,j,EQ)Input:1.S,T:Sentencestobealigned2.i:IndexofawordinS3.j:IndexofawordinT4.EQ:Dependencytypeequivalences(Table1)Output:context={(k,je)}:pairsofwordindexescontext←{(k,je):wordSim(sk,tl)>01∧(je,k,τs)∈dependencies(S)2(j,je,τt)∈dependencies(T)3∧POS(si)=POS(tj)∧POS(sk)=POS(tl)4(τs=τt5∨(POS(si),POS(sk),τs,τt)∈EQ))}6denciesislimitedbytheaccuracyoftheparser,andb)weinvestigateequivalencetypesforonlyinter-lexical-categoryalignmentinthisstudy.Thereforeweapplyanadditionalmodelofwordcontext:thetextualneighborhoodofsinSandtinT.ExtractionofcontextualevidenceforcontentwordsfromtextualneighborhoodisdescribedinAl-gorithm2.Likethedependency-basedmodule,itaccumulatesevidenceforeach(si,tj)pairbyin-spectingmultiplepairsofneighboringwords.Butin-steadofaligningonlywordswithinalexicalcategory,Algorithm2:textContext(S,T,je,j,STOP)Input:1.S,T:Sentencestobealigned2.i:IndexofawordinS3.j:IndexofawordinT4.STOP:AsetofstopwordsOutput:context={(k,je)}:pairsofwordindexesCi←{k:k∈[i−3,i+3]∧k6=i∧sk6∈STOP}1Cj←{je:l∈[j−3,j+3]∧l6=j∧tl6∈STOP}2context←Ci×Cj3thismodulealsoperformsinter-categoryalignment,consideringcontentwordswithina(3,3)windowofsiandtjasneighbors.Weimplementrelationalequivalence()herebyholdinganytwopositionswithinthewindowequallycontributiveandmutuallycomparableassourcesofcontextualevidence.3.3TheAlignmentAlgorithmWenowdescribeeachalignmentmoduleinthepipelineandtheirsequenceofoperation.3.3.1IdenticalWordSequencesThepresenceofacommonwordsequenceinSandTisindicativeofan(un)identical,et(b)con-

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
8
1
5
6
6
8
3
6

/

/
t

je

un
c
_
un
_
0
0
1
7
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

224

textuallysimilarwordintheothersentenceforeachwordinthesequence.Weobservetheresultsofaligningidenticalwordsinsuchsequencesoflengthncontainingatleastonecontentword.Thissimpleheuristicdemonstratesahighprecision(≈97%)ontheMSRalignmentdevsetforn≥2,andthereforeweconsidermembershipinsuchsequencesasthesimplestformofcontextualevidenceinoursystemandalignallidenticalwordsequencepairsinSandTcontainingatleastonecontentword.Fromthispointon,wewillrefertothismoduleaswsAlign.3.3.2NamedEntitiesWealignnamedentitiesseparatelytoenablethealignmentoffullandpartialmentions(andacronyms)ofthesameentity.WeusetheStanfordNamedEntityRecognizer(Finkeletal.,2005)toidentifynamedentitiesinSandT.Afteraligningtheexacttermmatches,anyunmatchedtermofapartialmentionisalignedtoalltermsinthefullmention.Themod-ulerecognizesonlyfirst-letteracronymsandalignsanacronymtoalltermsinthefullmentionofthecorrespondingname.Sincenamedentitiesareinstancesofnouns,namedentityalignmentisalsoinformedbycontextualev-idence(whichwediscussinthenextsection),buthappensbeforealignmentofothergenericcontentwords.Parents(orchildren)ofanamedentityaresimplytheparents(orchildren)ofitsheadword.WewillrefertothismoduleasamethodnamedneAlignfromthispointon.3.3.3ContentWordsExtractionofcontextualevidenceforpromisingcontentwordpairshasalreadybeendiscussedinSection3.2,coveringbothdependency-basedcontextandtextualcontext.Algorithm3(cwDepAlign)describesthedependency-basedalignmentprocess.Foreachpotentiallyalignablepair(si,tj),thedependency-basedcontextisextractedasdescribedinAlgorithm1,andcontextsimilarityiscalculatedasthesumofthewordsimilaritiesofthe(sk,tl)contextwordpairs(lines2-7).(ThewordSimmethodreturnsasimilarityscorein{0,ppdbSim,1}.)Thealignmentscoreofthe(si,tj)pairisthenaweightedsumofwordandcontextualsimilarity(lines8-12).(WediscusshowtheweightsaresetinSectionAlgorithm3:cwDepAlign(S,T,EQ,AE,w,STOP)Input:1.S,T:Sentencestobealigned2.EQ:Dependencytypeequivalences(Table1)3.AE:Alreadyalignedwordpairindexes4.w:Weightofwordsimilarityrelativetocontex-tualsimilarity5.STOP:AsetofstopwordsOutput:A={(je,j)}:wordindexpairsofalignedwords{(si,tj)}wheresi∈Sandtj∈TΨ←∅;ΛΨ←∅;Φ←∅1forsi∈S,tj∈Tdo2ifsi6∈STOP∧¬∃tl:(je,je)∈AE3∧tj6∈STOP∧¬∃sk:(k,j)∈AE4∧wordSim(si,tj)>0then5context←depContext(S,T,je,j,EQ)6contextSim←X(k,je)∈contextwordSim(sk,tl)7ifcontextSim>0then8Ψ←Ψ∪{(je,j)}9ΛΨ(je,j)←context10Φ(je,j)←w∗wordSim(si,tj)11+(1−w)∗contextSim12LinearizeandsortΨindecreasingorderofΦ(je,j)13A←∅14for(je,j)∈Ψdo15if¬∃l:(je,je)∈A16∧¬∃k:(k,j)∈Athen17A←A∪{(je,j)}18pour(k,je)∈ΛΨ(je,j)do19if¬∃q:(k,q)∈A∪AE20∧¬∃p:(p,je)∈A∪AEthen21A←A∪{(k,je)}223.3.5.)Themodulethenaligns(si,tj)pairswithnon-zeroevidenceindecreasingorderofthisscore(lines13-18).Inaddition,italignsallthepairsthatcontributedcontextualevidenceforthe(si,tj)alignment(lines19-22).Notethatweimplementaone-to-onealignmentwherebyawordgetsalignedatmostoncewithinthemodule.Algorithm4(cwTextAlign)presentsalignmentbasedonsimilaritiesinthetextualneighborhood.Foreachpotentiallyalignablepair(si,tj),Algorithm2isusedtoextractthecontext,whichisasetofneigh-boringcontentwordpairs(lines2-7).Thecontextualsimilarityisthesumofthesimilaritiesofthesepairs

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
8
1
5
6
6
8
3
6

/

/
t

je

un
c
_
un
_
0
0
1
7
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

225

Algorithm4:cwTextAlign(S,T,AE,w,STOP)Input:1.S,T:Sentencestobealigned2.AE:Existingalignmentsbywordindexes3.w:Weightofwordsimilarityrelativetocontex-tualsimilarity4.STOP:AsetofstopwordsOutput:A={(je,j)}:wordindexpairsofalignedwords{(si,tj)}wheresi∈Sandtj∈TΨ←∅;Φ←∅1forsi∈S,tj∈Tdo2ifsi6∈STOP∧¬∃tl:(je,je)∈AE3∧tj6∈STOP∧¬∃sk:(k,j)∈AE4∧wordSim(si,tj)>0then5Ψ←Ψ∪{(je,j)}6context←textContext(S,T,je,j,STOP)7contextSim←X(k,je)∈contextwordSim(sk,tl)8Φ(je,j)←w∗wordSim(si,tj)9+(1−w)∗contextSim10LinearizeandsortΨindecreasingorderofΦ(je,j)11A←∅12for(je,j)∈Ψdo13if¬∃l:(je,je)∈A14∧¬∃k:(k,j)∈Athen15A←A∪{(je,j)}16(line8),andthealignmentscoreisaweightedsumofwordsimilarityandcontextualsimilarity(lines9,10).Thealignmentscoreisthenusedtomakeone-to-onewordalignmentdecisions(lines11-16).Consideringtextualneighborsasweakersourcesofevidence,wedonotaligntheneighbors.NotethatincwTextAlignwealsoalignsemanti-callysimilarcontentwordpairs(si,tj)withnocon-textualsimilaritiesifnopairs(sk,tj)ou(si,tl)existwithahigheralignmentscore.ThisisaconsequenceofourobservationoftheMSRalignmentdevdata,wherewefindthatmoreoftenthannotcontentwordsareinherentlysufficientlymeaningfultobealignedevenintheabsenceofcontextualevidencewhentherearenocompetingpairs.ThecontentwordalignmentmoduleisthusitselfapipelineofcwDepAlignandcwTextAlign.3.3.4StopWordsWefollowthecontextualevidence-basedapproachtoalignstopwordsaswell,someofwhichgetalignedAlgorithm5:align(S,T,EQ,w,STOP)Input:1.S,T:Sentencestobealigned2.EQ:Dependencytypeequivalences(Table1)3.w:Weightofwordsimilarityrelativetocontex-tualsimilarity4.STOP:AsetofstopwordsOutput:A={(je,j)}:wordindexpairsofalignedwords{(si,tj)}wheresi∈Sandtj∈TA←wsAlign(S,T)1A←A∪neAlign(S,T,EQ,UN,w)2A←A∪cwDepAlign(S,T,EQ,UN,w,STOP)3A←A∪cwTextAlign(S,T,UN,w,STOP)4A←A∪swDepAlign(S,T,UN,w,STOP)5A←A∪swTextAlign(S,T,UN,w,STOP)6aspartofwordsequencealignmentasdiscussedinSection3.3.1,andneighboralignmentasdiscussedinSection3.3.3.Fortherestweutilizedependen-ciesandtextualneighborhoodsasbefore,withthreeadjustments.Firstly,sincestopwordalignmentisthelastmod-uleinourpipeline,ratherthanconsideringallse-manticallysimilarwordpairsforcontextualsimilar-ity,weconsideronlyalignedpairs.Secondly,sincemanystopwords(e.g.determiners,modals)typi-callydemonstratelittlevariationinthedependenciestheyengagein,weignoretypeequivalencesforstopwordsandimplementonlyexactmatchingofdepen-dencies.(Stopwordsingeneralcanparticipateinequivalentdependencies,butweleaveconstructingacorrespondingmappingforfuturework.)Enfin,fortextualneighborhood,weseparatelycheckalign-mentsoftheleftandtherightneighborsandaggre-gatetheevidencestodeterminealignment–againduetotheprimarilysyntacticnatureofinteractionofstopwordswiththeirneighbors.Thusstopwordsarealsoalignedinasequenceofdependencyandtextualneighborhood-basedalign-ments.WeassumetwocorrespondingmodulesnamedswDepAlignandswTextAlign,respectively.3.3.5TheAlgorithmOurfullalignmentpipelineisshownasthemethodaligninAlgorithm5.Notethatthestrictorderofthealignmentmoduleslimitsthescopeofdownstreammodulessinceeachsuchmodulediscardsanywordthathasalreadybeenalignedbyanearliermodule

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
8
1
5
6
6
8
3
6

/

/
t

je

un
c
_
un
_
0
0
1
7
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

226

(thisisaccomplishedviathevariableA;thecorre-spondingparameterinAlgorithms3and4isAE).Therationalesbehindthespecificorderofthemod-ulescannowbeexplained:(1)wsAlignisamodulewithrelativelyhigherprecision,(2)itisconvenienttoalignnamedentitiesbeforeothercontentwordstoen-ablealignmentofentitymentionsofdifferentlengths,(3)dependency-basedevidencewasobservedtobemorereliable(i.e.ofhigherprecision)thantextualevidenceintheMSRalignmentdevset,et(4)stopwordalignmentsaredependentonexistingcontentwordalignments.Thealignerassumestwofreeparameters:ppdbSimandw(inAlgorithms3and4).Todeterminetheirvalues,weexhaustivelysearchthroughthetwo-dimensionalspace(ppdbSim,w)forppdbSim,w∈{0.1,…,0.9,1}andthecombina-tion(0.9,0.9)yieldsthebestF1scorefortheMSRalignmentdevset.Weusethesevaluesforthealignerinallofitssubsequentapplications.4EvaluationWeevaluatetheperformanceofouralignerbothin-trinsicallyandextrinsicallyonmultiplecorpora.4.1IntrinsicEvaluationTheMSRalignmentdataset2(Brockett,2007)wasdesignedtotrainanddirectlyevaluateautomatedaligners.Threeannotatorsindividuallyalignedwordsandphrasesin1600pairsofpremiseandhypothe-sissentencesfromtheRTE2challengedata(dividedintodevandtestsets,eachconsistingof800sen-tences).Thedatasethassubsequentlybeenusedtoassessseveraltopperformingaligners(MacCartneyetal.,2008;ThadaniandMcKeown,2011;Yaoetal.,2013a;Yaoetal.,2013b).Weusethetestsetforevaluationinthesamemannerasthesestudies:(un)weapplymajorityruletoselectfromthethreesetsofannotationsforeachsentenceanddiscardthree-waydisagreements,(b)weevaluateonlyonthesurelinks(wordpairsthatannotatorsmentionedshouldcertainlybealigned,asopposedtopossiblelinks).Wetestthegeneralizabilityofthealignerbyeval-uatingit,unchanged(i.e.withidenticalparametervalues),onasecondalignmentcorpus:theEdin-2http://www.cs.biu.ac.il/nlp/files/RTE2006Aligned.zipSystemP%R%F1%E%MSRMacCartneyetal.(2008)85.485.385.321.3Thadani&McKeown(2011)89.586.287.833.0Yaoetal.(2013un)93.784.088.635.3Yaoetal.(2013b)92.182.886.829.1ThisWork93.789.891.743.8EDB++Yaoetal.(2013un)91.382.086.415.0Yaoetal.(2013b)90.481.985.913.7ThisWork93.582.587.618.3Table2:Resultsofintrinsicevaluationontwodatasetsburgh++3(Thadanietal.,2012)corpus.Thetestsetconsistsof306pairs;eachpairisalignedbyatmosttwoannotatorsandweadopttherandomselectionpolicydescribedin(Thadanietal.,2012)toresolvedisagreements.Table2showstheresults.Foreachcorpus,itshowsprecision(%alignmentsthatmatchedwithgoldannotations),recall(%goldalignmentsdiscov-eredbythealigner),F1scoreandthepercentageofsentencesthatreceivedtheexactgoldalignments(denotedbyE)fromthealigner.OntheMSRtestset,ouralignershowsa3.1%improvementinF1scoreoverthepreviousbestsys-tem(Yaoetal.,2013a)witha27.2%errorreduction.Importantly,itdemonstratesaconsiderableincreaseinrecallwithoutalossofprecision.TheEscorealsoincreasesasaconsequence.OntheEdinburgh++testset,oursystemachievesa1.2%increaseinF1score(anerrorreductionof8.8%)overthepreviousbestsystem(Yaoetal.,2013a),withimprovementsinbothprecisionandrecall.Thisisaremarkableresultthatdemonstratesthegeneralapplicabilityofthealigner,asnoparametertuningtookplace.4.1.1AblationTestWeperformasetofablationteststoassesstheimportanceofthealigner’sindividualcomponents.EachrowofTable3beginningwith(-)showsafea-tureexcludedfromthealignerandtwoassociatedsetsofmetrics,showingtheperformanceofthere-sultingalgorithmonthetwoalignmentcorpora.Withoutawordsimilaritymodule,recalldropsaswouldbeexpected.Withoutcontextualevidence(wordsequences,dependenciesandtextualneigh-bors)precisiondropsconsiderablyandrecallalsofalls.Withoutdependencies,thealignerstillgives3http://www.ling.ohio-state.edu/∼scott/#edinburgh-plusplus

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
8
1
5
6
6
8
3
6

/

/
t

je

un
c
_
un
_
0
0
1
7
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

227

MSREDB++FeatureP%R%F1%P%R%F1%Original93.789.891.793.582.587.6(-)WordSimilarity95.286.390.595.177.385.3(-)ContextualEvidence81.386.083.686.480.683.4(-)Dependencies94.288.891.493.881.387.1(-)TextNeighborhood85.590.487.990.484.387.2(-)StopWords94.288.391.292.280.085.7Table3:Ablationtestresultsstate-of-the-artresults,whichpointstothepossibilityofaveryfastyethigh-performancealigner.Withoutevidencefromtextualneighbors,cependant,thepreci-sionofthealignersuffersbadly.Textualneighborsfindalignmentsacrossdifferentlexicalcategories,atypeofalignmentthatiscurrentlynotsupportedbyourdependencyequivalences.Extendingthesetofdependencyequivalencesmightalleviatethisis-sue.Finally,withoutstopwords(i.e.whilealigningcontentwordsonly)recalldropsjustalittleforeachcorpus,whichisexpectedascontentwordsformthevastmajorityofnon-identicalwordalignments.4.2ExtrinsicEvaluationWeextrinsicallyevaluateoursystemontextualsimi-larityidentificationandparaphrasedetection.Herewediscusseachtaskandtheresultsofevaluation.4.2.1SemanticTextualSimilarityGiventwoshortinputtextfragments(commonlysentences),thegoalofthistaskistoidentifythedegreetowhichthetwofragmentsaresemanticallysimilar.The*SEM2013STStask(Agirreetal.,2013)assessedanumberofSTSsystemsonfourtestdatasetsbycomparingtheiroutputscorestohumanannotations.Pearsoncorrelationcoefficientwithhu-manannotationswascomputedindividuallyforeachtestsetandaweightedsumofthecorrelationswasusedasthefinalevaluationmetric(theweightforeachdatasetwasproportionaltoitssize).Weapplyouralignertothetaskbyaligningeachsentencepairandtakingtheproportionofcontentwordsalignedinthetwosentences(bynormalizingwiththeharmonicmeanoftheirnumberofcontentwords)asaproxyoftheirsemanticsimilarity.OnlythreeofthefourSTS2013datasetsarefreelyavail-able4(headlines,OnWN,andFNWN),whichweuseforourexperiments(leavingouttheSMTdataset).4http://ixa2.si.ehu.es/sts/SystemCorrel.%RankHanetal.(2013)73.71(original)JacanaAlign46.266ThisWork67.27Table4:ExtrinsicevaluationonSTS2013dataThesethreesetscontain1500annotatedsentencepairsintotal.Table4showstheresults.Thefirstrowshowstheperformanceofthetopsysteminthetask.Withadirectapplicationofouraligner(noparametertun-ing),ourSTSalgorithmacheivesa67.15%weightedcorrelation,whichwouldearnitthe7thrankamong90participatingsystems.ConsideringthefactthatalignmentisoneofmanycomponentsofSTS,thisresultistrulypromising.Forcomparison,wealsoevaluatethepreviousbestalignernamedJacanaAlign(Yaoetal.,2013a)onSTS2013data(theJacanaAlignpublicrelease5isused,whichisaversionoftheoriginalalignerwithextralexicalresources).Weapplythreedifferentval-uesderivedfromitsoutputasproxiesofsemanticsimilarity:un)alignedcontentwordproportion,b)theViterbidecodingscore,andc)thenormalizeddecod-ingscore.Ofthethree,(b)givesthebestresults,whichweshowinrow2ofTable4.OuraligneroutperformsJacanaAlignbyalargemargin.4.2.2ParaphraseIdentificationThegoalofparaphraseidentificationistodecideiftwosentenceshavethesamemeaning.Theoutputisayes/nodecisioninsteadofareal-valuedsimilarityscoreasinSTS.WeusetheMSRparaphrasecor-pus6(4076devpairs,1725testpairs)(Dolanetal.,2004)toevaluateouralignerandcomparewithotheraligners.Followingearlierwork(MacCartneyetal.,2008;Yaoetal.,2013b),weuseanormalizedalign-mentscoreofthetwosentencestomakeadecisionbasedonathresholdwhichwesetusingthedevset.Alignmentswithahigher-than-thresholdscorearetakentobeparaphrasesandtherestnon-paraphrases.Again,thisisanoversimplifiedapplicationofthealigner,evenmoresothaninSTS,sinceasmallchangeinlinguisticpropertiesoftwosentences(e.g.polarityormodality)canturnthemintonon-5https://code.google.com/p/jacana/6http://research.microsoft.com/en-us/downloads/607d14d9-20cd-47e3-85bc-a2f65cd28042/

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
8
1
5
6
6
8
3
6

/

/
t

je

un
c
_
un
_
0
0
1
7
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

228

SystemAcc.%P%R%F1%Madnanietal.(2012)77.479.089.984.1Yaoetal.(2013un)70.072.688.179.6Yaoetal.(2013b)68.168.695.879.9ThisWork73.476.686.481.2Table5:ExtrinsicevaluationonMSRparaphrasedataparaphrasesdespitehavingahighdegreeofalign-ment.Sothealignerwasnotexpectedtodemonstratestate-of-the-artperformance,butstillitgetscloseasshowninTable5.Thefirstcolumnshowstheaccu-racyofeachsysteminclassifyingtheinputsentencesintooneoftwoclasses:true(paraphrases)andfalse(non-paraphrases).Therestofthecolumnsshowtheperformanceofthesystemforthetrueclassintermsofprecision,recall,andF1score.Italicizednumbersrepresentscoresthatwerenotreportedbytheauthorsofthecorrespondingpapers,buthavebeenrecon-structedfromthereporteddata(andhencearelikelytohavesmallprecisionerrors).Thefirstrowshowsthebestperformancebyanysystemonthetestsettothebestofourknowledge.Thenexttworowsshowtheperformancesoftwostate-of-the-artaligners(performancesofbothsys-temswerereportedin(Yaoetal.,2013b)).Thelastrowshowstheperformanceofouraligner.Al-thoughitdoesworsethanthebestparaphrasesystem,itoutperformstheotheraligners.5DiscussionOurexperimentsrevealthatawordalignerbasedonsimplemeasuresoflexicalandcontextualsimilar-itycandemonstratestate-of-the-artaccuracy.How-ever,asalignmentisfrequentlyacomponentoflargertasks,highaccuracyaloneisnotalwayssufficient.Otherdimensionsofanaligner’susabilityincludespeed,consumptionofcomputingresources,replica-bility,andgeneralizabilitytodifferentapplications.Ourdesigngoalsincludeachievingabalanceamongsuchmultifariousandconflictinggoals.Aspeedadvantageofouralignerstemsfromfor-mulatingtheproblemasone-to-onewordalignmentandthusavoidinganexpensivedecodingphase.Thepresenceofmultiplephasesisoffsetbydiscardingalreadyalignedwordsinsubsequentphases.TheuseofPPDBastheonly(hashable)wordsimilarityresourcehelpsinreducinglatencyaswellasspacerequirements.AsshowninSection4.1.1,furtherspeedupcouldbeachievedwithonlyasmallperfor-mancedegradationbyconsideringonlythetextualneighborhoodassourceofcontextualevidence.However,thetwomajorgoalsthatwebelievethealignerachievestothegreatestextentarereplicabil-ityandgeneralizability.Theeasyreplicabilityofthealignerstemsfromitsuseofonlybasicandfre-quentlyusedNLPmodules(alemmatizer,aPOStagger,anNERmodule,andadependencyparser:allavailableaspartoftheStanfordCoreNLPsuite7;weuseaPythonwrapper8)andasinglewordsimilarityresource(PPDB).Weexperimentallyshowthatthealignercanbesuccessfullyappliedtodifferentalignmentdatasetsaswellasmultipleendtasks.WebelieveadesigncharacteristicthatenhancesthegeneralizabilityofthealignerisitsminimaldependenceontheMSRalignmenttrainingdata,whichoriginatesfromatex-tualentailmentcorpushavinguniquepropertiessuchasdisparitiesinthelengthsoftheinputsentencesandadirectionalnatureoftheirrelationship(i.e.,thepremiseimplyingthehypothesis,butnotviceversa).Arelatedpotentialreasonisthesymmetryofthealigner’soutput(causedbyitsassumptionofnodirectionality)–thefactthatitoutputsthesamesetofalignmentsregardlessoftheorderoftheinputsentences,incontrasttomostexistingaligners.Majorlimitationsofthealignerincludetheinabil-itytoalignphrases,includingmultiwordexpressions.Itisincapableofcapturingandexploitinglongdis-tancedependenciesamongwords(e.g.coreferences).NowordsimilarityresourceisperfectandPPDBisnoexception,thereforecertainwordalignmentsalsoremainundetected.6ConclusionsWeshowhowcontextualevidencecanbeusedtoconstructamonolingualwordalignerwithcertainde-siredproperties,includingstate-of-the-artaccuracy,easyreplicability,andhighgeneralizability.Somepotentialavenuesforfutureworkinclude:allow-ingphrase-levelalignmentviaphrasalsimilarityre-sources(e.g.thephrasalparaphrasesofPPDB),in-cludingothersourcesofsimilaritysuchasvectorspacemodelsorWordNetrelations,expandingtheset7http://nlp.stanford.edu/downloads/corenlp.shtml8https://github.com/dasmith/stanford-corenlp-python

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
8
1
5
6
6
8
3
6

/

/
t

je

un
c
_
un
_
0
0
1
7
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

229

ofdependencyequivalencesand/orusingsemanticroleequivalences,andformulatingouralignmental-gorithmasobjectiveoptimizationratherthangreedysearch.Thealignerisavailablefordownloadathttps://github.com/ma-sultan/monolingual-word-aligner.AcknowledgmentsThismaterialisbasedinpartuponworksupportedbytheNationalScienceFoundationunderGrantNum-bersEHR/0835393andEHR/0835381.Anyopin-ions,résultats,andconclusionsorrecommendationsexpressedinthismaterialarethoseoftheauthor(s)anddonotnecessarilyreflecttheviewsoftheNa-tionalScienceFoundation.ReferencesEnekoAgirre,DanielCer,MonaDiab,AitorGonzalez-Agirre,andWeiweiGuo.2013.*SEM2013sharedtask:SemanticTextualSimilarity.InProceedingsoftheSecondJointConferenceonLexicalandCompu-tationalSemantics.AssociationforComputationalLinguistics,32-43.ColinBannardandChrisCallison-Burch.2005.Para-phrasingwithBilingualParallelCorpora.InProceed-ingsofthe43rdAnnualMeetingonAssociationforComputationalLinguistics.AssociationforComputa-tionalLinguistics,597-604.DanielB¨ar,ChrisBiemann,IrynaGurevych,andTorstenZesch.2012.UKP:computingsemantictextualsim-ilaritybycombiningmultiplecontentsimilaritymea-sures.InProceedingsoftheFirstJointConferenceonLexicalandComputationalSemantics.AssociationforComputationalLinguistics,435-440.RoyBar-Haim,IdoDagan,BillDolan,LisaFerro,DaniloGiampiccolo,BernardoMagnini,andIdanSzpektor.2006.TheSecondPASCALRecognisingTextualEn-tailmentChallenge.InProceedingsofTheSecondPASCALRecognisingTextualEntailmentChallenge.ChrisBrockett.2007.AligningtheRTE2006Cor-pus.TechnicalReportMSR-TR-2007-77,MicrosoftResearch.NathanaelChambers,DanielCer,TrondGrenager,DavidHall,ChloeKiddon,BillMacCartney,Marie-CatherinedeMarneffe,DanielRamage,EricYeh,andChristopherDManning.2007.Learningalignmentsandleverag-ingnaturallogic.InProceedingsoftheACL-PASCALWorkshoponTextualEntailmentandParaphrasingAs-sociationforComputationalLinguistics,165-170.Ming-WeiChang,DanGoldwasser,DanRoth,andVivekSrikumar.2010.DiscriminativeLearningoverCon-strainedLatentRepresentations.InProceedingsofthe2010AnnualConferenceoftheNorthAmericanChap-teroftheAssociationforComputationalLinguisticsAssociationforComputationalLinguistics,429-437.DipanjanDasandNoahA.Smith.2009.ParaphraseIden-ticationasProbabilisticQuasi-SynchronousRecogni-tion.InProceedingsoftheJointConferenceofthe47thAnnualMeetingoftheACLandthe4thInternationalJointConferenceonNaturalLanguageProcessingoftheAFNLP.AssociationforComputationalLinguistics,468-476.Marie-CatherinedeMarneffe,BillMacCartney,andChristopherD.Manning.2006.GeneratingTypedDependencyParsesfromPhraseStructureParses.InProceedingsoftheInternationalConferenceonLan-guageResourcesandEvaluation.449-454.Marie-CatherinedeMarneffeandChristopherD.Man-ning.2008.Stanfordtypeddependenciesmanual.TechnicalReport,StanfordUniversity.BillDolan,ChrisQuirk,andChrisBrockett.2004.Un-supervisedConstructionofLargeParaphraseCorpora:ExploitingMassivelyParallelNewsSources.InPro-ceedingsoftheInternationalConferenceonCompu-tationalLinguistics.AssociationforComputationalLinguistics,350-356.JennyRoseFinkel,TrondGrenager,andChristopherMan-ning.2005.IncorporatingNon-localInformationintoInformationExtractionSystemsbyGibbsSampling.InProceedingsofthe43rdAnnualMeetingoftheAssoci-ationforComputationalLinguistics.AssociationforComputationalLinguistics,363-370.JuriGanitkevitch,BenjaminVanDurme,andChrisCallison-Burch.2013.PPDB:TheParaphraseDatabase.InProceedingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforCom-putationalLinguistics.AssociationforComputationalLinguistics,758-764.LushanHan,AbhayKashyap,TimFinin,JamesMayeld,andJonathanWeese.2013.UMBCEBIQUITY-CORE:SemanticTextualSimilaritySystems.InProceedingsoftheSecondJointConferenceonLexicalandCompu-tationalSemantics,Volume1.AssociationforCompu-tationalLinguistics,44-52.AndrewHicklandJeremyBensley.2007.ADiscourseCommitment-BasedFrameworkforRecognizingTex-tualEntailment.InProceedingsoftheACL-PASCALWorkshoponTextualEntailmentandParaphrasing.As-sociationforComputationalLinguistics,171-176.AndrewHickl,JeremyBensley,JohnWilliams,KirkRoberts,BryanRink,andYingShi.2006.Recog-nizingTextualEntailmentwithLCCsGROUNDHOG

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
8
1
5
6
6
8
3
6

/

/
t

je

un
c
_
un
_
0
0
1
7
8
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

230

System.InProceedingsoftheSecondPASCALChal-lengesWorkshoponRecognizingTextualEntailment.MilenKouylekovandBernardoMagnini.2005.Rec-ognizingtextualentailmentwithtreeeditdistanceal-gorithms.InProceedingsofthePASCALChallengesWorkshop:RecognisingTextualEntailmentChallenge17-20.BillMacCartney,MichelGalley,andChristopherD.Man-ning.2008.APhrase-BasedAlignmentModelforNat-uralLanguageInference.InProceedingsofthe2008ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.AssociationforComputationalLinguistics,802-811.NitinMadnani,JoelTetreault,andMartinChodorow.2012.Re-examiningMachineTranslationMetricsforParaphraseIdentification.InProceedingsof2012Con-ferenceoftheNorthAmericanChapteroftheAssoci-ationforComputationalLinguistics.AssociationforComputationalLinguistics,182-190.KapilThadaniandKathleenMcKeown.2011.OptimalandSyntactically-InformedDecodingforMonolingualPhrase-BasedAlignment.InProceedingsofthe49thAnnualMeetingoftheAssociationforComputationalLinguistics:HumanLanguageTechnologies.Associa-tionforComputationalLinguistics,254-259.KapilThadani,ScottMartin,andMichaelWhite.2012.AJointPhrasalandDependencyModelforParaphraseAlignment.InProceedingsofCOLING2012:Posters.1229-1238.KristinaToutanova,DanKlein,ChristopherD.Manning,andYoramSinger.2003Feature-richPart-of-speechTaggingwithaCyclicDependencyNetworkInPro-ceedingsofthe2003HumanLanguageTechnologyConferenceoftheNorthAmericanChapteroftheAsso-ciationforComputationalLinguistics.AssociationforComputationalLinguistics,173-180.XuchenYao,BenjaminVanDurme,ChrisCallison-Burch,andPeterClark.2013a.ALightweightandHighPer-formanceMonolingualWordAligner.InProceedingsofthe51stAnnualMeetingoftheAssociationforCom-putationalLinguistics.AssociationforComputationalLinguistics,702-707.XuchenYao,BenjaminVanDurme,ChrisCallison-Burch,andPeterClark.2013b.Semi-MarkovPhrase-basedMonolingualAlignment.InProceedingsofthe2013ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.AssociationforComputationalLinguistics,590-600.XuchenYao,BenjaminVanDurme,ChrisCallison-Burch,andPeterClark.2013c.AnswerExtractionasSe-quenceTaggingwithTreeEditDistance.InProceed-ingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguis-tics.AssociationforComputationalLinguistics,858-867.
Télécharger le PDF