Transactions of the Association for Computational Linguistics, vol. 4, pp. 113–125, 2016. Action Editor: Noah Smith.

Transactions of the Association for Computational Linguistics, vol. 4, pp. 113–125, 2016. Action Editor: Noah Smith.
Submission batch: 10/2015; Revision batch: 2/2016; Published 4/2016.

2016 Association for Computational Linguistics. Distributed under a CC-BY 4.0 Licence.

c
(cid:13)

AJointModelforAnswerSentenceRankingandAnswerExtractionMdArafatSultan†VittorioCastelli‡RaduFlorian‡†InstituteofCognitiveScienceandDepartmentofComputerScience,UniversityofColorado,Boulder,CO‡IBMT.J.WatsonResearchCenter,YorktownHeights,NYarafat.sultan@colorado.edu,vittorio@us.ibm.com,raduf@us.ibm.comAbstractAnswersentencerankingandanswerextrac-tionaretwokeychallengesinquestionanswer-ingthathavetraditionallybeentreatediniso-lation,i.e.,asindependenttasks.Inthisarti-cle,nous(1)explainhowbothtasksarerelatedattheircorebyacommonquantity,et(2)proposeasimpleandintuitivejointprobabilis-ticmodelthataddressesbothviajointcom-putationbuttask-speciﬁcapplicationofthatquantity.InourexperimentswithtwoTRECdatasets,ourjointmodelsubstantiallyoutper-formsstate-of-the-artsystemsinbothtasks.1IntroductionOneoftheoriginalgoalsofAIwastobuildmachinesthatcannaturallyinteractwithhumans.Overtime,thechallengesbecameapparentandlanguagepro-cessingemergedasoneofAI’smostpuzzlingareas.Nevertheless,majorbreakthroughshavestillbeenmadeinseveralimportanttasks;withIBM’sWat-son(Ferruccietal.,2010)signiﬁcantlyoutperform-inghumanchampionsinthequizcontestJeopardy!,questionanswering(QA)isdeﬁnitelyonesuchtask.QAcomesinvariousforms,eachsupportingspe-ciﬁckindsofuserrequirements.Considerascenariowhereasystemisgivenaquestionandasetofsen-tenceseachofwhichmayormaynotcontainananswertothatquestion.Thegoalofanswerextrac-tionistoextractapreciseanswerintheformofashortspanoftextinoneormoreofthosesentences.Inthisform,QAmeetsusers’immediateinformationneeds.Answersentenceranking,ontheotherhand,isthetaskofassigningaranktoeachsentencesothattheonesthataremorelikelytocontainananswerarerankedhigher.Inthisform,QAissimilartoinforma-tionretrievalandpresentsgreateropportunitiesforfurtherexplorationandlearning.Inthisarticle,weproposeanovelapproachtojointlysolvingthesetwowell-studiedyetopenQAproblems.Mostanswersentencerankingalgorithmsoperateundertheassumptionthatthedegreeofsyntacticand/orsemanticsimilaritybetweenquestionsandan-swersentencesisasufﬁcientlystrongpredictorofanswersentencerelevance(Wangetal.,2007;Yihetal.,2013;Yuetal.,2014;SeverynandMoschitti,2015).Ontheotherhand,answerextractionalgo-rithmsfrequentlyassesscandidateanswerphrasesbasedprimarilyontheirownpropertiesrelativetothequestion(e.g.,whetherthequestionisawhoquestionandthephrasereferstoaperson),makinginadequateornouseofsentence-levelevidence(Yaoetal.,2013a;SeverynandMoschitti,2013).Boththeseassumptions,cependant,aresimplistic,andfailtocapturethecorerequirementsofthetwotasks.Table1showsaquestion,andthreecandi-dateanswersentencesonlyoneofwhich(S(1))ac-tuallyanswersthequestion.Rankingmodelsthatrelysolelyontextsimilarityarehighlylikelytoin-correctlyassignsimilarrankstoS(1)andS(2).SuchmodelswouldfailtoutilizethekeypieceofevidenceagainstS(2)thatitdoesnotcontainanytemporalinformation,necessarytoanswerawhenquestion.Similarly,anextractionmodelthatreliesonlyonthefeaturesofacandidatephrasemightextractthetem-poralexpression“theyear1666”inS(3)asananswerdespiteaclearlackofsentence-levelevidence.Inviewoftheabove,weproposeajointmodel

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

un
c
_
un
_
0
0
0
8
7
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

114

QWhenwastheHaleBoppcometdiscov-ered?S(1)ThecometwasﬁrstspottedbyHaleandBopp,bothUSastronomers,onJuly22,1995.S(2)Hale-Bopp,alargecomet,wasobservedfortheﬁrsttimeinChina.S(3)Thelawofgravitywasdiscoveredintheyear1666bySirIsaacNewton.Table1:Aquestionandthreecandidateanswersentences.foranswersentencerankingandanswerextractionthatutilizesbothsentenceandphrase-levelevidencetosolveeachtask.Moreconcretely,nous(1)designtask-speciﬁcprobabilisticmodelsforrankingandextraction,exploitingfeaturesofcandidateanswersentencesandtheirphrases,respectivement,et(2)com-binethetwomodelsinasimple,intuitivesteptobuildajointprobabilisticmodelforbothtasks.Thistwo-stepapproachfacilitatesconstructionofnewjointmodelsfromanyexistingsolutionstothetwotasks.OnapubliclyavailableTRECdataset(Wangetal.,2007),ourjointmodeldemonstratesanimprovementinrankingbyover10MAPandMRRscoresoverthecurrentstateoftheart.Italsooutperformsstate-of-the-artextractionsystemsontwoTRECdatasets(Wangetal.,2007;Yaoetal.,2013c).2BackgroundInthissection,weprovideaformaldescriptionofthetwotasksandestablishterminologythatwefollowinlatersections.TheWangetal.(2007)datasethasbeenthebenchmarkformostrecentworkonthetwotasksaswellasourown.Therefore,wesituateourdescriptioninthespeciﬁccontextofthisdataset.Wealsodiscussrelatedpriorwork.2.1AnswerSentenceRankingGivenaquestionQandasetofcandidatean-swersentences{S(1),…,S(N)},thegoalinan-swersentencerankingistoassigneachS(je)anintegerrankQ(S(je))sothatforanypair(je,j),rankQ(S(je))label(1:S(je)containsanan-swertoQ,0:itdoesnot).Asupervisedrankingmodelmustlearntoranktestanswersentencesfromsuchbinaryannotationsinthetrainingdata.Existingmodelsaccomplishthisbylearningtoassignarelevancescoretoeach(Q,S(je))pair;thesescoresthencanbeusedtorankthesentences.QArankerspredominantlyoperateunderthehypothesisthatthisrelevancescoreisafunctionofthesyntac-ticand/orsemanticsimilaritiesbetweenQandS(je).Wangetal.(2007),forexample,learntheprobabilityofgeneratingQfromS(je)usingsyntactictransforma-tionsunderaquasi-synchronousgrammarformalism.ThetreeeditmodelsofHeilmanandSmith(2010)andYaoetal.(2013un)computeminimaltreeeditse-quencestoalignS(je)toQ,anduselogisticregressiontomapfeaturesofeditsequencestoarelevancescore.WangandManning(2010)employstructuredpredic-tiontocomputeprobabilitiesfortreeeditsequences.Yaoetal.(2013b)alignrelatedphrasesinQandeachS(je)usingasemi-MarkovCRFmodelandrankcandidatesbasedontheirdecodingscores.Yihetal.(2013)useanarrayoflexicalsemanticsimilarityresources,fromwhichtheyderivefeaturesforabi-naryclassiﬁer.ConvolutionalneuralnetworkmodelsproposedbyYuetal.(2014)andSeverynandMos-chitti(2015)computedistributionalsemanticvectorsofQandS(je)toassesstheirsemanticsimilarity.Inacontrastingapproach,SeverynandMoschitti(2013)connectthequestionfocuswordinQwithpotentialanswerphrasesinS(je)usingashallowsyn-tactictreerepresentation.Importantly,unlikemostrankers,theirmodelutilizeskeyinformationinin-dividualS(je)phraseswhichencodesthedegreeoftype-compatibilitybetweenQandS(je).ButitfailstorobustlyalignconceptsinQandS(je)duetoasimplisticlemma-matchpolicy.Ourjointmodelfactorsinbothsemanticsimilar-ityandquestion-answertype-compatibilityfeaturesforranking.Moreover,oursemanticsimilarityfea-tures(describedinSection4)areinformedbyrecentthefullformofthefunction:rankQ(S(je),{S(1),...,S(N)}). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 8 7 1 5 6 7 3 6 2 / / t l a c _ a _ 0 0 0 8 7 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 115 advancesintheareaofshorttextsimilarityidentiﬁ-cation(Agirreetal.,2014;Agirreetal.,2015).2.2AnswerExtractionGivenaquestionQandasetofcandidateanswersen-tences{S(1),...,S(N)},thegoalinanswerextractionistoextractfromthelatterashortchunkCoftext(awordorasequenceofcontiguouswords)whichisapreciseanswertoQ.InTable1,“July22,1995”and“1995”inS(1)aretwosuchanswers.Eachpositive(Q,S(je))pairintheWangetal.(2007)datasetisannotatedbyYaoetal.(2013un)withagoldanswerchunkC(je)ginS(je).AssociatedwitheachQisalsoaregexppatternPthatspeci-ﬁesoneormoregoldanswerchunksforQ.Beingaregexppattern,Pcanaccommodatevariantsofagoldanswerchunkaswellasmultiplegoldchunks.Forinstance,thepattern“1995”fortheexampleinTable1matchesboth“July22,1995”and“1995”.AnextractionalgorithmextractsananswerchunkC,whichismatchedagainstPduringevaluation.ExtractionofCisamultistepprocess.Existingsolutionsadoptagenericframework,whichweout-lineinAlgorithm1.IneachS(je),candidateanswerchunksC(je)areﬁrstidentiﬁedandevaluatedaccord-ingtosomecriteria(steps1–4).ThebestchunkC(je)∗inS(je)isthenidentiﬁed(step5).Fromthese“locallybest”chunks,groupsofequivalentchunksareformed(step6),wheresomepredeﬁnedcriteriaforchunkequivalenceareused(e.g.,non-zerowordoverlap).Thequalityofeachgroupiscomputedasanaggre-gateoverthequalitiesofitsmemberchunks(steps7–8),andﬁnallyarepresentativechunkfromthebestgroupisextractedasC(steps9–10).Thereare,cependant,detailsthatneedtobeﬁlledinwithinthisgenericframework,speciﬁcallyinsteps2,4,6and10ofthealgorithm.Solutionsdifferinthesespeciﬁcs.Herewediscusstwostate-of-the-artsystems(Yaoetal.,2013a;SeverynandMoschitti,2013),whicharetheonlysystemsthathavebeenevaluatedontheWangetal.(2007)regexppatterns.Yaoetal.(2013un)useaconditionalrandomﬁeld(CRF)tosimultaneouslyidentifychunks(step2)andcomputetheirφvalues(step4).Theirchunkingfea-turesincludethePOS,DEPandNERtagsofwords.Additionalfeaturesareemployedforchunkqualityestimation,e.g.,thequestiontypeandfocus,prop-ertiesoftheeditoperationassociatedwiththewordAlgorithm1:AnswerExtractionFrameworkInput:1.Q:aquestionsentence.2.{S(1),...,S(N)}:candidateanswersentences.Output:C:ashortandpreciseanswertoQ.1fori∈{1,...,N}do2C(je)←candidatechunksinS(je)3forc∈C(je)do4φ(c)←qualityofcasananswertoQ5C(je)∗←argmaxc∈C(je)(φ(c))6{G(1)C,...,G(M.)C}←groupsofchunksin{C(1)∗,...,C(N)∗}s.t.chunksineachG(je)Caresemanticallyequivalentundersomecriteria7forg∈{G(1)C,...,G(M.)C}do8φ(g)←Pc∈gφ(c)9G(∗)C←argmaxg∈{G(1)C,...,G(M.)C}(φ(g))10C←amemberofG(∗)Caccordingtotheirtreeeditmodel(seeSection2.1),andsoon.SeverynandMoschitti(2013)employatwo-stepprocess.First,theyextractallNPchunksforstep2,asothertypesofchunksrarelycontainanswerstoTREC-stylefactoidquestions.Akernel-basedbinaryclassiﬁeristhentrainedtocomputeascoreforeachchunk(step4).Relationallinksestab-lishedbetweenexpectedanswertypesandcompati-blechunkentitytypes(e.g.,HUM↔PERSON,DATE↔DATE/TIME/NUMBER)providetheinformationnecessaryforclassiﬁcation.Forstep6,bothsystemsrelyonasimplewordoverlapstrategy:chunkswithcommoncontentwordsaregroupedtogether.Neitherarticlediscussesthespeciﬁcsofstep10.Weadheretothisgenericframeworkwithourownmodelsandfeatures;butimportantly,throughtheuseofsentence-levelevidenceinstep4,ourjointmodeldemonstratesasubstantialimprovementinaccuracy.2.3CoupledRankingandExtractionYaoetal.(2013c)presentarankerthatutilizestoken-levelextractionfeatures.Thequestionsentenceisaugmentedwithsuchfeaturestoformulateasearch l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 8 7 1 5 6 7 3 6 2 / / t l a c _ a _ 0 0 0 8 7 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 116 query,whichisfedasinputtoasearchengineforrankedretrievalfromapoolofcandidateanswersen-tences.Theyexperimentallyshowthatdownstreamextractionfromtopretrievalsinthislistismoreac-curatethanifthequeryisnotexpandedwiththeextractionfeatures.Wetakeadifferentapproachwherenumericpre-dictionsfromseparaterankingandextractionmod-ulesarecombinedtojointlyperformbothtasks(Sec-tion3).Yaoetal.buildonanexistingrankerthatsupportsqueryexpansionandtoken-levelcharacter-izationofcandidateanswersentences.Weassumenosuchsystemfeatures,facilitatingcouplingofarbi-trarymodelsincludingnewexperimentalones.Forextraction,Yaoetal.simplyrelyonbetterupstreamranking,whereasourjointmodelprovidesaprecisemathematicalformulationofanswerchunkqualityasafunctionofbothchunkandsentencerelevancetothequestion.Weobservealargeincreaseinend-to-endextractionaccuracyovertheYaoetal.modelinourexperiments.3ApproachWeﬁrsttrainseparateprobabilisticmodelsforan-swersentencerankingandanswerextraction,foreachofwhichwetakeanapproachsimilartothatofexistingmodels.Probabilitieslearnedbythetwotask-speciﬁcmodelsarethencombinedtoconstructourjointmodel.Thissectiondiscussesthedetailsofthistwo-stepprocess.3.1AnswerSentenceRankingLetthefollowinglogisticfunctionrepresenttheprob-abilitythatacandidateanswersentenceS(je)containsananswertoaquestionQ:P.(S(je)|Q)=11+e−θTrfr(Q,S(je))(1)wherefr(Q,S(je))isasetoffeatureseachofwhichisauniquemeasureofsemanticsimilaritybetweenQandS(je),andθristheweightvectorlearneddur-ingmodeltraining.WedescribeourfeaturesetforrankinginSection4.GivenP(S(je)|Q)valuesfori∈{1,...,N},rank-ingisstraightforward:rankQ(S(je))P.(S(j)|Q).Notethatasmallernu-mericvaluerepresentsahigherrank.3.2AnswerExtractionWefollowtheframeworkinAlgorithm1foranswerextraction.Belowwedescribeourimplementationofthegenericsteps:1.Step2:Weadoptthestrategyof(SeverynandMoschitti,2013)ofextractingonlytheNPchunks,forwhichweusearegexpchunker.2.Step4:Thequalityφ(c)ofacandidatechunkcinS(je)isgivenbythefollowinglogisticfunc-tion:φ(c)=P(c|Q,S(je))=11+e−θTefe(Q,S(je),c)(2)wherefe(Q,S(je),c)isthefeaturesetforchunkcrelativetoQ,andθeistheweightvectorlearnedduringmodeltraining.OurfeaturesetforextractionisdescribedinSection5.3.Step6:Givenanexistingsetof(possiblyempty)chunkgroups{G(1)C,…,G(M.)C},anewchunkcisaddedtogroupG(je)C,si(1)allcontentwordsincareinatleastonememberofG(je)C,ou(2)thereexistsamemberofG(je)Callofwhosecontentwordsareinc.Ifnosuchgroupisfound,anewgroupG(M+1)Ciscreatedwithcasitsonlymember.4.Step10:WeextractthelongestchunkinG(∗)CasthebestanswerC.Additionally,weretainonlythetoptofalltheanswercandidatesextractedinstep5topreventprop-agationofnoisychunkstolatersteps.ThevalueoftissetusingtheWangetal.(2007)DEVset.3.3JointRankingandExtractionTheprimarygoalofthejointmodelistofacilitatetheapplicationofbothchunk-levelandsentence-levelfeaturestorankingaswellasextraction.Tothatend,itﬁrstcomputesthejointprobabilitythat(1)S(je)containsananswertoQ,et(2)c∈C(je)isacorrectanswerchunk:P.(S(je),c|Q)=P(S(je)|Q)×P(c|Q,S(je))(3)wherethetwotermsontherighthandsidearegivenbyEquations(1)et(2),respectively.Bothranking

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

un
c
_
un
_
0
0
0
8
7
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

117

andextractionarethendrivenbytask-appropriateapplicationofthiscommonquantity.GivenEquation(3),theconditionforrankingisredeﬁnedasfollows:rankQ(S(je))maxc∈C(j)P.(S(j),c|Q).ThisnewconditionrewardsanS(je)thatnotonlyishighlysemanticallysimilartoQ,butalsocontainsachunkcwhichisalikelyanswertoQ.Forextraction,thejointprobabilityinEquation(3)replacestheconditionalinEquation(2)forstep4ofAlgorithm1:φ(c)=P(S(je),c|Q).Encore,thisnewdeﬁnitionofφ(c)rewardsachunkcthatis(1)type-compatiblewithQ,et(2)well-supportedbythecontentofthecontainingsentenceS(je).Équation(3)assignsequalweighttotherankingandtheextractionmodel.Tolearntheseweightsfromdata,weimplementavariationofthejointmodelthatemploysasecond-levelregressor:P.(S(je),c|Q)=11+e−θT2f2(Q,S(je),c)(4)wherethefeaturevectorf2consistsofthetwoproba-bilitiesinEquations(1)et(2),andθ2istheweightvector.WhileP(S(je),c|Q)iscomputedusingadif-ferentformulainthismodel,themethodsforrankingandextractionbasedonitremainsthesameasabove.Fromhereon,wewillrefertothemodelsinSec-tions3.1and3.2asourstandalonerankingandex-tractionmodels,respectivement,andthemodelsinthissectionasthejointprobabilisticmodel(Équation(3))andthestacked(regression)model(Équation(4)).3.4LearningThestandalonerankingmodelistrainedusingthe0/1labelsassignedto(Q,S(je))pairsintheWangetal.(2007)dataset.Forstandaloneextraction,weusefortrainingthegoldchunkannotationsC(je)gassoci-atedwith(Q,S(je))pairs:acandidateNPchunkinS(je)isconsideredapositiveexamplefor(Q,S(je))iffitcontainsC(je)gandS(je)isanactualanswersentence.Forbothrankingandextraction,thecorrespondingweightvectorθislearnedbyminimizingthefollow-ingL2-regularizedlossfunction:J.(je)=−1TTXi=1(cid:20)oui(je)log(P.(je))+(1−y(je))log(1−P(je))(cid:21)+λkθk2whereTisthenumberoftrainingexamples,oui(je)isthegoldlabelforexampleiandP(je)isthemodel-predictedprobabilityofexampleibeingapositiveexample(givenbyEquations(1)et(2)).Learningofθ2forthestackedmodelworksinasimilarfashion,wherelevel1predictionsfortrainingQApairs(accordingtoEquations(1)et(2))serveasfeaturevectors.4AnswerSentenceRankingFeaturesInsteadofreinventingsimilarityfeaturesforourQAranker,wederiveourfeaturesetfromthewinningsystem(Sultanetal.,2015)attheSemEval2015SemanticTextualSimilarity(STS)task(Agirreetal.,2015).STSisanannuallyheldSemEvalcompetition,wheresystemsoutputreal-valuedsimilarityscoresforinputsentencepairs.Hundredsofsystemshavebeenevaluatedoverthepastfewyears(Agirreetal.,2012;Agirreetal.,2013;Agirreetal.,2014;Agirreetal.,2015);ourchosensystemwasshowntooutperformallothersystemsfromallyearsofSemEvalSTS(Sultanetal.,2015).Inordertocomputethedegreeofsemanticsimi-laritybetweenaquestionQandacandidateanswersentenceS(je),wedrawfeaturesfromtwosources:(1)lexicalalignmentbetweenQandS(je),et(2)vectorrepresentationsofQandS(je),derivedfromtheirwordembeddings.WhiletheoriginalSTSsys-tememploysridgeregression,weusethesefeatureswithinalogisticregressionmodelforQAranking.4.1AlignmentFeaturesWealignrelatedwordsinQandS(je)usingamono-lingualaligneroriginallyproposedbySultanetal.(2014).Herewegiveabriefdescriptionofourimplementation,whichemploysarguablymoreprin-cipledmethodstosolveasetofsubproblems.Seetheoriginalarticleforfurtherdetails.ThealignercomputesforeachwordpairacrossQandS(je)asemanticsimilarityscoresimW∈[0,1]usingPPDB—alargedatabaseoflexicalparaphrasesdevelopedusingbilingualpivoting(Ganitkevitchetal.,2013).Speciﬁcally,itallowsthreedifferentlevelsofsimilarity:1ifthetwowordsortheirlemmasareidentical,avalueppdbSim∈(0,1)ifthewordpairispresentinPPDB(theXXXLdatabase)2,and02http://www.cis.upenn.edu/˜ccb/ppdb/

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

un
c
_
un
_
0
0
0
8
7
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

118

otherwise.ItalsocomputesthedegreeofsimilaritysimCbetweenthetwowords’contextsintheirrespectivesentences.Thissimilarityiscomputedasthesumofwordsimilaritiesintwodifferenttypesofcontexts:(1)adependencyneighborhoodofsize2(i.e.parents,grandparents,childrenandgrandchildren),et(2)asurface-formneighborhoodofsize3(i.e.3wordstotheleftand3wordstotheright).Stopwordsareskippedduringneighborselection.UnliketheSultanetal.(2014)aligner,whichallowsasingleneighborwordtobematchedtomultiplesimilarwordsintheothersentence,wematchneighborsusingamax-weightedbipartitematchingalgorithm,wherewordsimilaritiesserveasedgeweights.EverywordpairacrossQandS(je)receivesaﬁnalweightgivenbyw∗simW+(1−w)∗simC,wherew∈[0,1].WhileSultanetal.useagreedybest-ﬁrstalgorithmtoalignwordsbasedontheseweights,weusethemasedgeweightsinamax-weightedbipartitematchingofwordpairs(detailsfollow).Weadoptthestrategyoftheoriginalalignerofstartingwithhigh-precisionalignmentsandincreas-ingtherecallinlatersteps.Tothisend,wealigninthefollowingorder:(1)identicalwordsequenceswithatleastonecontentword,(2)namedentities,(3)contentwords,et(4)stopwords.Followingtheoriginalaligner,noadditionalcontextmatchingisperformedinstep1sinceasequenceitselfpro-videscontextualevidenceforitstokens.Foreachofsteps2–4,words/entitiesofthecorrespondingtypearealignedusingmax-weightedbipartitematchingasdescribedabove(multiwordnamedentitiesareconsideredsingleunitsinstep2);otherwordtypesandalreadyalignedwordsarediscarded.ThevaluesofwandppdbSimarederivedusingagridsearchonanalignmentdataset(Brockett,2007).GivenalignedwordsintheQApair,ourﬁrstfeaturecomputestheproportionofalignedcontentwordsinQandS(je),combined:simA(Q,S(je))=nac(Q)+nac(S(je))nc(Q)+nc(S(je))wherenac(·)andnc(·)representthenumberofalignedcontentwordsandthetotalnumberofcontentwordsinasentence,respectively.S(je)canbearbitrarilylongandstillcontainanan-swertoQ.Intheabovesimilaritymeasure,longeranswersentencesarepenalizedduetoalargernum-berofunalignedwords.Tocounterthisphenomenon,weaddameasureofcoverageofQbyS(je)totheoriginalfeaturesetofSultanetal.(2015):covA(Q,S(je))=nac(Q)nc(Q)4.2ASemanticVectorFeatureNeuralwordembeddings(Mikolovetal.,2013;Ba-ronietal.,2014;Pennington,2014)havebeenhighlysuccessfulasdistributionalwordrepresentationsintherecentpast.Weutilizethe400-dimensionalwordembeddingsdevelopedbyBaronietal.(2014)3toconstructsentence-levelembeddingsforQandS(je),whichwethencomparetocomputeasimilarityscore.ToconstructthevectorrepresentationVSofagivensentenceS,weﬁrstextractthecontentwordlemmasCS={C(1)S,…,C(M.)S}inS.Thevectorsrepresentingtheselemmasarethenaddedtogeneratethesentencevector:VS=MXi=1VC(je)SFinally,asimilaritymeasureforQandS(je)isderivedbytakingthecosinesimilaritybetweentheirvectorrepresentations:simE(Q,S(je))=VQ·VS(je)|VQ||VS(je)|Thissimplebag-of-wordsmodelwasfoundtoaug-mentthealignment-basedfeaturewellintheevalua-tionsreportedbySultanetal.(2015).simA,covAandsimEconstituteourﬁnalfeatureset.AsweshowinSection6,thissmallfeaturesetoutperformsthecurrentstateoftheartinanswersentenceranking.5AnswerExtractionFeaturesAsmentionedinSection3.2,weconsideronlyNPchunksasanswercandidatesforextraction.Ourchunkfeaturescanbecategorizedintotwobroadgroups,whichwedescribeinthissection.Forthefollowingdiscussion,let(Q,S(je),c)beourquestion,answersentence,answerchunktriple.3http://clic.cimec.unitn.it/composes/semantic-vectors.html

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

un
c
_
un
_
0
0
0
8
7
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

119

5.1Question-IndependentFeaturesThesefeaturesrepresentpropertiesofcindependentofthenatureofQ.Forexample,ourﬁrsttwofeaturesﬁreifallcontentwordsincarepresentinQoraligntowordsinQ.Suchchunksrarelycontainananswer,regardlessofthetypeofQ.Yaoetal.(2013un)reportanobservationthatan-swerchunksoftenappearclosetoalignedcontentwordsofspeciﬁctypesinS(je).Tomodelthisphe-nomenon,weadopttheirfeaturesspecifyingthedis-tanceofcfromthenearestalignedcontentwordwainS(je)andthePOS/DEP/NERtagsofwa.Inaddi-tion,toencodethetotalamountoflocalevidencepresentforc,weemploytheproportionsofalignedcontentwordsinitsdependency(size=2)andsur-face(size=3)contextsinS(je).5.2FeaturesContainingtheQuestionTypeThesefeaturesareoftheform“question-type|x”,wherexcanbeanelementary(i.e.unit)orcompositefeature.Therationaleisthatcertainfeaturesarein-formativeprimarilyinthecontextofcertainquestiontypes(e.g.,alikelyanswertoawhenquestionisachunkcontainingtheNERtagDATE).HeadwordFeatures.WeextracttheheadwordofcanduseitsPOS/DEP/NERtagsasfeatures(ap-pendedtothequestiontype).AheadwordinthesubjectpositionofS(je)orwithPERSONasitsNERtag,forexample,isalikelyanswertoawhoquestion.QuestionFocus.Thequestionfocuswordrepre-sentstheentityaboutwhichthequestionisbeingasked.Forexample,in“Whatisthelargestcoun-tryintheworld?»,thefocuswordis“country”.Forquestiontypeslikewhatandwhich,propertiesofthequestionfocuslargelydeterminethenatureofthean-swer.Intheaboveexample,thefocuswordindicatesthatGPEisalikelyNERtagfortheanswer.Weextractthequestionfocususingarule-basedsystemoriginallydesignedforadifferentapplica-tion,undertheassumptionthataquestioncouldspanmultiplesentences.Therule-basedsystemislooselyinspiredbytheworkofLallyetal.(2012),fromwhichitdiffersradicallybecausethequestionsintheJeopardy!gameareexpressedasanswers.Thefocusextractorﬁrstdeterminesthequestionwordorwords,whichisthenusedinconjunctionwiththeparsetreetodecidewhetherthequestionworditselforsomeotherwordinthesentenceistheactualfocus.WepairtheheadwordPOS/DEP/NERtagswiththefocuswordanditsPOS/NERtags,andaddeachsuchpair(appendedtothequestiontype)toourfeatureset.Thereareninefeatureshere;examplesincludequestion-type|question-focus-word|headword-pos-tagandquestion-type|question-focus-ner-tag|headword-ner-tag.Wealsoemploythetrue/falselabelsofthefollow-ingpropositionsasfeatures(inconjunctionwiththequestiontype):(1)thequestionfocuswordisinc,(2)thequestionfocusPOStagisinthePOStagsofc,et(3)thequestionfocusNERtagisoftheformxorxDESC,andxisintheNERtagsofc,forsomex(e.g.,GPE).ChunkTags.Inmanycases,itisnotthehead-wordofcwhichistheanswer;forexample,inQ:“HowmanystatesarethereintheUS?”andc:“50states”,theheadwordofcis“states”.Toextendourunitofattentionfromtheheadwordtotheen-tirechunk,weﬁrstconstructvocabulariesofPOSandNERtags,VposandVner,fromtrainingdata.ForeachpossibletaginVpos,wethenusethepres-ence/absenceofthattaginthePOStagsequenceforcasafeature(inconjunctionwiththequestiontype).WerepeattheprocessforVner.Fortheabovec,forinstance,aninformativefeaturewhichislikelytoﬁreis:“question-type=how-many|theNERtagsofcincludeCARDINAL”.PartialAlignment.Forsomequestiontypes,partofacorrectanswerchunkisoftenalignedtoaques-tionword(e.g.,Q:“Howmanyplayersareontheﬁeldduringasoccergame?»,c:“22players”).Toin-formourmodelofsuchoccurrences,weemploytwofeatures—true/falselabelsofthefollowingproposi-tions:(1)cispartiallyaligned,(2)cisnotalignedatall(eachinconjunctionwiththequestiontype).6Experiments6.1DataTheWangetal.(2007)corpusiscreatedfromTextREtrievalConference(TREC)8–13QAdata.Itcon-sistsofasetoffactoidquestions,andforeachques-tion,asetofcandidateanswersentences.Eachan-swercandidateisautomaticallydrawnfromalargerdocumentbasedontwoselectioncriteria:(1)anon-zerocontentwordoverlapwiththequestion,ou(2)

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

un
c
_
un
_
0
0
0
8
7
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

120

Dataset#Questions#QAPairs%PositiveTRAIN-ALL1,22953,41712.0TRAIN944,7187.4DEV821,14819.3TEST1001,51718.7Table2:SummaryoftheWangetal.(2007)corpus.amatchwiththegoldregexpanswerpatternforthequestion(trainingonly).TRAINpairsaredrawnfromTREC8–12;DEVandTESTpairsaredrawnfromTREC13.DetailsoftheTRAIN/DEV/TESTsplitaregiveninTable2.TRAIN-ALLisalargesetofautomaticallyjudged(thusnoisy)QApairs:asentenceisconsideredapositiveexampleifitmatchesthegoldanswerpatternforthecorrespondingquestion.TRAINisamuchsmallersubsetofTRAIN-ALL,containingpairsthataremanuallycorrectedforerrors.ManualjudgmentisproducedforDEVandTESTpairs,too.Foranswerextraction,Yaoetal.(2013un)addtoeachQApairthecorrectanswerchunk(s).ThegoldTRECpatternsareusedtoﬁrstidentifyrele-vantchunksineachanswersentence.TRAIN,DEVandTESTarethenmanuallycorrectedforerrors.TheWangetal.(2007)datasetalsocomeswithPOS/DEP/NERtagsforeachsentence.TheyusetheMXPOSTtagger(Ratnaparkhi,1996)forPOStagging,theMSTParser(McDonaldetal.,2005)togeneratetypeddependencytrees,andtheBBNIden-tiﬁnder(Bikeletal.,1999)forNERtagging.Al-thoughwehaveaccesstoastate-of-the-artinforma-tionpipelinethatproducesbettertags,thispaperaimstostudytheeffectoftheproposedmodelsandofourfeaturesonsystemperformance,ratherthanonad-ditionalvariables;donc,tosupportcomparisonwithpriorwork,werelyonthetagsprovidedwiththedatasetforallourexperiments.6.2AnswerSentenceRankingWeadoptthestandardevaluationprocedureandmet-ricsforQArankersreportedintheliterature.6.2.1EvaluationMetricsOurmetricsforrankingareMeanAveragePre-cision(MAP)andMeanReciprocalRank(MRR).Herewedeﬁnebothintermsofsimplermetrics.PrecisionatK.GivenaquestionQandasetofcandidateanswersentences{S(1),…,S(N)},lettheoutputofarankerbe[R.(1),…,R.(N)],sothateachR(je)∈{S(1),…,S(N)}andthepredictedrankofR(je)ishigherthanthepredictedrankofR(j)when-everi95%)areretainedafterthisexclusion.WeusethelogisticregressionimplementationofScikit-learn(Pedregosaetal.,2011)andusetheWangetal.(2007)DEVsettosetC,theregulariza-tionstrengthparameter.Thestandardtrecevalscriptisusedtogenerateallresults.6.2.3ResultsTable3showsperformancesofourrankingmodelsandrecentbaselinesystemsonTEST.OurQAsimi-larityfeatures(i.e.thestandaloneranker)outperformallbaselineswithbothTRAINandTRAIN-ALL,al-thoughtheadditionalnoisyexamplesinthelatterarenotfoundtoimproveresults.Moreimportantly,wegetimprovementsofsub-stantiallylargermagnitudesusingourjointmodels—morethan10MAPandMRRpointsoverthestate-of-the-artsystemofSeverynandMoschitti(2015)withTRAIN-ALLforthejointprobabilisticmodel.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

un
c
_
un
_
0
0
0
8
7
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

121

ModelMAP%MRR%TRAINShnarch(2013)68.6075.40Yihetal.(2013)70.9277.00Yuetal.(2014)70.5878.00Severyn&Moschitti(2015)73.2979.62OurStandaloneModel76.0583.99OurJointProbabilisticModel81.5989.09OurStackedModel80.7786.85TRAIN-ALLYuetal.(2014)71.1378.46Severyn&Moschitti(2015)74.5980.78OurStandaloneModel75.6883.09OurJointProbabilisticModel84.9591.95OurStackedModel82.5690.69Table3:Answersentencerankingresults.Unlikethestandalonemodel,thejointmodelsalsobeneﬁtfromtheadditionalnoisyexamplesinTRAIN-ALL.Theseresultssupportthecentralargumentofthispaperthatjointmodelingisabetterapproachtoanswersentenceranking.6.3AnswerExtractionWefollowtheprocedurereportedinpriorwork(Yaoetal.,2013a;SeverynandMoschitti,2013)toevalu-atetheanswerchunksextractedbythesystem.6.3.1EvaluationMetricsPrecision.Givenasetofquestions,theprecisionofananswerextractionsystemistheproportionofitsextractedanswersthatarecorrect(i.e.matchthecorrespondinggoldregexppattern).Recall.Recallistheproportionofquestionsforwhichthesystemextractedacorrectanswer.F1Score.TheF1scoreistheharmonicmeanofprecisionandrecall.Itcapturesthesystem’saccuracyandcoverageinasinglemetric.6.3.2SetupFollowingpriorwork,nous(1)retainthe89ques-tionsintheWangetal.(2007)TESTsetthathaveatleastonecorrectanswer,et(2)trainonlywithchunksincorrectanswersentencestoavoidextremebiastowardsfalselabels(boththestandaloneextrac-tionmodelandstage2ofthestackedmodel).AsinModelP%R%F1%TRAINYaoetal.(2013un)55.253.954.5Severyn&Moschitti(2013)66.266.266.2OurStandaloneModel62.962.962.9OurJointProbabilisticModel69.769.769.7OurStackedModel62.962.962.9TRAIN-ALLYaoetal.(2013un)63.662.963.3Severyn&Moschitti(2013)70.870.870.8OurStandaloneModel70.870.870.8OurJointProbabilisticModel76.476.476.4OurStackedModel73.073.073.0Table4:AnswerextractionresultsontheWangetal.(2007)testset.ranking,weuseScikit-learnforlogisticregressionandsettheregularizationparameterCusingDEV.6.3.3ResultsTable4showsperformancesofourextractionmod-elsontheWangetal.TESTset.Thejointproba-bilisticmodeldemonstratestopperformanceforbothTRAINandTRAIN-ALL.WithTRAIN-ALL,itcor-rectlyanswers68ofthe89testquestions(5morethanthepreviousbestmodelofSeverynandMos-chitti(2013)).Thestackedmodelalsoperformswellwiththelargertrainingset.Again,theseresultssup-portthecentralclaimofthepaperthatanswerextrac-tioncanbemadebetterthroughjointmodeling.Table5showsperformancesofourstandaloneandjointprobabilisticmodels(trainedonTRAIN-ALL)ondifferentTESTquestiontypes.Thejointmodelisthebetterofthetwoacrosstypes,achievinggoodQuestionTypeCountSTJPwhat3751.456.8when19100.0100.0where11100.090.9who/whom1060.070.0why10.00.0howmany977.8100.0howlong250.0100.0Table5:F1%oftheSTandaloneandtheJointProbabilisticextractionmodelacrossquestiontypes.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

un
c
_
un
_
0
0
0
8
7
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

122

QuestionCandidateAnswerSentenceSTJPHowmanyyearswasJackWelchwithGE?“SixSigmahasgalvanizedourcompanywithanintensitythelikesofwhichIhaveneverseeninmy40yearsatGE,”saidJohnWelch,chairmanofGeneralElectric..517.113SoferventaproselytizerisWelchthatGEhasspentthreeyearsandmorethan$1billiontoconvertallofitsdivisionstotheSixSigmafaith..714.090WhatkindofshipistheLibertyBell7?Newportplanstoretrievetherecoveryvesselﬁrst,thengoafterLibertyBell7,theonlyU.S.mannedspacecraftlostafterasuccessfulmission..838.278“Itwillbeabigrelief”oncethecapsuleisaboardship,CurtNewportsaidbeforesettingsailThursday..388.003Table6:ScorescomputedbytheSTandaloneandtheJointProbabilisticmodelforcandidatechunks(boldfaced)infour(Wangetal.,2007)testsentences.Jointmodelscoresfornon-answerchunks(rows2and4)aremuchlower.resultsonallquestiontypesexceptwhat.Aparticularlychallengingsubtypeofwhatques-tionsarewhatbequestions,answerstowhichoftengobeyondNPchunkboundaries.Ahuman-extractedanswertothequestion“WhatisMuslimBrother-hood’sgoal?”intheWangetal.corpus(2007),forexample,is“advocatesturningEgyptintoastrictMuslimstatebypoliticalmeans.”Whatingeneralisneverthelessthemostdifﬁcultquestiontype,sinceunlikequestionslikewhoorwhen,answersdonothavestrictcategories(e.g.,aﬁxedsetofNERtags).6.3.4QualitativeAnalysisWecloselyexamineQApairsforwhichthejointprobabilisticmodelextractsacorrectanswerchunkbutthestandalonemodeldoesnot.Table6showstwosuchquestions,withtwocandidateanswersentencesforeach.Candidateanswerchunksareboldfaced.Fortheﬁrstquestion,onlythesentenceinrow1containsananswer.Thestandalonemodelassignsahigherscoretothenon-answerchunkinrow2,buttheuseofsentence-levelfeaturesenablesthejointmodeltoidentifythemorerelevantchunkinrow1.Notethatthejointmodelscore,beingaproductoftwoprobabilities,isalwayslowerthanthestandalonemodelscore.However,onlytherelativescoremattersinthiscase,asthechunkwiththehighestoverallscoreiseventuallyselectedforextraction.Forthesecondquestion,bothmodelscomputealowerscoreforthenon-answerchunk“CurtNewport”thantheanswerchunk“mannedspacecraft”.How-ever,theincorrectchunkappearsinseveralcandidateanswersentences(notshownhere),resultinginaModelP%R%F1%Yaoetal.(2013c)35.417.223.1OurJointProbabilisticModel83.883.883.8Table7:PerformancesoftwojointextractionmodelsontheYaoetal.(2013c)testset.highoverallscoreforthestandalonemodel(Algo-rithm1:steps7and8).Thejointmodelassignsamuchlowerscoretoeachinstanceofthischunkduetoweaksentence-levelevidence,eventuallyresultingintheextractionofthecorrectchunk.6.3.5ASecondExtractionDatasetYaoetal.(2013c)reportanextractiondatasetcon-taining99testquestions,derivedfromtheMIT109testcollection(LinandKatz,2006)ofTRECpairs.Eachquestioninthisdatasethas10candidatean-swersentences.Wecomparetheperformanceofourjointprobabilisticmodelwiththatoftheirextractionmodel,whichextractsanswersfromtopcandidatesentencesidentiﬁedbytheircoupledranker(Sec-tion2.3).4Modelsaretrainedontheirtrainingsetof2,205questionsand22,043candidateQApairs.AsshowninTable7,ourmodeloutperformstheYaoetal.modelbyasurprisinglylargemargin,correctlyanswering83ofthe99testquestions.Interestingly,ourstandalonemodelextractssixmorecorrectanswersinthisdatasetthanthejoint4Wecomparewithonlytheirextractionmodel,asthelargerrankingdatasetisnotavailableanymore.Precisionandrecallarereportedathttp://cs.jhu.edu/˜xuchen/packages/jacana-ir-acl2013-data-results.tar.bz2.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

un
c
_
un
_
0
0
0
8
7
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

123

CandidateAnswerSentenceSTJPAnotherperkisgettingtoworkwithhisson,BarryVanDyke,whohasaregularroleasDetectiveSteveSloanon“Diagnosis”..861.338ThisisonlythethirdtimeinschoolhistorytheRaidershavebegunaseason6-0andtheﬁrstsince1976,whenSteveSloan,inhissecondseasonascoach,ledthemtoan8-0startand10-2overallrecord..494.010HealsorepresentedseveralAlabamacoaches,includingRayPerkins,BillCurry,SteveSloanandWimpSanderson..334.007BartStarr,JoeNamath,KenStabler,SteveSloan,ScottHunterandWalterLewisarebutafewofthelegendsonthewalloftheCrimsonTidequarterbackscoach..334.009Table8:ScorescomputedbytheSTandaloneandtheJointProbabilisticmodelforNPchunks(boldfaced)infourYaoetal.(2013c)testsentencesforthequestion:Whoisthedetectiveon‘DiagnosisMurder’?Thestandalonemodelassignshighprobabilitiestonon-answerchunksinthelastthreesentences,subsequentlycorrectedbythejointmodel.model.Acloseexaminationrevealsthatinallsixcases,thisiscausedbythepresenceofcorrectan-swerchunksinnon-answersentences.Table8showsanexample,wherethecorrectanswerchunk“SteveSloan”appearsinallfourcandidatesentences,ofwhichonlytheﬁrstisactuallyrelevanttotheques-tion.Thestandalonemodelassignshighscorestoallfourinstancesandasaresultobservesahighoverallscoreforthechunk.Thejointmodel,ontheotherhand,recognizesthefalsepositives,andcon-sequentlyobservesasmalleroverallscoreforthechunk.However,thisdesiredbehavioreventuallyresultsinawrongextraction.Theseresultshavekeyimplicationsfortheevaluationofanswerextractionsystems:metricsthatassessperformanceonindivid-ualQApairscanenableﬁner-grainedevaluationthanwhatend-to-endextractionmetricsoffer.7DiscussionOurtwo-stepapproachtojointmodeling,consist-ingofconstructingseparatemodelsforrankingandextractionﬁrstandthencouplingtheirpredictions,offersatleasttwoadvantages.First,predictionsfromanygivenpairofrankingandextractionsystemscanbecombined,sincesuchsystemsmustcomputeascoreforaQApairorananswerchunkinordertodifferentiateamongcandidates.CouplingoftherankingandextractionsystemsofYaoetal.(2013un)andSeverynandMoschitti(2013),forexample,isstraightforwardwithinourframework.Second,thisapproachsupportstheuseoftask-appropriatetrainingdataforrankingandextraction,whichcanprovidekeyadvantage.Forexample,whileanswersentencerankingsystemsusebothcorrectandincorrectcan-didateanswersentencesformodeltraining,existinganswerextractionsystemsdiscardthelatterinordertomaintaina(relatively)balancedclassdistribution(Yaoetal.,2013a;SeverynandMoschitti,2013).Throughtheseparationoftherankingandextrac-tionmodelsduringtraining,ourapproachnaturallysupportssuchtask-speciﬁcsamplingoftrainingdata.ApotentiallylimitingfactorinourextractionmodelistheassumptionthatanswersarealwaysexpressedneatlyinNPchunks.Whilemodelsthatmakenosuchassumptionexist(e.g.,theCRFmodelofYaoetal.(2013un)),extractionoflonganswers(suchastheonediscussedinSection6.3.3)isstilldifﬁcultinpracticeduetotheirunconstrainednature.8ConclusionsandFutureWorkWepresentajointmodelfortheimportantQAtasksofanswersentencerankingandanswerextraction.Byexploitingtheinterconnectednatureofthetwotasks,ourmodeldemonstratessubstantialperfor-manceimprovementsoverpreviousbestsystemsforboth.Additionally,ourrankingmodelappliesrecentadvancesinthecomputationofshorttextsimilaritytoQA,providingstrongersimilarityfeatures.Anobviousdirectionforfutureworkistheinclu-sionofnewfeaturesforeachtask.Answersentenceranking,forexample,canbeneﬁtfromphrasalalign-mentandlong-distancecontextrepresentation.An-swerextractionforwhatquestionscanbemadebetterusingalexicalanswertypefeature,orworldknowl-

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

un
c
_
un
_
0
0
0
8
7
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

124

bord(suchas“blueisacolor”)derivedfromsemanticnetworkslikeWordNet.Ourmodelalsofacilitatesstraightforwardintegrationoffeatures/predictionsfromotherexistingsystemsforbothtasks,forex-ample,theconvolutionalneuralsentencemodelofSeverynandMoschitti(2015)forranking.Finally,moresophisticatedtechniquesarerequiredforextrac-tionoftheﬁnalanswerchunkbasedonindividualchunkscoresacrossQApairs.AcknowledgmentsWethankthereviewersfortheirvaluablecommentsandsuggestions.WealsothankXuchenYaoandAliakseiSeverynforclariﬁcationoftheirwork.ReferencesEnekoAgirre,DanielCer,MonaDiab,andAitorGonzalez-Agirre.2012.SemEval-2012task6:APilotonSemanticTextualSimilarity.InProceedingsoftheSixthInternationalWorkshoponSemanticEvaluation,pages385–393,Montreal,Canada.EnekoAgirre,DanielCer,MonaDiab,AitorGonzalez-Agirre,andWeiweiGuo.2013.*SEM2013SharedTask:SemanticTextualSimilarity.InProceedingsoftheSecondJointConferenceonLexicalandComputa-tionalSemantics,pages32–43,Atlanta,Georgia,USA.EnekoAgirre,CarmenBanea,ClaireCardie,DanielCer,MonaDiab,AitorGonzalez-Agirre,WeiweiGuo,RadaMihalcea,GermanRigau,andJanyceWiebe.2014.SemEval-2014Task10:MultilingualSemanticTex-tualSimilarity.InProceedingsofthe8thInterna-tionalWorkshoponSemanticEvaluation,pages81–91,Dublin,Ireland.EnekoAgirre,CarmenBanea,ClaireCardie,DanielCer,MonaDiab,AitorGonzalez-Agirre,WeiweiGuo,I˜nigoLopez-Gazpio,MontseMaritxalar,RadaMihalcea,Ger-manRigau,LarraitzUria,andJanyceWiebe.2015.SemEval-2015Task2:SemanticTextualSimilarity,English,SpanishandPilotonInterpretability.InPro-ceedingsofthe9thInternationalWorkshoponSemanticEvaluation,pages252–263,Denver,Colorado,USA.MarcoBaroni,GeorgianaDinu,andGerm´anKruszewski.2014.Don’tCount,Predict!ASystematicComparisonofContext-Countingvs.Context-PredictingSemanticVectors.InProceedingsofthe52ndAnnualMeetingoftheAssociationforComputationalLinguistics,pages238–247,Baltimore,Maryland,USA.DanielM.Bikel,RichardSchwartz,andRalphM.Weischedel.1999.AnAlgorithmthatLearnswhat’sinaName.MachineLearning,34(1-3):211–231.ChrisBrockett.2007.AligningtheRTE2006Cor-pus.TechnicalReportMSR-TR-2007-77,MicrosoftResearch.DavidFerrucci,EricBrown,JenniferChu-Carroll,JamesFan,DavidGondek,AdityaA.Kalyanpur,AdamLally,J.WilliamMurdock,EricNyberg,JohnPrager,NicoSchlaefer,andChrisWelty.2010.BuildingWatson:AnOverviewoftheDeepQAProject.AIMagazine,31(3),pages59–79.JuriGanitkevitch,BenjaminVanDurme,andChrisCallison-Burch.2013.PPDB:TheParaphraseDatabase.InProceedingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforCompu-tationalLinguistics,pages758–764,Atlanta,Georgia,USA.MichaelHeilmanandNoahA.Smith2010.TreeEditModelsforRecognizingTextualEntailments,Para-phrases,andAnswerstoQuestions.InProceedingsofthe2010ConferenceoftheNorthAmericanChap-teroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies,pages1011–1019,LosAngeles,California,USA.AdamLally,JohnM.Prager,MichaelC.McCord,Bran-imirK.Boguraev,SiddharthPatwardhan,JamesFan,PaulFodor,andJenniferChu-Carroll.2012.QuestionAnalysis:HowWatsonReadsaClue.IBMJournalofResearchandDevelopment,56(3.4):2:1–2:14.JimmyLinandBorisKatz.2006.BuildingaReusableTestCollectionforQuestionAnswering.JournaloftheAmericanSocietyforInformationScienceandTechnol-ogy,57(7):851–861.RyanMcDonald,KobyCrammer,andFernadoPereira.2005.OnlineLarge-MarginTrainingofDependencyParsers.InProceedingsofthe43stAnnualMeetingoftheAssociationforComputationalLinguistics,AnnArbor,Michigan,USA.TomasMikolov,KaiChen,GregCorrado,andJeffreyDean.2013.EfﬁcientEstimationofWordRepresenta-tionsinVectorSpace.InProceedingsoftheInterna-tionalConferenceonLearningRepresentationsWork-shop,Scottsdale,Arizona,USA.FabianPedregosa,Ga¨elVaroquaux,AlexandreGramfort,VincentMichel,BertrandThirion,OlivierGrisel,Math-ieuBlondel,PeterPrettenhofer,RonWeiss,VincentDubourg,JakeVanderplas,AlexandrePassos,DavidCournapeau,MatthieuBrucher,MatthieuPerrot,and´EdouardDuchesnay.2011.Scikit-learn:MachineLearninginPython.JournalofMachineLearningRe-search,vol.12,pages2825–2830.JeffreyPennington,RichardSocher,andChristopherD.Manning.2014.GloVe:GlobalVectorsforWordRepresentation.InProceedingsofthe2014ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages1532–1543,Doha,Qatar.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

un
c
_
un
_
0
0
0
8
7
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

125

AdwaitRatnaparkhi.1996.AMaximumEntropyModelforPart-of-SpeechTagging.InProceedingsofthe1996ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages133–142,Philadelphia,Pennsylva-nia,USA.AliakseiSeverynandAlessandroMoschitti.2013.Auto-maticFeatureEngineeringforAnswerSelectionandExtraction.InProceedingsofthe2013ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages458–467,Seattle,Washington,USA.AliakseiSeverynandAlessandroMoschitti.2015.Learn-ingtoRankShortTextPairswithConvolutionalDeepNeuralNetworks.InProceedingsofthe38thInterna-tionalACMSIGIRConferenceonResearchandDe-velopmentinInformationRetrieval,pages373–382,Santiago,Chile.EyalShnarch.2013.ProbabilisticModelsforLexicalInference.PhDThesis,BarIlanUniversity.MdArafatSultan,StevenBethard,andTamaraSumner.2014.BacktoBasicsforMonolingualAlignment:ExploitingWordSimilarityandContextualEvidence.TransactionsoftheAssociationforComputationalLin-guistics,2,pages219–230.MdArafatSultan,StevenBethard,andTamaraSum-ner.2015.DLS@CU:SentenceSimilarityfromWordAlignmentandSemanticVectorComposition.InPro-ceedingsofthe9thInternationalWorkshoponSemanticEvaluation,pages148–153,Denver,Colorado,USA.MengqiuWang,NoahA.Smith,andTerukoMitamura.2007.WhatistheJeopardyModel?AQuasi-SynchronousGrammarforQA.InProceedingsofthe2007JointConferenceonEmpiricalMethodsinNaturalLanguageProcessingandComputationalNat-uralLanguageLearning,pages22–32,Prague,CzechRepublic.MengqiuWang,andChristopherD.Manning.2010.Prob-abilisticTree-EditModelswithStructuredLatentVari-ablesforTextualEntailmentandQuestionAnswering.InProceedingsofthe23rdInternationalConferenceonComputationalLinguistics,pages1164–1172,Beijing,China.XuchenYao,BenjaminVanDurme,ChrisCallison-Burch,andPeterClark.2013a.AnswerExtractionasSe-quenceTaggingwithTreeEditDistance.InProceed-ingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguis-tics:HumanLanguageTechnologies,pages858–867,Atlanta,Georgia,USA.XuchenYao,BenjaminVanDurme,ChrisCallison-Burch,andPeterClark.2013b.Semi-MarkovPhrase-BasedMonolingualAlignment.InProceedingsofthe2013ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages590–600,Seattle,Washington,USA.XuchenYao,BenjaminVanDurme,andPeterClark.2013c.AutomaticCouplingofAnswerExtractionandInformationRetrieval.InProceedingsofthe51stAn-nualMeetingoftheAssociationforComputationalLin-guistics,pages159–165,Soﬁa,Bulgaria.Wen-tauYih,Ming-WeiChang,ChristopherMeek,andAndrzejPastusiak.2013.QuestionAnsweringusingEnhancedLexicalSemanticModels.InProceedingsofthe51stAnnualMeetingoftheAssociationforCompu-tationalLinguistics,pages1744–1753,Soﬁa,Bulgaria.LeiYu,KarlMoritzHermann,PhilBlunsom,andStephenPulman.2014.DeepLearningforAnswerSentenceSe-lection.InProceedingsoftheDeepLearningandRep-resentationLearningWorkshop,NIPS2014,Montr´eal,Canada.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
toi

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

un
c
_
un
_
0
0
0
8
7
p
d

b
oui
g
toi
e
s
t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

126
Télécharger le PDF