Transactions of the Association for Computational Linguistics, vol. 4, pp. 113–125, 2016. Action Editor: Noah Smith. - IA de Investigación especializada en el MIT

Transacciones de la Asociación de Lingüística Computacional, volumen. 4, páginas. 113–125, 2016. Editor de acciones: Noah Smith.
Lote de envío: 10/2015; Lote de revisión: 2/2016; Publicado 4/2016.

2016 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia.

C
(cid:13)

AJointModelforAnswerSentenceRankingandAnswerExtractionMdArafatSultan†VittorioCastelli‡RaduFlorian‡†InstituteofCognitiveScienceandDepartmentofComputerScience,UniversityofColorado,Roca,CO‡IBMT.J.WatsonResearchCenter,YorktownHeights,NYarafat.sultan@colorado.edu,vittorio@us.ibm.com,raduf@us.ibm.comAbstractAnswersentencerankingandanswerextrac-tionaretwokeychallengesinquestionanswer-ingthathavetraditionallybeentreatediniso-lation,i.e.,asindependenttasks.Inthisarti-cle,nosotros(1)explainhowbothtasksarerelatedattheircorebyacommonquantity,y(2)proposeasimpleandintuitivejointprobabilis-ticmodelthataddressesbothviajointcom-putationbuttask-speciﬁcapplicationofthatquantity.InourexperimentswithtwoTRECdatasets,ourjointmodelsubstantiallyoutper-formsstate-of-the-artsystemsinbothtasks.1IntroductionOneoftheoriginalgoalsofAIwastobuildmachinesthatcannaturallyinteractwithhumans.Overtime,thechallengesbecameapparentandlanguagepro-cessingemergedasoneofAI’smostpuzzlingareas.Nevertheless,majorbreakthroughshavestillbeenmadeinseveralimportanttasks;withIBM’sWat-son(Ferruccietal.,2010)signiﬁcantlyoutperform-inghumanchampionsinthequizcontestJeopardy!,questionanswering(control de calidad)isdeﬁnitelyonesuchtask.QAcomesinvariousforms,eachsupportingspe-ciﬁckindsofuserrequirements.Considerascenariowhereasystemisgivenaquestionandasetofsen-tenceseachofwhichmayormaynotcontainananswertothatquestion.Thegoalofanswerextrac-tionistoextractapreciseanswerintheformofashortspanoftextinoneormoreofthosesentences.Inthisform,QAmeetsusers’immediateinformationneeds.Answersentenceranking,por otro lado,isthetaskofassigningaranktoeachsentencesothattheonesthataremorelikelytocontainananswerarerankedhigher.Inthisform,QAissimilartoinforma-tionretrievalandpresentsgreateropportunitiesforfurtherexplorationandlearning.Inthisarticle,weproposeanovelapproachtojointlysolvingthesetwowell-studiedyetopenQAproblems.Mostanswersentencerankingalgorithmsoperateundertheassumptionthatthedegreeofsyntacticand/orsemanticsimilaritybetweenquestionsandan-swersentencesisasufﬁcientlystrongpredictorofanswersentencerelevance(Wangetal.,2007;Yihetal.,2013;Yuetal.,2014;SeverynandMoschitti,2015).Por otro lado,answerextractionalgo-rithmsfrequentlyassesscandidateanswerphrasesbasedprimarilyontheirownpropertiesrelativetothequestion(e.g.,whetherthequestionisawhoquestionandthephrasereferstoaperson),makinginadequateornouseofsentence-levelevidence(Yaoetal.,2013a;SeverynandMoschitti,2013).Boththeseassumptions,sin embargo,aresimplistic,andfailtocapturethecorerequirementsofthetwotasks.Table1showsaquestion,andthreecandi-dateanswersentencesonlyoneofwhich(S(1))ac-tuallyanswersthequestion.Rankingmodelsthatrelysolelyontextsimilarityarehighlylikelytoin-correctlyassignsimilarrankstoS(1)andS(2).SuchmodelswouldfailtoutilizethekeypieceofevidenceagainstS(2)thatitdoesnotcontainanytemporalinformation,necessarytoanswerawhenquestion.Similarly,anextractionmodelthatreliesonlyonthefeaturesofacandidatephrasemightextractthetem-poralexpression“theyear1666”inS(3)asananswerdespiteaclearlackofsentence-levelevidence.Inviewoftheabove,weproposeajointmodel

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

a
C
_
a
_
0
0
0
8
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

114

QWhenwastheHaleBoppcometdiscov-ered?S(1)ThecometwasﬁrstspottedbyHaleandBopp,bothUSastronomers,onJuly22,1995.S(2)Hale-Bopp,alargecomet,wasobservedfortheﬁrsttimeinChina.S(3)Thelawofgravitywasdiscoveredintheyear1666bySirIsaacNewton.Table1:Aquestionandthreecandidateanswersentences.foranswersentencerankingandanswerextractionthatutilizesbothsentenceandphrase-levelevidencetosolveeachtask.Moreconcretely,nosotros(1)designtask-speciﬁcprobabilisticmodelsforrankingandextraction,exploitingfeaturesofcandidateanswersentencesandtheirphrases,respectivamente,y(2)com-binethetwomodelsinasimple,intuitivesteptobuildajointprobabilisticmodelforbothtasks.Thistwo-stepapproachfacilitatesconstructionofnewjointmodelsfromanyexistingsolutionstothetwotasks.OnapubliclyavailableTRECdataset(Wangetal.,2007),ourjointmodeldemonstratesanimprovementinrankingbyover10MAPandMRRscoresoverthecurrentstateoftheart.Italsooutperformsstate-of-the-artextractionsystemsontwoTRECdatasets(Wangetal.,2007;Yaoetal.,2013c).2BackgroundInthissection,weprovideaformaldescriptionofthetwotasksandestablishterminologythatwefollowinlatersections.TheWangetal.(2007)datasethasbeenthebenchmarkformostrecentworkonthetwotasksaswellasourown.Therefore,wesituateourdescriptioninthespeciﬁccontextofthisdataset.Wealsodiscussrelatedpriorwork.2.1AnswerSentenceRankingGivenaquestionQandasetofcandidatean-swersentences{S(1),…,S(norte)},thegoalinan-swersentencerankingistoassigneachS(i)anintegerrankQ(S(i))sothatforanypair(i,j),rankQ(S(i))label(1:S(i)containsanan-swertoQ,0:itdoesnot).Asupervisedrankingmodelmustlearntoranktestanswersentencesfromsuchbinaryannotationsinthetrainingdata.Existingmodelsaccomplishthisbylearningtoassignarelevancescoretoeach(q,S(i))pair;thesescoresthencanbeusedtorankthesentences.QArankerspredominantlyoperateunderthehypothesisthatthisrelevancescoreisafunctionofthesyntac-ticand/orsemanticsimilaritiesbetweenQandS(i).Wangetal.(2007),forexample,learntheprobabilityofgeneratingQfromS(i)usingsyntactictransforma-tionsunderaquasi-synchronousgrammarformalism.ThetreeeditmodelsofHeilmanandSmith(2010)andYaoetal.(2013a)computeminimaltreeeditse-quencestoalignS(i)toQ,anduselogisticregressiontomapfeaturesofeditsequencestoarelevancescore.WangandManning(2010)employstructuredpredic-tiontocomputeprobabilitiesfortreeeditsequences.Yaoetal.(2013b)alignrelatedphrasesinQandeachS(i)usingasemi-MarkovCRFmodelandrankcandidatesbasedontheirdecodingscores.Yihetal.(2013)useanarrayoflexicalsemanticsimilarityresources,fromwhichtheyderivefeaturesforabi-naryclassiﬁer.ConvolutionalneuralnetworkmodelsproposedbyYuetal.(2014)andSeverynandMos-chitti(2015)computedistributionalsemanticvectorsofQandS(i)toassesstheirsemanticsimilarity.Inacontrastingapproach,SeverynandMoschitti(2013)connectthequestionfocuswordinQwithpotentialanswerphrasesinS(i)usingashallowsyn-tactictreerepresentation.Importantly,unlikemostrankers,theirmodelutilizeskeyinformationinin-dividualS(i)phraseswhichencodesthedegreeoftype-compatibilitybetweenQandS(i).ButitfailstorobustlyalignconceptsinQandS(i)duetoasimplisticlemma-matchpolicy.Ourjointmodelfactorsinbothsemanticsimilar-ityandquestion-answertype-compatibilityfeaturesforranking.Moreover,oursemanticsimilarityfea-tures(describedinSection4)areinformedbyrecentthefullformofthefunction:rankQ(S(i),{S(1),...,S(norte)}). yo D oh w norte oh a d mi d F r oh metro h t t pag : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 0 8 7 1 5 6 7 3 6 2 / / t yo a C _ a _ 0 0 0 8 7 pag d . F b y gramo tu mi s t t oh norte 0 7 S mi pag mi metro b mi r 2 0 2 3 115 advancesintheareaofshorttextsimilarityidentiﬁ-cation(Agirreetal.,2014;Agirreetal.,2015).2.2AnswerExtractionGivenaquestionQandasetofcandidateanswersen-tences{S(1),...,S(norte)},thegoalinanswerextractionistoextractfromthelatterashortchunkCoftext(awordorasequenceofcontiguouswords)whichisapreciseanswertoQ.InTable1,“July22,1995”and“1995”inS(1)aretwosuchanswers.Eachpositive(q,S(i))pairintheWangetal.(2007)datasetisannotatedbyYaoetal.(2013a)withagoldanswerchunkC(i)ginS(i).AssociatedwitheachQisalsoaregexppatternPthatspeci-ﬁesoneormoregoldanswerchunksforQ.Beingaregexppattern,Pcanaccommodatevariantsofagoldanswerchunkaswellasmultiplegoldchunks.Forinstance,thepattern“1995”fortheexampleinTable1matchesboth“July22,1995”and“1995”.AnextractionalgorithmextractsananswerchunkC,whichismatchedagainstPduringevaluation.ExtractionofCisamultistepprocess.Existingsolutionsadoptagenericframework,whichweout-lineinAlgorithm1.IneachS(i),candidateanswerchunksC(i)areﬁrstidentiﬁedandevaluatedaccord-ingtosomecriteria(steps1–4).ThebestchunkC(i)∗inS(i)isthenidentiﬁed(step5).Fromthese“locallybest”chunks,groupsofequivalentchunksareformed(step6),wheresomepredeﬁnedcriteriaforchunkequivalenceareused(e.g.,non-zerowordoverlap).Thequalityofeachgroupiscomputedasanaggre-gateoverthequalitiesofitsmemberchunks(steps7–8),andﬁnallyarepresentativechunkfromthebestgroupisextractedasC(steps9–10).Thereare,sin embargo,detailsthatneedtobeﬁlledinwithinthisgenericframework,speciﬁcallyinsteps2,4,6and10ofthealgorithm.Solutionsdifferinthesespeciﬁcs.Herewediscusstwostate-of-the-artsystems(Yaoetal.,2013a;SeverynandMoschitti,2013),whicharetheonlysystemsthathavebeenevaluatedontheWangetal.(2007)regexppatterns.Yaoetal.(2013a)useaconditionalrandomﬁeld(CRF)tosimultaneouslyidentifychunks(step2)andcomputetheirφvalues(step4).Theirchunkingfea-turesincludethePOS,DEPandNERtagsofwords.Additionalfeaturesareemployedforchunkqualityestimation,e.g.,thequestiontypeandfocus,prop-ertiesoftheeditoperationassociatedwiththewordAlgorithm1:AnswerExtractionFrameworkInput:1.q:aquestionsentence.2.{S(1),...,S(norte)}:candidateanswersentences.Output:C:ashortandpreciseanswertoQ.1fori∈{1,...,norte}do2C(i)←candidatechunksinS(i)3forc∈C(i)do4φ(C)←qualityofcasananswertoQ5C(i)∗←argmaxc∈C(i)(Fi(C))6{GRAMO(1)C,...,GRAMO(METRO)C}←groupsofchunksin{C(1)∗,...,C(norte)∗}s.t.chunksineachG(i)Caresemanticallyequivalentundersomecriteria7forg∈{GRAMO(1)C,...,GRAMO(METRO)C}do8φ(gramo)←Pc∈gφ(C)9GRAMO(∗)C←argmaxg∈{GRAMO(1)C,...,GRAMO(METRO)C}(Fi(gramo))10C←amemberofG(∗)Caccordingtotheirtreeeditmodel(seeSection2.1),andsoon.SeverynandMoschitti(2013)employatwo-stepprocess.First,theyextractallNPchunksforstep2,asothertypesofchunksrarelycontainanswerstoTREC-stylefactoidquestions.Akernel-basedbinaryclassiﬁeristhentrainedtocomputeascoreforeachchunk(step4).Relationallinksestab-lishedbetweenexpectedanswertypesandcompati-blechunkentitytypes(e.g.,HUM↔PERSON,DATE↔DATE/TIME/NUMBER)providetheinformationnecessaryforclassiﬁcation.Forstep6,bothsystemsrelyonasimplewordoverlapstrategy:chunkswithcommoncontentwordsaregroupedtogether.Neitherarticlediscussesthespeciﬁcsofstep10.Weadheretothisgenericframeworkwithourownmodelsandfeatures;butimportantly,throughtheuseofsentence-levelevidenceinstep4,ourjointmodeldemonstratesasubstantialimprovementinaccuracy.2.3CoupledRankingandExtractionYaoetal.(2013C)presentarankerthatutilizestoken-levelextractionfeatures.Thequestionsentenceisaugmentedwithsuchfeaturestoformulateasearch l D o w n o a d e d f r o m h t t p : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 0 8 7 1 5 6 7 3 6 2 / / t yo a C _ a _ 0 0 0 8 7 pag d . F b y gramo tu mi s t t oh norte 0 7 S mi pag mi metro b mi r 2 0 2 3 116 query,whichisfedasinputtoasearchengineforrankedretrievalfromapoolofcandidateanswersen-tences.Theyexperimentallyshowthatdownstreamextractionfromtopretrievalsinthislistismoreac-curatethanifthequeryisnotexpandedwiththeextractionfeatures.Wetakeadifferentapproachwherenumericpre-dictionsfromseparaterankingandextractionmod-ulesarecombinedtojointlyperformbothtasks(Sec-tion3).Yaoetal.buildonanexistingrankerthatsupportsqueryexpansionandtoken-levelcharacter-izationofcandidateanswersentences.Weassumenosuchsystemfeatures,facilitatingcouplingofarbi-trarymodelsincludingnewexperimentalones.Forextraction,Yaoetal.simplyrelyonbetterupstreamranking,whereasourjointmodelprovidesaprecisemathematicalformulationofanswerchunkqualityasafunctionofbothchunkandsentencerelevancetothequestion.Weobservealargeincreaseinend-to-endextractionaccuracyovertheYaoetal.modelinourexperiments.3ApproachWeﬁrsttrainseparateprobabilisticmodelsforan-swersentencerankingandanswerextraction,foreachofwhichwetakeanapproachsimilartothatofexistingmodels.Probabilitieslearnedbythetwotask-speciﬁcmodelsarethencombinedtoconstructourjointmodel.Thissectiondiscussesthedetailsofthistwo-stepprocess.3.1AnswerSentenceRankingLetthefollowinglogisticfunctionrepresenttheprob-abilitythatacandidateanswersentenceS(i)containsananswertoaquestionQ:PAG(S(i)|q)=11+e−θTrfr(q,S(i))(1)wherefr(q,S(i))isasetoffeatureseachofwhichisauniquemeasureofsemanticsimilaritybetweenQandS(i),andθristheweightvectorlearneddur-ingmodeltraining.WedescribeourfeaturesetforrankinginSection4.GivenP(S(i)|q)valuesfori∈{1,...,norte},rank-ingisstraightforward:rankQ(S(i))PAG(S(j)|q).Notethatasmallernu-mericvaluerepresentsahigherrank.3.2AnswerExtractionWefollowtheframeworkinAlgorithm1foranswerextraction.Belowwedescribeourimplementationofthegenericsteps:1.Step2:Weadoptthestrategyof(SeverynandMoschitti,2013)ofextractingonlytheNPchunks,forwhichweusearegexpchunker.2.Step4:Thequalityφ(C)ofacandidatechunkcinS(i)isgivenbythefollowinglogisticfunc-tion:Fi(C)=P(C|q,S(i))=11+e−θTefe(q,S(i),C)(2)wherefe(q,S(i),C)isthefeaturesetforchunkcrelativetoQ,andθeistheweightvectorlearnedduringmodeltraining.OurfeaturesetforextractionisdescribedinSection5.3.Step6:Givenanexistingsetof(possiblyempty)chunkgroups{GRAMO(1)C,…,GRAMO(METRO)C},anewchunkcisaddedtogroupG(i)C,si(1)allcontentwordsincareinatleastonememberofG(i)C,o(2)thereexistsamemberofG(i)Callofwhosecontentwordsareinc.Ifnosuchgroupisfound,anewgroupG(M+1)Ciscreatedwithcasitsonlymember.4.Step10:WeextractthelongestchunkinG(∗)CasthebestanswerC.Additionally,weretainonlythetoptofalltheanswercandidatesextractedinstep5topreventprop-agationofnoisychunkstolatersteps.ThevalueoftissetusingtheWangetal.(2007)DEVset.3.3JointRankingandExtractionTheprimarygoalofthejointmodelistofacilitatetheapplicationofbothchunk-levelandsentence-levelfeaturestorankingaswellasextraction.Tothatend,itﬁrstcomputesthejointprobabilitythat(1)S(i)containsananswertoQ,y(2)c∈C(i)isacorrectanswerchunk:PAG(S(i),C|q)=P(S(i)|q)×P(C|q,S(i))(3)wherethetwotermsontherighthandsidearegivenbyEquations(1)y(2),respectively.Bothranking

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

a
C
_
a
_
0
0
0
8
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

117

andextractionarethendrivenbytask-appropriateapplicationofthiscommonquantity.GivenEquation(3),theconditionforrankingisredeﬁnedasfollows:rankQ(S(i))maxc∈C(j)PAG(S(j),C|q).ThisnewconditionrewardsanS(i)thatnotonlyishighlysemanticallysimilartoQ,butalsocontainsachunkcwhichisalikelyanswertoQ.Forextraction,thejointprobabilityinEquation(3)replacestheconditionalinEquation(2)forstep4ofAlgorithm1:Fi(C)=P(S(i),C|q).De nuevo,thisnewdeﬁnitionofφ(C)rewardsachunkcthatis(1)type-compatiblewithQ,y(2)well-supportedbythecontentofthecontainingsentenceS(i).Ecuación(3)assignsequalweighttotherankingandtheextractionmodel.Tolearntheseweightsfromdata,weimplementavariationofthejointmodelthatemploysasecond-levelregressor:PAG(S(i),C|q)=11+e−θT2f2(q,S(i),C)(4)wherethefeaturevectorf2consistsofthetwoproba-bilitiesinEquations(1)y(2),andθ2istheweightvector.WhileP(S(i),C|q)iscomputedusingadif-ferentformulainthismodel,themethodsforrankingandextractionbasedonitremainsthesameasabove.Fromhereon,wewillrefertothemodelsinSec-tions3.1and3.2asourstandalonerankingandex-tractionmodels,respectivamente,andthemodelsinthissectionasthejointprobabilisticmodel(Ecuación(3))andthestacked(regression)modelo(Ecuación(4)).3.4LearningThestandalonerankingmodelistrainedusingthe0/1labelsassignedto(q,S(i))pairsintheWangetal.(2007)dataset.Forstandaloneextraction,weusefortrainingthegoldchunkannotationsC(i)gassoci-atedwith(q,S(i))pares:acandidateNPchunkinS(i)isconsideredapositiveexamplefor(q,S(i))iffitcontainsC(i)gandS(i)isanactualanswersentence.Forbothrankingandextraction,thecorrespondingweightvectorθislearnedbyminimizingthefollow-ingL2-regularizedlossfunction:j(i)=−1TTXi=1(cid:20)y(i)registro(PAG(i))+(1−y(i))registro(1−P(i))(cid:21)+λkθk2whereTisthenumberoftrainingexamples,y(i)isthegoldlabelforexampleiandP(i)isthemodel-predictedprobabilityofexampleibeingapositiveexample(givenbyEquations(1)y(2)).Learningofθ2forthestackedmodelworksinasimilarfashion,wherelevel1predictionsfortrainingQApairs(accordingtoEquations(1)y(2))serveasfeaturevectors.4AnswerSentenceRankingFeaturesInsteadofreinventingsimilarityfeaturesforourQAranker,wederiveourfeaturesetfromthewinningsystem(Sultanetal.,2015)attheSemEval2015SemanticTextualSimilarity(STS)tarea(Agirreetal.,2015).STSisanannuallyheldSemEvalcompetition,wheresystemsoutputreal-valuedsimilarityscoresforinputsentencepairs.Hundredsofsystemshavebeenevaluatedoverthepastfewyears(Agirreetal.,2012;Agirreetal.,2013;Agirreetal.,2014;Agirreetal.,2015);ourchosensystemwasshowntooutperformallothersystemsfromallyearsofSemEvalSTS(Sultanetal.,2015).Inordertocomputethedegreeofsemanticsimi-laritybetweenaquestionQandacandidateanswersentenceS(i),wedrawfeaturesfromtwosources:(1)lexicalalignmentbetweenQandS(i),y(2)vectorrepresentationsofQandS(i),derivedfromtheirwordembeddings.WhiletheoriginalSTSsys-tememploysridgeregression,weusethesefeatureswithinalogisticregressionmodelforQAranking.4.1AlignmentFeaturesWealignrelatedwordsinQandS(i)usingamono-lingualaligneroriginallyproposedbySultanetal.(2014).Herewegiveabriefdescriptionofourimplementation,whichemploysarguablymoreprin-cipledmethodstosolveasetofsubproblems.Seetheoriginalarticleforfurtherdetails.ThealignercomputesforeachwordpairacrossQandS(i)asemanticsimilarityscoresimW∈[0,1]usingPPDB—alargedatabaseoflexicalparaphrasesdevelopedusingbilingualpivoting(Ganitkevitchetal.,2013).Específicamente,itallowsthreedifferentlevelsofsimilarity:1ifthetwowordsortheirlemmasareidentical,avalueppdbSim∈(0,1)ifthewordpairispresentinPPDB(theXXXLdatabase)2,and02http://www.cis.upenn.edu/˜ccb/ppdb/

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

a
C
_
a
_
0
0
0
8
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

118

otherwise.ItalsocomputesthedegreeofsimilaritysimCbetweenthetwowords’contextsintheirrespectivesentences.Thissimilarityiscomputedasthesumofwordsimilaritiesintwodifferenttypesofcontexts:(1)adependencyneighborhoodofsize2(i.e.parents,grandparents,childrenandgrandchildren),y(2)asurface-formneighborhoodofsize3(i.e.3wordstotheleftand3wordstotheright).Stopwordsareskippedduringneighborselection.UnliketheSultanetal.(2014)aligner,whichallowsasingleneighborwordtobematchedtomultiplesimilarwordsintheothersentence,wematchneighborsusingamax-weightedbipartitematchingalgorithm,wherewordsimilaritiesserveasedgeweights.EverywordpairacrossQandS(i)receivesaﬁnalweightgivenbyw∗simW+(1−w)∗simC,wherew∈[0,1].WhileSultanetal.useagreedybest-ﬁrstalgorithmtoalignwordsbasedontheseweights,weusethemasedgeweightsinamax-weightedbipartitematchingofwordpairs(detailsfollow).Weadoptthestrategyoftheoriginalalignerofstartingwithhigh-precisionalignmentsandincreas-ingtherecallinlatersteps.Tothisend,wealigninthefollowingorder:(1)identicalwordsequenceswithatleastonecontentword,(2)namedentities,(3)contentwords,y(4)stopwords.Followingtheoriginalaligner,noadditionalcontextmatchingisperformedinstep1sinceasequenceitselfpro-videscontextualevidenceforitstokens.Foreachofsteps2–4,words/entitiesofthecorrespondingtypearealignedusingmax-weightedbipartitematchingasdescribedabove(multiwordnamedentitiesareconsideredsingleunitsinstep2);otherwordtypesandalreadyalignedwordsarediscarded.ThevaluesofwandppdbSimarederivedusingagridsearchonanalignmentdataset(Brockett,2007).GivenalignedwordsintheQApair,ourﬁrstfeaturecomputestheproportionofalignedcontentwordsinQandS(i),combined:simA(q,S(i))=nac(q)+nac(S(i))nc(q)+nc(S(i))wherenac(·)andnc(·)representthenumberofalignedcontentwordsandthetotalnumberofcontentwordsinasentence,respectively.S(i)canbearbitrarilylongandstillcontainanan-swertoQ.Intheabovesimilaritymeasure,longeranswersentencesarepenalizedduetoalargernum-berofunalignedwords.Tocounterthisphenomenon,weaddameasureofcoverageofQbyS(i)totheoriginalfeaturesetofSultanetal.(2015):covA(q,S(i))=nac(q)nc(q)4.2ASemanticVectorFeatureNeuralwordembeddings(Mikolovetal.,2013;Ba-ronietal.,2014;Pennington,2014)havebeenhighlysuccessfulasdistributionalwordrepresentationsintherecentpast.Weutilizethe400-dimensionalwordembeddingsdevelopedbyBaronietal.(2014)3toconstructsentence-levelembeddingsforQandS(i),whichwethencomparetocomputeasimilarityscore.ToconstructthevectorrepresentationVSofagivensentenceS,weﬁrstextractthecontentwordlemmasCS={C(1)S,…,C(METRO)S}inS.Thevectorsrepresentingtheselemmasarethenaddedtogeneratethesentencevector:VS=MXi=1VC(i)SFinally,asimilaritymeasureforQandS(i)isderivedbytakingthecosinesimilaritybetweentheirvectorrepresentations:simE(q,S(i))=VQ·VS(i)|VQ||VS(i)|Thissimplebag-of-wordsmodelwasfoundtoaug-mentthealignment-basedfeaturewellintheevalua-tionsreportedbySultanetal.(2015).simA,covAandsimEconstituteourﬁnalfeatureset.AsweshowinSection6,thissmallfeaturesetoutperformsthecurrentstateoftheartinanswersentenceranking.5AnswerExtractionFeaturesAsmentionedinSection3.2,weconsideronlyNPchunksasanswercandidatesforextraction.Ourchunkfeaturescanbecategorizedintotwobroadgroups,whichwedescribeinthissection.Forthefollowingdiscussion,dejar(q,S(i),C)beourquestion,answersentence,answerchunktriple.3http://clic.cimec.unitn.it/composes/semantic-vectors.html

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

a
C
_
a
_
0
0
0
8
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

119

5.1Question-IndependentFeaturesThesefeaturesrepresentpropertiesofcindependentofthenatureofQ.Forexample,ourﬁrsttwofeaturesﬁreifallcontentwordsincarepresentinQoraligntowordsinQ.Suchchunksrarelycontainananswer,regardlessofthetypeofQ.Yaoetal.(2013a)reportanobservationthatan-swerchunksoftenappearclosetoalignedcontentwordsofspeciﬁctypesinS(i).Tomodelthisphe-nomenon,weadopttheirfeaturesspecifyingthedis-tanceofcfromthenearestalignedcontentwordwainS(i)andthePOS/DEP/NERtagsofwa.Inaddi-tion,toencodethetotalamountoflocalevidencepresentforc,weemploytheproportionsofalignedcontentwordsinitsdependency(size=2)andsur-face(size=3)contextsinS(i).5.2FeaturesContainingtheQuestionTypeThesefeaturesareoftheform“question-type|x”,wherexcanbeanelementary(i.e.unit)orcompositefeature.Therationaleisthatcertainfeaturesarein-formativeprimarilyinthecontextofcertainquestiontypes(e.g.,alikelyanswertoawhenquestionisachunkcontainingtheNERtagDATE).HeadwordFeatures.WeextracttheheadwordofcanduseitsPOS/DEP/NERtagsasfeatures(ap-pendedtothequestiontype).AheadwordinthesubjectpositionofS(i)orwithPERSONasitsNERtag,forexample,isalikelyanswertoawhoquestion.QuestionFocus.Thequestionfocuswordrepre-sentstheentityaboutwhichthequestionisbeingasked.Forexample,in“Whatisthelargestcoun-tryintheworld?",thefocuswordis“country”.Forquestiontypeslikewhatandwhich,propertiesofthequestionfocuslargelydeterminethenatureofthean-swer.Intheaboveexample,thefocuswordindicatesthatGPEisalikelyNERtagfortheanswer.Weextractthequestionfocususingarule-basedsystemoriginallydesignedforadifferentapplica-tion,undertheassumptionthataquestioncouldspanmultiplesentences.Therule-basedsystemislooselyinspiredbytheworkofLallyetal.(2012),fromwhichitdiffersradicallybecausethequestionsintheJeopardy!gameareexpressedasanswers.Thefocusextractorﬁrstdeterminesthequestionwordorwords,whichisthenusedinconjunctionwiththeparsetreetodecidewhetherthequestionworditselforsomeotherwordinthesentenceistheactualfocus.WepairtheheadwordPOS/DEP/NERtagswiththefocuswordanditsPOS/NERtags,andaddeachsuchpair(appendedtothequestiontype)toourfeatureset.Thereareninefeatureshere;examplesincludequestion-type|question-focus-word|headword-pos-tagandquestion-type|question-focus-ner-tag|headword-ner-tag.Wealsoemploythetrue/falselabelsofthefollow-ingpropositionsasfeatures(inconjunctionwiththequestiontype):(1)thequestionfocuswordisinc,(2)thequestionfocusPOStagisinthePOStagsofc,y(3)thequestionfocusNERtagisoftheformxorxDESC,andxisintheNERtagsofc,forsomex(e.g.,GPE).ChunkTags.Inmanycases,itisnotthehead-wordofcwhichistheanswer;forexample,inQ:“HowmanystatesarethereintheUS?”andc:“50states”,theheadwordofcis“states”.Toextendourunitofattentionfromtheheadwordtotheen-tirechunk,weﬁrstconstructvocabulariesofPOSandNERtags,VposandVner,fromtrainingdata.ForeachpossibletaginVpos,wethenusethepres-ence/absenceofthattaginthePOStagsequenceforcasafeature(inconjunctionwiththequestiontype).WerepeattheprocessforVner.Fortheabovec,por ejemplo,aninformativefeaturewhichislikelytoﬁreis:“question-type=how-many|theNERtagsofcincludeCARDINAL”.PartialAlignment.Forsomequestiontypes,partofacorrectanswerchunkisoftenalignedtoaques-tionword(e.g.,Q:“Howmanyplayersareontheﬁeldduringasoccergame?",C:“22players”).Toin-formourmodelofsuchoccurrences,weemploytwofeatures—true/falselabelsofthefollowingproposi-tions:(1)cispartiallyaligned,(2)cisnotalignedatall(eachinconjunctionwiththequestiontype).6Experiments6.1DataTheWangetal.(2007)corpusiscreatedfromTextREtrievalConference(TREC)8–13QAdata.Itcon-sistsofasetoffactoidquestions,andforeachques-tion,asetofcandidateanswersentences.Eachan-swercandidateisautomaticallydrawnfromalargerdocumentbasedontwoselectioncriteria:(1)anon-zerocontentwordoverlapwiththequestion,o(2)

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

a
C
_
a
_
0
0
0
8
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

120

Dataset#Questions#QAPairs%PositiveTRAIN-ALL1,22953,41712.0TRAIN944,7187.4DEV821,14819.3TEST1001,51718.7Table2:SummaryoftheWangetal.(2007)corpus.amatchwiththegoldregexpanswerpatternforthequestion(trainingonly).TRAINpairsaredrawnfromTREC8–12;DEVandTESTpairsaredrawnfromTREC13.DetailsoftheTRAIN/DEV/TESTsplitaregiveninTable2.TRAIN-ALLisalargesetofautomaticallyjudged(thusnoisy)QApairs:asentenceisconsideredapositiveexampleifitmatchesthegoldanswerpatternforthecorrespondingquestion.TRAINisamuchsmallersubsetofTRAIN-ALL,containingpairsthataremanuallycorrectedforerrors.ManualjudgmentisproducedforDEVandTESTpairs,too.Foranswerextraction,Yaoetal.(2013a)addtoeachQApairthecorrectanswerchunk(s).ThegoldTRECpatternsareusedtoﬁrstidentifyrele-vantchunksineachanswersentence.TRAIN,DEVandTESTarethenmanuallycorrectedforerrors.TheWangetal.(2007)datasetalsocomeswithPOS/DEP/NERtagsforeachsentence.TheyusetheMXPOSTtagger(Ratnaparkhi,1996)forPOStagging,theMSTParser(McDonaldetal.,2005)togeneratetypeddependencytrees,andtheBBNIden-tiﬁnder(Bikeletal.,1999)forNERtagging.Al-thoughwehaveaccesstoastate-of-the-artinforma-tionpipelinethatproducesbettertags,thispaperaimstostudytheeffectoftheproposedmodelsandofourfeaturesonsystemperformance,ratherthanonad-ditionalvariables;por lo tanto,tosupportcomparisonwithpriorwork,werelyonthetagsprovidedwiththedatasetforallourexperiments.6.2AnswerSentenceRankingWeadoptthestandardevaluationprocedureandmet-ricsforQArankersreportedintheliterature.6.2.1EvaluationMetricsOurmetricsforrankingareMeanAveragePre-cision(MAP)andMeanReciprocalRank(MRR).Herewedeﬁnebothintermsofsimplermetrics.PrecisionatK.GivenaquestionQandasetofcandidateanswersentences{S(1),…,S(norte)},lettheoutputofarankerbe[R(1),…,R(norte)],sothateachR(i)∈{S(1),…,S(norte)}andthepredictedrankofR(i)ishigherthanthepredictedrankofR(j)when-everi95%)areretainedafterthisexclusion.WeusethelogisticregressionimplementationofScikit-learn(Pedregosaetal.,2011)andusetheWangetal.(2007)DEVsettosetC,theregulariza-tionstrengthparameter.Thestandardtrecevalscriptisusedtogenerateallresults.6.2.3ResultsTable3showsperformancesofourrankingmodelsandrecentbaselinesystemsonTEST.OurQAsimi-larityfeatures(i.e.thestandaloneranker)outperformallbaselineswithbothTRAINandTRAIN-ALL,al-thoughtheadditionalnoisyexamplesinthelatterarenotfoundtoimproveresults.Moreimportantly,wegetimprovementsofsub-stantiallylargermagnitudesusingourjointmodels—morethan10MAPandMRRpointsoverthestate-of-the-artsystemofSeverynandMoschitti(2015)withTRAIN-ALLforthejointprobabilisticmodel.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

a
C
_
a
_
0
0
0
8
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

121

ModelMAP%MRR%TRAINShnarch(2013)68.6075.40Yihetal.(2013)70.9277.00Yuetal.(2014)70.5878.00Severyn&Moschitti(2015)73.2979.62OurStandaloneModel76.0583.99OurJointProbabilisticModel81.5989.09OurStackedModel80.7786.85TRAIN-ALLYuetal.(2014)71.1378.46Severyn&Moschitti(2015)74.5980.78OurStandaloneModel75.6883.09OurJointProbabilisticModel84.9591.95OurStackedModel82.5690.69Table3:Answersentencerankingresults.Unlikethestandalonemodel,thejointmodelsalsobeneﬁtfromtheadditionalnoisyexamplesinTRAIN-ALL.Theseresultssupportthecentralargumentofthispaperthatjointmodelingisabetterapproachtoanswersentenceranking.6.3AnswerExtractionWefollowtheprocedurereportedinpriorwork(Yaoetal.,2013a;SeverynandMoschitti,2013)toevalu-atetheanswerchunksextractedbythesystem.6.3.1EvaluationMetricsPrecision.Givenasetofquestions,theprecisionofananswerextractionsystemistheproportionofitsextractedanswersthatarecorrect(i.e.matchthecorrespondinggoldregexppattern).Recall.Recallistheproportionofquestionsforwhichthesystemextractedacorrectanswer.F1Score.TheF1scoreistheharmonicmeanofprecisionandrecall.Itcapturesthesystem’saccuracyandcoverageinasinglemetric.6.3.2SetupFollowingpriorwork,nosotros(1)retainthe89ques-tionsintheWangetal.(2007)TESTsetthathaveatleastonecorrectanswer,y(2)trainonlywithchunksincorrectanswersentencestoavoidextremebiastowardsfalselabels(boththestandaloneextrac-tionmodelandstage2ofthestackedmodel).AsinModelP%R%F1%TRAINYaoetal.(2013a)55.253.954.5Severyn&Moschitti(2013)66.266.266.2OurStandaloneModel62.962.962.9OurJointProbabilisticModel69.769.769.7OurStackedModel62.962.962.9TRAIN-ALLYaoetal.(2013a)63.662.963.3Severyn&Moschitti(2013)70.870.870.8OurStandaloneModel70.870.870.8OurJointProbabilisticModel76.476.476.4OurStackedModel73.073.073.0Table4:AnswerextractionresultsontheWangetal.(2007)testset.ranking,weuseScikit-learnforlogisticregressionandsettheregularizationparameterCusingDEV.6.3.3ResultsTable4showsperformancesofourextractionmod-elsontheWangetal.TESTset.Thejointproba-bilisticmodeldemonstratestopperformanceforbothTRAINandTRAIN-ALL.WithTRAIN-ALL,itcor-rectlyanswers68ofthe89testquestions(5morethanthepreviousbestmodelofSeverynandMos-chitti(2013)).Thestackedmodelalsoperformswellwiththelargertrainingset.Again,theseresultssup-portthecentralclaimofthepaperthatanswerextrac-tioncanbemadebetterthroughjointmodeling.Table5showsperformancesofourstandaloneandjointprobabilisticmodels(trainedonTRAIN-ALL)ondifferentTESTquestiontypes.Thejointmodelisthebetterofthetwoacrosstypes,achievinggoodQuestionTypeCountSTJPwhat3751.456.8when19100.0100.0where11100.090.9who/whom1060.070.0why10.00.0howmany977.8100.0howlong250.0100.0Table5:F1%oftheSTandaloneandtheJointProbabilisticextractionmodelacrossquestiontypes.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

a
C
_
a
_
0
0
0
8
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

122

QuestionCandidateAnswerSentenceSTJPHowmanyyearswasJackWelchwithGE?“SixSigmahasgalvanizedourcompanywithanintensitythelikesofwhichIhaveneverseeninmy40yearsatGE,”saidJohnWelch,chairmanofGeneralElectric..517.113SoferventaproselytizerisWelchthatGEhasspentthreeyearsandmorethan$1billiontoconvertallofitsdivisionstotheSixSigmafaith..714.090WhatkindofshipistheLibertyBell7?Newportplanstoretrievetherecoveryvesselﬁrst,thengoafterLibertyBell7,theonlyU.S.mannedspacecraftlostafterasuccessfulmission..838.278“Itwillbeabigrelief”oncethecapsuleisaboardship,CurtNewportsaidbeforesettingsailThursday..388.003Table6:ScorescomputedbytheSTandaloneandtheJointProbabilisticmodelforcandidatechunks(boldfaced)infour(Wangetal.,2007)testsentences.Jointmodelscoresfornon-answerchunks(rows2and4)aremuchlower.resultsonallquestiontypesexceptwhat.Aparticularlychallengingsubtypeofwhatques-tionsarewhatbequestions,answerstowhichoftengobeyondNPchunkboundaries.Ahuman-extractedanswertothequestion“WhatisMuslimBrother-hood’sgoal?”intheWangetal.corpus(2007),forexample,is“advocatesturningEgyptintoastrictMuslimstatebypoliticalmeans.”Whatingeneralisneverthelessthemostdifﬁcultquestiontype,sinceunlikequestionslikewhoorwhen,answersdonothavestrictcategories(e.g.,aﬁxedsetofNERtags).6.3.4QualitativeAnalysisWecloselyexamineQApairsforwhichthejointprobabilisticmodelextractsacorrectanswerchunkbutthestandalonemodeldoesnot.Table6showstwosuchquestions,withtwocandidateanswersentencesforeach.Candidateanswerchunksareboldfaced.Fortheﬁrstquestion,onlythesentenceinrow1containsananswer.Thestandalonemodelassignsahigherscoretothenon-answerchunkinrow2,buttheuseofsentence-levelfeaturesenablesthejointmodeltoidentifythemorerelevantchunkinrow1.Notethatthejointmodelscore,beingaproductoftwoprobabilities,isalwayslowerthanthestandalonemodelscore.However,onlytherelativescoremattersinthiscase,asthechunkwiththehighestoverallscoreiseventuallyselectedforextraction.Forthesecondquestion,bothmodelscomputealowerscoreforthenon-answerchunk“CurtNewport”thantheanswerchunk“mannedspacecraft”.How-ever,theincorrectchunkappearsinseveralcandidateanswersentences(notshownhere),resultinginaModelP%R%F1%Yaoetal.(2013C)35.417.223.1OurJointProbabilisticModel83.883.883.8Table7:PerformancesoftwojointextractionmodelsontheYaoetal.(2013C)testset.highoverallscoreforthestandalonemodel(Algo-rithm1:steps7and8).Thejointmodelassignsamuchlowerscoretoeachinstanceofthischunkduetoweaksentence-levelevidence,eventuallyresultingintheextractionofthecorrectchunk.6.3.5ASecondExtractionDatasetYaoetal.(2013C)reportanextractiondatasetcon-taining99testquestions,derivedfromtheMIT109testcollection(LinandKatz,2006)ofTRECpairs.Eachquestioninthisdatasethas10candidatean-swersentences.Wecomparetheperformanceofourjointprobabilisticmodelwiththatoftheirextractionmodel,whichextractsanswersfromtopcandidatesentencesidentiﬁedbytheircoupledranker(Sec-tion2.3).4Modelsaretrainedontheirtrainingsetof2,205questionsand22,043candidateQApairs.AsshowninTable7,ourmodeloutperformstheYaoetal.modelbyasurprisinglylargemargin,correctlyanswering83ofthe99testquestions.Interestingly,ourstandalonemodelextractssixmorecorrectanswersinthisdatasetthanthejoint4Wecomparewithonlytheirextractionmodel,asthelargerrankingdatasetisnotavailableanymore.Precisionandrecallarereportedathttp://cs.jhu.edu/˜xuchen/packages/jacana-ir-acl2013-data-results.tar.bz2.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

a
C
_
a
_
0
0
0
8
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

123

CandidateAnswerSentenceSTJPAnotherperkisgettingtoworkwithhisson,BarryVanDyke,whohasaregularroleasDetectiveSteveSloanon“Diagnosis”..861.338ThisisonlythethirdtimeinschoolhistorytheRaidershavebegunaseason6-0andtheﬁrstsince1976,whenSteveSloan,inhissecondseasonascoach,ledthemtoan8-0startand10-2overallrecord..494.010HealsorepresentedseveralAlabamacoaches,includingRayPerkins,BillCurry,SteveSloanandWimpSanderson..334.007BartStarr,JoeNamath,KenStabler,SteveSloan,ScottHunterandWalterLewisarebutafewofthelegendsonthewalloftheCrimsonTidequarterbackscoach..334.009Table8:ScorescomputedbytheSTandaloneandtheJointProbabilisticmodelforNPchunks(boldfaced)infourYaoetal.(2013C)testsentencesforthequestion:Whoisthedetectiveon‘DiagnosisMurder’?Thestandalonemodelassignshighprobabilitiestonon-answerchunksinthelastthreesentences,subsequentlycorrectedbythejointmodel.model.Acloseexaminationrevealsthatinallsixcases,thisiscausedbythepresenceofcorrectan-swerchunksinnon-answersentences.Table8showsanexample,wherethecorrectanswerchunk“SteveSloan”appearsinallfourcandidatesentences,ofwhichonlytheﬁrstisactuallyrelevanttotheques-tion.Thestandalonemodelassignshighscorestoallfourinstancesandasaresultobservesahighoverallscoreforthechunk.Thejointmodel,por otro lado,recognizesthefalsepositives,andcon-sequentlyobservesasmalleroverallscoreforthechunk.However,thisdesiredbehavioreventuallyresultsinawrongextraction.Theseresultshavekeyimplicationsfortheevaluationofanswerextractionsystems:metricsthatassessperformanceonindivid-ualQApairscanenableﬁner-grainedevaluationthanwhatend-to-endextractionmetricsoffer.7DiscussionOurtwo-stepapproachtojointmodeling,consist-ingofconstructingseparatemodelsforrankingandextractionﬁrstandthencouplingtheirpredictions,offersatleasttwoadvantages.First,predictionsfromanygivenpairofrankingandextractionsystemscanbecombined,sincesuchsystemsmustcomputeascoreforaQApairorananswerchunkinordertodifferentiateamongcandidates.CouplingoftherankingandextractionsystemsofYaoetal.(2013a)andSeverynandMoschitti(2013),forexample,isstraightforwardwithinourframework.Second,thisapproachsupportstheuseoftask-appropriatetrainingdataforrankingandextraction,whichcanprovidekeyadvantage.Forexample,whileanswersentencerankingsystemsusebothcorrectandincorrectcan-didateanswersentencesformodeltraining,existinganswerextractionsystemsdiscardthelatterinordertomaintaina(relatively)balancedclassdistribution(Yaoetal.,2013a;SeverynandMoschitti,2013).Throughtheseparationoftherankingandextrac-tionmodelsduringtraining,ourapproachnaturallysupportssuchtask-speciﬁcsamplingoftrainingdata.ApotentiallylimitingfactorinourextractionmodelistheassumptionthatanswersarealwaysexpressedneatlyinNPchunks.Whilemodelsthatmakenosuchassumptionexist(e.g.,theCRFmodelofYaoetal.(2013a)),extractionoflonganswers(suchastheonediscussedinSection6.3.3)isstilldifﬁcultinpracticeduetotheirunconstrainednature.8ConclusionsandFutureWorkWepresentajointmodelfortheimportantQAtasksofanswersentencerankingandanswerextraction.Byexploitingtheinterconnectednatureofthetwotasks,ourmodeldemonstratessubstantialperfor-manceimprovementsoverpreviousbestsystemsforboth.Additionally,ourrankingmodelappliesrecentadvancesinthecomputationofshorttextsimilaritytoQA,providingstrongersimilarityfeatures.Anobviousdirectionforfutureworkistheinclu-sionofnewfeaturesforeachtask.Answersentenceranking,forexample,canbeneﬁtfromphrasalalign-mentandlong-distancecontextrepresentation.An-swerextractionforwhatquestionscanbemadebetterusingalexicalanswertypefeature,orworldknowl-

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

a
C
_
a
_
0
0
0
8
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

124

borde(suchas“blueisacolor”)derivedfromsemanticnetworkslikeWordNet.Ourmodelalsofacilitatesstraightforwardintegrationoffeatures/predictionsfromotherexistingsystemsforbothtasks,forex-ample,theconvolutionalneuralsentencemodelofSeverynandMoschitti(2015)forranking.Finally,moresophisticatedtechniquesarerequiredforextrac-tionoftheﬁnalanswerchunkbasedonindividualchunkscoresacrossQApairs.AcknowledgmentsWethankthereviewersfortheirvaluablecommentsandsuggestions.WealsothankXuchenYaoandAliakseiSeverynforclariﬁcationoftheirwork.ReferencesEnekoAgirre,DanielCer,MonaDiab,andAitorGonzalez-Agirre.2012.SemEval-2012task6:APilotonSemanticTextualSimilarity.InProceedingsoftheSixthInternationalWorkshoponSemanticEvaluation,pages385–393,Montreal,Canada.EnekoAgirre,DanielCer,MonaDiab,AitorGonzalez-Agirre,andWeiweiGuo.2013.*SEM2013SharedTask:SemanticTextualSimilarity.InProceedingsoftheSecondJointConferenceonLexicalandComputa-tionalSemantics,pages32–43,Atlanta,Georgia,USA.EnekoAgirre,CarmenBanea,ClaireCardie,DanielCer,MonaDiab,AitorGonzalez-Agirre,WeiweiGuo,RadaMihalcea,GermanRigau,andJanyceWiebe.2014.SemEval-2014Task10:MultilingualSemanticTex-tualSimilarity.InProceedingsofthe8thInterna-tionalWorkshoponSemanticEvaluation,pages81–91,Dublin,Ireland.EnekoAgirre,CarmenBanea,ClaireCardie,DanielCer,MonaDiab,AitorGonzalez-Agirre,WeiweiGuo,I˜nigoLopez-Gazpio,MontseMaritxalar,RadaMihalcea,Ger-manRigau,LarraitzUria,andJanyceWiebe.2015.SemEval-2015Task2:SemanticTextualSimilarity,Inglés,SpanishandPilotonInterpretability.InPro-ceedingsofthe9thInternationalWorkshoponSemanticEvaluation,pages252–263,Denver,Colorado,USA.MarcoBaroni,GeorgianaDinu,andGerm´anKruszewski.2014.Don’tCount,Predict!ASystematicComparisonofContext-Countingvs.Context-PredictingSemanticVectors.InProceedingsofthe52ndAnnualMeetingoftheAssociationforComputationalLinguistics,pages238–247,Baltimore,Maryland,USA.DanielM.Bikel,RichardSchwartz,andRalphM.Weischedel.1999.AnAlgorithmthatLearnswhat’sinaName.MachineLearning,34(1-3):211–231.ChrisBrockett.2007.AligningtheRTE2006Cor-pus.TechnicalReportMSR-TR-2007-77,MicrosoftResearch.DavidFerrucci,EricBrown,JenniferChu-Carroll,JamesFan,DavidGondek,AdityaA.Kalyanpur,AdamLally,J.WilliamMurdock,EricNyberg,JohnPrager,NicoSchlaefer,andChrisWelty.2010.BuildingWatson:AnOverviewoftheDeepQAProject.AIMagazine,31(3),pages59–79.JuriGanitkevitch,BenjaminVanDurme,andChrisCallison-Burch.2013.PPDB:TheParaphraseDatabase.InProceedingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforCompu-tationalLinguistics,pages758–764,Atlanta,Georgia,USA.MichaelHeilmanandNoahA.Smith2010.TreeEditModelsforRecognizingTextualEntailments,Para-phrases,andAnswerstoQuestions.InProceedingsofthe2010ConferenceoftheNorthAmericanChap-teroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies,pages1011–1019,LosAngeles,California,USA.AdamLally,JohnM.Prager,MichaelC.McCord,Bran-imirK.Boguraev,SiddharthPatwardhan,JamesFan,PaulFodor,andJenniferChu-Carroll.2012.QuestionAnalysis:HowWatsonReadsaClue.IBMJournalofResearchandDevelopment,56(3.4):2:1–2:14.JimmyLinandBorisKatz.2006.BuildingaReusableTestCollectionforQuestionAnswering.JournaloftheAmericanSocietyforInformationScienceandTechnol-ogy,57(7):851–861.RyanMcDonald,KobyCrammer,andFernadoPereira.2005.OnlineLarge-MarginTrainingofDependencyParsers.InProceedingsofthe43stAnnualMeetingoftheAssociationforComputationalLinguistics,AnnArbor,Michigan,USA.TomasMikolov,KaiChen,GregCorrado,andJeffreyDean.2013.EfﬁcientEstimationofWordRepresenta-tionsinVectorSpace.InProceedingsoftheInterna-tionalConferenceonLearningRepresentationsWork-shop,Scottsdale,Arizona,USA.FabianPedregosa,Ga¨elVaroquaux,AlexandreGramfort,VincentMichel,BertrandThirion,OlivierGrisel,Math-ieuBlondel,PeterPrettenhofer,RonWeiss,VincentDubourg,JakeVanderplas,AlexandrePassos,DavidCournapeau,MatthieuBrucher,MatthieuPerrot,and´EdouardDuchesnay.2011.Scikit-learn:MachineLearninginPython.JournalofMachineLearningRe-search,vol.12,pages2825–2830.JeffreyPennington,RichardSocher,andChristopherD.Manning.2014.GloVe:GlobalVectorsforWordRepresentation.InProceedingsofthe2014ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages1532–1543,Doha,Qatar.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

a
C
_
a
_
0
0
0
8
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

125

AdwaitRatnaparkhi.1996.AMaximumEntropyModelforPart-of-SpeechTagging.InProceedingsofthe1996ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages133–142,Philadelphia,Pennsylva-nia,USA.AliakseiSeverynandAlessandroMoschitti.2013.Auto-maticFeatureEngineeringforAnswerSelectionandExtraction.InProceedingsofthe2013ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages458–467,Seattle,Washington,USA.AliakseiSeverynandAlessandroMoschitti.2015.Learn-ingtoRankShortTextPairswithConvolutionalDeepNeuralNetworks.InProceedingsofthe38thInterna-tionalACMSIGIRConferenceonResearchandDe-velopmentinInformationRetrieval,pages373–382,Santiago,Chile.EyalShnarch.2013.ProbabilisticModelsforLexicalInference.PhDThesis,BarIlanUniversity.MdArafatSultan,StevenBethard,andTamaraSumner.2014.BacktoBasicsforMonolingualAlignment:ExploitingWordSimilarityandContextualEvidence.TransactionsoftheAssociationforComputationalLin-guistics,2,pages219–230.MdArafatSultan,StevenBethard,andTamaraSum-ner.2015.DLS@CU:SentenceSimilarityfromWordAlignmentandSemanticVectorComposition.InPro-ceedingsofthe9thInternationalWorkshoponSemanticEvaluation,pages148–153,Denver,Colorado,USA.MengqiuWang,NoahA.Smith,andTerukoMitamura.2007.WhatistheJeopardyModel?AQuasi-SynchronousGrammarforQA.InProceedingsofthe2007JointConferenceonEmpiricalMethodsinNaturalLanguageProcessingandComputationalNat-uralLanguageLearning,pages22–32,Prague,CzechRepublic.MengqiuWang,andChristopherD.Manning.2010.Prob-abilisticTree-EditModelswithStructuredLatentVari-ablesforTextualEntailmentandQuestionAnswering.InProceedingsofthe23rdInternationalConferenceonComputationalLinguistics,pages1164–1172,Beijing,China.XuchenYao,BenjaminVanDurme,ChrisCallison-Burch,andPeterClark.2013a.AnswerExtractionasSe-quenceTaggingwithTreeEditDistance.InProceed-ingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguis-tics:HumanLanguageTechnologies,pages858–867,Atlanta,Georgia,USA.XuchenYao,BenjaminVanDurme,ChrisCallison-Burch,andPeterClark.2013b.Semi-MarkovPhrase-BasedMonolingualAlignment.InProceedingsofthe2013ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages590–600,Seattle,Washington,USA.XuchenYao,BenjaminVanDurme,andPeterClark.2013c.AutomaticCouplingofAnswerExtractionandInformationRetrieval.InProceedingsofthe51stAn-nualMeetingoftheAssociationforComputationalLin-guistics,pages159–165,Soﬁa,Bulgaria.Wen-tauYih,Ming-WeiChang,ChristopherMeek,andAndrzejPastusiak.2013.QuestionAnsweringusingEnhancedLexicalSemanticModels.InProceedingsofthe51stAnnualMeetingoftheAssociationforCompu-tationalLinguistics,pages1744–1753,Soﬁa,Bulgaria.LeiYu,KarlMoritzHermann,PhilBlunsom,andStephenPulman.2014.DeepLearningforAnswerSentenceSe-lection.InProceedingsoftheDeepLearningandRep-resentationLearningWorkshop,NIPS2014,Montr´eal,Canada.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/
t

a
C
_
a
_
0
0
0
8
7
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

126
Descargar PDF