Transactions of the Association for Computational Linguistics, Bd. 4, S. 113–125, 2016. Action Editor: Noah Smith.

Transactions of the Association for Computational Linguistics, Bd. 4, S. 113–125, 2016. Action Editor: Noah Smith.
Submission batch: 10/2015; Revision batch: 2/2016; Published 4/2016.

2016 Verein für Computerlinguistik. Distributed under a CC-BY 4.0 Lizenz.

C
(cid:13)

AJointModelforAnswerSentenceRankingandAnswerExtractionMdArafatSultan†VittorioCastelli‡RaduFlorian‡†InstituteofCognitiveScienceandDepartmentofComputerScience,UniversityofColorado,Felsblock,CO‡IBMT.J.WatsonResearchCenter,YorktownHeights,NYarafat.sultan@colorado.edu,vittorio@us.ibm.com,raduf@us.ibm.comAbstractAnswersentencerankingandanswerextrac-tionaretwokeychallengesinquestionanswer-ingthathavetraditionallybeentreatediniso-lation,i.e.,asindependenttasks.Inthisarti-cle,Wir(1)explainhowbothtasksarerelatedattheircorebyacommonquantity,Und(2)proposeasimpleandintuitivejointprobabilis-ticmodelthataddressesbothviajointcom-putationbuttask-speciﬁcapplicationofthatquantity.InourexperimentswithtwoTRECdatasets,ourjointmodelsubstantiallyoutper-formsstate-of-the-artsystemsinbothtasks.1IntroductionOneoftheoriginalgoalsofAIwastobuildmachinesthatcannaturallyinteractwithhumans.Overtime,thechallengesbecameapparentandlanguagepro-cessingemergedasoneofAI’smostpuzzlingareas.Nevertheless,majorbreakthroughshavestillbeenmadeinseveralimportanttasks;withIBM’sWat-son(Ferruccietal.,2010)signiﬁcantlyoutperform-inghumanchampionsinthequizcontestJeopardy!,questionanswering(QA)isdeﬁnitelyonesuchtask.QAcomesinvariousforms,eachsupportingspe-ciﬁckindsofuserrequirements.Considerascenariowhereasystemisgivenaquestionandasetofsen-tenceseachofwhichmayormaynotcontainananswertothatquestion.Thegoalofanswerextrac-tionistoextractapreciseanswerintheformofashortspanoftextinoneormoreofthosesentences.Inthisform,QAmeetsusers’immediateinformationneeds.Answersentenceranking,ontheotherhand,isthetaskofassigningaranktoeachsentencesothattheonesthataremorelikelytocontainananswerarerankedhigher.Inthisform,QAissimilartoinforma-tionretrievalandpresentsgreateropportunitiesforfurtherexplorationandlearning.Inthisarticle,weproposeanovelapproachtojointlysolvingthesetwowell-studiedyetopenQAproblems.Mostanswersentencerankingalgorithmsoperateundertheassumptionthatthedegreeofsyntacticand/orsemanticsimilaritybetweenquestionsandan-swersentencesisasufﬁcientlystrongpredictorofanswersentencerelevance(Wangetal.,2007;Yihetal.,2013;Yuetal.,2014;SeverynandMoschitti,2015).Ontheotherhand,answerextractionalgo-rithmsfrequentlyassesscandidateanswerphrasesbasedprimarilyontheirownpropertiesrelativetothequestion(e.g.,whetherthequestionisawhoquestionandthephrasereferstoaperson),makinginadequateornouseofsentence-levelevidence(Yaoetal.,2013a;SeverynandMoschitti,2013).Boththeseassumptions,Jedoch,aresimplistic,andfailtocapturethecorerequirementsofthetwotasks.Table1showsaquestion,andthreecandi-dateanswersentencesonlyoneofwhich(S(1))ac-tuallyanswersthequestion.Rankingmodelsthatrelysolelyontextsimilarityarehighlylikelytoin-correctlyassignsimilarrankstoS(1)andS(2).SuchmodelswouldfailtoutilizethekeypieceofevidenceagainstS(2)thatitdoesnotcontainanytemporalinformation,necessarytoanswerawhenquestion.Similarly,anextractionmodelthatreliesonlyonthefeaturesofacandidatephrasemightextractthetem-poralexpression“theyear1666”inS(3)asananswerdespiteaclearlackofsentence-levelevidence.Inviewoftheabove,weproposeajointmodel

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
0
8
7
1
5
6
7
3
6
2

/
T

A
C
_
A
_
0
0
0
8
7
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

114

QWhenwastheHaleBoppcometdiscov-ered?S(1)ThecometwasﬁrstspottedbyHaleandBopp,bothUSastronomers,onJuly22,1995.S(2)Hale-Bopp,alargecomet,wasobservedfortheﬁrsttimeinChina.S(3)Thelawofgravitywasdiscoveredintheyear1666bySirIsaacNewton.Table1:Aquestionandthreecandidateanswersentences.foranswersentencerankingandanswerextractionthatutilizesbothsentenceandphrase-levelevidencetosolveeachtask.Moreconcretely,Wir(1)designtask-speciﬁcprobabilisticmodelsforrankingandextraction,exploitingfeaturesofcandidateanswersentencesandtheirphrases,jeweils,Und(2)com-binethetwomodelsinasimple,intuitivesteptobuildajointprobabilisticmodelforbothtasks.Thistwo-stepapproachfacilitatesconstructionofnewjointmodelsfromanyexistingsolutionstothetwotasks.OnapubliclyavailableTRECdataset(Wangetal.,2007),ourjointmodeldemonstratesanimprovementinrankingbyover10MAPandMRRscoresoverthecurrentstateoftheart.Italsooutperformsstate-of-the-artextractionsystemsontwoTRECdatasets(Wangetal.,2007;Yaoetal.,2013c).2BackgroundInthissection,weprovideaformaldescriptionofthetwotasksandestablishterminologythatwefollowinlatersections.TheWangetal.(2007)datasethasbeenthebenchmarkformostrecentworkonthetwotasksaswellasourown.Therefore,wesituateourdescriptioninthespeciﬁccontextofthisdataset.Wealsodiscussrelatedpriorwork.2.1AnswerSentenceRankingGivenaquestionQandasetofcandidatean-swersentences{S(1),…,S(N)},thegoalinan-swersentencerankingistoassigneachS(ich)anintegerrankQ(S(ich))sothatforanypair(ich,J),rankQ(S(ich))label(1:S(ich)containsanan-swertoQ,0:itdoesnot).Asupervisedrankingmodelmustlearntoranktestanswersentencesfromsuchbinaryannotationsinthetrainingdata.Existingmodelsaccomplishthisbylearningtoassignarelevancescoretoeach(Q,S(ich))pair;thesescoresthencanbeusedtorankthesentences.QArankerspredominantlyoperateunderthehypothesisthatthisrelevancescoreisafunctionofthesyntac-ticand/orsemanticsimilaritiesbetweenQandS(ich).Wangetal.(2007),forexample,learntheprobabilityofgeneratingQfromS(ich)usingsyntactictransforma-tionsunderaquasi-synchronousgrammarformalism.ThetreeeditmodelsofHeilmanandSmith(2010)andYaoetal.(2013A)computeminimaltreeeditse-quencestoalignS(ich)toQ,anduselogisticregressiontomapfeaturesofeditsequencestoarelevancescore.WangandManning(2010)employstructuredpredic-tiontocomputeprobabilitiesfortreeeditsequences.Yaoetal.(2013B)alignrelatedphrasesinQandeachS(ich)usingasemi-MarkovCRFmodelandrankcandidatesbasedontheirdecodingscores.Yihetal.(2013)useanarrayoflexicalsemanticsimilarityresources,fromwhichtheyderivefeaturesforabi-naryclassiﬁer.ConvolutionalneuralnetworkmodelsproposedbyYuetal.(2014)andSeverynandMos-chitti(2015)computedistributionalsemanticvectorsofQandS(ich)toassesstheirsemanticsimilarity.Inacontrastingapproach,SeverynandMoschitti(2013)connectthequestionfocuswordinQwithpotentialanswerphrasesinS(ich)usingashallowsyn-tactictreerepresentation.Importantly,unlikemostrankers,theirmodelutilizeskeyinformationinin-dividualS(ich)phraseswhichencodesthedegreeoftype-compatibilitybetweenQandS(ich).ButitfailstorobustlyalignconceptsinQandS(ich)duetoasimplisticlemma-matchpolicy.Ourjointmodelfactorsinbothsemanticsimilar-ityandquestion-answertype-compatibilityfeaturesforranking.Moreover,oursemanticsimilarityfea-tures(describedinSection4)areinformedbyrecentthefullformofthefunction:rankQ(S(ich),{S(1),...,S(N)}). l D O w N O A D e D F R O M H T T P : / / D ich R e C T . M ich T . e d u / t a c l / l A R T ich C e - P D F / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 8 7 1 5 6 7 3 6 2 / / t l a c _ a _ 0 0 0 8 7 P D . F B j G u e S T T O N 0 7 S e P e M B e R 2 0 2 3 115 advancesintheareaofshorttextsimilarityidentiﬁ-cation(Agirreetal.,2014;Agirreetal.,2015).2.2AnswerExtractionGivenaquestionQandasetofcandidateanswersen-tences{S(1),...,S(N)},thegoalinanswerextractionistoextractfromthelatterashortchunkCoftext(awordorasequenceofcontiguouswords)whichisapreciseanswertoQ.InTable1,“July22,1995”and“1995”inS(1)aretwosuchanswers.Eachpositive(Q,S(ich))pairintheWangetal.(2007)datasetisannotatedbyYaoetal.(2013A)withagoldanswerchunkC(ich)ginS(ich).AssociatedwitheachQisalsoaregexppatternPthatspeci-ﬁesoneormoregoldanswerchunksforQ.Beingaregexppattern,Pcanaccommodatevariantsofagoldanswerchunkaswellasmultiplegoldchunks.Forinstance,thepattern“1995”fortheexampleinTable1matchesboth“July22,1995”and“1995”.AnextractionalgorithmextractsananswerchunkC,whichismatchedagainstPduringevaluation.ExtractionofCisamultistepprocess.Existingsolutionsadoptagenericframework,whichweout-lineinAlgorithm1.IneachS(ich),candidateanswerchunksC(ich)areﬁrstidentiﬁedandevaluatedaccord-ingtosomecriteria(steps1–4).ThebestchunkC(ich)∗inS(ich)isthenidentiﬁed(step5).Fromthese“locallybest”chunks,groupsofequivalentchunksareformed(step6),wheresomepredeﬁnedcriteriaforchunkequivalenceareused(e.g.,non-zerowordoverlap).Thequalityofeachgroupiscomputedasanaggre-gateoverthequalitiesofitsmemberchunks(steps7–8),andﬁnallyarepresentativechunkfromthebestgroupisextractedasC(steps9–10).Thereare,Jedoch,detailsthatneedtobeﬁlledinwithinthisgenericframework,speciﬁcallyinsteps2,4,6and10ofthealgorithm.Solutionsdifferinthesespeciﬁcs.Herewediscusstwostate-of-the-artsystems(Yaoetal.,2013a;SeverynandMoschitti,2013),whicharetheonlysystemsthathavebeenevaluatedontheWangetal.(2007)regexppatterns.Yaoetal.(2013A)useaconditionalrandomﬁeld(CRF)tosimultaneouslyidentifychunks(step2)andcomputetheirφvalues(step4).Theirchunkingfea-turesincludethePOS,DEPandNERtagsofwords.Additionalfeaturesareemployedforchunkqualityestimation,e.g.,thequestiontypeandfocus,prop-ertiesoftheeditoperationassociatedwiththewordAlgorithm1:AnswerExtractionFrameworkInput:1.Q:aquestionsentence.2.{S(1),...,S(N)}:candidateanswersentences.Output:C:ashortandpreciseanswertoQ.1fori∈{1,...,N}do2C(ich)←candidatechunksinS(ich)3forc∈C(ich)do4φ(C)←qualityofcasananswertoQ5C(ich)∗←argmaxc∈C(ich)(Phi(C))6{G(1)C,...,G(M)C}←groupsofchunksin{C(1)∗,...,C(N)∗}s.t.chunksineachG(ich)Caresemanticallyequivalentundersomecriteria7forg∈{G(1)C,...,G(M)C}do8φ(G)←Pc∈gφ(C)9G(∗)C←argmaxg∈{G(1)C,...,G(M)C}(Phi(G))10C←amemberofG(∗)Caccordingtotheirtreeeditmodel(seeSection2.1),andsoon.SeverynandMoschitti(2013)employatwo-stepprocess.First,theyextractallNPchunksforstep2,asothertypesofchunksrarelycontainanswerstoTREC-stylefactoidquestions.Akernel-basedbinaryclassiﬁeristhentrainedtocomputeascoreforeachchunk(step4).Relationallinksestab-lishedbetweenexpectedanswertypesandcompati-blechunkentitytypes(e.g.,HUM↔PERSON,DATE↔DATE/TIME/NUMBER)providetheinformationnecessaryforclassiﬁcation.Forstep6,bothsystemsrelyonasimplewordoverlapstrategy:chunkswithcommoncontentwordsaregroupedtogether.Neitherarticlediscussesthespeciﬁcsofstep10.Weadheretothisgenericframeworkwithourownmodelsandfeatures;butimportantly,throughtheuseofsentence-levelevidenceinstep4,ourjointmodeldemonstratesasubstantialimprovementinaccuracy.2.3CoupledRankingandExtractionYaoetal.(2013C)presentarankerthatutilizestoken-levelextractionfeatures.Thequestionsentenceisaugmentedwithsuchfeaturestoformulateasearch l D o w n o a d e d f r o m h t t p : / / D ich R e C T . M ich T . e d u / t a c l / l A R T ich C e - P D F / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 8 7 1 5 6 7 3 6 2 / / t l a c _ a _ 0 0 0 8 7 P D . F B j G u e S T T O N 0 7 S e P e M B e R 2 0 2 3 116 query,whichisfedasinputtoasearchengineforrankedretrievalfromapoolofcandidateanswersen-tences.Theyexperimentallyshowthatdownstreamextractionfromtopretrievalsinthislistismoreac-curatethanifthequeryisnotexpandedwiththeextractionfeatures.Wetakeadifferentapproachwherenumericpre-dictionsfromseparaterankingandextractionmod-ulesarecombinedtojointlyperformbothtasks(Sec-tion3).Yaoetal.buildonanexistingrankerthatsupportsqueryexpansionandtoken-levelcharacter-izationofcandidateanswersentences.Weassumenosuchsystemfeatures,facilitatingcouplingofarbi-trarymodelsincludingnewexperimentalones.Forextraction,Yaoetal.simplyrelyonbetterupstreamranking,whereasourjointmodelprovidesaprecisemathematicalformulationofanswerchunkqualityasafunctionofbothchunkandsentencerelevancetothequestion.Weobservealargeincreaseinend-to-endextractionaccuracyovertheYaoetal.modelinourexperiments.3ApproachWeﬁrsttrainseparateprobabilisticmodelsforan-swersentencerankingandanswerextraction,foreachofwhichwetakeanapproachsimilartothatofexistingmodels.Probabilitieslearnedbythetwotask-speciﬁcmodelsarethencombinedtoconstructourjointmodel.Thissectiondiscussesthedetailsofthistwo-stepprocess.3.1AnswerSentenceRankingLetthefollowinglogisticfunctionrepresenttheprob-abilitythatacandidateanswersentenceS(ich)containsananswertoaquestionQ:P(S(ich)|Q)=11+e−θTrfr(Q,S(ich))(1)wherefr(Q,S(ich))isasetoffeatureseachofwhichisauniquemeasureofsemanticsimilaritybetweenQandS(ich),andθristheweightvectorlearneddur-ingmodeltraining.WedescribeourfeaturesetforrankinginSection4.GivenP(S(ich)|Q)valuesfori∈{1,...,N},rank-ingisstraightforward:rankQ(S(ich))P(S(J)|Q).Notethatasmallernu-mericvaluerepresentsahigherrank.3.2AnswerExtractionWefollowtheframeworkinAlgorithm1foranswerextraction.Belowwedescribeourimplementationofthegenericsteps:1.Step2:Weadoptthestrategyof(SeverynandMoschitti,2013)ofextractingonlytheNPchunks,forwhichweusearegexpchunker.2.Step4:Thequalityφ(C)ofacandidatechunkcinS(ich)isgivenbythefollowinglogisticfunc-tion:Phi(C)=P(C|Q,S(ich))=11+e−θTefe(Q,S(ich),C)(2)wherefe(Q,S(ich),C)isthefeaturesetforchunkcrelativetoQ,andθeistheweightvectorlearnedduringmodeltraining.OurfeaturesetforextractionisdescribedinSection5.3.Step6:Givenanexistingsetof(possiblyempty)chunkgroups{G(1)C,…,G(M)C},anewchunkcisaddedtogroupG(ich)C,Wenn(1)allcontentwordsincareinatleastonememberofG(ich)C,oder(2)thereexistsamemberofG(ich)Callofwhosecontentwordsareinc.Ifnosuchgroupisfound,anewgroupG(M+1)Ciscreatedwithcasitsonlymember.4.Step10:WeextractthelongestchunkinG(∗)CasthebestanswerC.Additionally,weretainonlythetoptofalltheanswercandidatesextractedinstep5topreventprop-agationofnoisychunkstolatersteps.ThevalueoftissetusingtheWangetal.(2007)DEVset.3.3JointRankingandExtractionTheprimarygoalofthejointmodelistofacilitatetheapplicationofbothchunk-levelandsentence-levelfeaturestorankingaswellasextraction.Tothatend,itﬁrstcomputesthejointprobabilitythat(1)S(ich)containsananswertoQ,Und(2)c∈C(ich)isacorrectanswerchunk:P(S(ich),C|Q)=P(S(ich)|Q)×P(C|Q,S(ich))(3)wherethetwotermsontherighthandsidearegivenbyEquations(1)Und(2),respectively.Bothranking

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
0
8
7
1
5
6
7
3
6
2

/
T

A
C
_
A
_
0
0
0
8
7
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

117

andextractionarethendrivenbytask-appropriateapplicationofthiscommonquantity.GivenEquation(3),theconditionforrankingisredeﬁnedasfollows:rankQ(S(ich))maxc∈C(J)P(S(J),C|Q).ThisnewconditionrewardsanS(ich)thatnotonlyishighlysemanticallysimilartoQ,butalsocontainsachunkcwhichisalikelyanswertoQ.Forextraction,thejointprobabilityinEquation(3)replacestheconditionalinEquation(2)forstep4ofAlgorithm1:Phi(C)=P(S(ich),C|Q).Wieder,thisnewdeﬁnitionofφ(C)rewardsachunkcthatis(1)type-compatiblewithQ,Und(2)well-supportedbythecontentofthecontainingsentenceS(ich).Gleichung(3)assignsequalweighttotherankingandtheextractionmodel.Tolearntheseweightsfromdata,weimplementavariationofthejointmodelthatemploysasecond-levelregressor:P(S(ich),C|Q)=11+e−θT2f2(Q,S(ich),C)(4)wherethefeaturevectorf2consistsofthetwoproba-bilitiesinEquations(1)Und(2),andθ2istheweightvector.WhileP(S(ich),C|Q)iscomputedusingadif-ferentformulainthismodel,themethodsforrankingandextractionbasedonitremainsthesameasabove.Fromhereon,wewillrefertothemodelsinSec-tions3.1and3.2asourstandalonerankingandex-tractionmodels,jeweils,andthemodelsinthissectionasthejointprobabilisticmodel(Gleichung(3))andthestacked(regression)Modell(Gleichung(4)).3.4LearningThestandalonerankingmodelistrainedusingthe0/1labelsassignedto(Q,S(ich))pairsintheWangetal.(2007)dataset.Forstandaloneextraction,weusefortrainingthegoldchunkannotationsC(ich)gassoci-atedwith(Q,S(ich))pairs:acandidateNPchunkinS(ich)isconsideredapositiveexamplefor(Q,S(ich))iffitcontainsC(ich)gandS(ich)isanactualanswersentence.Forbothrankingandextraction,thecorrespondingweightvectorθislearnedbyminimizingthefollow-ingL2-regularizedlossfunction:J(θ)=−1TTXi=1(cid:20)j(ich)log(P(ich))+(1−y(ich))log(1−P(ich))(cid:21)+λkθk2whereTisthenumberoftrainingexamples,j(ich)isthegoldlabelforexampleiandP(ich)isthemodel-predictedprobabilityofexampleibeingapositiveexample(givenbyEquations(1)Und(2)).Learningofθ2forthestackedmodelworksinasimilarfashion,wherelevel1predictionsfortrainingQApairs(accordingtoEquations(1)Und(2))serveasfeaturevectors.4AnswerSentenceRankingFeaturesInsteadofreinventingsimilarityfeaturesforourQAranker,wederiveourfeaturesetfromthewinningsystem(Sultanetal.,2015)attheSemEval2015SemanticTextualSimilarity(STS)Aufgabe(Agirreetal.,2015).STSisanannuallyheldSemEvalcompetition,wheresystemsoutputreal-valuedsimilarityscoresforinputsentencepairs.Hundredsofsystemshavebeenevaluatedoverthepastfewyears(Agirreetal.,2012;Agirreetal.,2013;Agirreetal.,2014;Agirreetal.,2015);ourchosensystemwasshowntooutperformallothersystemsfromallyearsofSemEvalSTS(Sultanetal.,2015).Inordertocomputethedegreeofsemanticsimi-laritybetweenaquestionQandacandidateanswersentenceS(ich),wedrawfeaturesfromtwosources:(1)lexicalalignmentbetweenQandS(ich),Und(2)vectorrepresentationsofQandS(ich),derivedfromtheirwordembeddings.WhiletheoriginalSTSsys-tememploysridgeregression,weusethesefeatureswithinalogisticregressionmodelforQAranking.4.1AlignmentFeaturesWealignrelatedwordsinQandS(ich)usingamono-lingualaligneroriginallyproposedbySultanetal.(2014).Herewegiveabriefdescriptionofourimplementation,whichemploysarguablymoreprin-cipledmethodstosolveasetofsubproblems.Seetheoriginalarticleforfurtherdetails.ThealignercomputesforeachwordpairacrossQandS(ich)asemanticsimilarityscoresimW∈[0,1]usingPPDB—alargedatabaseoflexicalparaphrasesdevelopedusingbilingualpivoting(Ganitkevitchetal.,2013).Konkret,itallowsthreedifferentlevelsofsimilarity:1ifthetwowordsortheirlemmasareidentical,avalueppdbSim∈(0,1)ifthewordpairispresentinPPDB(theXXXLdatabase)2,and02http://www.cis.upenn.edu/˜ccb/ppdb/

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
0
8
7
1
5
6
7
3
6
2

/
T

A
C
_
A
_
0
0
0
8
7
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

118

otherwise.ItalsocomputesthedegreeofsimilaritysimCbetweenthetwowords’contextsintheirrespectivesentences.Thissimilarityiscomputedasthesumofwordsimilaritiesintwodifferenttypesofcontexts:(1)adependencyneighborhoodofsize2(i.e.parents,grandparents,childrenandgrandchildren),Und(2)asurface-formneighborhoodofsize3(i.e.3wordstotheleftand3wordstotheright).Stopwordsareskippedduringneighborselection.UnliketheSultanetal.(2014)Ausrichter,whichallowsasingleneighborwordtobematchedtomultiplesimilarwordsintheothersentence,wematchneighborsusingamax-weightedbipartitematchingalgorithm,wherewordsimilaritiesserveasedgeweights.EverywordpairacrossQandS(ich)receivesaﬁnalweightgivenbyw∗simW+(1−w)∗simC,wherew∈[0,1].WhileSultanetal.useagreedybest-ﬁrstalgorithmtoalignwordsbasedontheseweights,weusethemasedgeweightsinamax-weightedbipartitematchingofwordpairs(detailsfollow).Weadoptthestrategyoftheoriginalalignerofstartingwithhigh-precisionalignmentsandincreas-ingtherecallinlatersteps.Tothisend,wealigninthefollowingorder:(1)identicalwordsequenceswithatleastonecontentword,(2)namedentities,(3)contentwords,Und(4)stopwords.Followingtheoriginalaligner,noadditionalcontextmatchingisperformedinstep1sinceasequenceitselfpro-videscontextualevidenceforitstokens.Foreachofsteps2–4,words/entitiesofthecorrespondingtypearealignedusingmax-weightedbipartitematchingasdescribedabove(multiwordnamedentitiesareconsideredsingleunitsinstep2);otherwordtypesandalreadyalignedwordsarediscarded.ThevaluesofwandppdbSimarederivedusingagridsearchonanalignmentdataset(Brockett,2007).GivenalignedwordsintheQApair,ourﬁrstfeaturecomputestheproportionofalignedcontentwordsinQandS(ich),combined:simA(Q,S(ich))=nac(Q)+nac(S(ich))nc(Q)+nc(S(ich))wherenac(·)andnc(·)representthenumberofalignedcontentwordsandthetotalnumberofcontentwordsinasentence,respectively.S(ich)canbearbitrarilylongandstillcontainanan-swertoQ.Intheabovesimilaritymeasure,longeranswersentencesarepenalizedduetoalargernum-berofunalignedwords.Tocounterthisphenomenon,weaddameasureofcoverageofQbyS(ich)totheoriginalfeaturesetofSultanetal.(2015):covA(Q,S(ich))=nac(Q)nc(Q)4.2ASemanticVectorFeatureNeuralwordembeddings(Mikolovetal.,2013;Ba-ronietal.,2014;Pennington,2014)havebeenhighlysuccessfulasdistributionalwordrepresentationsintherecentpast.Weutilizethe400-dimensionalwordembeddingsdevelopedbyBaronietal.(2014)3toconstructsentence-levelembeddingsforQandS(ich),whichwethencomparetocomputeasimilarityscore.ToconstructthevectorrepresentationVSofagivensentenceS,weﬁrstextractthecontentwordlemmasCS={C(1)S,…,C(M)S}inS.Thevectorsrepresentingtheselemmasarethenaddedtogeneratethesentencevector:VS=MXi=1VC(ich)SFinally,asimilaritymeasureforQandS(ich)isderivedbytakingthecosinesimilaritybetweentheirvectorrepresentations:simE(Q,S(ich))=VQ·VS(ich)|VQ||VS(ich)|Thissimplebag-of-wordsmodelwasfoundtoaug-mentthealignment-basedfeaturewellintheevalua-tionsreportedbySultanetal.(2015).simA,covAandsimEconstituteourﬁnalfeatureset.AsweshowinSection6,thissmallfeaturesetoutperformsthecurrentstateoftheartinanswersentenceranking.5AnswerExtractionFeaturesAsmentionedinSection3.2,weconsideronlyNPchunksasanswercandidatesforextraction.Ourchunkfeaturescanbecategorizedintotwobroadgroups,whichwedescribeinthissection.Forthefollowingdiscussion,let(Q,S(ich),C)beourquestion,answersentence,answerchunktriple.3http://clic.cimec.unitn.it/composes/semantic-vectors.html

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
0
8
7
1
5
6
7
3
6
2

/
T

A
C
_
A
_
0
0
0
8
7
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

119

5.1Question-IndependentFeaturesThesefeaturesrepresentpropertiesofcindependentofthenatureofQ.Forexample,ourﬁrsttwofeaturesﬁreifallcontentwordsincarepresentinQoraligntowordsinQ.Suchchunksrarelycontainananswer,regardlessofthetypeofQ.Yaoetal.(2013A)reportanobservationthatan-swerchunksoftenappearclosetoalignedcontentwordsofspeciﬁctypesinS(ich).Tomodelthisphe-nomenon,weadopttheirfeaturesspecifyingthedis-tanceofcfromthenearestalignedcontentwordwainS(ich)andthePOS/DEP/NERtagsofwa.Inaddi-tion,toencodethetotalamountoflocalevidencepresentforc,weemploytheproportionsofalignedcontentwordsinitsdependency(size=2)andsur-face(size=3)contextsinS(ich).5.2FeaturesContainingtheQuestionTypeThesefeaturesareoftheform“question-type|x”,wherexcanbeanelementary(i.e.unit)orcompositefeature.Therationaleisthatcertainfeaturesarein-formativeprimarilyinthecontextofcertainquestiontypes(e.g.,alikelyanswertoawhenquestionisachunkcontainingtheNERtagDATE).HeadwordFeatures.WeextracttheheadwordofcanduseitsPOS/DEP/NERtagsasfeatures(ap-pendedtothequestiontype).AheadwordinthesubjectpositionofS(ich)orwithPERSONasitsNERtag,forexample,isalikelyanswertoawhoquestion.QuestionFocus.Thequestionfocuswordrepre-sentstheentityaboutwhichthequestionisbeingasked.Forexample,in“Whatisthelargestcoun-tryintheworld?”,thefocuswordis“country”.Forquestiontypeslikewhatandwhich,propertiesofthequestionfocuslargelydeterminethenatureofthean-swer.Intheaboveexample,thefocuswordindicatesthatGPEisalikelyNERtagfortheanswer.Weextractthequestionfocususingarule-basedsystemoriginallydesignedforadifferentapplica-tion,undertheassumptionthataquestioncouldspanmultiplesentences.Therule-basedsystemislooselyinspiredbytheworkofLallyetal.(2012),fromwhichitdiffersradicallybecausethequestionsintheJeopardy!gameareexpressedasanswers.Thefocusextractorﬁrstdeterminesthequestionwordorwords,whichisthenusedinconjunctionwiththeparsetreetodecidewhetherthequestionworditselforsomeotherwordinthesentenceistheactualfocus.WepairtheheadwordPOS/DEP/NERtagswiththefocuswordanditsPOS/NERtags,andaddeachsuchpair(appendedtothequestiontype)toourfeatureset.Thereareninefeatureshere;examplesincludequestion-type|question-focus-word|headword-pos-tagandquestion-type|question-focus-ner-tag|headword-ner-tag.Wealsoemploythetrue/falselabelsofthefollow-ingpropositionsasfeatures(inconjunctionwiththequestiontype):(1)thequestionfocuswordisinc,(2)thequestionfocusPOStagisinthePOStagsofc,Und(3)thequestionfocusNERtagisoftheformxorxDESC,andxisintheNERtagsofc,forsomex(e.g.,GPE).ChunkTags.Inmanycases,itisnotthehead-wordofcwhichistheanswer;forexample,inQ:“HowmanystatesarethereintheUS?”andc:“50states”,theheadwordofcis“states”.Toextendourunitofattentionfromtheheadwordtotheen-tirechunk,weﬁrstconstructvocabulariesofPOSandNERtags,VposandVner,fromtrainingdata.ForeachpossibletaginVpos,wethenusethepres-ence/absenceofthattaginthePOStagsequenceforcasafeature(inconjunctionwiththequestiontype).WerepeattheprocessforVner.Fortheabovec,forinstance,aninformativefeaturewhichislikelytoﬁreis:“question-type=how-many|theNERtagsofcincludeCARDINAL”.PartialAlignment.Forsomequestiontypes,partofacorrectanswerchunkisoftenalignedtoaques-tionword(e.g.,Q:“Howmanyplayersareontheﬁeldduringasoccergame?”,C:“22players”).Toin-formourmodelofsuchoccurrences,weemploytwofeatures—true/falselabelsofthefollowingproposi-tions:(1)cispartiallyaligned,(2)cisnotalignedatall(eachinconjunctionwiththequestiontype).6Experiments6.1DataTheWangetal.(2007)corpusiscreatedfromTextREtrievalConference(TREC)8–13QAdata.Itcon-sistsofasetoffactoidquestions,andforeachques-tion,asetofcandidateanswersentences.Eachan-swercandidateisautomaticallydrawnfromalargerdocumentbasedontwoselectioncriteria:(1)anon-zerocontentwordoverlapwiththequestion,oder(2)

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
0
8
7
1
5
6
7
3
6
2

/
T

A
C
_
A
_
0
0
0
8
7
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

120

Dataset#Questions#QAPairs%PositiveTRAIN-ALL1,22953,41712.0TRAIN944,7187.4DEV821,14819.3TEST1001,51718.7Table2:SummaryoftheWangetal.(2007)corpus.amatchwiththegoldregexpanswerpatternforthequestion(trainingonly).TRAINpairsaredrawnfromTREC8–12;DEVandTESTpairsaredrawnfromTREC13.DetailsoftheTRAIN/DEV/TESTsplitaregiveninTable2.TRAIN-ALLisalargesetofautomaticallyjudged(thusnoisy)QApairs:asentenceisconsideredapositiveexampleifitmatchesthegoldanswerpatternforthecorrespondingquestion.TRAINisamuchsmallersubsetofTRAIN-ALL,containingpairsthataremanuallycorrectedforerrors.ManualjudgmentisproducedforDEVandTESTpairs,too.Foranswerextraction,Yaoetal.(2013A)addtoeachQApairthecorrectanswerchunk(S).ThegoldTRECpatternsareusedtoﬁrstidentifyrele-vantchunksineachanswersentence.TRAIN,DEVandTESTarethenmanuallycorrectedforerrors.TheWangetal.(2007)datasetalsocomeswithPOS/DEP/NERtagsforeachsentence.TheyusetheMXPOSTtagger(Ratnaparkhi,1996)forPOStagging,theMSTParser(McDonaldetal.,2005)togeneratetypeddependencytrees,andtheBBNIden-tiﬁnder(Bikeletal.,1999)forNERtagging.Al-thoughwehaveaccesstoastate-of-the-artinforma-tionpipelinethatproducesbettertags,thispaperaimstostudytheeffectoftheproposedmodelsandofourfeaturesonsystemperformance,ratherthanonad-ditionalvariables;daher,tosupportcomparisonwithpriorwork,werelyonthetagsprovidedwiththedatasetforallourexperiments.6.2AnswerSentenceRankingWeadoptthestandardevaluationprocedureandmet-ricsforQArankersreportedintheliterature.6.2.1EvaluationMetricsOurmetricsforrankingareMeanAveragePre-cision(MAP)andMeanReciprocalRank(MRR).Herewedeﬁnebothintermsofsimplermetrics.PrecisionatK.GivenaquestionQandasetofcandidateanswersentences{S(1),…,S(N)},lettheoutputofarankerbe[R(1),…,R(N)],sothateachR(ich)∈{S(1),…,S(N)}andthepredictedrankofR(ich)ishigherthanthepredictedrankofR(J)when-everi95%)areretainedafterthisexclusion.WeusethelogisticregressionimplementationofScikit-learn(Pedregosaetal.,2011)andusetheWangetal.(2007)DEVsettosetC,theregulariza-tionstrengthparameter.Thestandardtrecevalscriptisusedtogenerateallresults.6.2.3ResultsTable3showsperformancesofourrankingmodelsandrecentbaselinesystemsonTEST.OurQAsimi-larityfeatures(i.e.thestandaloneranker)outperformallbaselineswithbothTRAINandTRAIN-ALL,al-thoughtheadditionalnoisyexamplesinthelatterarenotfoundtoimproveresults.Moreimportantly,wegetimprovementsofsub-stantiallylargermagnitudesusingourjointmodels—morethan10MAPandMRRpointsoverthestate-of-the-artsystemofSeverynandMoschitti(2015)withTRAIN-ALLforthejointprobabilisticmodel.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
0
8
7
1
5
6
7
3
6
2

/
T

A
C
_
A
_
0
0
0
8
7
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

121

ModelMAP%MRR%TRAINShnarch(2013)68.6075.40Yihetal.(2013)70.9277.00Yuetal.(2014)70.5878.00Severyn&Moschitti(2015)73.2979.62OurStandaloneModel76.0583.99OurJointProbabilisticModel81.5989.09OurStackedModel80.7786.85TRAIN-ALLYuetal.(2014)71.1378.46Severyn&Moschitti(2015)74.5980.78OurStandaloneModel75.6883.09OurJointProbabilisticModel84.9591.95OurStackedModel82.5690.69Table3:Answersentencerankingresults.Unlikethestandalonemodel,thejointmodelsalsobeneﬁtfromtheadditionalnoisyexamplesinTRAIN-ALL.Theseresultssupportthecentralargumentofthispaperthatjointmodelingisabetterapproachtoanswersentenceranking.6.3AnswerExtractionWefollowtheprocedurereportedinpriorwork(Yaoetal.,2013a;SeverynandMoschitti,2013)toevalu-atetheanswerchunksextractedbythesystem.6.3.1EvaluationMetricsPrecision.Givenasetofquestions,theprecisionofananswerextractionsystemistheproportionofitsextractedanswersthatarecorrect(i.e.matchthecorrespondinggoldregexppattern).Recall.Recallistheproportionofquestionsforwhichthesystemextractedacorrectanswer.F1Score.TheF1scoreistheharmonicmeanofprecisionandrecall.Itcapturesthesystem’saccuracyandcoverageinasinglemetric.6.3.2SetupFollowingpriorwork,Wir(1)retainthe89ques-tionsintheWangetal.(2007)TESTsetthathaveatleastonecorrectanswer,Und(2)trainonlywithchunksincorrectanswersentencestoavoidextremebiastowardsfalselabels(boththestandaloneextrac-tionmodelandstage2ofthestackedmodel).AsinModelP%R%F1%TRAINYaoetal.(2013A)55.253.954.5Severyn&Moschitti(2013)66.266.266.2OurStandaloneModel62.962.962.9OurJointProbabilisticModel69.769.769.7OurStackedModel62.962.962.9TRAIN-ALLYaoetal.(2013A)63.662.963.3Severyn&Moschitti(2013)70.870.870.8OurStandaloneModel70.870.870.8OurJointProbabilisticModel76.476.476.4OurStackedModel73.073.073.0Table4:AnswerextractionresultsontheWangetal.(2007)testset.ranking,weuseScikit-learnforlogisticregressionandsettheregularizationparameterCusingDEV.6.3.3ResultsTable4showsperformancesofourextractionmod-elsontheWangetal.TESTset.Thejointproba-bilisticmodeldemonstratestopperformanceforbothTRAINandTRAIN-ALL.WithTRAIN-ALL,itcor-rectlyanswers68ofthe89testquestions(5morethanthepreviousbestmodelofSeverynandMos-chitti(2013)).Thestackedmodelalsoperformswellwiththelargertrainingset.Again,theseresultssup-portthecentralclaimofthepaperthatanswerextrac-tioncanbemadebetterthroughjointmodeling.Table5showsperformancesofourstandaloneandjointprobabilisticmodels(trainedonTRAIN-ALL)ondifferentTESTquestiontypes.Thejointmodelisthebetterofthetwoacrosstypes,achievinggoodQuestionTypeCountSTJPwhat3751.456.8when19100.0100.0where11100.090.9who/whom1060.070.0why10.00.0howmany977.8100.0howlong250.0100.0Table5:F1%oftheSTandaloneandtheJointProbabilisticextractionmodelacrossquestiontypes.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
0
8
7
1
5
6
7
3
6
2

/
T

A
C
_
A
_
0
0
0
8
7
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

122

QuestionCandidateAnswerSentenceSTJPHowmanyyearswasJackWelchwithGE?“SixSigmahasgalvanizedourcompanywithanintensitythelikesofwhichIhaveneverseeninmy40yearsatGE,”saidJohnWelch,chairmanofGeneralElectric..517.113SoferventaproselytizerisWelchthatGEhasspentthreeyearsandmorethan$1billiontoconvertallofitsdivisionstotheSixSigmafaith..714.090WhatkindofshipistheLibertyBell7?Newportplanstoretrievetherecoveryvesselﬁrst,thengoafterLibertyBell7,theonlyU.S.mannedspacecraftlostafterasuccessfulmission..838.278“Itwillbeabigrelief”oncethecapsuleisaboardship,CurtNewportsaidbeforesettingsailThursday..388.003Table6:ScorescomputedbytheSTandaloneandtheJointProbabilisticmodelforcandidatechunks(boldfaced)infour(Wangetal.,2007)testsentences.Jointmodelscoresfornon-answerchunks(rows2and4)aremuchlower.resultsonallquestiontypesexceptwhat.Aparticularlychallengingsubtypeofwhatques-tionsarewhatbequestions,answerstowhichoftengobeyondNPchunkboundaries.Ahuman-extractedanswertothequestion“WhatisMuslimBrother-hood’sgoal?”intheWangetal.corpus(2007),forexample,is“advocatesturningEgyptintoastrictMuslimstatebypoliticalmeans.”Whatingeneralisneverthelessthemostdifﬁcultquestiontype,sinceunlikequestionslikewhoorwhen,answersdonothavestrictcategories(e.g.,aﬁxedsetofNERtags).6.3.4QualitativeAnalysisWecloselyexamineQApairsforwhichthejointprobabilisticmodelextractsacorrectanswerchunkbutthestandalonemodeldoesnot.Table6showstwosuchquestions,withtwocandidateanswersentencesforeach.Candidateanswerchunksareboldfaced.Fortheﬁrstquestion,onlythesentenceinrow1containsananswer.Thestandalonemodelassignsahigherscoretothenon-answerchunkinrow2,buttheuseofsentence-levelfeaturesenablesthejointmodeltoidentifythemorerelevantchunkinrow1.Notethatthejointmodelscore,beingaproductoftwoprobabilities,isalwayslowerthanthestandalonemodelscore.However,onlytherelativescoremattersinthiscase,asthechunkwiththehighestoverallscoreiseventuallyselectedforextraction.Forthesecondquestion,bothmodelscomputealowerscoreforthenon-answerchunk“CurtNewport”thantheanswerchunk“mannedspacecraft”.How-ever,theincorrectchunkappearsinseveralcandidateanswersentences(notshownhere),resultinginaModelP%R%F1%Yaoetal.(2013C)35.417.223.1OurJointProbabilisticModel83.883.883.8Table7:PerformancesoftwojointextractionmodelsontheYaoetal.(2013C)testset.highoverallscoreforthestandalonemodel(Algo-rithm1:steps7and8).Thejointmodelassignsamuchlowerscoretoeachinstanceofthischunkduetoweaksentence-levelevidence,eventuallyresultingintheextractionofthecorrectchunk.6.3.5ASecondExtractionDatasetYaoetal.(2013C)reportanextractiondatasetcon-taining99testquestions,derivedfromtheMIT109testcollection(LinandKatz,2006)ofTRECpairs.Eachquestioninthisdatasethas10candidatean-swersentences.Wecomparetheperformanceofourjointprobabilisticmodelwiththatoftheirextractionmodel,whichextractsanswersfromtopcandidatesentencesidentiﬁedbytheircoupledranker(Sec-tion2.3).4Modelsaretrainedontheirtrainingsetof2,205questionsand22,043candidateQApairs.AsshowninTable7,ourmodeloutperformstheYaoetal.modelbyasurprisinglylargemargin,correctlyanswering83ofthe99testquestions.Interestingly,ourstandalonemodelextractssixmorecorrectanswersinthisdatasetthanthejoint4Wecomparewithonlytheirextractionmodel,asthelargerrankingdatasetisnotavailableanymore.Precisionandrecallarereportedathttp://cs.jhu.edu/˜xuchen/packages/jacana-ir-acl2013-data-results.tar.bz2.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
0
8
7
1
5
6
7
3
6
2

/
T

A
C
_
A
_
0
0
0
8
7
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

123

CandidateAnswerSentenceSTJPAnotherperkisgettingtoworkwithhisson,BarryVanDyke,whohasaregularroleasDetectiveSteveSloanon“Diagnosis”..861.338ThisisonlythethirdtimeinschoolhistorytheRaidershavebegunaseason6-0andtheﬁrstsince1976,whenSteveSloan,inhissecondseasonascoach,ledthemtoan8-0startand10-2overallrecord..494.010HealsorepresentedseveralAlabamacoaches,includingRayPerkins,BillCurry,SteveSloanandWimpSanderson..334.007BartStarr,JoeNamath,KenStabler,SteveSloan,ScottHunterandWalterLewisarebutafewofthelegendsonthewalloftheCrimsonTidequarterbackscoach..334.009Table8:ScorescomputedbytheSTandaloneandtheJointProbabilisticmodelforNPchunks(boldfaced)infourYaoetal.(2013C)testsentencesforthequestion:Whoisthedetectiveon‘DiagnosisMurder’?Thestandalonemodelassignshighprobabilitiestonon-answerchunksinthelastthreesentences,subsequentlycorrectedbythejointmodel.model.Acloseexaminationrevealsthatinallsixcases,thisiscausedbythepresenceofcorrectan-swerchunksinnon-answersentences.Table8showsanexample,wherethecorrectanswerchunk“SteveSloan”appearsinallfourcandidatesentences,ofwhichonlytheﬁrstisactuallyrelevanttotheques-tion.Thestandalonemodelassignshighscorestoallfourinstancesandasaresultobservesahighoverallscoreforthechunk.Thejointmodel,ontheotherhand,recognizesthefalsepositives,andcon-sequentlyobservesasmalleroverallscoreforthechunk.However,thisdesiredbehavioreventuallyresultsinawrongextraction.Theseresultshavekeyimplicationsfortheevaluationofanswerextractionsystems:metricsthatassessperformanceonindivid-ualQApairscanenableﬁner-grainedevaluationthanwhatend-to-endextractionmetricsoffer.7DiscussionOurtwo-stepapproachtojointmodeling,consist-ingofconstructingseparatemodelsforrankingandextractionﬁrstandthencouplingtheirpredictions,offersatleasttwoadvantages.First,predictionsfromanygivenpairofrankingandextractionsystemscanbecombined,sincesuchsystemsmustcomputeascoreforaQApairorananswerchunkinordertodifferentiateamongcandidates.CouplingoftherankingandextractionsystemsofYaoetal.(2013A)andSeverynandMoschitti(2013),forexample,isstraightforwardwithinourframework.Second,thisapproachsupportstheuseoftask-appropriatetrainingdataforrankingandextraction,whichcanprovidekeyadvantage.Forexample,whileanswersentencerankingsystemsusebothcorrectandincorrectcan-didateanswersentencesformodeltraining,existinganswerextractionsystemsdiscardthelatterinordertomaintaina(relatively)balancedclassdistribution(Yaoetal.,2013a;SeverynandMoschitti,2013).Throughtheseparationoftherankingandextrac-tionmodelsduringtraining,ourapproachnaturallysupportssuchtask-speciﬁcsamplingoftrainingdata.ApotentiallylimitingfactorinourextractionmodelistheassumptionthatanswersarealwaysexpressedneatlyinNPchunks.Whilemodelsthatmakenosuchassumptionexist(e.g.,theCRFmodelofYaoetal.(2013A)),extractionoflonganswers(suchastheonediscussedinSection6.3.3)isstilldifﬁcultinpracticeduetotheirunconstrainednature.8ConclusionsandFutureWorkWepresentajointmodelfortheimportantQAtasksofanswersentencerankingandanswerextraction.Byexploitingtheinterconnectednatureofthetwotasks,ourmodeldemonstratessubstantialperfor-manceimprovementsoverpreviousbestsystemsforboth.Additionally,ourrankingmodelappliesrecentadvancesinthecomputationofshorttextsimilaritytoQA,providingstrongersimilarityfeatures.Anobviousdirectionforfutureworkistheinclu-sionofnewfeaturesforeachtask.Answersentenceranking,forexample,canbeneﬁtfromphrasalalign-mentandlong-distancecontextrepresentation.An-swerextractionforwhatquestionscanbemadebetterusingalexicalanswertypefeature,orworldknowl-

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
0
8
7
1
5
6
7
3
6
2

/
T

A
C
_
A
_
0
0
0
8
7
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

124

edge(suchas“blueisacolor”)derivedfromsemanticnetworkslikeWordNet.Ourmodelalsofacilitatesstraightforwardintegrationoffeatures/predictionsfromotherexistingsystemsforbothtasks,forex-ample,theconvolutionalneuralsentencemodelofSeverynandMoschitti(2015)forranking.Finally,moresophisticatedtechniquesarerequiredforextrac-tionoftheﬁnalanswerchunkbasedonindividualchunkscoresacrossQApairs.AcknowledgmentsWethankthereviewersfortheirvaluablecommentsandsuggestions.WealsothankXuchenYaoandAliakseiSeverynforclariﬁcationoftheirwork.ReferencesEnekoAgirre,DanielCer,MonaDiab,andAitorGonzalez-Agirre.2012.SemEval-2012task6:APilotonSemanticTextualSimilarity.InProceedingsoftheSixthInternationalWorkshoponSemanticEvaluation,pages385–393,Montreal,Canada.EnekoAgirre,DanielCer,MonaDiab,AitorGonzalez-Agirre,andWeiweiGuo.2013.*SEM2013SharedTask:SemanticTextualSimilarity.InProceedingsoftheSecondJointConferenceonLexicalandComputa-tionalSemantics,pages32–43,Atlanta,Georgia,USA.EnekoAgirre,CarmenBanea,ClaireCardie,DanielCer,MonaDiab,AitorGonzalez-Agirre,WeiweiGuo,RadaMihalcea,GermanRigau,andJanyceWiebe.2014.SemEval-2014Task10:MultilingualSemanticTex-tualSimilarity.InProceedingsofthe8thInterna-tionalWorkshoponSemanticEvaluation,pages81–91,Dublin,Ireland.EnekoAgirre,CarmenBanea,ClaireCardie,DanielCer,MonaDiab,AitorGonzalez-Agirre,WeiweiGuo,I˜nigoLopez-Gazpio,MontseMaritxalar,RadaMihalcea,Ger-manRigau,LarraitzUria,andJanyceWiebe.2015.SemEval-2015Task2:SemanticTextualSimilarity,English,SpanishandPilotonInterpretability.InPro-ceedingsofthe9thInternationalWorkshoponSemanticEvaluation,pages252–263,Denver,Colorado,USA.MarcoBaroni,GeorgianaDinu,andGerm´anKruszewski.2014.Don’tCount,Predict!ASystematicComparisonofContext-Countingvs.Context-PredictingSemanticVectors.InProceedingsofthe52ndAnnualMeetingoftheAssociationforComputationalLinguistics,pages238–247,Baltimore,Maryland,USA.DanielM.Bikel,RichardSchwartz,andRalphM.Weischedel.1999.AnAlgorithmthatLearnswhat’sinaName.MachineLearning,34(1-3):211–231.ChrisBrockett.2007.AligningtheRTE2006Cor-pus.TechnicalReportMSR-TR-2007-77,MicrosoftResearch.DavidFerrucci,EricBrown,JenniferChu-Carroll,JamesFan,DavidGondek,AdityaA.Kalyanpur,AdamLally,J.WilliamMurdock,EricNyberg,JohnPrager,NicoSchlaefer,andChrisWelty.2010.BuildingWatson:AnOverviewoftheDeepQAProject.AIMagazine,31(3),pages59–79.JuriGanitkevitch,BenjaminVanDurme,andChrisCallison-Burch.2013.PPDB:TheParaphraseDatabase.InProceedingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforCompu-tationalLinguistics,pages758–764,Atlanta,Georgia,USA.MichaelHeilmanandNoahA.Smith2010.TreeEditModelsforRecognizingTextualEntailments,Para-phrases,andAnswerstoQuestions.InProceedingsofthe2010ConferenceoftheNorthAmericanChap-teroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies,pages1011–1019,LosAngeles,Kalifornien,USA.AdamLally,JohnM.Prager,MichaelC.McCord,Bran-imirK.Boguraev,SiddharthPatwardhan,JamesFan,PaulFodor,andJenniferChu-Carroll.2012.QuestionAnalysis:HowWatsonReadsaClue.IBMJournalofResearchandDevelopment,56(3.4):2:1–2:14.JimmyLinandBorisKatz.2006.BuildingaReusableTestCollectionforQuestionAnswering.JournaloftheAmericanSocietyforInformationScienceandTechnol-ogy,57(7):851–861.RyanMcDonald,KobyCrammer,andFernadoPereira.2005.OnlineLarge-MarginTrainingofDependencyParsers.InProceedingsofthe43stAnnualMeetingoftheAssociationforComputationalLinguistics,AnnArbor,Michigan,USA.TomasMikolov,KaiChen,GregCorrado,andJeffreyDean.2013.EfﬁcientEstimationofWordRepresenta-tionsinVectorSpace.InProceedingsoftheInterna-tionalConferenceonLearningRepresentationsWork-shop,Scottsdale,Arizona,USA.FabianPedregosa,Ga¨elVaroquaux,AlexandreGramfort,VincentMichel,BertrandThirion,OlivierGrisel,Math-ieuBlondel,PeterPrettenhofer,RonWeiss,VincentDubourg,JakeVanderplas,AlexandrePassos,DavidCournapeau,MatthieuBrucher,MatthieuPerrot,and´EdouardDuchesnay.2011.Scikit-learn:MachineLearninginPython.JournalofMachineLearningRe-search,vol.12,pages2825–2830.JeffreyPennington,RichardSocher,andChristopherD.Manning.2014.GloVe:GlobalVectorsforWordRepresentation.InProceedingsofthe2014ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages1532–1543,Doha,Qatar.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
0
8
7
1
5
6
7
3
6
2

/
T

A
C
_
A
_
0
0
0
8
7
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

125

AdwaitRatnaparkhi.1996.AMaximumEntropyModelforPart-of-SpeechTagging.InProceedingsofthe1996ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages133–142,Philadelphia,Pennsylva-nia,USA.AliakseiSeverynandAlessandroMoschitti.2013.Auto-maticFeatureEngineeringforAnswerSelectionandExtraction.InProceedingsofthe2013ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages458–467,Seattle,Washington,USA.AliakseiSeverynandAlessandroMoschitti.2015.Learn-ingtoRankShortTextPairswithConvolutionalDeepNeuralNetworks.InProceedingsofthe38thInterna-tionalACMSIGIRConferenceonResearchandDe-velopmentinInformationRetrieval,pages373–382,Santiago,Chile.EyalShnarch.2013.ProbabilisticModelsforLexicalInference.PhDThesis,BarIlanUniversity.MdArafatSultan,StevenBethard,andTamaraSumner.2014.BacktoBasicsforMonolingualAlignment:ExploitingWordSimilarityandContextualEvidence.TransactionsoftheAssociationforComputationalLin-guistics,2,pages219–230.MdArafatSultan,StevenBethard,andTamaraSum-ner.2015.DLS@CU:SentenceSimilarityfromWordAlignmentandSemanticVectorComposition.InPro-ceedingsofthe9thInternationalWorkshoponSemanticEvaluation,pages148–153,Denver,Colorado,USA.MengqiuWang,NoahA.Smith,andTerukoMitamura.2007.WhatistheJeopardyModel?AQuasi-SynchronousGrammarforQA.InProceedingsofthe2007JointConferenceonEmpiricalMethodsinNaturalLanguageProcessingandComputationalNat-uralLanguageLearning,pages22–32,Prague,CzechRepublic.MengqiuWang,andChristopherD.Manning.2010.Prob-abilisticTree-EditModelswithStructuredLatentVari-ablesforTextualEntailmentandQuestionAnswering.InProceedingsofthe23rdInternationalConferenceonComputationalLinguistics,pages1164–1172,Beijing,China.XuchenYao,BenjaminVanDurme,ChrisCallison-Burch,andPeterClark.2013a.AnswerExtractionasSe-quenceTaggingwithTreeEditDistance.InProceed-ingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguis-tics:HumanLanguageTechnologies,pages858–867,Atlanta,Georgia,USA.XuchenYao,BenjaminVanDurme,ChrisCallison-Burch,andPeterClark.2013b.Semi-MarkovPhrase-BasedMonolingualAlignment.InProceedingsofthe2013ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages590–600,Seattle,Washington,USA.XuchenYao,BenjaminVanDurme,andPeterClark.2013c.AutomaticCouplingofAnswerExtractionandInformationRetrieval.InProceedingsofthe51stAn-nualMeetingoftheAssociationforComputationalLin-guistics,pages159–165,Soﬁa,Bulgaria.Wen-tauYih,Ming-WeiChang,ChristopherMeek,andAndrzejPastusiak.2013.QuestionAnsweringusingEnhancedLexicalSemanticModels.InProceedingsofthe51stAnnualMeetingoftheAssociationforCompu-tationalLinguistics,pages1744–1753,Soﬁa,Bulgaria.LeiYu,KarlMoritzHermann,PhilBlunsom,andStephenPulman.2014.DeepLearningforAnswerSentenceSe-lection.InProceedingsoftheDeepLearningandRep-resentationLearningWorkshop,NIPS2014,Montr´eal,Kanada.

D
Ö
w
N
Ö
A
D
e
D

F
R
Ö
M
H

T
T

:
/
/

D
ich
R
e
C
T
.

ich
T
.

e
D
u

/
T

A
C
l
/

A
R
T
ich
C
e
–
P
D

F
/

D
Ö

ich
/

1
0
1
1
6
2

/
T

A
C
_
A
_
0
0
0
8
7
1
5
6
7
3
6
2

/
T

A
C
_
A
_
0
0
0
8
7
P
D

B
j
G
u
e
S
T

Ö
N
0
7
S
e
P
e
M
B
e
R
2
0
2
3

126
PDF Herunterladen