Transactions of the Association for Computational Linguistics, vol. 4, pp. 113–125, 2016. Action Editor: Noah Smith.

Transactions of the Association for Computational Linguistics, vol. 4, pp. 113–125, 2016. Action Editor: Noah Smith.
Submission batch: 10/2015; Revision batch: 2/2016; Published 4/2016.

2016 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.

c
(cid:13)

AJointModelforAnswerSentenceRankingandAnswerExtractionMdArafatSultan†VittorioCastelli‡RaduFlorian‡†InstituteofCognitiveScienceandDepartmentofComputerScience,UniversityofColorado,Boulder,CO‡IBMT.J.WatsonResearchCenter,YorktownHeights,NYarafat.sultan@colorado.edu,vittorio@us.ibm.com,raduf@us.ibm.comAbstractAnswersentencerankingandanswerextrac-tionaretwokeychallengesinquestionanswer-ingthathavetraditionallybeentreatediniso-lation,i.e.,asindependenttasks.Inthisarti-cle,we(1)explainhowbothtasksarerelatedattheircorebyacommonquantity,and(2)proposeasimpleandintuitivejointprobabilis-ticmodelthataddressesbothviajointcom-putationbuttask-specificapplicationofthatquantity.InourexperimentswithtwoTRECdatasets,ourjointmodelsubstantiallyoutper-formsstate-of-the-artsystemsinbothtasks.1IntroductionOneoftheoriginalgoalsofAIwastobuildmachinesthatcannaturallyinteractwithhumans.Overtime,thechallengesbecameapparentandlanguagepro-cessingemergedasoneofAI’smostpuzzlingareas.Nevertheless,majorbreakthroughshavestillbeenmadeinseveralimportanttasks;withIBM’sWat-son(Ferruccietal.,2010)significantlyoutperform-inghumanchampionsinthequizcontestJeopardy!,questionanswering(QA)isdefinitelyonesuchtask.QAcomesinvariousforms,eachsupportingspe-cifickindsofuserrequirements.Considerascenariowhereasystemisgivenaquestionandasetofsen-tenceseachofwhichmayormaynotcontainananswertothatquestion.Thegoalofanswerextrac-tionistoextractapreciseanswerintheformofashortspanoftextinoneormoreofthosesentences.Inthisform,QAmeetsusers’immediateinformationneeds.Answersentenceranking,ontheotherhand,isthetaskofassigningaranktoeachsentencesothattheonesthataremorelikelytocontainananswerarerankedhigher.Inthisform,QAissimilartoinforma-tionretrievalandpresentsgreateropportunitiesforfurtherexplorationandlearning.Inthisarticle,weproposeanovelapproachtojointlysolvingthesetwowell-studiedyetopenQAproblems.Mostanswersentencerankingalgorithmsoperateundertheassumptionthatthedegreeofsyntacticand/orsemanticsimilaritybetweenquestionsandan-swersentencesisasufficientlystrongpredictorofanswersentencerelevance(Wangetal.,2007;Yihetal.,2013;Yuetal.,2014;SeverynandMoschitti,2015).Ontheotherhand,answerextractionalgo-rithmsfrequentlyassesscandidateanswerphrasesbasedprimarilyontheirownpropertiesrelativetothequestion(e.g.,whetherthequestionisawhoquestionandthephrasereferstoaperson),makinginadequateornouseofsentence-levelevidence(Yaoetal.,2013a;SeverynandMoschitti,2013).Boththeseassumptions,however,aresimplistic,andfailtocapturethecorerequirementsofthetwotasks.Table1showsaquestion,andthreecandi-dateanswersentencesonlyoneofwhich(S(1))ac-tuallyanswersthequestion.Rankingmodelsthatrelysolelyontextsimilarityarehighlylikelytoin-correctlyassignsimilarrankstoS(1)andS(2).SuchmodelswouldfailtoutilizethekeypieceofevidenceagainstS(2)thatitdoesnotcontainanytemporalinformation,necessarytoanswerawhenquestion.Similarly,anextractionmodelthatreliesonlyonthefeaturesofacandidatephrasemightextractthetem-poralexpression“theyear1666”inS(3)asananswerdespiteaclearlackofsentence-levelevidence.Inviewoftheabove,weproposeajointmodel

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/

/
t

l

a
c
_
a
_
0
0
0
8
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

114

QWhenwastheHaleBoppcometdiscov-ered?S(1)ThecometwasfirstspottedbyHaleandBopp,bothUSastronomers,onJuly22,1995.S(2)Hale-Bopp,alargecomet,wasobservedforthefirsttimeinChina.S(3)Thelawofgravitywasdiscoveredintheyear1666bySirIsaacNewton.Table1:Aquestionandthreecandidateanswersentences.foranswersentencerankingandanswerextractionthatutilizesbothsentenceandphrase-levelevidencetosolveeachtask.Moreconcretely,we(1)designtask-specificprobabilisticmodelsforrankingandextraction,exploitingfeaturesofcandidateanswersentencesandtheirphrases,respectively,and(2)com-binethetwomodelsinasimple,intuitivesteptobuildajointprobabilisticmodelforbothtasks.Thistwo-stepapproachfacilitatesconstructionofnewjointmodelsfromanyexistingsolutionstothetwotasks.OnapubliclyavailableTRECdataset(Wangetal.,2007),ourjointmodeldemonstratesanimprovementinrankingbyover10MAPandMRRscoresoverthecurrentstateoftheart.Italsooutperformsstate-of-the-artextractionsystemsontwoTRECdatasets(Wangetal.,2007;Yaoetal.,2013c).2BackgroundInthissection,weprovideaformaldescriptionofthetwotasksandestablishterminologythatwefollowinlatersections.TheWangetal.(2007)datasethasbeenthebenchmarkformostrecentworkonthetwotasksaswellasourown.Therefore,wesituateourdescriptioninthespecificcontextofthisdataset.Wealsodiscussrelatedpriorwork.2.1AnswerSentenceRankingGivenaquestionQandasetofcandidatean-swersentences{S(1),…,S(N)},thegoalinan-swersentencerankingistoassigneachS(i)anintegerrankQ(S(i))sothatforanypair(i,j),rankQ(S(i))P(S(j)|Q).Notethatasmallernu-mericvaluerepresentsahigherrank.3.2AnswerExtractionWefollowtheframeworkinAlgorithm1foranswerextraction.Belowwedescribeourimplementationofthegenericsteps:1.Step2:Weadoptthestrategyof(SeverynandMoschitti,2013)ofextractingonlytheNPchunks,forwhichweusearegexpchunker.2.Step4:Thequalityφ(c)ofacandidatechunkcinS(i)isgivenbythefollowinglogisticfunc-tion:φ(c)=P(c|Q,S(i))=11+e−θTefe(Q,S(i),c)(2)wherefe(Q,S(i),c)isthefeaturesetforchunkcrelativetoQ,andθeistheweightvectorlearnedduringmodeltraining.OurfeaturesetforextractionisdescribedinSection5.3.Step6:Givenanexistingsetof(possiblyempty)chunkgroups{G(1)C,…,G(M)C},anewchunkcisaddedtogroupG(i)C,if(1)allcontentwordsincareinatleastonememberofG(i)C,or(2)thereexistsamemberofG(i)Callofwhosecontentwordsareinc.Ifnosuchgroupisfound,anewgroupG(M+1)Ciscreatedwithcasitsonlymember.4.Step10:WeextractthelongestchunkinG(∗)CasthebestanswerC.Additionally,weretainonlythetoptofalltheanswercandidatesextractedinstep5topreventprop-agationofnoisychunkstolatersteps.ThevalueoftissetusingtheWangetal.(2007)DEVset.3.3JointRankingandExtractionTheprimarygoalofthejointmodelistofacilitatetheapplicationofbothchunk-levelandsentence-levelfeaturestorankingaswellasextraction.Tothatend,itfirstcomputesthejointprobabilitythat(1)S(i)containsananswertoQ,and(2)c∈C(i)isacorrectanswerchunk:P(S(i),c|Q)=P(S(i)|Q)×P(c|Q,S(i))(3)wherethetwotermsontherighthandsidearegivenbyEquations(1)and(2),respectively.Bothranking

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/

/
t

l

a
c
_
a
_
0
0
0
8
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

117

andextractionarethendrivenbytask-appropriateapplicationofthiscommonquantity.GivenEquation(3),theconditionforrankingisredefinedasfollows:rankQ(S(i))maxc∈C(j)P(S(j),c|Q).ThisnewconditionrewardsanS(i)thatnotonlyishighlysemanticallysimilartoQ,butalsocontainsachunkcwhichisalikelyanswertoQ.Forextraction,thejointprobabilityinEquation(3)replacestheconditionalinEquation(2)forstep4ofAlgorithm1:φ(c)=P(S(i),c|Q).Again,thisnewdefinitionofφ(c)rewardsachunkcthatis(1)type-compatiblewithQ,and(2)well-supportedbythecontentofthecontainingsentenceS(i).Equation(3)assignsequalweighttotherankingandtheextractionmodel.Tolearntheseweightsfromdata,weimplementavariationofthejointmodelthatemploysasecond-levelregressor:P(S(i),c|Q)=11+e−θT2f2(Q,S(i),c)(4)wherethefeaturevectorf2consistsofthetwoproba-bilitiesinEquations(1)and(2),andθ2istheweightvector.WhileP(S(i),c|Q)iscomputedusingadif-ferentformulainthismodel,themethodsforrankingandextractionbasedonitremainsthesameasabove.Fromhereon,wewillrefertothemodelsinSec-tions3.1and3.2asourstandalonerankingandex-tractionmodels,respectively,andthemodelsinthissectionasthejointprobabilisticmodel(Equation(3))andthestacked(regression)model(Equation(4)).3.4LearningThestandalonerankingmodelistrainedusingthe0/1labelsassignedto(Q,S(i))pairsintheWangetal.(2007)dataset.Forstandaloneextraction,weusefortrainingthegoldchunkannotationsC(i)gassoci-atedwith(Q,S(i))pairs:acandidateNPchunkinS(i)isconsideredapositiveexamplefor(Q,S(i))iffitcontainsC(i)gandS(i)isanactualanswersentence.Forbothrankingandextraction,thecorrespondingweightvectorθislearnedbyminimizingthefollow-ingL2-regularizedlossfunction:J(θ)=−1TTXi=1(cid:20)y(i)log(P(i))+(1−y(i))log(1−P(i))(cid:21)+λkθk2whereTisthenumberoftrainingexamples,y(i)isthegoldlabelforexampleiandP(i)isthemodel-predictedprobabilityofexampleibeingapositiveexample(givenbyEquations(1)and(2)).Learningofθ2forthestackedmodelworksinasimilarfashion,wherelevel1predictionsfortrainingQApairs(accordingtoEquations(1)and(2))serveasfeaturevectors.4AnswerSentenceRankingFeaturesInsteadofreinventingsimilarityfeaturesforourQAranker,wederiveourfeaturesetfromthewinningsystem(Sultanetal.,2015)attheSemEval2015SemanticTextualSimilarity(STS)task(Agirreetal.,2015).STSisanannuallyheldSemEvalcompetition,wheresystemsoutputreal-valuedsimilarityscoresforinputsentencepairs.Hundredsofsystemshavebeenevaluatedoverthepastfewyears(Agirreetal.,2012;Agirreetal.,2013;Agirreetal.,2014;Agirreetal.,2015);ourchosensystemwasshowntooutperformallothersystemsfromallyearsofSemEvalSTS(Sultanetal.,2015).Inordertocomputethedegreeofsemanticsimi-laritybetweenaquestionQandacandidateanswersentenceS(i),wedrawfeaturesfromtwosources:(1)lexicalalignmentbetweenQandS(i),and(2)vectorrepresentationsofQandS(i),derivedfromtheirwordembeddings.WhiletheoriginalSTSsys-tememploysridgeregression,weusethesefeatureswithinalogisticregressionmodelforQAranking.4.1AlignmentFeaturesWealignrelatedwordsinQandS(i)usingamono-lingualaligneroriginallyproposedbySultanetal.(2014).Herewegiveabriefdescriptionofourimplementation,whichemploysarguablymoreprin-cipledmethodstosolveasetofsubproblems.Seetheoriginalarticleforfurtherdetails.ThealignercomputesforeachwordpairacrossQandS(i)asemanticsimilarityscoresimW∈[0,1]usingPPDB—alargedatabaseoflexicalparaphrasesdevelopedusingbilingualpivoting(Ganitkevitchetal.,2013).Specifically,itallowsthreedifferentlevelsofsimilarity:1ifthetwowordsortheirlemmasareidentical,avalueppdbSim∈(0,1)ifthewordpairispresentinPPDB(theXXXLdatabase)2,and02http://www.cis.upenn.edu/˜ccb/ppdb/

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/

/
t

l

a
c
_
a
_
0
0
0
8
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

118

otherwise.ItalsocomputesthedegreeofsimilaritysimCbetweenthetwowords’contextsintheirrespectivesentences.Thissimilarityiscomputedasthesumofwordsimilaritiesintwodifferenttypesofcontexts:(1)adependencyneighborhoodofsize2(i.e.parents,grandparents,childrenandgrandchildren),and(2)asurface-formneighborhoodofsize3(i.e.3wordstotheleftand3wordstotheright).Stopwordsareskippedduringneighborselection.UnliketheSultanetal.(2014)aligner,whichallowsasingleneighborwordtobematchedtomultiplesimilarwordsintheothersentence,wematchneighborsusingamax-weightedbipartitematchingalgorithm,wherewordsimilaritiesserveasedgeweights.EverywordpairacrossQandS(i)receivesafinalweightgivenbyw∗simW+(1−w)∗simC,wherew∈[0,1].WhileSultanetal.useagreedybest-firstalgorithmtoalignwordsbasedontheseweights,weusethemasedgeweightsinamax-weightedbipartitematchingofwordpairs(detailsfollow).Weadoptthestrategyoftheoriginalalignerofstartingwithhigh-precisionalignmentsandincreas-ingtherecallinlatersteps.Tothisend,wealigninthefollowingorder:(1)identicalwordsequenceswithatleastonecontentword,(2)namedentities,(3)contentwords,and(4)stopwords.Followingtheoriginalaligner,noadditionalcontextmatchingisperformedinstep1sinceasequenceitselfpro-videscontextualevidenceforitstokens.Foreachofsteps2–4,words/entitiesofthecorrespondingtypearealignedusingmax-weightedbipartitematchingasdescribedabove(multiwordnamedentitiesareconsideredsingleunitsinstep2);otherwordtypesandalreadyalignedwordsarediscarded.ThevaluesofwandppdbSimarederivedusingagridsearchonanalignmentdataset(Brockett,2007).GivenalignedwordsintheQApair,ourfirstfeaturecomputestheproportionofalignedcontentwordsinQandS(i),combined:simA(Q,S(i))=nac(Q)+nac(S(i))nc(Q)+nc(S(i))wherenac(·)andnc(·)representthenumberofalignedcontentwordsandthetotalnumberofcontentwordsinasentence,respectively.S(i)canbearbitrarilylongandstillcontainanan-swertoQ.Intheabovesimilaritymeasure,longeranswersentencesarepenalizedduetoalargernum-berofunalignedwords.Tocounterthisphenomenon,weaddameasureofcoverageofQbyS(i)totheoriginalfeaturesetofSultanetal.(2015):covA(Q,S(i))=nac(Q)nc(Q)4.2ASemanticVectorFeatureNeuralwordembeddings(Mikolovetal.,2013;Ba-ronietal.,2014;Pennington,2014)havebeenhighlysuccessfulasdistributionalwordrepresentationsintherecentpast.Weutilizethe400-dimensionalwordembeddingsdevelopedbyBaronietal.(2014)3toconstructsentence-levelembeddingsforQandS(i),whichwethencomparetocomputeasimilarityscore.ToconstructthevectorrepresentationVSofagivensentenceS,wefirstextractthecontentwordlemmasCS={C(1)S,…,C(M)S}inS.Thevectorsrepresentingtheselemmasarethenaddedtogeneratethesentencevector:VS=MXi=1VC(i)SFinally,asimilaritymeasureforQandS(i)isderivedbytakingthecosinesimilaritybetweentheirvectorrepresentations:simE(Q,S(i))=VQ·VS(i)|VQ||VS(i)|Thissimplebag-of-wordsmodelwasfoundtoaug-mentthealignment-basedfeaturewellintheevalua-tionsreportedbySultanetal.(2015).simA,covAandsimEconstituteourfinalfeatureset.AsweshowinSection6,thissmallfeaturesetoutperformsthecurrentstateoftheartinanswersentenceranking.5AnswerExtractionFeaturesAsmentionedinSection3.2,weconsideronlyNPchunksasanswercandidatesforextraction.Ourchunkfeaturescanbecategorizedintotwobroadgroups,whichwedescribeinthissection.Forthefollowingdiscussion,let(Q,S(i),c)beourquestion,answersentence,answerchunktriple.3http://clic.cimec.unitn.it/composes/semantic-vectors.html

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/

/
t

l

a
c
_
a
_
0
0
0
8
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

119

5.1Question-IndependentFeaturesThesefeaturesrepresentpropertiesofcindependentofthenatureofQ.Forexample,ourfirsttwofeaturesfireifallcontentwordsincarepresentinQoraligntowordsinQ.Suchchunksrarelycontainananswer,regardlessofthetypeofQ.Yaoetal.(2013a)reportanobservationthatan-swerchunksoftenappearclosetoalignedcontentwordsofspecifictypesinS(i).Tomodelthisphe-nomenon,weadopttheirfeaturesspecifyingthedis-tanceofcfromthenearestalignedcontentwordwainS(i)andthePOS/DEP/NERtagsofwa.Inaddi-tion,toencodethetotalamountoflocalevidencepresentforc,weemploytheproportionsofalignedcontentwordsinitsdependency(size=2)andsur-face(size=3)contextsinS(i).5.2FeaturesContainingtheQuestionTypeThesefeaturesareoftheform“question-type|x”,wherexcanbeanelementary(i.e.unit)orcompositefeature.Therationaleisthatcertainfeaturesarein-formativeprimarilyinthecontextofcertainquestiontypes(e.g.,alikelyanswertoawhenquestionisachunkcontainingtheNERtagDATE).HeadwordFeatures.WeextracttheheadwordofcanduseitsPOS/DEP/NERtagsasfeatures(ap-pendedtothequestiontype).AheadwordinthesubjectpositionofS(i)orwithPERSONasitsNERtag,forexample,isalikelyanswertoawhoquestion.QuestionFocus.Thequestionfocuswordrepre-sentstheentityaboutwhichthequestionisbeingasked.Forexample,in“Whatisthelargestcoun-tryintheworld?”,thefocuswordis“country”.Forquestiontypeslikewhatandwhich,propertiesofthequestionfocuslargelydeterminethenatureofthean-swer.Intheaboveexample,thefocuswordindicatesthatGPEisalikelyNERtagfortheanswer.Weextractthequestionfocususingarule-basedsystemoriginallydesignedforadifferentapplica-tion,undertheassumptionthataquestioncouldspanmultiplesentences.Therule-basedsystemislooselyinspiredbytheworkofLallyetal.(2012),fromwhichitdiffersradicallybecausethequestionsintheJeopardy!gameareexpressedasanswers.Thefocusextractorfirstdeterminesthequestionwordorwords,whichisthenusedinconjunctionwiththeparsetreetodecidewhetherthequestionworditselforsomeotherwordinthesentenceistheactualfocus.WepairtheheadwordPOS/DEP/NERtagswiththefocuswordanditsPOS/NERtags,andaddeachsuchpair(appendedtothequestiontype)toourfeatureset.Thereareninefeatureshere;examplesincludequestion-type|question-focus-word|headword-pos-tagandquestion-type|question-focus-ner-tag|headword-ner-tag.Wealsoemploythetrue/falselabelsofthefollow-ingpropositionsasfeatures(inconjunctionwiththequestiontype):(1)thequestionfocuswordisinc,(2)thequestionfocusPOStagisinthePOStagsofc,and(3)thequestionfocusNERtagisoftheformxorxDESC,andxisintheNERtagsofc,forsomex(e.g.,GPE).ChunkTags.Inmanycases,itisnotthehead-wordofcwhichistheanswer;forexample,inQ:“HowmanystatesarethereintheUS?”andc:“50states”,theheadwordofcis“states”.Toextendourunitofattentionfromtheheadwordtotheen-tirechunk,wefirstconstructvocabulariesofPOSandNERtags,VposandVner,fromtrainingdata.ForeachpossibletaginVpos,wethenusethepres-ence/absenceofthattaginthePOStagsequenceforcasafeature(inconjunctionwiththequestiontype).WerepeattheprocessforVner.Fortheabovec,forinstance,aninformativefeaturewhichislikelytofireis:“question-type=how-many|theNERtagsofcincludeCARDINAL”.PartialAlignment.Forsomequestiontypes,partofacorrectanswerchunkisoftenalignedtoaques-tionword(e.g.,Q:“Howmanyplayersareonthefieldduringasoccergame?”,c:“22players”).Toin-formourmodelofsuchoccurrences,weemploytwofeatures—true/falselabelsofthefollowingproposi-tions:(1)cispartiallyaligned,(2)cisnotalignedatall(eachinconjunctionwiththequestiontype).6Experiments6.1DataTheWangetal.(2007)corpusiscreatedfromTextREtrievalConference(TREC)8–13QAdata.Itcon-sistsofasetoffactoidquestions,andforeachques-tion,asetofcandidateanswersentences.Eachan-swercandidateisautomaticallydrawnfromalargerdocumentbasedontwoselectioncriteria:(1)anon-zerocontentwordoverlapwiththequestion,or(2)

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/

/
t

l

a
c
_
a
_
0
0
0
8
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

120

Dataset#Questions#QAPairs%PositiveTRAIN-ALL1,22953,41712.0TRAIN944,7187.4DEV821,14819.3TEST1001,51718.7Table2:SummaryoftheWangetal.(2007)corpus.amatchwiththegoldregexpanswerpatternforthequestion(trainingonly).TRAINpairsaredrawnfromTREC8–12;DEVandTESTpairsaredrawnfromTREC13.DetailsoftheTRAIN/DEV/TESTsplitaregiveninTable2.TRAIN-ALLisalargesetofautomaticallyjudged(thusnoisy)QApairs:asentenceisconsideredapositiveexampleifitmatchesthegoldanswerpatternforthecorrespondingquestion.TRAINisamuchsmallersubsetofTRAIN-ALL,containingpairsthataremanuallycorrectedforerrors.ManualjudgmentisproducedforDEVandTESTpairs,too.Foranswerextraction,Yaoetal.(2013a)addtoeachQApairthecorrectanswerchunk(s).ThegoldTRECpatternsareusedtofirstidentifyrele-vantchunksineachanswersentence.TRAIN,DEVandTESTarethenmanuallycorrectedforerrors.TheWangetal.(2007)datasetalsocomeswithPOS/DEP/NERtagsforeachsentence.TheyusetheMXPOSTtagger(Ratnaparkhi,1996)forPOStagging,theMSTParser(McDonaldetal.,2005)togeneratetypeddependencytrees,andtheBBNIden-tifinder(Bikeletal.,1999)forNERtagging.Al-thoughwehaveaccesstoastate-of-the-artinforma-tionpipelinethatproducesbettertags,thispaperaimstostudytheeffectoftheproposedmodelsandofourfeaturesonsystemperformance,ratherthanonad-ditionalvariables;therefore,tosupportcomparisonwithpriorwork,werelyonthetagsprovidedwiththedatasetforallourexperiments.6.2AnswerSentenceRankingWeadoptthestandardevaluationprocedureandmet-ricsforQArankersreportedintheliterature.6.2.1EvaluationMetricsOurmetricsforrankingareMeanAveragePre-cision(MAP)andMeanReciprocalRank(MRR).Herewedefinebothintermsofsimplermetrics.PrecisionatK.GivenaquestionQandasetofcandidateanswersentences{S(1),…,S(N)},lettheoutputofarankerbe[R(1),…,R(N)],sothateachR(i)∈{S(1),…,S(N)}andthepredictedrankofR(i)ishigherthanthepredictedrankofR(j)when-everi95%)areretainedafterthisexclusion.WeusethelogisticregressionimplementationofScikit-learn(Pedregosaetal.,2011)andusetheWangetal.(2007)DEVsettosetC,theregulariza-tionstrengthparameter.Thestandardtrecevalscriptisusedtogenerateallresults.6.2.3ResultsTable3showsperformancesofourrankingmodelsandrecentbaselinesystemsonTEST.OurQAsimi-larityfeatures(i.e.thestandaloneranker)outperformallbaselineswithbothTRAINandTRAIN-ALL,al-thoughtheadditionalnoisyexamplesinthelatterarenotfoundtoimproveresults.Moreimportantly,wegetimprovementsofsub-stantiallylargermagnitudesusingourjointmodels—morethan10MAPandMRRpointsoverthestate-of-the-artsystemofSeverynandMoschitti(2015)withTRAIN-ALLforthejointprobabilisticmodel.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/

/
t

l

a
c
_
a
_
0
0
0
8
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

121

ModelMAP%MRR%TRAINShnarch(2013)68.6075.40Yihetal.(2013)70.9277.00Yuetal.(2014)70.5878.00Severyn&Moschitti(2015)73.2979.62OurStandaloneModel76.0583.99OurJointProbabilisticModel81.5989.09OurStackedModel80.7786.85TRAIN-ALLYuetal.(2014)71.1378.46Severyn&Moschitti(2015)74.5980.78OurStandaloneModel75.6883.09OurJointProbabilisticModel84.9591.95OurStackedModel82.5690.69Table3:Answersentencerankingresults.Unlikethestandalonemodel,thejointmodelsalsobenefitfromtheadditionalnoisyexamplesinTRAIN-ALL.Theseresultssupportthecentralargumentofthispaperthatjointmodelingisabetterapproachtoanswersentenceranking.6.3AnswerExtractionWefollowtheprocedurereportedinpriorwork(Yaoetal.,2013a;SeverynandMoschitti,2013)toevalu-atetheanswerchunksextractedbythesystem.6.3.1EvaluationMetricsPrecision.Givenasetofquestions,theprecisionofananswerextractionsystemistheproportionofitsextractedanswersthatarecorrect(i.e.matchthecorrespondinggoldregexppattern).Recall.Recallistheproportionofquestionsforwhichthesystemextractedacorrectanswer.F1Score.TheF1scoreistheharmonicmeanofprecisionandrecall.Itcapturesthesystem’saccuracyandcoverageinasinglemetric.6.3.2SetupFollowingpriorwork,we(1)retainthe89ques-tionsintheWangetal.(2007)TESTsetthathaveatleastonecorrectanswer,and(2)trainonlywithchunksincorrectanswersentencestoavoidextremebiastowardsfalselabels(boththestandaloneextrac-tionmodelandstage2ofthestackedmodel).AsinModelP%R%F1%TRAINYaoetal.(2013a)55.253.954.5Severyn&Moschitti(2013)66.266.266.2OurStandaloneModel62.962.962.9OurJointProbabilisticModel69.769.769.7OurStackedModel62.962.962.9TRAIN-ALLYaoetal.(2013a)63.662.963.3Severyn&Moschitti(2013)70.870.870.8OurStandaloneModel70.870.870.8OurJointProbabilisticModel76.476.476.4OurStackedModel73.073.073.0Table4:AnswerextractionresultsontheWangetal.(2007)testset.ranking,weuseScikit-learnforlogisticregressionandsettheregularizationparameterCusingDEV.6.3.3ResultsTable4showsperformancesofourextractionmod-elsontheWangetal.TESTset.Thejointproba-bilisticmodeldemonstratestopperformanceforbothTRAINandTRAIN-ALL.WithTRAIN-ALL,itcor-rectlyanswers68ofthe89testquestions(5morethanthepreviousbestmodelofSeverynandMos-chitti(2013)).Thestackedmodelalsoperformswellwiththelargertrainingset.Again,theseresultssup-portthecentralclaimofthepaperthatanswerextrac-tioncanbemadebetterthroughjointmodeling.Table5showsperformancesofourstandaloneandjointprobabilisticmodels(trainedonTRAIN-ALL)ondifferentTESTquestiontypes.Thejointmodelisthebetterofthetwoacrosstypes,achievinggoodQuestionTypeCountSTJPwhat3751.456.8when19100.0100.0where11100.090.9who/whom1060.070.0why10.00.0howmany977.8100.0howlong250.0100.0Table5:F1%oftheSTandaloneandtheJointProbabilisticextractionmodelacrossquestiontypes.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/

/
t

l

a
c
_
a
_
0
0
0
8
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

122

QuestionCandidateAnswerSentenceSTJPHowmanyyearswasJackWelchwithGE?“SixSigmahasgalvanizedourcompanywithanintensitythelikesofwhichIhaveneverseeninmy40yearsatGE,”saidJohnWelch,chairmanofGeneralElectric..517.113SoferventaproselytizerisWelchthatGEhasspentthreeyearsandmorethan$1billiontoconvertallofitsdivisionstotheSixSigmafaith..714.090WhatkindofshipistheLibertyBell7?Newportplanstoretrievetherecoveryvesselfirst,thengoafterLibertyBell7,theonlyU.S.mannedspacecraftlostafterasuccessfulmission..838.278“Itwillbeabigrelief”oncethecapsuleisaboardship,CurtNewportsaidbeforesettingsailThursday..388.003Table6:ScorescomputedbytheSTandaloneandtheJointProbabilisticmodelforcandidatechunks(boldfaced)infour(Wangetal.,2007)testsentences.Jointmodelscoresfornon-answerchunks(rows2and4)aremuchlower.resultsonallquestiontypesexceptwhat.Aparticularlychallengingsubtypeofwhatques-tionsarewhatbequestions,answerstowhichoftengobeyondNPchunkboundaries.Ahuman-extractedanswertothequestion“WhatisMuslimBrother-hood’sgoal?”intheWangetal.corpus(2007),forexample,is“advocatesturningEgyptintoastrictMuslimstatebypoliticalmeans.”Whatingeneralisneverthelessthemostdifficultquestiontype,sinceunlikequestionslikewhoorwhen,answersdonothavestrictcategories(e.g.,afixedsetofNERtags).6.3.4QualitativeAnalysisWecloselyexamineQApairsforwhichthejointprobabilisticmodelextractsacorrectanswerchunkbutthestandalonemodeldoesnot.Table6showstwosuchquestions,withtwocandidateanswersentencesforeach.Candidateanswerchunksareboldfaced.Forthefirstquestion,onlythesentenceinrow1containsananswer.Thestandalonemodelassignsahigherscoretothenon-answerchunkinrow2,buttheuseofsentence-levelfeaturesenablesthejointmodeltoidentifythemorerelevantchunkinrow1.Notethatthejointmodelscore,beingaproductoftwoprobabilities,isalwayslowerthanthestandalonemodelscore.However,onlytherelativescoremattersinthiscase,asthechunkwiththehighestoverallscoreiseventuallyselectedforextraction.Forthesecondquestion,bothmodelscomputealowerscoreforthenon-answerchunk“CurtNewport”thantheanswerchunk“mannedspacecraft”.How-ever,theincorrectchunkappearsinseveralcandidateanswersentences(notshownhere),resultinginaModelP%R%F1%Yaoetal.(2013c)35.417.223.1OurJointProbabilisticModel83.883.883.8Table7:PerformancesoftwojointextractionmodelsontheYaoetal.(2013c)testset.highoverallscoreforthestandalonemodel(Algo-rithm1:steps7and8).Thejointmodelassignsamuchlowerscoretoeachinstanceofthischunkduetoweaksentence-levelevidence,eventuallyresultingintheextractionofthecorrectchunk.6.3.5ASecondExtractionDatasetYaoetal.(2013c)reportanextractiondatasetcon-taining99testquestions,derivedfromtheMIT109testcollection(LinandKatz,2006)ofTRECpairs.Eachquestioninthisdatasethas10candidatean-swersentences.Wecomparetheperformanceofourjointprobabilisticmodelwiththatoftheirextractionmodel,whichextractsanswersfromtopcandidatesentencesidentifiedbytheircoupledranker(Sec-tion2.3).4Modelsaretrainedontheirtrainingsetof2,205questionsand22,043candidateQApairs.AsshowninTable7,ourmodeloutperformstheYaoetal.modelbyasurprisinglylargemargin,correctlyanswering83ofthe99testquestions.Interestingly,ourstandalonemodelextractssixmorecorrectanswersinthisdatasetthanthejoint4Wecomparewithonlytheirextractionmodel,asthelargerrankingdatasetisnotavailableanymore.Precisionandrecallarereportedathttp://cs.jhu.edu/˜xuchen/packages/jacana-ir-acl2013-data-results.tar.bz2.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/

/
t

l

a
c
_
a
_
0
0
0
8
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

123

CandidateAnswerSentenceSTJPAnotherperkisgettingtoworkwithhisson,BarryVanDyke,whohasaregularroleasDetectiveSteveSloanon“Diagnosis”..861.338ThisisonlythethirdtimeinschoolhistorytheRaidershavebegunaseason6-0andthefirstsince1976,whenSteveSloan,inhissecondseasonascoach,ledthemtoan8-0startand10-2overallrecord..494.010HealsorepresentedseveralAlabamacoaches,includingRayPerkins,BillCurry,SteveSloanandWimpSanderson..334.007BartStarr,JoeNamath,KenStabler,SteveSloan,ScottHunterandWalterLewisarebutafewofthelegendsonthewalloftheCrimsonTidequarterbackscoach..334.009Table8:ScorescomputedbytheSTandaloneandtheJointProbabilisticmodelforNPchunks(boldfaced)infourYaoetal.(2013c)testsentencesforthequestion:Whoisthedetectiveon‘DiagnosisMurder’?Thestandalonemodelassignshighprobabilitiestonon-answerchunksinthelastthreesentences,subsequentlycorrectedbythejointmodel.model.Acloseexaminationrevealsthatinallsixcases,thisiscausedbythepresenceofcorrectan-swerchunksinnon-answersentences.Table8showsanexample,wherethecorrectanswerchunk“SteveSloan”appearsinallfourcandidatesentences,ofwhichonlythefirstisactuallyrelevanttotheques-tion.Thestandalonemodelassignshighscorestoallfourinstancesandasaresultobservesahighoverallscoreforthechunk.Thejointmodel,ontheotherhand,recognizesthefalsepositives,andcon-sequentlyobservesasmalleroverallscoreforthechunk.However,thisdesiredbehavioreventuallyresultsinawrongextraction.Theseresultshavekeyimplicationsfortheevaluationofanswerextractionsystems:metricsthatassessperformanceonindivid-ualQApairscanenablefiner-grainedevaluationthanwhatend-to-endextractionmetricsoffer.7DiscussionOurtwo-stepapproachtojointmodeling,consist-ingofconstructingseparatemodelsforrankingandextractionfirstandthencouplingtheirpredictions,offersatleasttwoadvantages.First,predictionsfromanygivenpairofrankingandextractionsystemscanbecombined,sincesuchsystemsmustcomputeascoreforaQApairorananswerchunkinordertodifferentiateamongcandidates.CouplingoftherankingandextractionsystemsofYaoetal.(2013a)andSeverynandMoschitti(2013),forexample,isstraightforwardwithinourframework.Second,thisapproachsupportstheuseoftask-appropriatetrainingdataforrankingandextraction,whichcanprovidekeyadvantage.Forexample,whileanswersentencerankingsystemsusebothcorrectandincorrectcan-didateanswersentencesformodeltraining,existinganswerextractionsystemsdiscardthelatterinordertomaintaina(relatively)balancedclassdistribution(Yaoetal.,2013a;SeverynandMoschitti,2013).Throughtheseparationoftherankingandextrac-tionmodelsduringtraining,ourapproachnaturallysupportssuchtask-specificsamplingoftrainingdata.ApotentiallylimitingfactorinourextractionmodelistheassumptionthatanswersarealwaysexpressedneatlyinNPchunks.Whilemodelsthatmakenosuchassumptionexist(e.g.,theCRFmodelofYaoetal.(2013a)),extractionoflonganswers(suchastheonediscussedinSection6.3.3)isstilldifficultinpracticeduetotheirunconstrainednature.8ConclusionsandFutureWorkWepresentajointmodelfortheimportantQAtasksofanswersentencerankingandanswerextraction.Byexploitingtheinterconnectednatureofthetwotasks,ourmodeldemonstratessubstantialperfor-manceimprovementsoverpreviousbestsystemsforboth.Additionally,ourrankingmodelappliesrecentadvancesinthecomputationofshorttextsimilaritytoQA,providingstrongersimilarityfeatures.Anobviousdirectionforfutureworkistheinclu-sionofnewfeaturesforeachtask.Answersentenceranking,forexample,canbenefitfromphrasalalign-mentandlong-distancecontextrepresentation.An-swerextractionforwhatquestionscanbemadebetterusingalexicalanswertypefeature,orworldknowl-

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/

/
t

l

a
c
_
a
_
0
0
0
8
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

124

edge(suchas“blueisacolor”)derivedfromsemanticnetworkslikeWordNet.Ourmodelalsofacilitatesstraightforwardintegrationoffeatures/predictionsfromotherexistingsystemsforbothtasks,forex-ample,theconvolutionalneuralsentencemodelofSeverynandMoschitti(2015)forranking.Finally,moresophisticatedtechniquesarerequiredforextrac-tionofthefinalanswerchunkbasedonindividualchunkscoresacrossQApairs.AcknowledgmentsWethankthereviewersfortheirvaluablecommentsandsuggestions.WealsothankXuchenYaoandAliakseiSeverynforclarificationoftheirwork.ReferencesEnekoAgirre,DanielCer,MonaDiab,andAitorGonzalez-Agirre.2012.SemEval-2012task6:APilotonSemanticTextualSimilarity.InProceedingsoftheSixthInternationalWorkshoponSemanticEvaluation,pages385–393,Montreal,Canada.EnekoAgirre,DanielCer,MonaDiab,AitorGonzalez-Agirre,andWeiweiGuo.2013.*SEM2013SharedTask:SemanticTextualSimilarity.InProceedingsoftheSecondJointConferenceonLexicalandComputa-tionalSemantics,pages32–43,Atlanta,Georgia,USA.EnekoAgirre,CarmenBanea,ClaireCardie,DanielCer,MonaDiab,AitorGonzalez-Agirre,WeiweiGuo,RadaMihalcea,GermanRigau,andJanyceWiebe.2014.SemEval-2014Task10:MultilingualSemanticTex-tualSimilarity.InProceedingsofthe8thInterna-tionalWorkshoponSemanticEvaluation,pages81–91,Dublin,Ireland.EnekoAgirre,CarmenBanea,ClaireCardie,DanielCer,MonaDiab,AitorGonzalez-Agirre,WeiweiGuo,I˜nigoLopez-Gazpio,MontseMaritxalar,RadaMihalcea,Ger-manRigau,LarraitzUria,andJanyceWiebe.2015.SemEval-2015Task2:SemanticTextualSimilarity,English,SpanishandPilotonInterpretability.InPro-ceedingsofthe9thInternationalWorkshoponSemanticEvaluation,pages252–263,Denver,Colorado,USA.MarcoBaroni,GeorgianaDinu,andGerm´anKruszewski.2014.Don’tCount,Predict!ASystematicComparisonofContext-Countingvs.Context-PredictingSemanticVectors.InProceedingsofthe52ndAnnualMeetingoftheAssociationforComputationalLinguistics,pages238–247,Baltimore,Maryland,USA.DanielM.Bikel,RichardSchwartz,andRalphM.Weischedel.1999.AnAlgorithmthatLearnswhat’sinaName.MachineLearning,34(1-3):211–231.ChrisBrockett.2007.AligningtheRTE2006Cor-pus.TechnicalReportMSR-TR-2007-77,MicrosoftResearch.DavidFerrucci,EricBrown,JenniferChu-Carroll,JamesFan,DavidGondek,AdityaA.Kalyanpur,AdamLally,J.WilliamMurdock,EricNyberg,JohnPrager,NicoSchlaefer,andChrisWelty.2010.BuildingWatson:AnOverviewoftheDeepQAProject.AIMagazine,31(3),pages59–79.JuriGanitkevitch,BenjaminVanDurme,andChrisCallison-Burch.2013.PPDB:TheParaphraseDatabase.InProceedingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforCompu-tationalLinguistics,pages758–764,Atlanta,Georgia,USA.MichaelHeilmanandNoahA.Smith2010.TreeEditModelsforRecognizingTextualEntailments,Para-phrases,andAnswerstoQuestions.InProceedingsofthe2010ConferenceoftheNorthAmericanChap-teroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies,pages1011–1019,LosAngeles,California,USA.AdamLally,JohnM.Prager,MichaelC.McCord,Bran-imirK.Boguraev,SiddharthPatwardhan,JamesFan,PaulFodor,andJenniferChu-Carroll.2012.QuestionAnalysis:HowWatsonReadsaClue.IBMJournalofResearchandDevelopment,56(3.4):2:1–2:14.JimmyLinandBorisKatz.2006.BuildingaReusableTestCollectionforQuestionAnswering.JournaloftheAmericanSocietyforInformationScienceandTechnol-ogy,57(7):851–861.RyanMcDonald,KobyCrammer,andFernadoPereira.2005.OnlineLarge-MarginTrainingofDependencyParsers.InProceedingsofthe43stAnnualMeetingoftheAssociationforComputationalLinguistics,AnnArbor,Michigan,USA.TomasMikolov,KaiChen,GregCorrado,andJeffreyDean.2013.EfficientEstimationofWordRepresenta-tionsinVectorSpace.InProceedingsoftheInterna-tionalConferenceonLearningRepresentationsWork-shop,Scottsdale,Arizona,USA.FabianPedregosa,Ga¨elVaroquaux,AlexandreGramfort,VincentMichel,BertrandThirion,OlivierGrisel,Math-ieuBlondel,PeterPrettenhofer,RonWeiss,VincentDubourg,JakeVanderplas,AlexandrePassos,DavidCournapeau,MatthieuBrucher,MatthieuPerrot,and´EdouardDuchesnay.2011.Scikit-learn:MachineLearninginPython.JournalofMachineLearningRe-search,vol.12,pages2825–2830.JeffreyPennington,RichardSocher,andChristopherD.Manning.2014.GloVe:GlobalVectorsforWordRepresentation.InProceedingsofthe2014ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages1532–1543,Doha,Qatar.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/

/
t

l

a
c
_
a
_
0
0
0
8
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

125

AdwaitRatnaparkhi.1996.AMaximumEntropyModelforPart-of-SpeechTagging.InProceedingsofthe1996ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages133–142,Philadelphia,Pennsylva-nia,USA.AliakseiSeverynandAlessandroMoschitti.2013.Auto-maticFeatureEngineeringforAnswerSelectionandExtraction.InProceedingsofthe2013ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages458–467,Seattle,Washington,USA.AliakseiSeverynandAlessandroMoschitti.2015.Learn-ingtoRankShortTextPairswithConvolutionalDeepNeuralNetworks.InProceedingsofthe38thInterna-tionalACMSIGIRConferenceonResearchandDe-velopmentinInformationRetrieval,pages373–382,Santiago,Chile.EyalShnarch.2013.ProbabilisticModelsforLexicalInference.PhDThesis,BarIlanUniversity.MdArafatSultan,StevenBethard,andTamaraSumner.2014.BacktoBasicsforMonolingualAlignment:ExploitingWordSimilarityandContextualEvidence.TransactionsoftheAssociationforComputationalLin-guistics,2,pages219–230.MdArafatSultan,StevenBethard,andTamaraSum-ner.2015.DLS@CU:SentenceSimilarityfromWordAlignmentandSemanticVectorComposition.InPro-ceedingsofthe9thInternationalWorkshoponSemanticEvaluation,pages148–153,Denver,Colorado,USA.MengqiuWang,NoahA.Smith,andTerukoMitamura.2007.WhatistheJeopardyModel?AQuasi-SynchronousGrammarforQA.InProceedingsofthe2007JointConferenceonEmpiricalMethodsinNaturalLanguageProcessingandComputationalNat-uralLanguageLearning,pages22–32,Prague,CzechRepublic.MengqiuWang,andChristopherD.Manning.2010.Prob-abilisticTree-EditModelswithStructuredLatentVari-ablesforTextualEntailmentandQuestionAnswering.InProceedingsofthe23rdInternationalConferenceonComputationalLinguistics,pages1164–1172,Beijing,China.XuchenYao,BenjaminVanDurme,ChrisCallison-Burch,andPeterClark.2013a.AnswerExtractionasSe-quenceTaggingwithTreeEditDistance.InProceed-ingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguis-tics:HumanLanguageTechnologies,pages858–867,Atlanta,Georgia,USA.XuchenYao,BenjaminVanDurme,ChrisCallison-Burch,andPeterClark.2013b.Semi-MarkovPhrase-BasedMonolingualAlignment.InProceedingsofthe2013ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages590–600,Seattle,Washington,USA.XuchenYao,BenjaminVanDurme,andPeterClark.2013c.AutomaticCouplingofAnswerExtractionandInformationRetrieval.InProceedingsofthe51stAn-nualMeetingoftheAssociationforComputationalLin-guistics,pages159–165,Sofia,Bulgaria.Wen-tauYih,Ming-WeiChang,ChristopherMeek,andAndrzejPastusiak.2013.QuestionAnsweringusingEnhancedLexicalSemanticModels.InProceedingsofthe51stAnnualMeetingoftheAssociationforCompu-tationalLinguistics,pages1744–1753,Sofia,Bulgaria.LeiYu,KarlMoritzHermann,PhilBlunsom,andStephenPulman.2014.DeepLearningforAnswerSentenceSe-lection.InProceedingsoftheDeepLearningandRep-resentationLearningWorkshop,NIPS2014,Montr´eal,Canada.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
8
7
1
5
6
7
3
6
2

/

/
t

l

a
c
_
a
_
0
0
0
8
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

126
Download pdf