Transacciones de la Asociación de Lingüística Computacional, volumen. 6, páginas. 197–210, 2018. Editor de acciones: Hinrich Schütze.
Lote de envío: 6/2017; Lote de revisión: 9/2017; Publicado 4/2018.
2018 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia.
C
(cid:13)
KnowledgeCompletionforGenericsusingGuidedTensorFactorizationHanieSedghi∗GoogleBrainMountainView,California,U.S.A.hsedghi@google.comAshishSabharwalAllenInstituteforArtificialIntelligence(AI2)seattle,Washington,U.S.A.AshishS@allenai.orgAbstractGivenaknowledgebaseorKBcontaining(noisy)factsaboutcommonnounsorgener-ics,suchas“alltreesproduceoxygen”or“someanimalsliveinforests”,weconsidertheproblemofinferringadditionalsuchfactsataprecisionsimilartothatofthestartingKB.SuchKBscapturegeneralknowledgeabouttheworld,andarecrucialforvariousappli-cationssuchasquestionanswering.Differ-entfromcommonlystudiednamedentityKBssuchasFreebase,genericsKBsinvolvequan-tification,havemorecomplexunderlyingreg-ularities,tendtobemoreincomplete,andvio-latethecommonlyusedlocallyclosedworldassumption(LCWA).WeshowthatexistingKBcompletionmethodsstrugglewiththisnewtask,andpresentthefirstapproachthatissuccessful.Ourresultsdemonstratethatex-ternalinformation,suchasrelationschemasandentitytaxonomies,ifusedappropriately,canbeasurprisinglypowerfultoolinthisset-ting.First,oursimpleyeteffectiveknowledgeguidedtensorfactorizationapproachachievesstate-of-the-artresultsontwogenericsKBs(80%precise)forscience,doublingtheirsizeat74%-86%precision.Second,ournoveltax-onomyguided,submodular,activelearningmethodforcollectingannotationsaboutrareentities(e.g.,oriole,abird)is6xmoreeffec-tiveatinferringfurthernewfactsaboutthemthanmultipleactivelearningbaselines.1IntroductionWeconsidertheproblemofcompletingapartialknowledgebase(KB)containingfactsaboutgener-∗ThisworkwasdonewhiletheauthorwasaffiliatedwiththeAllenInstituteforArtificialIntelligence.icsorcommonnouns,representedasathird-ordertensorof(source,relation,objetivo)triples,suchas(butterfly,pollinate,flower)y(thermometer,mea-sure,temperatura).Suchfactscapturecommonknowledgethathumanshaveabouttheworld.Theyarearguablyessentialforintelligentagentswithhuman-likeconversationalabilitiesaswellasforspecificapplicationssuchasquestionanswering.Wedemonstratethatstate-of-the-artKBcompletionmethodsperformpoorlywhenfacedwithgener-ics,whileourstrategiesforincorporatingexternalknowledgeaswellasobtainingadditionalannota-tionsforrareentitiesprovidethefirstsuccessfulso-lutiontothischallengingnewtask.Sincegenericsrepresentclassesofsimilarindi-viduals,thetruthvalueyiofagenericstriplexi=(s,r,t)dependsonthequantificationsemanticsoneassociateswithsandt.Indeed,thesemanticsofgenericsstatementscanbeambiguous,evenself-contradictory,duetoculturalnorms.AsLeslie(2008)pointsout,‘duckslayeggs’isgenerallycon-sideredtruewhile‘ducksarefemale’,whichistrueforabroadersetofducksthantheformerstatement,isgenerallyconsideredfalse.Toavoiddeepphilosophicalissues,wefixapar-ticularmathematicalsemanticsthatisespeciallyrel-evantfornoisyfactsderivedautomaticallyfromtext:associateswithacategoricalquantificationfrom{todo,alguno,ninguno}andassociatet(implicitly)withsome.Forinstance,“allbutterfliespollinate(alguno)flower”and“someanimalslivein(alguno)forest”.Whenpresentingsuchtriplestohumans,theyarephrasedas:isittruethatallbutterfliespollinatesomeflower?Asanotationalshortcut,wetreatthequantificationofsasthecategoricallabelyiforthetriplexi.Forexample,(butterfly,pollinate,flower)
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
1
5
1
5
6
7
6
1
6
/
/
t
yo
a
C
_
a
_
0
0
0
1
5
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
198
islabeledallwhile(animal,livein,forest)islabeledsome.GivenanoisyKBofsuchlabeledtriples,thetaskistoinfermoretriples.Tensorfactorizationandgraphbasedmethodshavebothbeenfoundtobeveryeffectiveforex-pandingknowledgebases,buthavefocusedonnamedentityKBssuchasFreebase(Bollackeretal.,2008)involvingrelationswithclearsemanticssuchasliveInandisACityIn,anddisambiguatedentitiessuchasBarackObamaorHawaii.CompletingKBsthatinvolvefactsaboutgenerics,sin embargo,bringsupnewchallenges,asevidencedbyourempiricalre-sultswhenusingexistingmethods.IthasbeenobservedthatHornclausesoftenreli-ablyconnectpredicatesinthenamed-entitysetting.Forinstance,foranypersonx,cityy,andcountryz,(X,liveIn,y)&(y,isACityIn,z)⇒(X,liveIn,z).Withgenerics,sin embargo,clearpatternsorreliablefirst-orderlogicrulesarerare,inpartduetoeachgenericrepresentingacollectionofindividualsthatoftenhavesimilaritieswithrespecttosomerelationsanddifferenceswithrespecttoothers.Forinstance,(X,liveIn,mountain)istrueformanycatsandcari-bou,butthereislittletangiblesimilaritybetweenthetwoanimalsanditisunclearwhat,ifanything,canbecarriedoverfromonetotheother.Ontheotherhand,ifwetaketwoanimalsthatsharea‘parent’insometaxonomy(e.g.,reindeeranddeer),thenthelikelihoodofknowledgetransferincreases.Weproposetomakeuseofadditionalrichback-groundknowledgecomplementingtheinformationpresentintheKBitself,suchasataxonomichi-erarchyofentities(availablefromsourcessuchasWordNet(Molinero,1995))andthecorrespondingen-titytypesandrelationschema.Ourkeyinsightisthat,ifusedappropriately,taxonomicandschemainformationcanbesurprisinglyeffectiveinmakingtensorfactorizationmethodsvastlymoreeffectiveforgenericsforderivinghighprecisionfacts.Intuitively,forgenerics,manypropertiesofinter-estarethemselvesgeneric(e.g.,livinginforests,asopposedtolivinginaspecificforest)andtendtobesharedbysiblingsinataxonomy(e.g.,finch,oriole,andhummingbird).Incontrast,siblingsofnamedentities(e.g.,variouspeople)oftendiffersubstan-tiallyinthepropertieswetypicallycareaboutandmodel(e.g.,whotheyaremarriedto,wheretheylive,etc.).MethodsthatusetypeinformationarethusmorepromisingforgenericsthanforclassicalNLPtasksinvolvingnamedentities.Weproposethreewaysofusingthisinformationandempiricallydemonstratetheeffectivenessofeachontwovari-antsofaKBofelementarylevelsciencefacts(Dalvietal.,2017).1Primero,weobservethatsimplyimposingschemaconsistency(Section3.1)onderivedfactscansig-nificantlybooststate-of-the-artmethodssuchasHolographicEmbeddings(HolE)(Nickeletal.,2016b)fromnearlynonewfactsat80%precisiontoover10,000newfacts,startingwithagenericsKBofasimilarsize.Otherembeddingmethods,suchasTransE(Bordesetal.,2013),RESCAL(Nickeletal.,2011),andSICTF(Nimishakavietal.,2016)(whichusesschemainformationaswell),alsopro-ducednonewfactsat80%precision.Graph-basedcompletionmethodsdidnotscaletoourdenselyconnectedtensors.2Second,onecanfurtherboostperformancebytransferringknowledgeupanddownthetaxonomichierarchy,usingthequantificationsemanticsofgenerics(Section3.2).Weshowthatexpandingthestartingtensorthiswaybeforeapplyingtensorfac-torizationiscomplementaryandresultsinastatis-ticallysignificantlyhigherprecision(86.4%asop-posedto82%)overnewfactsatthesameyield.Finally,weproposeanovellimited-budgettax-onomyguidedactivelearningmethodtoaddressthechallengeofsignificantincompletenessingener-icsKBs,byquantifyinguncertaintyviasiblings(Section4).Dalvietal.(2017)haveobservedthat,whenusinginformationextractionmethods,itismuchhardertoderivereliablefactsaboutgenericsthanaboutnamedentities.ThismakesgenericsKBsvastlyincomplete,withnoorverylittleinformationaboutcertainentitiessuchascaribouororiole.1WeareunawareofotherlargegenericsKBs.OurmethoddoesnotemployrulesorchoicesspecifictothisdatasetandisexpectedtogeneralizetoothergenericsKBs,asandwhentheybecomeavailable.2OnthesmallerAnimalstensor(tobedescribedlater),PRA(Laoetal.,2011)generatedveryfewhigh-precisionfactsafter30hours.SFE(GardnerandMitchell,2015)wasunabletofinishtrainingaclassifierforanyrelationafteraday,inpartduetothehighconnectivityofgenericslikeanimal.Ontheotherhand,HolEistrainedinacoupleofminutesevenonthelargerSciencetensor,andcanbemadeevenfasterusingthemethodofHayashiandShimbo(2017).
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
1
5
1
5
6
7
6
1
6
/
/
t
yo
a
C
_
a
_
0
0
0
1
5
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
199
Ouractivelearningapproachaddressesthefol-lowingquestion:Givenanewentity3˜eandabudgetB,whatisagoodsetQofBqueriesabout˜etoannotate(viahumans)suchthatexpandingtheorig-inaltensorwithQhelpsaKBcompletionmethodinfermanymorehighprecisionfactsabout˜e?Weproposetodefineacorrelationbasedmeasureoftheuncertaintyofeachunannotatedtriple(i.e.,apotentialquery)involving˜e,basedonhowfre-quentlythecorrespondingtripleistruefor˜e’ssib-lingsinthetaxonomichierarchy(Section4.1).Wethendevelopasubmodularobjectivefunction,andacorrespondinggreedy(1−1/e)-approximation,tosearchforasmallsubsetoftriplestoannotatethatoptimallybalancesdiversitywithcoverage(Sec-tion4.2).Wedemonstratethatannotatingthisbal-ancedsubsetmakestensorfactorizationderivesub-stantiallymorenewandinterestingfactscomparedtoseveralactivelearningbaselines.Forexample,withabudgettoannotate100queriesaboutanewentityoriole,randomqueriesleadtononewtruefactsatall(viaannotationfollowedbytensorfac-torization),imposingschemaconsistencyresultsin83newfacts,andourproposedmethodendsupwith483newfacts.Thisdemonstratesthatwell-designedintelligentqueriescanbesubstantiallymoreeffec-tiveingatheringfactsaboutthenewentity.Insummary,thisworktackles,forthefirsttime,thechallengingtaskofknowledgecomple-tionforgenerics,byimposingconsistencywithex-ternalknowledge.Ourefficientsibling-guidedac-tivelearningapproachaddressesthepaucityoffactsaboutcertainentities,successfullyinferringasub-stantialnumberofnewfactsaboutthem.1.1RelatedWorkKBcompletionapproachesfallintotwomainclasses:graph-basedmethodsandthoseemploy-inglow-dimensionalembeddingsviamatrixorten-sorfactorization.TheformerusesgraphtraversaltechniquestocompletetheKB,bylearningwhichtypesofpathsortransitionsareindicativeofwhichrelationbetweenthestartandendpoints(Laoetal.,2011;GardnerandMitchell,2015).Thisclassofsolutions,unfortunately,doesnotscalewellto3Unlessotherwisestated,wewillhenceforthuseentitytorefertoasingularcommonnounthatrepresentsaclassorgroupofindividuals,suchasanimal,hummingbird,forest,etc.oursetting(cf.Footnote2).Thisappearsdue,atleastinpart,todifferentconnectivitycharacteris-ticsofgenericstensorscomparedtonamedentityonessuchasFB15k(Bordesetal.,2013).Ad-vancesinthelattersetofmethodshaveledtoseveralembedding-basedmethodsthatarehighlysuccessfulatKBcompletionfornamedentities(Nickeletal.,2011;Riedeletal.,2013;Dongetal.,2014;Trouil-lonetal.,2016;Nickeletal.,2016a).Wecompareagainstmanyofthese,includingvariantsofHolE,TransE,andRESCAL.Recentworkonincorporatingentitytypeandre-lationschemaintensorfactorization(Krompaßetal.,2014;Krompaßetal.,2015;Xieetal.,2016b)hasfocusedonfactualdatabasesaboutnamedenti-ties,cual,asdiscussedearlier,haveverydifferentcharacteristicsthangenericstensors.Nimishakavietal.(2016)useentitytypeinformationasama-trixinthecontextofnon-negativeRESCALforschemainductiononmedicalresearchdocuments.Asabyproduct,theycompletemissingentriesinthetensorinaschema-compatiblemanner.Weshowthatourproposalperformsbetterongenericstensorsthantheirmethod,SICTF.SICTF,inturn,ismeanttobeanimprovementovertheTRESCALsystemofChangetal.(2014),whichalsoincorporatestypesinRESCALinasimilarmanner.Recently,Sch¨utzeetal.(2017)proposedaneuralmodelforfine-grainedentitytypingandforrobustlyusingtypeinformationtoimproverelationextraction,butthisistargetedforFreebasestylenamedentities.Forschema-awarediscriminativetrainingofem-beddings,Xieetal.(2016b)useaflexibleratioofnegativesamplesfrombothschemaconsistentandschemainconsistenttriples.Theircombinedideas,sin embargo,donotimproveuponvanillaHolE(oneofourbaselines)onthestandardFB15k(Bordesetal.,2013)dataset.Theyalsoconsiderimposinghierar-chicaltypesforFreebase,asentitiesmayhavedif-ferentmeaningswhentheyhavedifferenttypes—anissuethattypicallydoesnotapplytogenericsKBs.KomninosandManandhar(2017)usetypeinformationalongwithadditionaltextualevidenceforknowledgebasecompletionontheFB15k237dataset.Theylearnembeddingsfortypes,alongwithentitiesandrelations,andshowthatthiswayofincorporatingtypeinformationhasa(pequeño)con-tributiontowardsimprovingperformance.Incorpo-
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
1
5
1
5
6
7
6
1
6
/
/
t
yo
a
C
_
a
_
0
0
0
1
5
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
200
ratinggivenfirstorderlogicruleshasbeenexploredforthesimplercaseofmatrixfactorization(Rock-tascheletal.,2015;Demeesteretal.,2016).Exist-ingfirstorderlogicruleextractionmethods,how-ever,struggletofindmeaningfulrulesforgenerics,makingthisapproachnotyetviableinoursetting.Xieetal.(2016a)considerinferringfactsaboutanewentity˜egivena‘description’ofthatentity.TheyuseConvolutionalNeuralNetworks(CNNs)toencodethedescription,derivinganembeddingfor˜e.Suchadescriptioninourcontextwouldcorrespondtoknowingsomefactualtriplesabout˜e,whichisarestrictedversionofouractivelearningsetting.KrishnamurthyandSingh(2013)consideractivelearningforaparticularkindoftensordecomposi-tion,namelyCPorCandecomp/Parafacdecomposi-tionintoalowdimensionalspace.Theystartwithanemptytensorandlookforthemostinformativeslicesandcolumnstofillcompletelytoachieveoptimalsamplecomplexity.Theirframeworkbuildsupontheincoherenceassumptiononthecolumnspace,whichdoesnotapplytogenericsKB.HegdeandTalukdar(2015)useanentity-centricinformationextraction(IE)approachforobtainingnewfactsaboutentitiesofinterest.Narasimhanetal.(2016)useareinforcementlearningapproachtoissuesearchqueriestoacquireadditionalevidenceforacandidatefact.Bothoftheseworks,andothersalongsimilarlines,areadvancedIEtechniquesthatoperateviaasearchfornewdocumentsandextrac-tionoffactsfromthem.ThisisdifferentfromtheKBcompletiontask,wheretheonlysourceofinfor-mationisthestartingKBandpossiblysomedetailsabouttheinvolvedentitiesandrelations.2TensorsofGenericsWeconsiderknowledgeexpressedintermsof(source,relation,objetivo)triples,abbreviatedas(s,r,t).Suchatriplemayreferto(sujeto,pred-icate,object)stylefactscommonlyusedininfor-mationextraction.Eachsourceandtargetisanen-titythatisagenericnoun,e.g.,animals,habitats,orfooditems.ExamplesofrelationsincludefoundIn,eat,etc.Asmentionedearlier,witheachgenericstriple(s,r,t),weassociateacategoricaltruthvalueq∈{todo,alguno,ninguno},definingthequantificationsemantics“qsr(alguno)t”.Forinstance,“somean-imalslivein(alguno)forest”and“alldogseat(alguno)bone”.GivenasetKofsuchtripleswithannotatedtruthvalues,thetaskistopredictadditionaltriplesK0thatarealsolikelytobetrue.Inadditiontoalistoftriples,weassumeaccesstobackgroundinformationintheformofentitytypesandthecorrespondingrelationschema,aswellasataxonomichierarchy.4LetETdenotethesetofpos-sibleentitytypes.Foreachrelationr,therelationschemaimposesatypeconstraintontheentitiesthatmayappearasitssourceortarget.Specifically,us-ing[']todenotetheset{1,2,…,'},theschemaforrisacollectionSr={(D(i)r,R(i)r)⊆ET×ET|i∈[']}ofdomain-rangepairswiththefollowingproperty:thetruthvalueof(s,r,t)isnonewhen-everforeveryi∈[']itisthecasethats/∈D(i)rort/∈R(i)r.Forexample,therelationfoundInmaybeassociatedwiththeschemaSfoundIn={(animal,lo-cation),(insect,animal),(plant,habitat),…}.Sim-ilarly,thetaxonomichierarchydefinesapartialor-derHoverallentitiesthatcapturesthe“isa”rela-tion,withdirectlinkssuchasisa(dog,mammal)orisa(gerbil,rodent).Weusethisinformationtoex-tract“siblings”ofagivenentity,i.e.,entitiesthatshareacommonparent(thismaybeeasilygeneral-izedtoanycommonancestor).3GuidedKnowledgeCompletionWebeginwithanoverviewoftensorfactorizationforKBcompletionforgenerics.Let(s,r,t)beagenericstripleassociatedwithacategoricalquantifi-cationlabelq∈{todo,alguno,ninguno}.Forexample,((cat,havePart,whiskers),todo),((cat,liveIn,homes),alguno),y((cat,eat,bear),ninguno).Predictingsuchlabelsisthusamulti-classclassificationproblem.GivenasetKoflabeledtriples,thegoaloftensorfactorizationistolearnalow-dimensionalembed-dinghforeachentityandrelationsuchthatsomefunctionfofhbestcapturesthegivenlabels.Givenanewtriple,wecanthenusefandthelearnedhtopredicttheprobabilityofeachlabelforit.Kof-tencontainsonly“positive”triples,i.e.,thosewithlabelallorsome.Acommonstepindiscriminativetrainingforhisthusnegativesampling,i.e.,gen-eratingadditionaltriplesthat(areexpectedto)have4Wedonotassumethattheschemaortaxonomyisperfect,andinsteadrelyontheseonlyforheuristicguidance.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
1
5
1
5
6
7
6
1
6
/
/
t
yo
a
C
_
a
_
0
0
0
1
5
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
201
labelnone.With[metro]denotingtheset{1,2,…,metro}asbe-fore,letK={(xi,yi),i∈[metro]}beasetoftriplesxi=(si,ri,de)andcorrespondinglabelsyi∈{1,2,3}equivalenttocategoricalquantifica-tionlabelqi∈{todo,alguno,ninguno}.WelearnentityandrelationembeddingsΘthatminimizethemulti-nomiallogisticlossdefinedas:minΘmXi=13Xk=1−1{yi=k}logPr(yi=k|xi,Θ)=minΘmXi=13Xk=1−1{yi=k}logσ(yif(hr,hs,ht))(1)wherehr,hs,ht∈Rddenotethelearnedembed-dings(latentvectors)fors,r,t,respectivamente,andσ(·)isthesigmoidfunctiondefinedasσ(z)=11+exp(−z).Iftheallcategoricallabelforgenericsisun-available,5wecansimplifythelabelspaceto{alguno,ninguno},modeledasyi∈{±1},andreducethemodeltobinaryclassification:minΘmXi=1log[1+exp.[−yif(hr,hs,ht)]].(2)Weremarkthatwhilethisgenericstaskwithonlytwolabelsappearssuperficiallysimilartothestan-dardKBcompletiontaskfornamedentities,theun-derlyingchallengesandsolutionsaredifferent.Forinstance,theapproachofusingtaxonomicinforma-tion(asopposedtojustentitytypes)asaguideisuniquelysuitedtogenericsKBs;thereasonbeingthatagenericentityreferstoasetofindividuals,withanaturalsubset/supersetrelationformingatax-onomy,whereasinstandardKBsanentityreferstoonespecificindividual.ThispreventstaxonomybasedrulesfromprovidingusefulinformationforstandardKBs,whileourresultsdemonstratetheirhighvaluewhenreasoningwithgenerics.Differ-enceslikethisleadtodifferencesinwhatissuccess-fulineachsettingandwhatisnot.5ThishappenstobethecaseforcurrentgenericsKBs,butisexpectedtochangewithincreasinginterestintheresearchcommunity.AstepinthisdirectionisarecentversionoftheAristoTupleKB,http://allenai.org/data/aristo-tuple-kb,whichincludesmostasaquantificationlabel,inadditiontosome.Whileallourproposedschemesareembeddingoblivious,forconcreteness,wedescribeandeval-uatethemfortheHolographicEmbeddingorHolE(Nickeletal.,2016b)whichmodelsthelabelprobabilityas:F(hr,hs,ht)=h>r(hs◦ht)(3)where◦:Rd×Rd→Rddenotescircularcorrela-tiondefinedas:[a◦b]k=d−1Xi=0aib(i+k)modd.(4)Intuitivamente,thek-thdimensionofcircularcorrelationcaptureshowrelatedaistobwhenthedimensionsofthelatterareshifted(circularly,viathemodop-eration)byk.Inparticular[a◦b]0issimplythedotproductofaandb.AscanbededucedfromEqns.(3)-(4),thismodelresemblescircularconvo-lution,butcancapture,tosomeextent,relationsthatareasymmetricamongthesourceandtargetentities.Thisisbecause[a◦b]isnotthesameas[b◦a]butisrather“flipped”([a◦b]k=[b◦a]d−k).Ifweconsiderthed×dmatrixMabofelement-wiserelationshipsbetweenaandb,theHolEembeddingofarelationrbetweenaandbdefinesaweightedsumofcircularanti-diagonalsofMab.CircularcorrelationcanbecomputedusingthefastFouriertransform(FFT),makingHolEquiteefficientinpractice.HayashiandShimbo(2017)recentlyshowedthatHolEandcomplexembed-dings(Trouillonetal.,2016),whichisanotherstate-of-the-artmethodforKBcompletion,areequivalentanddifferonlyintermsofconstraintsoninitialval-ues.Further,theyproposedalineartimecomputa-tionforHolEbystayingfullywithinthefrequencydomainofFFT.3.1IncorporatingTypesandRelationSchema(ITRS)Asdescribedearlier,relationschemaSrimposesarestrictiononsourcesandtargetsthatmayoccurwitharelationr.Wecanincorporatethisknowl-edgebothattrainingandattesttimes.Doingthisattesttimesimplytranslatestorelabelingschema-inconsistentpredictedtriplesasnone.Incorporat-ingthisknowledgeattrainingtimecanbedoneasaconstraintontherandomnegativesamplesthat
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
1
5
1
5
6
7
6
1
6
/
/
t
yo
a
C
_
a
_
0
0
0
1
5
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
202
themethodgeneratestocomplementthegiven,typ-icallypositive,triplesfortraining.Ingeneral,theratioofrandomnegativesamplesfromtheentiretensorTandrandomnegativesam-plesfromtheschemaconsistentportionT0ofTisaparameterthatshouldbetunedsuchthatthere-sultingnegativesamplesmimicthetruedistributionoflabels.Itisworthnotingthatwhetherthelocallyclosedworldassumption(LCWA)holdsornotplaysanimportantroleindeterminingthisratio.How-ever,theideaofmixingthetwokindsofnegativesampleshasbeenusedintheliteraturewithoutcon-sideringthenatureofthedataset,resultinginsomeseeminglycontradictingempiricalresultsontheop-timalratio(Lietal.,2016;Xieetal.,2016b;ShiandWeninger,2017;Xieetal.,2017).Asdiscussedlater,wefoundsamplingfromTtoworkbestonourdatasets.3.2IncorporatingEntityTaxonomy(IET)ItischallengingtocomeupwithcomplexHornorfirstorderlogicrulesforgenerics,aseachentityrep-resentsaclassofindividualsthatmaynotallbe-haveidentically.However,wecanderivesimpleyethighlyeffectiverulesbasedoncategoricalquantifi-cationlabels,leveragingthefactthatentitiescomefromdifferentlevelsinataxonomyhierarchy.Letpbetheparententityforentityset{ci}.Notethatciitselfisageneric,thatis,aclassofindividu-alsratherthanasingleindividual.Thisallowsonetomakemeaningfulexistentialstatementssuchas:ifapropertyholdsforallormostmembersofevenoneclassci,thenitholdsforsome(reasonablenumberof)membersofitsparentclassp.Weusethefol-lowingrules:6((pag,rj,tj),todo)⇒∀i((ci,rj,tj),todo)∀i((ci,rj,tj),todo)⇒((pag,ej,tj),todo)∃i((ci,rj,tj),todo)⇒((pag,ej,tj),alguno)∃i((ci,rj,tj),alguno)⇒((pag,ej,tj),alguno)Weapplytheserulestoaddresssparsityofgener-icstensors,makingtensorfactorizationmorerobust.Specifically,giveninitialtriplesK,weuseappli-cablerulestoderiveadditionaltriplesK0,perform6ThelastrulemaynotbeappropriateforKBswheresomemayrefertotheextremecaseofasingleindividual.ThisisnotthecasefortheKBsweuseforourevaluation.tensorfactorizationonK∪K0,andthenrevisitthetriplesinK0usingtheirpredictedlabelprobabili-ties.Notethatthisapproachallowsustoberobusttotaxonomicerrors:insteadofassumingeachtripleinK0istrue,weusethisonlyasapriorandletten-sorfactorizationdeterminethefinalpredictionbasedonglobalpatternsitfinds.4ActiveLearningforNeworRareEntitiesToaddresstheincompletenatureofgenericsKBs,weconsiderrareentitiesforwhichwehaveveryfewfacts,ornewentitieswhicharepresentinthetaxon-omybutforwhichwehavenofactsintheKB.Thegoalistousetensorfactorizationtogeneratehighqualityfactsaboutsuchentities.Forinstance,considerthetaskofinferringfactsaboutoriole,whereallweknowisthatitisabird.Weassumearestrictedbudgetonthenumberoffactswecanquery(forhumanannotation)aboutori-ole,usingwhichwewouldliketopredictmanymorehigh-qualityfactsaboutit.GivenafixedquerybudgetB,whatistheopti-malsetofqueriesweshouldgenerateforhumanan-notationaboutaneworrareentity˜eforthistask?Weviewthisasanactivelearningproblemandpro-poseatwo-stepalgorithm.First,weusetaxonomyguideduncertaintysamplingtoproposealistLtopotentiallyquery.Next,wedescribeasubmodularobjectivefunctionandacorrespondinglineartimealgorithmtochooseanoptimalsubsetbL⊆Lsatis-fying|bL|=B.WethenusebLforhumanannota-tion,appendtheresulttotheoriginalKB,andper-formtensorfactorizationtopredictadditionalnewfactsabout˜e.Fornotationalsimplicityandwithoutlossofgenerality,throughoutthissection,wecon-siderthecasewhere˜eappearsasthesourceentityinthetriple;theideasapplyequallywhen˜eappearsasthetargetentityinthetriple.4.1KnowledgeGuidedUncertaintyQuantificationWenowdiscusstheactivelearningandspecificallyuncertaintysamplingmethodweusetoproposealistoftriplestoquery.Uncertaintysamplingconsid-erstheuncertaintyforeachpossibletriple(˜e,ri,ei),definedashowfarawayfrom0.5theconditionalprobabilityisofthisfact,giventhefactswealready
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
1
5
1
5
6
7
6
1
6
/
/
t
yo
a
C
_
a
_
0
0
0
1
5
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
203
knowfromtheKB(Settles,2012).Thequestionishowtomodelthisconditionalprobability.Asim-plebaselineistoconsiderRandomqueries,i.e.,r,eareselectedrandomlyfromthelistofrelationsandentitiesinthetensor,respectively.Toinferinformationabout˜e,weproposethefol-lowingapproximationfortheconditionalprobabil-ityofanewfactabout˜egiventheKB.Let˜E˜e={mi|corr(˜e,mi)>0}bethesetofentitiesthatarecorrelatedwith˜e,Ω={((ei,ri,e0i),yi)|ei∈˜E˜e}bethesetofknownfactsaboutsuchentities,andyibethelabelforthetriple(ei,ri,e0i).Wehave:Pr(F(hri,h˜e,he0i))’1|Ω|Xei∈˜E˜ecorr(˜e,ei)yi.(5)Sin embargo,inpractice,wecannotmeasurecorr(˜e,ei)foreveryentryintheKBaswedonothavecompleteinformationabout˜e.Onesimpleideaistoconsiderthateveryentityiscorrelatedwith˜e:corr(˜e,ei)=1∀ei∈E.WewillrefertothisasSchemaConsis-tentqueryproposalasthisrelatestosummingoverallpossible(henceschemaconsistent)facts.Sincewehaveaccesstotaxonomyinformation,wecandoamoreprecise,SiblingGuided,approx-imation.7Weproposethefollowingapproximationforcorr(˜e,ei)forei∈E:corr(˜e,ei)=(cid:26)1ifei∈sibling(˜e)0otherwise.(6)Eqns.(5)y(6)canbeusedtoinferuncertaintriples:ifeverysiblingof˜ehasrelationshiprwithanentitye0,wecaninferfor“free”thatthisisthecasefor˜easwell.Ontheotherhand,whensib-lingsdisagreeinthisrespect,thereismoreuncer-taintyabout(˜e,r,e0)(accordingto(5)y(6)),mak-ingthistripleagoodcandidatetoquery.Inourex-ampleoforiole,thesiblingsarethebirdsthatexistinthetensor,e.g.,hummingbird,finch,woodpecker,etc.Allofthem(eat,insect)andhenceweinferthisfororiole.Butthereisnoagreementon(appearIn,farm)andhencethisisaddedtothequerylist.7Onemayalsodefinecorrbasedonentitysimilarityinadistributionalspace.Onechallengehereisthatsuchsimilaritygenerallydoesn’tpreservetypes.Forexample,dogmayco-occurmoreoftenwithandthusbe“closer”toboneorbarkinginadistributionalspace,thantosiblingssuchascatorotherpetanimals,whicharemorehelpfulinoursetting.Algorithm1:ActiveLearningforQueryPro-posalinputnewentity˜e,KB,taxonomy,lowerboundκMonagreement,lowerboundτLonuncertainty,upperboundτUonuncertainty1:extractlistS˜eofsibling(˜e)usingtaxonomy2:foreachei∈S˜e,addallfactsabouteitoΩ3:para(˜e,ri,e0i)∈Ωdo4:usar(5)-(6)toestimatePr(F(hri,h˜e,he0i))5:ifp≥κMthenadd(˜e,ri,e0i)toM6:ifτL≤p≤τUthenadd(˜e,ri,e0i)toLoutputL,MAlgorithm1formalizesthisprocess.Settingsomeupper(τU)andlower(τL)boundsontheconditionalprobability(Eqn.(5))whichquantifiestheuncer-tainty,wereachasetL={(˜e,ri,ei),i∈I}oftriplestoquery.UsinganotherhighthresholdκM>τU,wealsoinferthesetM={(˜e,rj,ej),j∈J}oftriplesthatalargemajorityofsiblingsagreeupon,andhence˜eisexpectedtoagreewithaswell.TripleswhoseconditionalprobabilityestimateisbetweenκMandτUareconsideredneithercertainenoughtoincludeinMnoruncertainenoughtojustifyaddingtoLforhumanannotationinhopesoflearningfromit.Similarly,tripleswithaconditionalprobabilityestimatelowerthanτLarediscarded.TheoutputofAlgorithm1isthelistLtoqueryandthelistMtoadddirectlytotheknowledgebase.4.2EfficientSubsetSelectionGiventhelistLasabove(Algorithm1),whichwecanwriteinshortasL={(ri,ei),i∈I},theprob-lemistofindthe“best”subsetbL.Abaselineforsuchaselectionistochoosethetopkqueries.WewillrefertothisasTKsubsetselection.Viewingsubsetselectionasacombinatorialprob-lem,wedeviseanobjectiveFthatmodelsseveralnaturalpropertiesofthissubset.WethenprovethatFissubmodular,thatis,themarginalgaininF(l)obtainedbyaddingonemoreitemtoLdecreasesasLgrows.8Importantly,thisimpliesthatthereisasimpleknowngreedyalgorithmthatcanefficientlycomputeaworst-case(1−1/e)-approximationof8Formally,forL00⊆L0⊆Landforl=(rl,el)∈L\L0,wehaveF(L00∪l)−F(L00)≥F(L0∪l)−F(L0).
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
1
5
1
5
6
7
6
1
6
/
/
t
yo
a
C
_
a
_
0
0
0
1
5
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
204
theglobaloptimumofF(Nemhauseretal.,1978).WerefertothisasSMsubsetselection.Sincequeriedsampleswilleventuallybefedintotensorfactorization,wewouldlikebLtocoveren-tities(fortheotherargumentofthetriple)andre-lationsasmuchaspossible.Inaddition,wewouldlikebLtobediverse,i.e.,prioritizerelationsanden-titiesthataremorevaried.9Atthesametime,wewouldalsowanttominimizeredundancy,i.e.,avoidchoosingrelations(entidades)thataretoosimilar.LetF(bL,RbL,EbL)denoteourobjective,whereRbL,EbListhesetofrelationsandentitiesinbL,respectively.Wedecomposeitas:F(bL,RbL,EbL)=wCC(bL,RbL,EbL)(7)+wDD(bL,RbL,EbL)−wRR(bL,RbL,EbL)wherethetermsinRHScorrespondtocover-age,diversity,andredundancy,respectivamente,andwC,wD,wRarethecorrespondingnon-negativeweights.Next,weproposefunctionalformsfortheseterms.Notethatanyfunctionthatcapturesthedescribedpropertiescanbeusedinstead,aslongastheobjectiveremainssubmodular.LetRandEdenotethesetofrelationsanden-titiesintheKB,respectively.ThecoveragesimplycapturesthefractionofentityandrelationsthatwehaveincludedinbL:C(bL,RbL,EbL)=|RbL||R|+|EbL||mi|.ThediversityforbListhesumofthediversitymea-sureoftheentitiesandrelationsincludedintheset:D(bL,RbL,EbL)=X(r,mi)∈bL[Vr+Ve],Vr=|ESr|+|ETr||mi|,Ve=|Re|+|ESe||R|+|mi|.HereVrandVerepresentthediversitymeasureofre-lationrandentitye,respectively.WeuseESr,ETrtodenotethesetofsourcesandtargetsthatappear9ThisagreeswiththesamplingmethodofChenetal.(2014)forfactorizingcoherentmatriceswithmissingvalues,whichchoosessampleswithprobabilityproportionaltotheirlocalco-herence.Algorithm2:QuerySubsetSelectioninputKB,budgetB,querylistLfromAlg.1.1:∀(r,mi)∈L,computethediversitymeasureVr,Ve2:bL←∅3:forj=1toBdo4:∀l∈L\bL:GRAMO(yo)=F(bL∪l)−F(bL),forFin(7)5:Selectl∗=argmaxL\bLG(yo)6:Addl∗tobLoutputbLforrelationrintheKB,ReasthesetofrelationsintheKBthathaveeastheirtarget,andESeasthesetofentitiesthatappearasthefirstentitywheneisthesecondentityofthetripleintheKB.ThediversitymeasureforeachrelationrisdefinedastheratioofthenumberofentitiesthatappearintheKBasitssourceortarget,overthetotalnumberofentities.Similarly,foranentitye,itsdiversityisdefinedastheratioofthenumberofrelationsinvolvingeplusthenumberofsourceentitiesthatco-occurwitheinarelation,overthetotalnumberofrelationsandentities.Notethatthediversitymeasureisanin-trinsiccharacteristicofeachentityandrelationship,dictatedbytheKBandindependentofthesetL,andcanthusbecomputedinadvance.Asdescribedabove,redundancyisameasureofsimilaritybetweenrelations(entidades)inbL.Ten-sorfactorizationyieldsanembeddingforeachre-lation(entidad)giventhefactstheyparticipatedin.Therefore,thelearnedembeddingsareoneofthebestoptionsforcapturingsimilarities.Lethe(andhr)denotethelearnedembeddingforentitye(andrelationr,resp.).WedefineR(bL,RbL,EbL)=Xr1,r2∈bLkhr1−hr2k+Xe1,e2∈bLkhe1−he2k.Thiscompletesthedefinitionofallpiecesofourobjectivefunction,F,fromEqn.(7).InAlgo-rithm2,wepresentourefficientgreedymethodtoselectasubsetofLthatapproximatelyoptimizesF.DespitebeingagreedyapproachthatsimplyaddsthecurrentlymostvaluablesinglequerytobLand
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
1
5
1
5
6
7
6
1
6
/
/
t
yo
a
C
_
a
_
0
0
0
1
5
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
205
repeats,thesubmodularnatureofF,whichwewillproveshortly,guaranteesthatAlgorithm2providesanapproximationthat,evenintheworsecase,isnoworsethanafactorof1−1/efromthe(unknown)trueoptimumofF.Thisisformalizedinthefol-lowingtheorem.Sinceadditionpreservessubmodu-larityandtheweightswC,wD,wRarenon-negative,wewillshowthateachofthethreetermsinFissubmodular.Theorem1.GivenatensorKB,abudgetB,andacandidatequerylistL,thequalityF(bL,RbL,EbL)oftheoutputbLofAlgorithm2isa(1−1/e)-approximationoftheglobaloptimumofF.Proof.Inordertoprovetheresult,itsufficestoshowthatF(bL,RbL,EbL)inEquation(7)issubmod-ular(Nemhauseretal.,1978).Tothisend,weshowthatforL00⊆L0⊆Landforl=(rl,el)∈L\L0,F(L00∪l)−F(L00)≥F(L0∪l)−F(L0).SinceadditionpreservessubmodularityandtheweightswC,wD,wRarenon-negative,itsufficestoshowthateachterminFissubmodular.First,considerthecoverageterm,C(bL,RbL,EbL).Inordertoprovethatitissubmodular,weverify:(|RL00∪l|−|RL00|)|R|≥(|RL0∪l|−|RL0|)|R|,(|EL00∪l|−|EL00|)|mi|≥(|EL0∪l|−|EL0|)|mi|.Notethatforthenumeratorsofeachoftheabovelines,thedifferencecanbeeither+1or0.SinceL00⊂L0,LHSis,bydefinition,neverlessthanRHSandtheinequalitiesholds.Next,considerthediversityterm,D(bL,RbL,EbL).Theaboveargumentdirectlyapplieshereaswell.Finally,considertheredundancyterm.Inordertoshowthat−R(bL,RbL,EbL)issubmodular,notethatwhentakingthedifferencebetweenR(L00∪l)andR(L00)thetermsthatcorrespondtobothentities(orbothrelations)beinginL00cancelout.ThesameholdsforR(L0∪l)−R(L0).Wethushave:R(L00∪l)−R(L00)=Xrl∈l,r2∈L00khr1−hr2k+Xel∈l,e2∈L00khe1−he2kR(L0∪l)−R(L0)=Xrl∈l,r2∈L0khr1−hr2k+Xel∈l,e2∈L0khe1−he2k.SinceL00⊆L0andnormsarenon-negative,R(L00∪l)−R(L00)≤R(L0∪l)−R(L0).Thereverseinequalityholdsforthenegationofbothsides,provingthat−R(bL,RbL,EbL)issubmodular.Combiningthethreeitemsconcludestheproof.(cid:4)Wewillcomplementthistheoreticalguaranteeintheexperimentssection(cf.Table3)byempiricallycomparingtheperformanceofourqueryproposalandsubsetselectionmethodswithbaselines.5ExperimentsWebeginwithadescriptionofthedatasetsandthegeneralsetup,thenevaluatetheeffectivenessofourguidedKBcompletionapproach,andendwithanevaluationofouractivelearningmethod.105.1DatasetandSetupToassessthequalityofourguidedKBcompletionmethod,weconsidertheonlylargeexistingknowl-edgebasesaboutgenericsthatweareawareof:1.ASciencetensorcontainingfactsaboutvari-ousscientificactivities,entidades(e.g.,animals,instruments,bodyparts),units,locations,occu-pations,etc.(Dalvietal.,2017).11Thisstartingtensorhasaprecisionofabout80%andactsasavaluableresourceforchallengingtaskssuchasquestionanswering.Ourgoalistostartwiththistensorandinfermorescientificfactsatasimilarorhigherlevelofprecision.2.AnAnimalssub-tensoroftheSciencetensor,whichfocusesonfactsaboutanimalsandalsohasasimilarstartingprecision.Again,thegoalistoinfermorefactsaboutanimals.ThemainstreamapproachforKBcompletionistofocusonentitiesthatarementionedsufficientlyoften.Forinstance,thecommonlyusedFB15Kdatasetguaranteesthateveryentityappearsatleast100times.Asamilderversionofthis,wefocusonthesubsetofthestartingtensorswhereeveryentityappearsatleast20times.TheresultingstatisticsofthetensorsweusehereareshowninTable1.10Dataandcodeavailablefromtheauthors.11AristoTupleKBv0,http://allenai.org/data/aristo-tuple-kb.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
1
5
1
5
6
7
6
1
6
/
/
t
yo
a
C
_
a
_
0
0
0
1
5
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
206
Dataset#Entities#Relations#TriplesAnimals22412910,604Science1,2551,51366,643Table1:Datasets,witha3/1/1train/validation/testsplit.Thisdata,whichistheonlyoneweareawareofwithgenerics,doesnotinclude((s,r,t),todo)styletriples.WethereforeusetheobjectivefunctioninEqn.(2)ratherthanthemulti-classoneinEqn.(1).Despitethislimitationofthedatasetanditssuperfi-cialsimilaritytothebinaryclassificationtaskunder-lyingstandard(non-generics)KBcompletion,ourresultsrevealthatextendingagenericsKBissur-prisinglydifficultforexistingmethods.Dalvietal.(2017)useapipelineconsistingofOpenIE(Bankoetal.,2007)extractions,aggrega-tion,andcleanupviacrowd-sourcingtogeneratetheSciencetensor.Thesefactscomewitharele-vantWordNet(Molinero,1995)basedtaxonomy,entitytypes(derivedfromWordNet‘synsets’),andrela-tionschema.Ourmethodcapitalizesonthisaddi-tionalinformation12toperformhighqualityknowl-edgecompletion.OurevaluationmetricistheaccuracyofthetopktriplesgeneratedbyvariousKBcompletionmeth-ods.Wealsovisualizeentireprecision-recallcurves,wherepossible.Whilethismetricrequireshumanannotationandisthusmorecumbersomethanfully-automaticmetrics,itisarguablymoresuitableforevaluatinggenerativetaskswithamassiveoutputspace,suchasKBcompletion.Inthissetting,eval-uationagainstarelativelysmallheldouttestsetcanbemisleading—amethodmaybehighlyac-curateatgeneratingthousandsofvalidandusefultriplesevenifitdoesnotnecessarilyclassifyspe-cificheldoutinstancesaccurately.WhilemeasuressuchMAPandMRRhavebeenusedinthepasttoalleviatethis,theyprovideonlyapartialsolu-tiontotheinherentdifficultyofevaluatinggenera-tivesystems.Annotation-efficientevaluationmeth-odshaverecentlybeenproposedtoaddressthischal-lenge(SabharwalandSedghi,2017).12Inordertolimitpotentialerrorpropagation,wecollapsethetaxonomytothetoptwolevelsinourexperiments.0200040006000800010000Yield0.40.50.60.70.80.91.0PrecisionHolE + ITRS + IETHolE + ITRSSICTFHolEFigure1:Precision-yieldcurvesforvariousembedding-basedmethodsontheAnimalstensor.State-of-the-artnamed-entityinspiredapproaches(negro,pink)havelowprecisionevenatalowyield.TransEisomittedduetoitsverylowprecisionhere,around10%.Ourmethod(HolE+ITRS+IET,verde)doublesthesizeofthestartingtensorataprecisionof86.4%.5.2GuidedKBCompletionWefirstcompareourmethod(Section3)withexist-ingKBcompletiontechniquesontheAnimalsten-sor,andthendemonstratethatitseffectivenesscar-riesoverscalablytothelargerSciencetensoraswell.Inwhatfollows,Tdenotesthetensorunderconsideration.Weexaminetwoalternativesforgeneratingnega-tivesamples:givenatriple(s,r,t)∈T,replaceswith(1)anyentitys0or(2)anentitys0ofthesametypeass.Theresultingperturbedtriple(s0,r,t)isthentreatedasanegativesampleifitisnotpresentinT.Wealsoconsideredaweightedcombinationof(1)y(2),andfoundrandomsamplingtobethemostreliableonourdatasets.ThiscomplieswiththecommonlyusedLCWAassumptionnotbeingappli-cabletothesetensors.Asbaselines,weconsiderextensionsofthreestate-of-the-artembedding-basedKBcompletionmethods:HolE,TransE,andRESCAL.Asmen-tionedearlier,twoleadinggraph-basedmethods,SFEandPRA,didnotscalewell.BothvanillaTransEandRESCALresultedinpoorperformance;wethusreportnumbersonlyfortheirextensions.Specifically,weconsider3baselines:(1)HolE,(2)TransE+Schema,y(3)SICTFwhichextends
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
1
5
1
5
6
7
6
1
6
/
/
t
yo
a
C
_
a
_
0
0
0
1
5
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
207
HolESICTFHolE+ITRS+IETfarm,join,farmpenguin,haspart,toothsalmon,thrivein,waterfamily,join,familymosquito,spread,parasiteanimal,givebirthto,animaltree,resemble,treespider,haspart,skinduck,feedin,waterwater,isknownas,waterelephant,eat,fishfish,migrateto,watervirus,attract,virusshark,haspart,skinfish,thrivein,wateranimal,resemble,animalcrab,eat,insectturtle,swimin,watertree,isknownas,treesnake,eat,fishsalmon,swimin,waterhabitat,isknownas,habitatotter,haspart,toothturtle,livein,waterenvment.,isknownas,envment.meat,attract,hummingbirdanimal,chew,foodman,join,manspider,haspart,clawinsect,destroy,treebird,givebirthto,birdturtle,haspart,toothfarm,possess,horseregion,isknownas,regionhuman,eat,plantfish,swimin,oceanvirus,derivefrom,virusmonkey,haspart,wingturtle,feedin,waterfood,resemble,fooddolphin,haspart,toothturtle,floatin,waterbird,isknownas,birdcarnivore,livein,waterdinosaur,walkon,legfield,resemble,fieldlizard,eat,fishturtle,migrateto,waterfish,isknownas,fishpelican,haspart,toothturtle,returnto,waterbird,resemble,birdcaterpillar,turninto,birdman,ride,cattlegrass,grazein,manbee,pollinate,gardenturtle,swimin,oceananimal,isknownas,animalvirus,infect,birdfish,floatin,oceanTable2:Top20predictionsbyvariousmethods,withinvalidtriplesunderlinedanduninterestingones,suchas(X,isknownas,X)o(Y,resembles,Y),showninitalics.Whilesomeofthisassessmentcanbesubjective,itisevidentthatourmethod,HolE+ITRS+IET,generatesmanymoretriplesthatarevalidandinterestingthancompetingapproaches.RESCALandincorporatesschema.Figure1showstheresultingprecision-yieldcurvesforthepredictionsmadebyeachmethodontheAnimalsdatasetcontaining10.6Kfacts.Specif-ically,foreachmethod,werankthepredictionsbasedonthemethod’sassignedscoreandcomputetheprecisionofthetopkpredictionsforvaryingk.Asexpected,weobserveagenerallydecreasingtrendaskincreases.TransE+ITRSgaveaprecisionofonlyaround10%andisomittedfromtheplot.Wemaketwoobservations:Primero,derivingnewfactsforthesegenericstensorsatahighprecisionischallenging!Específicamente,noneofthebaselinemethods(blackandpinkcurves),whichrepresentstateoftheartfornamed-entityten-sors,achieveayieldofmorethan10%ofT(i.e.,1Kpredictions)evenataprecisionofjust60%.Second,externalinformation,ifusedappropri-ately,canbesurprisinglypowerfulinthissetting.Specifically,simplyincorporatingrelationschema(ITRS,bluecurve)allowsHolE-basedcompletiontodoublethesizeofthestartingtensorTbypro-ducingover10Knewtriplesataprecisionof82%.Further,incorporatingentitytaxonomy(IET,greencurve)toaddresstensorsparsityresultsinthesameyieldatastatisticallysignificantlyhigherprecisionof86.4%.ItturnsoutthatnotonlydoesourmethodresultinsubstantiallyimprovedPRcurves,italsogeneratesqualitativelymoreinterestingandusefulgenericfactsabouttheworldthanpreviousmethods.WeillustratethisinTable2,whichliststhetop20pre-dictionsmadebyvariousapproaches.Thetriplesshowninredarefalsepredictions(p.ej.,(penguin,haspart,tooth),(grass,grazein,hombre),(caterpil-lar,turninto,bird))oruninterestingones(p.ej.,(wa-ter,isknownas,agua)).Aswesee,avastmajor-ityofthetop20predictionsmadebybothvanillaHolEandSICTFfallintothesecategories.Ontheotherhand,ourmethod,HolE+ITRS+IET,predicts19truetripesoutofthetop20,includinginterestingscientificfactsthatwereevidentlymissingfromthestartingtensor,suchas(salmon,thrivein,agua),(fish,swimin,ocean)y(insect,destroy,árbol).Finalmente,weevaluateourproposalontheentireSciencedatasetwith66.6Kfacts.Sincegraph-basedmethodsdidnotscalewelltothemuchsmallerAn-imalsdatasetandothermethodsperformedsubstan-
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
1
5
1
5
6
7
6
1
6
/
/
t
yo
a
C
_
a
_
0
0
0
1
5
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
208
#NewTrueTriplesInferredQuerySubsetFromSiblingTensorProposalSelectionAnntationArgumentFactorizationTotalRandom-0-00SchemaConsistentTK73-1083SchemaConsistentSM57-2784SiblingGuidedTK9617211324SiblingGuidedSM10017366483Table3:ActiveLearningfornewentities:Numberofnewfactsinferred(fromannotation,siblingagreement,tensorfactorization,andintotal)forarepresentativenewentity˜e,whenquerying100factsabout˜eforhumanannotation.tiallyworsethere,wefocushereonthescalabilityandpredictionqualityofourmethod.WefoundthatHolE+ITRS+IETscaleswelltothishighdimen-sion,doublingthenumberoffactsbyadding66Knewfactsat74%precision.AlthoughtheSciencetensoris1,000timeslargerthantheAnimalstensor,themethodtookonly10xlongertorun(3minutesonAnimalstensorvs.56minutesonScienceten-sor,usinga2.8GHz,16GBMacbookPro).Withadditionalimprovementssuchasparallelization,itiseasilypossibletofurtherscalethemethoduptosubstantiallylargertensors.5.3ActiveLearningforNewEntitiesToassessthequalityofouractivelearningmecha-nism(Section4),weconsiderpredictingfactsaboutanewentity˜ethatisnotintheAnimalstensor.Forillustration,wechoose˜efromtheSciencetensorvocabularywhileensuringthatitispresentintheWordNettaxonomy.Thesetupisasfollows.Wefirstuseaquerygen-erationmechanism(Aleatorio,SchemaConsistent,orSiblingGuided;cf.Section4.1)toproposeanor-deredlistLoffactsabout˜etoannotate.Next,weperformsubsetselection(TopkorTK,SubmodularorSM;cf.Section4.2)onLtoidentifyasubsetbLofupto100mostpromisingqueries.Thesearethenannotatedandthetrueonesfedintotensorfactor-izationasadditionalinputtoinferfurthernewfactsabout˜e.InTable3,weassessthequalityofbLintwoways,cuando|bL|=100:howmanytruefactsdoesbLhaveandhowmanyoverallnewfactsdoesthisannotationproduceabout˜e.Figure2providesacomplemen-taryview,focusingontheoverallnumberofnewfactsinferredas|bL|increases.Whiletheseillus-5060708090100NumberofFactsAnnotated0100200300400500NewFactsInferredSiblingGuided+SMSiblingGuided+TKSchemaConsistent+TKSchemaConsistent+SMRandomQueriesFigure2:ActiveLearningfornewentities:Totalnum-berofnewinferredfacts(y-axis)forvarioushumanan-notationquerysizes(x-axis).Theuseofsubsetselection(greentriangles,arriba)andsiblinginformation(bluecir-cles,2ndfromtop)vastlyoutperformsvariousbaselines.trativenumbersareforarepresentativenewentity,reindeer,theoveralltrendandorderofnumbersre-mainedthesameforothernewentitiesweexperi-mentedwith.WementionsomehighlightsfromTable3.First,notsurprisingly,randomlychoosingtriplesabout˜etoannotateisineffective.Second,choosingschemaconsistenttriplesresultsin73truetriples(outof100)butthesefactshelptensorfactorizationverylit-tle,resultinginonly10additionalnewtriplesabout˜e.Ourproposedsiblingguidedqueryingmecha-nismresultsnotonlyinnearlyall100factsbeingtruealongwith17truefactsinferredfromsiblingagreement(setMinAlg.1),butalso,combinedwithsubmodularsubsetselectionforbalancingdiversitywithcoverage(Alg.2),ultimatelyresultsin483new
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
1
5
1
5
6
7
6
1
6
/
/
t
yo
a
C
_
a
_
0
0
0
1
5
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
209
factsabout˜e.Thesefactscoverinterestingnewin-formationsuchas(reindeer,eat,fruit),(wolf,chase,reindeer),y(reindeer,provide,pelo).Finalmente,theplotinFigure2demonstratesthatthequalitativetrendsremainthesame,irrespectiveofthenumber|bL|ofqueriesannotated.Overall,oursiblingguidedquerieswithsubmodularsubsetse-lection(greentriangles,top-mostcurve)ultimatelyresultsin5.8timesmorenewfactsabout˜ethananon-trivial,uncertainlybased,schemaconsistentbaseline(blackstars,3rdcurvefromthetop).Thisatteststotheefficacyofthemethodonthischalleng-ingproblemanddataset.6ConclusionThisworkexploresKBcompletionforanewclassofproblems,namelycompletinggenericsKBs,whichisanessentialstepforincludinggeneralworldknowledgeinintelligentmachines.Thedif-ferencesbetweengenericsandmuchstudiednamedentityKBsmakeexistingtechniqueseithernotscalewellorproducefactsatanundesirablylowpreci-sionoutofthebox.Wedemonstratethatincorporat-ingentitytaxonomyandrelationschemaappropri-atelycanbehighlyeffectiveforgenericsKBs.Fur-ther,toaddressscarcityoffactsaboutcertainenti-tiesinsuchKBs,wepresentanovelactivelearn-ingapproachusingsiblingguideduncertaintyesti-mationalongwithsubmodularsubsetselection.Theproposedtechniquessubstantiallyoutperformvari-ousbaselines,settinganewstateoftheartforthischallengingclassofcompletionproblems.OurmethodisapplicabletoKBsthathaveanas-sociatedentitytaxonomyandrelationschema.Itisexpectedtobesuccessfulwheninformationfromsiblingscanbeusedtoguidewhatislikelytobetrueandwhatisagoodcandidatetoqueryforagivenentity.WefocusonKBsofgenericswheresuchin-formationisavailableand—asweshow—ishighlyvaluableforeffectiveKBcompletion.Whydoesouruseoftypesworksubstantiallybet-terinoursettingthantheuseoftypesinvariousbaselines?Onehypothesisisthefollowing.Theuseofcomplicatedmodelsrequiressubstantialdataandinformation.InourKB,theinformationappearssosparseandincompletethatusingtypesincompli-catedwaysisnotproductive.Ourproposalinsteadattemptstousetypeinformationonlytogentlyen-hancethesignalandreducenoise,beforeperform-ingtensordecomposition.Wehopethisworkwilltriggerfurtherexplorationofknowledgebaseswithgenerics,akeyaspectofmachineintelligence.AcknowledgmentsTheauthorswouldliketothankPeterClarkforfruitfuldiscussions,valuablefeedback,andcrowdsourcinganno-tations;MattGardnerforconstructivecommentsandas-sessinggraph-basedcompletionmethodsonourdatasets;andUdaiSainiandParthaTalukdarforevaluatingtheirCNTFapproachonourdatasets.ReferencesMicheleBanko,MichaelJ.Cafarella,StephenSoderland,MatthewBroadhead,andOrenEtzioni.2007.Openinformationextractionfromtheweb.InIJCAI,pages2670–2676.KurtBollacker,ColinEvans,PraveenParitosh,TimSturge,andJamieTaylor.2008.Freebase:acollabo-rativelycreatedgraphdatabaseforstructuringhumanknowledge.InICMD,pages1247–1250.ACM.AntoineBordes,NicolasUsunier,AlbertoGarcia-Duran,JasonWeston,andOksanaYakhnenko.2013.Trans-latingembeddingsformodelingmulti-relationaldata.InNIPS,pages2787–2795.Kai-WeiChang,Wen-tauYih,BishanYang,andChristo-pherMeek.2014.Typedtensordecompositionofknowledgebasesforrelationextraction.InEMNLP,pages1568–1579.YudongChen,SrinadhBhojanapalli,SujaySanghavi,andRachelWard.2014.Coherentmatrixcompletion.InICML.BhavanaDalvi,NiketTandon,andPeterClark.2017.Domain-targeted,highprecisionknowledgeextrac-tion.TACL,5:233–246.ThomasDemeester,TimRockt¨aschel,andSebastianRiedel.2016.Liftedruleinjectionforrelationem-beddings.InEMNLP,pages1389–1399.XinDong,EvgeniyGabrilovich,GeremyHeitz,WilkoHorn,NiLao,KevinMurphy,ThomasStrohmann,ShaohuaSun,andWeiZhang.2014.KnowledgeVault:Aweb-scaleapproachtoprobabilisticknowl-edgefusion.InKDD,pages601–610.ACM.MatthewGardnerandTomM.Mitchell.2015.Effi-cientandexpressiveknowledgebasecompletionusingsubgraphfeatureextraction.InEMNLP,pages1488–1498.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
1
5
1
5
6
7
6
1
6
/
/
t
yo
a
C
_
a
_
0
0
0
1
5
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
210
KatsuhikoHayashiandMasashiShimbo.2017.Ontheequivalenceofholographicandcomplexembeddingsforlinkprediction.InACL,pages554–559.ManjunathHegdeandParthaP.Talukdar.2015.Anentity-centricapproachforovercomingknowledgegraphsparsity.InEMNLP,pages530–535.AlexandrosKomninosandSureshManandhar.2017.Feature-richnetworksforknowledgebasecompletion.InACL,pages324–329.AkshayKrishnamurthyandAartiSingh.2013.Low-rankmatrixandtensorcompletionviaadaptivesam-pling.InNIPS,pages836–844.DenisKrompaß,MaximilianNickel,andVolkerTresp.2014.Large-scalefactorizationoftype-constrainedmulti-relationaldata.In2014InternationalCon-ferenceonDataScienceandAdvancedAnalytics(DSAA),pages18–24.IEEE.DenisKrompaß,StephanBaier,andVolkerTresp.2015.Type-constrainedrepresentationlearninginknowl-edgegraphs.InInternationalSemanticWebConfer-ence,pages640–655.Springer.NiLao,TomMitchell,andWilliamWCohen.2011.Randomwalkinferenceandlearninginalargescaleknowledgebase.InEMNLP,pages529–539.Sarah-JaneLeslie.2008.Generics:Cognitionandacqui-sition.PholosophicalReview,117(1):1–47.XiangLi,AynazTaheri,LifuTu,andKevinGimpel.2016.Commonsenseknowledgebasecompletion.InACL,pages1445–1455.GeorgeA.Miller.1995.WordNet:AlexicaldatabaseforEnglish.CommunicationsoftheACM,38(11):39–41.KarthikNarasimhan,AdamYala,andReginaBarzilay.2016.Improvinginformationextractionbyacquir-ingexternalevidencewithreinforcementlearning.InEMNLP,pages2355–2365.GeorgeL.Nemhauser,LaurenceA.Wolsey,andMar-shallL.Fisher.1978.Ananalysisofapproximationsformaximizingsubmodularsetfunctions-I.Mathe-maticalProgramming,14(1):265–294.MaximilianNickel,VolkerTresp,andHans-PeterKriegel.2011.Athree-waymodelforcollectivelearningonmulti-relationaldata.InICML,pages809–816.MaximilianNickel,KevinMurphy,VolkerTresp,andEv-geniyGabrilovich.2016a.Areviewofrelationalma-chinelearningforknowledgegraphs.ProceedingsoftheIEEE,104(1):11–33.MaximilianNickel,LorenzoRosasco,andTomasoA.Poggio.2016b.Holographicembeddingsofknowl-edgegraphs.InAAAI,pages1955–1961.MadhavNimishakavi,UdaySinghSaini,andParthaTalukdar.2016.Relationschemainductionusingten-sorfactorizationwithsideinformation.InEMNLP,pages414–423.SebastianRiedel,LiminYao,AndrewMcCallum,andBenjaminM.Marlin.2013.Relationextractionwithmatrixfactorizationanduniversalschemas.InHLT-NAACL,pages74–84.TimRocktaschel,SameerSingh,andSebastianRiedel.2015.Injectinglogicalbackgroundknowledgeintoembeddingsforrelationextraction.InNAACL,pages1119–1129.AshishSabharwalandHanieSedghi.2017.Howgoodaremypredictions?Efficientlyapproximatingprecision-recallcurvesformassivedatasets.InUAI.HinrichSch¨utze,YadollahYaghoobzadeh,andHeikeAdel.2017.Noisemitigationforneuralentitytypingandrelationextraction.InEACL,pages1183–1194.BurrSettles.2012.ActiveLearning.Morgan&Clay-pool.BaoxuShiandTimWeninger.2017.ProjE:Embeddingprojectionforknowledgegraphcompletion.InAAAI,pages1236–1242.Th´eoTrouillon,JohannesWelbl,SebastianRiedel,´EricGaussier,andGuillaumeBouchard.2016.Com-plexembeddingsforsimplelinkprediction.InICML,pages2071–2080.RuobingXie,ZhiyuanLiu,JiaJia,HuanboLuan,andMaosongSun.2016a.Representationlearningofknowledgegraphswithentitydescriptions.InAAAI,pages2659–2665.RuobingXie,ZhiyuanLiu,andMaosongSun.2016b.Representationlearningofknowledgegraphswithhi-erarchicaltypes.InIJCAI,pages2965–2971.QizheXie,XuezheMa,ZihangDai,andEduardH.Hovy.2017.Aninterpretableknowledgetransfermodelforknowledgebasecompletion.InACL,pages950–962.
Descargar PDF