Transactions of the Association for Computational Linguistics, vol. 3, pp. 243–255, 2015. Action Editor: Ryan McDonald.
Submission batch: 1/2015; Revision batch 4/2015; Published 5/2015.
2015 Association for Computational Linguistics. Distributed under a CC-BY-NC-SA 4.0 Licence.
c
(cid:13)
CombiningMinimally-supervisedMethodsforArabicNamedEntityRecognitionMahaAlthobaiti,UdoKruschwitz,andMassimoPoesioSchoolofComputerScienceandElectronicEngineeringUniversityofEssexColchester,ROYAUME-UNI{mjaltha,udo,poesio}@essex.ac.ukAbstractSupervisedmethodscanachievehighperfor-manceonNLPtasks,suchasNamedEn-tityRecognition(NER),butnewannotationsarerequiredforeverynewdomainand/orgenrechange.Thishasmotivatedresearchinminimallysupervisedmethodssuchassemi-supervisedlearninganddistantlearning,butneithertechniquehasyetachievedperfor-mancelevelscomparabletothoseofsuper-visedmethods.Semi-supervisedmethodstendtohaveveryhighprecisionbutcompar-ativelylowrecall,whereasdistantlearningtendstoachievehigherrecallbutlowerpre-cision.Thiscomplementaritysuggeststhatbetterresultsmaybeobtainedbycombiningthetwotypesofminimallysupervisedmeth-ods.Inthispaperwepresentanovelap-proachtoArabicNERusingacombinationofsemi-supervisedanddistantlearningtech-niques.Wetrainedasemi-supervisedNERclassifierandanotheroneusingdistantlearn-ingtechniques,andthencombinedthemusingavarietyofclassifiercombinationschemes,includingtheBayesianClassifierCombina-tion(BCC)procedurerecentlyproposedforsentimentanalysis.Accordingtoourresults,theBCCmodelleadstoanincreaseinper-formanceof8percentagepointsoverthebestbaseclassifiers.1IntroductionSupervisedlearningtechniquesareveryeffectiveandwidelyusedtosolvemanyNLPproblems,in-cludingNER(Sekineandothers,1998;Benajibaetal.,2007a;Darwish,2013).Themaindisadvantageofsupervisedtechniques,cependant,istheneedforalargeannotatedcorpus.Althoughaconsiderableamountofannotateddataisavailableformanylan-guages,includingArabic(Zaghouani,2014),chang-ingthedomainorexpandingthesetofclassesal-waysrequiresdomain-specificexpertsandnewan-notateddata,bothofwhichdemandtimeandeffort.Therefore,muchofthecurrentresearchonNERfocusesonapproachesthatrequireminimalhumaninterventiontoexportthenamedentity(NE)clas-sifierstonewdomainsandtoexpandNEclasses(Nadeau,2007;Nothmanetal.,2013).Semi-supervised(Abney,2010)anddistantlearn-ingapproaches(Mintzetal.,2009;Nothmanetal.,2013)arealternativestosupervisedmethodsthatdonotrequiremanuallyannotateddata.TheseapproacheshaveprovedtobeeffectiveandeasilyadaptabletonewNEtypes.However,theperfor-manceofsuchmethodstendstobelowerthanthatachievedwithsupervisedmethods(Althobaitietal.,2013;Nadeau,2007;Nothmanetal.,2013).Weproposecombiningthesetwominimallysu-pervisedmethodsinordertoexploittheirrespectivestrengthsandtherebyobtainbetterresults.Semi-supervisedlearningtendstobemoreprecisethandistantlearning,whichinturnleadstohigherre-callthansemi-supervisedlearning.Inthiswork,weusevariousclassifiercombinationschemestocombinetheminimalsupervisionmethods.Mostpreviousstudieshaveexaminedclassifiercombi-nationschemestocombinemultiplesupervised-learningsystems(Florianetal.,2003;SahaandEkbal,2013),butthisresearchisthefirsttocom-bineminimalsupervisionapproaches.Inaddition,
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4
/
/
t
je
un
c
_
un
_
0
0
1
3
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
244
wereportourresultsfromtestingtherecentlypro-posedIndependentBayesianClassifierCombination(IBCC)scheme(KimandGhahramani,2012;Lev-enbergetal.,2014)andcomparingitwithtraditionalvotingmethodsforensemblecombination.2Background2.1ArabicNERAlotofresearchhasbeendevotedtoArabicNERoverthepasttenyears.Muchoftheinitialworkem-ployedhand-writtenrule-basedtechniques(Mesfar,2007;ShaalanandRaza,2009;Elsebaietal.,2009).MorerecentapproachestoArabicNERarebasedonsupervisedlearningtechniques.ThemostcommonsupervisedlearningtechniquesinvestigatedforAra-bicNERareMaximumEntropy(ME)(Benajibaetal.,2007b),SupportVectorMachines(SVMs)(Be-najibaetal.,2008),andConditionalRandomFields(CRFs)(BenajibaandRosso,2008;Abdul-HamidandDarwish,2010).Darwish(2013)presentedcross-lingualfeaturesforNERthatmakeuseofthelinguisticpropertiesandknowledgebasesofanotherlanguage.Inhisstudy,EnglishcapitalisationfeaturesandanEnglishknowledgebase(DBpedia)wereexploitedasdis-criminativefeaturesforArabicNER.AlargeMa-chineTranslation(MT)phrasetableandWikipediacross-linguallinkswereusedfortranslationbetweenArabicandEnglish.TheresultsshowedanoverallF-scoreof84.3%withanimprovementof5.5%overastrongbaselinesystemonastandarddataset(theANERcorpsetcollectedbyBenajibaetal.(2007un)).Abdallahetal.(2012)proposedahybridNERsystemforArabicthatintegratesarule-basedsys-temwithadecisiontreeclassifier.Theirinte-gratedapproachincreasedtheF-scorebybetween8%and14%whencomparedtotheoriginalrulebasedsystemandthepuremachinelearningtech-nique.OudahandShaalan(2012)alsodevelopedhybridArabicNERsystemsthatintegratearule-basedapproachwiththreedifferentsupervisedtech-niques:decisiontrees,SVMs,andlogisticregres-sion.Theirbesthybridsystemoutperformsstate-of-the-artArabicNERsystems(BenajibaandRosso,2008;Abdallahetal.,2012)onstandardtestsets.2.2MinimalSupervisionandNERMuchcurrentresearchseeksadequatealternativestoexpensivecorpusannotationthataddressthelimita-tionsofsupervisedlearningmethods:theneedforsubstantialhumaninterventionandthelimitednum-berofNEclassesthatcanbehandledbythesystem.Semi-supervisedtechniquesanddistantlearningareexamplesofmethodsthatrequireminimalsupervi-sion.Semi-supervisedlearning(SSL)(Abney,2010)hasbeenusedforvariousNLPtasks,includingNER(Nadeau,2007).‘Bootstrapping’isthemostcom-monsemi-supervisedtechnique.Bootstrappingin-volvesasmalldegreeofsupervision,suchasasetofseeds,toinitiatethelearningprocess(NadeauandSekine,2007).Anearlystudythatintroducedmutualbootstrappingandprovedhighlyinfluentialis(RiloffandJones,1999).Theypresentedanal-gorithmthatbeginswithasetofseedexamplesofaparticularentitytype.Then,allcontextsfoundaroundtheseseedsinalargecorpusarecompiled,ranked,andusedtofindnewexamples.Pascaetal.(2006)usedthesamebootstrappingtechniqueasRiloffandJones(1999),butappliedthetechniquetoverylargecorporaandmanagedtogenerateonemillionfactswithaprecisionrateofabout88%.Ab-delRahmanetal.(2010)proposedtointegrateboot-strappingsemi-supervisedpatternrecognitionandaConditionalRandomFields(CRFs)classifier.Theyusedsemi-supervisedpatternrecognitioninordertogeneratepatternsthatwerethenusedasfeaturesintheCRFsclassifier.Distantlearning(DL)isanotherpopularparadigmthatavoidsthehighcostofsupervision.Itdependsontheuseofexternalknowledge(e.g.,encyclopediassuchasWikipedia,unlabelledlargecorpora,orexternalsemanticrepositories)toincreasetheperformanceoftheclassifier,ortoautomaticallycreatenewresourcesforuseinthelearningprocess(Mintzetal.,2009;NguyenandMoschitti,2011).Nothmanetal.(2013)automaticallycreatedmassive,multilingualtrainingannotationsforNERbyexploitingthetextandin-ternalstructureofWikipedia.TheyfirstcategorisedWikipediaarticlesintoaspecificsetofnamedentitytypesacrossninelanguages:Dutch,English,French,German,Italian,Polish,Portuguese,Rus-
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4
/
/
t
je
un
c
_
un
_
0
0
1
3
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
245
sian,andSpanish.Then,Wikipedia’slinksweretransformedintonamedentityannotationsbasedontheNEtypesofthetargetarticles.Followingthisapproach,millionsofwordswereannotatedintheaforementionedninelanguages.TheirmethodforautomaticallyderivingcorporafromWikipediaoutperformedthemethodsproposedbyRichmanandSchone(2008)andMikaetal.(2008)whentestingtheWikipedia-trainedmodelsonCONLLsharedtaskdataandothergold-standardcorpora.AlotaibiandLee(2013)presentedamethodologytoautomaticallybuildtwoNE-annotatedsetsfromArabicWikipedia.ThecorporawerebuiltbytransforminglinksintoNEannotationsaccordingtotheNEtypeofthetargetarticles.POS-tagging,morphologicalanalysis,andlinkedNEphraseswereusedtodetectothermentionsofNEsthatappearwithoutlinksintext.TheirWikipedia-trainedmodelperformedwellwhentestedonvariousnewswiretestsets,butitdidnotsurpasstheperformanceofthesupervisedclassifierthatistrainedandtestedondatasetsdrawnfromthesamedomain.2.3ClassifierCombinationandNERWearenotawareofanypreviousworkcombin-ingminimallysupervisedmethodsforNERtaskinArabicoranyothernaturallanguage,buttherearemanystudiesthathaveexaminedclassifiercom-binationschemestocombinevarioussupervised-learningsystems.Florianetal.(2003)presentedthebestsystemattheNERCoNLL2003task,withanF-scorevalueequalto88.76%.TheyusedacombinationoffourdiverseNEclassifiers:thetransformation-basedlearningclassifier,aHiddenMarkovModelclassifier(HMM),arobustriskmin-imizationclassifierbasedonaregularizedwinnowmethod(Zhangetal.,2002),andaMEclassifier.Thefeaturestheyusedincludedtokens,POSandchunktags,affixes,gazetteers,andtheoutputoftwootherNEclassifierstrainedonricherdatasets.TheirmethodsforcombiningtheresultsofthefourNEclassifiersimprovedtheoverallperformanceby17-21%whencomparedwiththebestperformingclas-sifier.SahaandEkbal(2013)studiedclassifiercombi-nationtechniquesforvariousNERmodelsundersingleandmulti-objectiveoptimisationframeworks.Theyusedsevendiverseclassifiers-naiveBayes,decisiontree,memorybasedlearner,HMM,ME,CRFs,andSVMs-tobuildanumberofvotingmod-elsbasedonidentifiedtextfeaturesthatareselectedmostlywithoutdomainknowledge.Thecombina-tionmethodsusedwerebinaryandrealvote-basedensembles.Theyreportedthattheproposedmulti-objectiveoptimisationclassifierensemblewithrealvotingoutperformstheindividualclassifiers,thethreebaselineensembles,andthecorrespondingsin-gleobjectiveclassifierensemble.3TwoMinimallySupervisedNERClassifiersTwomainminimallysupervisedapproacheshavebeenusedforNER:semi-supervisedlearning(Al-thobaitietal.,2013)anddistantsupervision(Noth-manetal.,2013).Wedevelopedstate-of-the-artclassifiersofbothtypesthatwillbeusedasbaseclassifiersinthispaper.OurimplementationsoftheseclassifiersareexplainedinSection3.1andSection3.2.3.1Semi-supervisedLearningAspreviouslymentioned,themostcommonSSLtechniqueisbootstrapping,whichonlyrequiresasetofseedstoinitiatethelearningprocess.WeusedanalgorithmadaptedfromAlthobaitietal.(2013)andcontainsthreecomponents,asshowninFigure1. Pattern Induction Instance Extraction Instance Ranking/Selection Seed Instances Figure1:TheThreeComponentsofSSLSystem.Thealgorithmbeginswithalistofafewexam-plesofagivenNEtype(e.g.,‘London’and‘Paris’canbeusedasseedexamplesforlocationentities)andlearnspatterns(P.)thatareusedtofindmoreex-amples(candidateNEs).Theseexamplesareeven-tuallysortedandusedagainasseedexamplesforthenextiteration.Ouralgorithmdoesnotuseplainfrequenciessinceabsolutefrequencydoesnotalwaysproducegoodexamples.Thisisbecausebadexampleswillbeextractedbyonepattern,howeverunwantedly,asmanytimesasthebadexamplesappearinthetextinrelativelysimilarcontexts.Meanwhile,goodexam-
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4
/
/
t
je
un
c
_
un
_
0
0
1
3
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
246
plesarebestextractedusingmorethanonepattern,sincetheyoccurinawidervarietyofcontextsinthetext.Instead,ouralgorithmrankscandidateNEsac-cordingtothenumberofdifferentpatternsthatareusedtoextractthem,sincepatternvarietyisabettercuetosemanticsthanabsolutefrequency(Baronietal.,2010).Aftersortingtheexamplesaccordingtothenum-berofdistinctpatterns,allexamplesbutthetopmarediscarded,wheremissettothenumberofex-amplesfromthepreviousiteration,plusone.Thesemexampleswillbeusedinthenextiteration,andsoon.Forexample,ifwestartthealgorithmwith20seedinstances,thefollowingiterationwillstartwith21,andthenextonewillstartwith22,andsoon.Thisprocedureisnecessaryinordertocarefullyincludeexamplesfromoneiterationtoanotherandtoensurethatbadinstancesarenotpassedontothenextiteration.Thesameprocedurewasappliedby(Althobaitietal.,2013).3.2DistantLearningFordistantlearningwefollowthestateoftheartap-proachtoexploitWikipediaforArabicNER,asin(Althobaitietal.,2014).Ourdistantlearningsys-temexploitsmanyofWikipedia’sfeatures,suchasanchortexts,redirects,andinter-languagelinks,inordertoautomaticallydevelopanArabicNEanno-tatedcorpus,whichisusedlatertotrainastate-of-the-artsupervisedclassifier.Thethreestepsofthisapproachare:1.ClassifyWikipediaarticlesintoasetofNEtypes.2.AnnotatetheWikipediatextasfollows:•Identifyandlabelmatchingtextinthetitleandthefirstsentenceofeacharticle.•LabellinkedphrasesinthetextaccordingtotheNEtypeofthetargetarticle.•Compilealistofalternativetitlesforarticlesandfilteroutambiguousones.•IdentifyandlabelmatchingphrasesinthelistandtheWikipediatext.3.Filtersentencestopreventnoisysentencesfrombeingincludedinthecorpus.Webrieflyexplainthesestepsinthefollowingsec-tions.3.2.1ClassifyingWikipediaArticlesTheWikipediaarticlesinthedatasetneedtobeclassifiedintothesetofnamedentitytypesintheclassificationscheme.Weconductanexperimentthatusessimplebag-of-wordsfeaturesextractedfromdifferentportionsoftheWikipediadocumentandmetadatasuchascategories,theinfoboxta-ble,andtokensfromthearticletitleandfirstsen-tenceofthedocument.Toimprovetheaccuracyofdocumentclassification,tokensaredistinguishedbasedontheirlocationinthedocument.There-fore,categoriesandinfoboxfeaturesaremarkedwithsuffixestodifferentiatethemfromtokensex-tractedfromthearticle’sbodytext(Tardifetal.,2009).ThefeaturesetisrepresentedbyTermFrequency-InverseDocumentFrequency(TF-IDF).InordertodevelopaWikipediadocumentclassifiertocategoriseWikipediadocumentsintoCoNLLNEtypes,namelyperson,location,organisation,mis-cellaneous,orother,weuseasetof4,000manuallyclassifiedWikipediaarticlesthatareavailablefreeonline(AlotaibiandLee,2012).80%ofthe4,000hand-classifiedWikipediaarticlesareusedfortrain-ing,and20%forevaluation.TheWikipediadocu-mentclassifierthatwetrainperformswell,achiev-inganF-scoreof90%.TheclassifieristhenusedtoclassifyallWikipediaarticles.Attheendofthisstage,weobtainalistofpairscontainingeachWikipediaarticleanditsNETypeinpreparationforthenextstage:developingtheNE-taggedtrainingcorpus.3.2.2TheAnnotationProcessTobegintheAnnotationProcessweidentifymatchingtermsinthearticletitleandthefirstsen-tenceandthentagthematchingphraseswiththeNE-typeofthearticle.Thesystemadoptspartialmatchingwhereallcorrespondingwordsintheti-tleandthefirstsentenceshouldfirstbeidentified.Then,thesystemannotatesthemandallwordsinbetween(Althobaitietal.,2014).ThenextstepistotransformthelinksbetweenWikipediaarticlesintoNEannotationsaccordingtotheNE-typeofthelinktarget.WikipediaalsocontainsafairamountofNEswithoutlinks.WefollowthetechniqueproposedbyNothmanetal.(2013),whichsuggestsinferringadditionallinksusingthealiasesforeacharticle.
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4
/
/
t
je
un
c
_
un
_
0
0
1
3
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
247
Ainsi,wecompilealistofalternativetitles,in-cludinganchortextsandNEredirects(i.e.,thelinkedphrasesandredirectedpagesthatrefertoNEarticles).Itisnecessarytofilterthelist,cependant,toremovenoisyalternativetitles,whichusuallyappeardueto(un)one-wordmeaningfulnamedentitiesthatareambiguouswhentakenoutofcontextand(b)multi-wordalternativetitlesthatcontainappositionwords(e.g.,‘President’,‘ViceMinister’).TothisendweusethefilteringalgorithmproposedbyAlthobaitietal.(2014)(seeAlgorithm1).InthisalgorithmacapitalisationprobabilitymeasureforArabicisintroduced.ThisinvolvesfindingtheEnglishglossforeachone-wordalternativenameandthencomputingitsprobabilityofbeingcapitalisedintheEnglishWikipedia.InordertofindtheEnglishglossforArabicwords,WikipediaArabic-to-Englishcross-linguallinksareexploited.IncasetheEnglishglossfortheArabicwordcouldnotbefoundusinginter-languagelinks,anonlinetranslatorisused.BeforetranslatingtheArabicword,alightstemmerisusedtoremoveprefixesandconjunctionsinordertoacquirethetranslationoftheworditselfwithoutitsassociatedaffixes.ThecapitalisationprobabilityiscomputedasfollowsPr[EN]=f(EN)isCapitalisedf(EN)isCapitalised+f(EN)notCapitalisedwhereENistheEnglishglossofthealterna-tivename;F(EN)isCapitalisedisthenumberoftimestheEnglishglossENiscapitalisedintheEnglishWikipedia;andf(EN)notCapitalisedisthenumberoftimestheEnglishglossENisnotcapitalisedintheEnglishWikipedia.Byspecifyingacapitalisationthresholdconstraint,ambiguousone-wordtitlesarepreventedfrombeingincludedinthelistofalternativetitles.Thecapitalisationthresholdissetto0.75assuggestedin(Althobaitietal.,2014).Themulti-wordalternativenameisalsoomittedifanyofitswordsbelongtothelistofappositionwords.3.2.3BuildingTheCorpusThelaststageistoincorporatesentencesintothefinalcorpus.WerefertothisdatasetastheWikipedia-derivedcorpus(WDC).Itcontains165,119sentencesofaround6milliontokens.OurmodelwasthentrainedontheWDCcorpus.InthisAlgorithm1:FilteringAlternativeNamesInput:AsetL={l1,l2,…,ln}ofallalternativenamesofWikipediaarticlesOutput:AsetRL={rl1,rl2,…,rln}ofreliablealternativenames1fori←1tondo2T←splitliintotokens3if(T.size()>=2)then/*AlltokensofTdonotbelongtoappositionlist*/4if(!containAppositiveWord(T))then5addlitothesetRL6else7lightstem←findLightStem(li)8englishgloss←translate(lightstem)/*ComputeCapitalisationProbabilityforEnglishgloss*/9capprob←compCapProb(englishgloss)10si(capprob>0.75)then11addlitothesetRLpaperwerefertothismodelastheDLclassifier.TheWDCdatasetisavailableonline1.Wealsoplantomakethemodelsavailabletotheresearchcommunity.4ClassifierCombination4.1TheCaseforClassifierCombinationInwhatfollowsweuseSSLtorefertooursemi-supervisedclassifier(seeSection3.1)andDLtore-fertoourdistantlearningclassifier(seeSection3.2).Table1showstheresultsofbothclassifierswhentestedontheANERcorptestset(seeSection5fordetailsaboutthedataset).NEsClassifiersPrecisionRecallFβ=1PERSSL85.9151.1064.08DL80.0145.1157.69LOCSSL87.9162.4873.04DL75.2167.1470.95ORGSSL84.2740.3054.52DL74.1057.0264.45OverallSSL86.0351.2964.27DL76.4456.4264.92Table1:TheresultsofSSLandDLclassifiersontheANERcorptestset.AsisapparentinTable1,theSSLclassifiertendstobemorepreciseattheexpenseofrecall.Thedis-1https://sites.google.com/site/mahajalthobaiti/resources
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4
/
/
t
je
un
c
_
un
_
0
0
1
3
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
248
tantlearningtechniqueislowerinprecisionthanthesemi-supervisedlearningtechnique,buthigherinre-call.Generally,preferenceisgiventothedistantsu-pervisionclassifierintermsofF-score.Theclassifiershavedifferentstrengths.Oursemi-supervisedalgorithmiteratesbetweenpatternex-tractionandcandidateNEsextractionandselection.OnlythecandidateNEsthattheclassifierismostconfidentofareaddedateachiteration,whichre-sultsinthehighprecision.TheSSLclassifierper-formsbetterthandistantlearningindetectingNEsthatappearinreliable/regularpatterns.Thesepat-ternsareusuallylearnedeasilyduringthetrainingphase,eitherbecausetheycontainimportantNEin-dicators2orbecausetheyaresupportedbymanyre-liablecandidateNEs.Forexample,theSSLclassi-fierhasahighprobabilitytosuccessfullydetectA(cid:211)AK.(cid:240)@“Obama”and¨A(cid:9)g(cid:9)(cid:224)UN(cid:9)fl(cid:129)(cid:29)(cid:10)渓LouisvanGaal”asper-sonnamesinthefollowingsentences:•…AJ(cid:10)(cid:9)KA¢(cid:29)(cid:10)QK.P(cid:240)(cid:9)QK(cid:10)ł(cid:10)(cid:9)Y¸@A(cid:211)AK.(cid:240)@(cid:129)(cid:28)(cid:10)(cid:13)KQ¸@h(cid:15)Q(cid:229)(cid:149)“PresidentObamasaidonavisittoBritain…”•…(cid:9)(cid:224)@YJ(cid:10)(cid:16)(cid:28)K(cid:10)UN(cid:9)KæK(cid:10)Q(cid:16)(cid:30)(cid:130)(cid:17)(cid:130)(cid:9)(cid:29)UN(cid:211)H.PY(cid:211)¨A(cid:9)g(cid:9)(cid:224)UN(cid:9)fl(cid:129)(cid:29)(cid:10)渨A(cid:16)fl“LouisvanGaalthemanagerofManchesterUnitedsaidthat…”Thepatternsextractedfromsuchsentencesinthenewswiredomainarelearnedeasilyduringthetrain-ingphase,astheycontaingoodNEindicatorslike(cid:129)(cid:28)(cid:10)(cid:13)KQ¸@“president”andH.PY(cid:211)“manager”.OurdistantlearningmethodreliesonWikipediastructureandlinkstoautomaticallycreateNEanno-tateddata.ItalsodependsonWikipediafeatures,suchasinter-languagelinksandredirects,tohandletherichmorphologyofArabicwithouttheneedtoperformexcessivepre-processingsteps(e.g.,POS-tagging,deepmorphologicalanalysis),whichhasaslightnegativeeffectontheprecisionoftheDLclas-sifier.Therecall,cependant,oftheDLclassifierishigh,coveringasmanyNEsaspossibleinallpos-sibledomains.Therefore,theDLclassifierisbetterthantheSSLclassifierindetectingNEsthatappearinambiguouscontexts(theycanbeusedfordiffer-entNEtypes)andwithnoobviousclues(NEindi-cators).Forexample,detectingł(cid:10)P@Q(cid:30)(cid:10)(cid:9)fl“Ferrari”andAJ(cid:10)»æ(cid:9)K“Nokia”asorganizationnamesinthefollowingsentences:2AlsoknownastriggerwordswhichhelpinidentifyingNEswithintext•…ł(cid:10)P@Q(cid:30)(cid:10)(cid:9)fl—Qkł(cid:10)(cid:9)Y¸@,æ(cid:9)JK(cid:10)P.(cid:16)(cid:135)(cid:13)KA(cid:131)œ˛«æ(cid:130)(cid:9)(cid:29)æ¸@—Y(cid:16)fi(cid:16)K“AlonsogotaheadoftheRenaultdriverwhopreventedFerrarifrom…”•(cid:16)Ø(cid:16)fi(cid:9)fi(cid:146)¸@—A(cid:214)(cid:16)(cid:223)@(cid:9)(cid:224)C«@(cid:9)Æ(cid:211)—æK(cid:10)Y“K.AJ(cid:10)»æ(cid:9)KH.A¢(cid:9)kZAg.“Nokia’sspeechcameadayafterthecomple-tionofthedeal”ThestrengthsandweaknessesoftheSSLandDLclassifiersindicatesthataclassifierensemblecouldperformbetterthanitsindividualcomponents.4.2ClassifierCombinationMethodsClassifiercombinationmethodsaresuitablewhenweneedtomakethebestuseofthepredictionsofmultipleclassifierstoenablehigheraccuracyclassi-fications.Dietterich(2000un)reviewsmanymethodsforconstructingensemblesandexplainswhyclas-sifiercombinationtechniquescanoftengainbetterperformancethananybaseclassifier.Tulyakovetal.(2008)introducevariouscategoriesofclassifiercombinationsaccordingtodifferentcriteriainclud-ingthetypeoftheclassifier’soutputandthelevelatwhichthecombinationsoperate.Severalempir-icalandtheoreticalstudieshavebeenconductedtocompareensemblemethodssuchasboosting,ran-domisation,andbaggingtechniques(MaclinandOpitz,1997;Dietterich,2000b;BauerandKohavi,1999).GhahramaniandKim(2003)exploreagen-eralframeworkforaBayesianmodelcombinationthatexplicitlymodelstherelationshipbetweeneachclassifier’soutputandtheunknowntruelabel.Assuch,multiclassBayesianClassifierCombination(BCC)modelsaredevelopedtocombinepredictionsofmultipleclassifiers.TheirproposedmethodforBCCinthemachinelearningcontextisderiveddi-rectlyfromthemethodproposedin(Haitovskyetal.,2002)formodellingdisagreementbetweenhumanassessors,whichinturnisanextensionof(DawidandSkene,1979).Similarstudiesformodellingdataannotationusingavarietyofmethodsarepresentedin(Carpenter,2008;CohnandSpecia,2013).Simp-sonetal.(2013)presentavariantofBCCinwhichtheyconsidertheuseofaprincipledapproximateBayesianmethod,variationalBayes(VB),asanin-ferencetechniqueinsteadofusingGibbsSampling.Theyalsoalterthemodelsoastousepointvaluesforhyper-parameters,insteadofplacingexponentialhyper-priorsoverthem.
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4
/
/
t
je
un
c
_
un
_
0
0
1
3
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
249
ThefollowingsectionsdetailthecombinationmethodsusedinthispapertocombinetheminimallysupervisedclassifiersforArabicNER.4.2.1VotingVotingisthemostcommonmethodinclassifiercombinationbecauseofitssimplicityandaccept-ableresults(VanHalterenetal.,2001;VanErpetal.,2002).Eachclassifierisallowedtovotefortheclassofitschoice.Itiscommontotakethema-jorityvote,whereeachbaseclassifierisgivenonevoteandtheclasswiththehighestnumberofvotesischosen.Inthecaseofatie,whentwoormoreclassesreceivethesamenumberofvotes,arandomselectionistakenfromamongthewinningclasses.Itisuseful,cependant,ifbaseclassifiersaredistin-guishedbytheirquality.Forthispurpose,weightsareusedtoencodetheimportanceofeachbaseclas-sifier(VanErpetal.,2002).Equalvotingassumesthatallclassifiershavethesamequality(VanHalterenetal.,2001).Weightedvoting,ontheotherhand,givesmoreweighttoclassifiersofbetterquality.So,eachclassifierisweightedaccordingtoitsoverallprecision,oritsprecisionandrecallontheclassitsuggests.Formally,givenKclassifiers,awidelyusedcom-binationschemeisthroughthelinearinterpolationoftheclassifiers’classprobabilitydistributionasfollowsP(C|SK1(w))=KXk=1Pk(C|Sk(w))·λk(w)wherePk(C|Sk(w))isanestimationoftheproba-bilitythatthecorrectclassificationisCgivenSk(w),theclassforthewordwassuggestedbyclassifierk.λk(w)istheweightthatspecifiestheimportancegiventoeachclassifierkinthecombination.Pk(C|Sk(w))iscomputedasfollowsPk(C|Sk(w))=(1,ifSk(w)=C0,otherwiseForequalvoting,eachclassifiershouldhavethesameweight(e.g.,λk(w)=1/K).Incaseofweightedvoting,theweightassociatedwitheachclassifiercanbecomputedfromitsprecisionand/orrecallasillustratedabove.4.2.2IndependentBayesianClassifierCombination(IBCC)UsingaBayesianapproachtoclassifiercombi-nation(BCC)providesamathematicalcombina-tionframeworkinwhichmanyclassifiers,withvar-iousdistributionsandtrainingfeatures,canbecom-binedtoprovidemoreaccurateinformation.Thisframeworkexplicitlymodelstherelationshipbe-tweeneachclassifier’soutputandtheunknowntruelabel(Levenbergetal.,2014).Thissectionde-scribestheBayesianapproachtotheclassifiercom-binationweadoptedinthispaperwhich,liketheworkofLevenbergetal.(2014),isbasedonSimp-sonetal.(2013)simplificationofGhahramaniandKim(2003)model.Forithdatapoint,truelabeltiisassumedtobegeneratedbyamultinomialdistributionwiththepa-rameterδ:p(ti=j|d)=δj,whichmodelstheclassproportions.Truelabelsmaytakevaluesti=1…J.,whereJisthenumberoftrueclasses.Itisalsoas-sumedthatthereareKbaseclassifiers.Theoutputoftheclassifiersareassumedtobediscretewithvaluesl=1…L,whereListhenumberofpossibleout-puts.Theoutputc(k)ioftheclassifierkisassumedtobegeneratedbyamultinomialdistributionwithparametersπ(k)j:p(c(k)i=l|ti=j,π(k)j)=π(k)j,lwhereπ(k)istheconfusionmatrixfortheclassifierk,whichquantifiesthedecision-makingabilitiesofeachbaseclassifier.AsinSimpsonetal.(2013)étude,weas-sumethatparametersπ(k)jandδhaveDirichletpriordistributionswithhyper-parametersα(k)0,j=[un(k)0,j1,α(k)0,j2,…,un(k)0,jL]andν=[ν0,1,ν0,2,…,ν0,J]respectively.Giventheobservedclasslabelsandbasedontheaboveprior,thejointdistributionoverallvariablesfortheIBCCmodelisp(d,Π,t,c|A0,ν)=IYi=1{δtiKYk=1π(k)ti,c(k)je}p(d|ν)p(Π|UN),whereΠ={π(k)j|j=1…J.,k=1…K}andA0={un(k)0,j|j=1…J.,k=1…K}.Theconditionalprobabilityofatestdatapointtibeingassignedclassjisgivenbyp(ti=j)=ρijPJy=1ρiy,
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4
/
/
t
je
un
c
_
un
_
0
0
1
3
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
250
whereρij=δjKYk=1πj,c(k)i.InourimplementationweusedpointvaluesforA0asin(Simpsonetal.,2013).Thevaluesofhyper-parametersA0offeredanaturalmethodtoincludeanypriorknowledge.Thus,theycanberegardedaspseudo-countsofpriorobservationsandtheycanbechosentorepresentanypriorlevelofuncertaintyintheconfusionmatrices,Π.Ourinferencetech-niquefortheunknownvariables(d,π,andt)wasGibbssamplingasin(GhahramaniandKim,2003;Simpsonetal.,2013).Figure2showsthedirectedgraphicalmodelforIBCC.Thec(k)irepresentsob-servedvalues,circularnodesarevariableswithdis-tributionsandsquarenodesarevariablesinstantiatedwithpointvalues. 𝝂𝟎 𝜶𝟎(𝒌) Classifiers: k=1, 2, …, K Data points: je = 1, 2, …, I 𝜹 𝝅(𝒌) 𝒄𝒊(𝒌) 𝒕𝒊 Figure2:ThedirectedgraphofIBCC.5DataInthissection,wedescribethetwodatasetsweused:•Validationset3(NEWS+BBCNEWS):90%ofthisdatasetisusedtoestimatetheweightofeachbaseclassifierand10%isusedtoperformerroranalysis.•Testset(ANERcorptestset):Thisdatasetisusedtoevaluatedifferentclassifiercombina-tionmethods.Thevalidationsetiscomposedoftwodatasets:NEWSandBBCNEWS.TheNEWSsetcontainsaround15ktokenscollectedbyDarwish(2013)3Alsoknownasdevelopmentset.fromtheRSSfeedoftheArabic(Egypt)versionofnews.google.comfromOctober2012.WecreatedtheBBCNEWScorpusbycollectingarepresenta-tivesampleofnewsfromBBCinMay2014.Itcon-tainsaround3ktokensandcoversdifferenttypesofnewssuchaspolitics,économie,andentertainment.TheANERcorptestsetmakesup20%ofthewholeANERcorpset.TheANERcorpsetisanews-wirecorpusbuiltandmanuallytaggedespeciallyfortheArabicNERtaskbyBenajibaetal.(2007un)andcontainsaround150ktokens.Thistestsetiscom-monlyusedintheArabicNERliteraturetoevaluatesupervisedclassifiers(BenajibaandRosso,2008;Abdul-HamidandDarwish,2010;Abdallahetal.,2012;OudahandShaalan,2012)andminimally-supervisedclassifiers(AlotaibiandLee,2013;Al-thobaitietal.,2013;Althobaitietal.,2014),whichallowsustoreviewtheperformanceofthecombinedclassifiersandcompareittotheperformanceofeachbaseclassifier.6ExperimentalAnalysis6.1ExperimentalSetupIntheIBCCmodel,thevalidationdatawasusedasknowntitogroundtheestimatesofmodelparam-eters.Thehyper-parametersweresetasα(k)j=1andνj=1(KimandGhahramani,2012;Leven-bergetal.,2014).Theinitialvaluesforrandomvariablesweresetasfollows:(un)theclasspropor-tionδwasinitialisedtotheresultofcountingtiand(b)theconfusionmatrixπwasinitialisedtothere-sultofcountingtiandtheoutputofeachclassifierc(k).Gibbssamplingwasrunwellpaststability(i.e.,1000iterations).Stabilitywasactuallyreachedinapproximately100iterations.Allparametersrequiredinvotingmethodswerespecifiedusingthevalidationset.Weexaminedtwodifferentvotingmethods:equalvotingandweightedvoting.Inthecaseofequalvoting,eachclassifierwasgivenanequalweight,(1/K)whereKwasthenumberofclassifierstobecombined.Inweightedvoting,totalprecisionwasusedinordertogivepref-erencetoclassifierswithgoodquality.
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4
/
/
t
je
un
c
_
un
_
0
0
1
3
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
251
6.2ResultsandDiscussion6.2.1ASimpleBaselineCombinedClassifierAproposedcombinedclassifiersimplyandstraightforwardlymakesdecisionsbasedontheagreeddecisionsofthebaseclassifiers,namelytheSSLclassifierandDLclassifier.Thatis,ifthebaseclassifiersagreeontheNEtypeofacertainword,thenitisannotatedbyanagreedNEtype.Inthecaseofdisagreement,thewordisconsiderednotnamedentity.Table2showstheresultsofthiscombinedclassifier,whichisconsideredabaselineinthispa-per.PrecisionRecallFβ=1Person97.3124.6939.39Location98.3540.0156.88Organisation97.3833.249.52Overall97.6832.6348.92Table2:TheresultsofthebaselineTheresultsofthecombinedclassifiershowsveryhighprecision,whichindicatesthatbothbaseclas-sifiersaremostlyaccurate.Thebaseclassifiersalsocommitdifferenterrorsthatareevidentinthelowrecall.Theaccuracyanddiversityofthesingleclas-sifiersarethemainconditionsforacombinedclas-sifiertohavebetteraccuracythananyofitscom-ponents(Dietterich,2000un).Donc,inthenextsectionwetakeintoconsiderationvariousclassifiercombinationmethodsinordertoaggregatethebestdecisionsofSSLandDLclassifiers,andtoimproveoverallperformance.6.2.2CombinedClassifiers:ClassifierCombinationMethodsTheSSLandDLclassifiersaretrainedwithtwodifferentalgorithmsusingdifferenttrainingdata.TheSSLclassifieristrainedonANERcorptrainingdata,whiletheDLclassifieristrainedonacorpusautomaticallyderivedfromArabicWikipedia,asex-plainedinSection3.1and3.2.WecombinetheSSLandDLclassifiersusingthethreeclassifiercombinationmethods,namelyequalvoting,weightedvoting,andIBCC.Table3showstheresultsoftheseclassifiercombinationmethods.TheIBCCschemeoutperformsallvotingtechniquesandbaseclassifiersintermsofF-score.Regard-ingprecision,votingtechniquesshowthehighestscores.However,thehighprecisionisaccompaniedbyareductioninrecallforbothvotingmethods.TheIBCCcombinationmethodalsohasrelativelyhighprecisioncomparedtotheprecisionofbaseclassi-fiers.MuchbetterrecallisregisteredforIBCC,butitisstilllow.NEsCombinationMethodsPrecisionRecallFβ=1PEREqualVoting79.9941.8854.97WeightedVoting80.1544.2457.01IBCC77.8763.8670.17LOCEqualVoting86.8730.6645.32WeightedVoting87.4830.2344.93IBCC81.5259.8669.03ORGEqualVoting97.0129.9745.79WeightedVoting98.1130.9847.09IBCC95.4434.3150.47OverallEqualVoting87.9634.1749.22WeightedVoting88.5835.1550.33IBCC84.9452.6865.03NEsBaseClassifiersPrecisionRecallFβ=1OverallSSL86.0351.2964.27DL76.4456.4264.92Table3:Theperformancesofvariouscombinationmeth-ods.6.2.3CombinedClassifiers:RestrictionoftheCombinationProcessAnerroranalysisofthevalidationsetshowsthat10.01%oftheNEswerecorrectlydetectedbythesemi-supervisedclassifier,butconsiderednotNEsbythedistantlearningclassifier.Atthesametime,thedistantlearningclassifiermanagedtocorrectlydetect25.44%oftheNEsthatwereconsiderednotNEsbythesemi-supervisedclassifier.Wealsono-ticedthatfalsepositiverates,i.e.thepossibilityofconsideringawordNEwhenitisactuallynotNE,areverylow(0.66%and2.45%forthesemi-supervisedanddistantlearningclassifiersrespec-tively).TheselowfalsepositiveratesandthehighpercentageoftheNEsthataredetectedandmissedbythetwoclassifiersinamutuallyexclusivewaycanbeexploitedtoobtainbetterresults,morespecif-ically,toincreaserecallwithoutnegativelyaffect-ingprecision.Therefore,werestrictedthecombi-
je
D
o
w
n
o
un
d
e
d
F
r
o
m
h
t
t
p
:
/
/
d
je
r
e
c
t
.
m
je
t
.
e
d
toi
/
t
un
c
je
/
je
un
r
t
je
c
e
–
p
d
F
/
d
o
je
/
.
1
0
1
1
6
2
/
t
je
un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4
/
/
t
je
un
c
_
un
_
0
0
1
3
6
p
d
.
F
b
oui
g
toi
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
252
nationprocesstoonlyincludesituationswherethebaseclassifiersagreeordisagreeontheNEtypeofacertainword.ThecombinationprocessisignoredincaseswherethebaseclassifiersonlydisagreeondetectingNEs.Forexample,ifthebaseclassifiersdisagreeonwhetheracertainwordisanNEornot,thewordisautomaticallyconsideredanNE.Figure3providessomeexamplesthatillustratetherestric-tionsweappliedtothecombinationprocess.TheannotationsintheexamplesarebasedontheCoNLL2003annotationguidelines(Chinchoretal.,1999). Predictions of SSL Classifier Predictions of DL Classifier B-PER B-LOC O B-LOC B-ORG O B-PER B-PER Apply Combination Method B-LOC B-ORG Apply Combination Method Figure3:Examplesofrestrictingthecombinationpro-cess.Restrictingthecombinationprocessinthiswayincreasesrecallwithoutnegativelyaffectingthepre-cision,asseeninTable4.TheincreaseinrecallmakestheoverallF-scoreforallcombinationmeth-odshigherthanthoseofbaseclassifiers.ThiswayofusingtheIBCCmodelresultsinaperformancelevelthatissuperiortoalloftheindividualclas-sifiersandothervoting-basedcombinedclassifiers.Therefore,theIBCCmodelleadstoa12%increaseintheperformanceofthebestbaseclassifier,whilevotingmethodsincreasetheperformancebyaround7%-10%.Theseresultshighlighttheroleofre-strictingthecombination,whichaffectstheperfor-manceofcombinationmethodsandgivesmorecon-troloverhowandwhenthepredictionsofbaseclas-sifiersshouldbecombined.6.2.4ComparingCombinedClassifiers:StatisticalSignificanceofResultsWetestedwhetherthedifferenceinperformancebetweenthethreeclassifiercombinationmethods-equalvoting,weightedvoting,andIBCC-issig-nificantusingtwodifferentstatisticaltestsovertheresultsofthesecombinationmethodsonanANER-corptestset.Thealphalevelof0.01wasusedasasignificancecriterionforallstatisticaltests.First,Werananon-parametricsigntest.Thesmallp-value(p(cid:28)0.01)foreachpairofthethreecombina-NEsCombinationMethodsPrecisionRecallFβ=1PEREqualVoting74.4661.8867.59WeightedVoting77.7763.5069.91IBCC77.8864.5670.60LOCEqualVoting74.0471.3672.68WeightedVoting74.0573.7073.86IBCC76.2075.9176.05ORGEqualVoting76.0163.9769.47WeightedVoting76.3066.6071.12IBCC78.9166.6572.26OverallEqualVoting74.8465.7469.99WeightedVoting76.0467.9371.76IBCC77.6669.0473.10NEsBaseClassifiersPrecisionRecallFβ=1OverallSSL86.0351.2964.27DL76.4456.4264.92Table4:Theperformancesofvariouscombinationmeth-odswhenrestrictingthecombinationprocess.tionmethods,asseeninTable5,suggeststhatthesemethodsaresignificantlydifferent.Theonlycom-parisonwherenosignificancewasfoundisequalvotingvs.weightedvoting,whenweusedthemtocombinethedatawithoutanyrestrictions(p=0.3394).CombinationMethods(WithoutRestriction)EqualVotingWeightedVotingIBCCEqualVotingWeightedVoting0.3394IBCC<2.2E-16<2.2E-16CombinationMethods(WithRestriction)EqualVotingWeightedVotingIBCCEqualVotingWeightedVoting1.78E-07IBCC<2.2E-161.97E-06Table5:Thesigntestresults(exactpvalues)forthepair-wisecomparisonsofthecombinationmethods.Second,weusedabootstrapsampling(EfronandTibshirani,1994),whichisbecomingthedefactostandardsinNLP(Søgaardetal.,2014).Table6compareseachpairofthethreecombinationmeth-odsusingabootstrapsamplingoverdocumentswith10,000replicates.Itshowsthep-valuesandconfi-denceintervalsofthedifferencebetweenmeans.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
1
3
6
1
5
6
6
7
6
4
/
/
t
l
a
c
_
a
_
0
0
1
3
6
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
253
CombinationWithRestrictionCombinationMethodsComparisonp-value[95%CI]WeightedVoting,EqualVoting0.000[0.270,0.600]IBCC,EqualVoting0.000[0.539,0.896]IBCC,WeightedVoting0.000[0.157,0.426]CombinationWithoutRestrictionCombinationMethodsComparisonp-value[95%CI]WeightedVoting,EqualVoting0.508[-0.365,0.349]IBCC,EqualVoting0.000[4.800,6.122]IBCC,WeightedVoting0.000[4.783,6.130]Table6:Thebootstraptestresults(p-valuesandCI)forthepairwisecomparisonsofthecombinationmethods.Thedifferencesinperformancebetweenalmostallthethreemethodsofcombinationarehighlysig-nificant.Theoneexceptionisthecomparisonbe-tweenequalvotingandweightedvoting,whentheyareusedasacombinationmethodwithoutrestric-tion,whichshowsanon-significantdifference(p-value=0.508,CI=-0.365to0.349).Generally,theIBCCschemeperformssignifi-cantlybetterthanvoting-basedcombinationmeth-odswhetherweimposerestrictionsonthecombina-tionprocessornot,ascanbeseeninTable3andTable4.7ConclusionMajoradvancesoverthepastdecadehaveoccurredinArabicNERwithregardtoutilisingvarioussu-pervisedsystems,exploringdifferentfeatures,andproducingmanuallyannotatedcorporathatmostlycoverthestandardsetofNEtypes.Moreeffortandtimeforadditionalmanualannotationsarere-quiredwhenexpandingthesetofNEtypes,orex-portingNEclassifierstonewdomains.Thishasmo-tivatedresearchinminimallysupervisedmethods,suchassemi-supervisedlearninganddistantlearn-ing,buttheperformanceofsuchmethodsislowerthanthatachievedbysupervisedmethods.How-ever,semi-supervisedmethodsanddistantlearningtendtohavedifferentstrengths,whichsuggeststhatbetterresultsmaybeobtainedbycombiningthesemethods.Therefore,wetrainedtwoclassifiersbasedondistantlearningandsemi-supervisiontechniques,andthencombinedthemusingavarietyofclassifiercombinationschemes.Ourmaincontributionsin-cludethefollowing:•WepresentedanovelapproachtoArabicNERusingacombinationofsemi-supervisedlearninganddistantsupervision.•WeusedtheIndependentBayesianClassifierCombination(IBCC)schemeforNER,andcom-paredittotraditionalvotingmethods.•Weintroducedtheclassifiercombinationrestric-tionasameansofcontrollinghowandwhenthepredictionsofbaseclassifiersshouldbecom-bined.Thisresearchdemonstratedthatcombiningthetwominimalsupervisionapproachesusingvariousclas-sifiercombinationmethodsleadstobetterresultsforNER.TheuseofIBCCimprovestheperformanceby8percentagepointsoverthebestbaseclassi-fier,whereastheimprovementintheperformancewhenusingvotingmethodsisonly4to6percent-agepoints.Althoughallcombinationmethodsre-sultinanaccurateclassification,theIBCCmodelachievesbetterrecallthanothertraditionalcombi-nationmethods.Ourexperimentsalsoshowedhowrestrictingthecombinationprocesscanincreasetherecallabilityofallthecombinationmethodswithoutnegativelyaffectingtheprecision.TheapproachweproposedinthispapercanbeeasilyadaptedtonewNEtypesanddifferentdo-mainswithouttheneedforhumanintervention.Inaddition,therearemanywaystorestrictthecombi-nationprocessaccordingtotheapplications’prefer-ences,eitherproducinghighaccuracyorrecall.Forexample,wemayobtainahighlyaccuratecombinedclassifierifwedonotcombinethepredictionsofallbaseclassifiersforacertainwordandautomaticallyconsideritnotNEwhenoneofthebaseclassifierconsidersthiswordnotNE.ReferencesSheriefAbdallah,KhaledShaalan,andMuhammadShoaib.2012.Integratingrule-basedsystemwithclassificationforarabicnamedentityrecognition.InComputationalLinguisticsandIntelligentTextPro-cessing,pages311–322.Springer.SamirAbdelRahman,MohamedElarnaoty,MarwaMagdy,andAlyFahmy.2010.Integratedmachine
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
1
3
6
1
5
6
6
7
6
4
/
/
t
l
a
c
_
a
_
0
0
1
3
6
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
254
learningtechniquesforarabicnamedentityrecogni-tion.IJCSI,7:27–36.AhmedAbdul-HamidandKareemDarwish.2010.Sim-plifiedfeaturesetforArabicnamedentityrecognition.InProceedingsofthe2010NamedEntitiesWorkshop,pages110–115.AssociationforComputationalLin-guistics.StevenAbney.2010.Semisupervisedlearningforcom-putationallinguistics.CRCPress.FahdAlotaibiandMarkLee.2012.MappingAra-bicWikipediaintothenamedentitiestaxonomy.InProceedingsofCOLING2012:Posters,pages43–52,Mumbai,India,December.TheCOLING2012Orga-nizingCommittee.FahdAlotaibiandMarkLee.2013.AutomaticallyDe-velopingaFine-grainedArabicNamedEntityCorpusandGazetteerbyutilizingWikipedia.InIJCNLP.MahaAlthobaiti,UdoKruschwitz,andMassimoPoesio.2013.Asemi-supervisedlearningapproachtoarabicnamedentityrecognition.InProceedingsoftheInter-nationalConferenceRecentAdvancesinNaturalLan-guageProcessingRANLP2013,pages32–40,Hissar,Bulgaria,September.INCOMALtd.Shoumen,BUL-GARIA.MahaAlthobaiti,UdoKruschwitz,andMassimoPoesio.2014.AutomaticCreationofArabicNamedEntityAnnotatedCorpusUsingWikipedia.InProceedingsoftheStudentResearchWorkshopatthe14thConfer-enceoftheEuropeanChapteroftheAssociationforComputationalLinguistics(EACL),pages106–115,Gothenburg.MarcoBaroni,BrianMurphy,EduardBarbu,andMas-simoPoesio.2010.Strudel:ACorpus-BasedSeman-ticModelBasedonPropertiesandTypes.CognitiveScience,34(2):222–254.EricBauerandRonKohavi.1999.Anempiricalcomparisonofvotingclassificationalgorithms:Bag-ging,boosting,andvariants.Machinelearning,36(1-2):105–139.YassineBenajibaandPaoloRosso.2008.Arabicnamedentityrecognitionusingconditionalrandomfields.InProc.ofWorkshoponHLT&NLPwithintheArabicWorld,LREC,volume8,pages143–153.YassineBenajiba,PaoloRosso,andJos´eMiguelBened´ıruiz.2007a.Anersys:AnArabicNamedEn-tityRecognitionSystembasedonMaximumEntropy.InComputationalLinguisticsandIntelligentTextPro-cessing,pages143–153.Springer.YassineBenajiba,PaoloRosso,andJos´eMiguelBened´ıruiz.2007b.Anersys:Anarabicnameden-tityrecognitionsystembasedonmaximumentropy.InComputationalLinguisticsandIntelligentTextPro-cessing,pages143–153.Springer.YassineBenajiba,MonaDiab,PaoloRosso,etal.2008.Arabicnamedentityrecognition:Ansvm-basedap-proach.InProceedingsof2008ArabInternationalConferenceonInformationTechnology(ACIT),pages16–18.BobCarpenter.2008.Multilevelbayesianmodelsofcategoricaldataannotation.Unpublishedmanuscript.Availableonlineathttp://lingpipe-blog.com/lingpipe-white-papers/,lastaccessed15-March-2015.NancyChinchor,EricaBrown,LisaFerro,andPattyRobinson.1999.1999NamedEntityRecognitionTaskDefinition.MITREandSAIC.TrevorCohnandLuciaSpecia.2013.Modellinganno-tatorbiaswithmulti-taskgaussianprocesses:Anap-plicationtomachinetranslationqualityestimation.InACL,pages32–42.KareemDarwish.2013.NamedEntityRecognitionus-ingCross-lingualResources:ArabicasanExample.InACL,pages1558–1567.AlexanderPhilipDawidandAllanMSkene.1979.Max-imumlikelihoodestimationofobservererror-ratesus-ingtheemalgorithm.Appliedstatistics,pages20–28.ThomasG.Dietterich.2000a.Ensemblemethodsinma-chinelearning.InMultipleClassifierSystems,volume1857ofLectureNotesinComputerScience,pages1–15.SpringerBerlinHeidelberg.ThomasGDietterich.2000b.Anexperimentalcompar-isonofthreemethodsforconstructingensemblesofdecisiontrees:Bagging,boosting,andrandomization.Machinelearning,40(2):139–157.BradleyEfronandRobertJTibshirani.1994.Anintro-ductiontothebootstrap.CRCpress.AliElsebai,FaridMeziane,andFatmaZohraBelkredim.2009.Arulebasedpersonsnamesarabicextractionsystem.CommunicationsoftheIBIMA,11(6):53–59.RaduFlorian,AbeIttycheriah,HongyanJing,andTongZhang.2003.Namedentityrecognitionthroughclas-sifiercombination.InProceedingsoftheseventhcon-ferenceonNaturallanguagelearningatHLT-NAACL2003-Volume4,pages168–171.AssociationforCom-putationalLinguistics.ZoubinGhahramaniandHyun-ChulKim.2003.Bayesianclassifiercombination.Technicalreport,UniversityCollegeLondon.YHaitovsky,ASmith,andYLiu.2002.Modellingdis-agreementsamongandwithinratersassessmentsfromthebayesianpointofview.InDraft.PresentedattheValenciameeting2002.Hyun-ChulKimandZoubinGhahramani.2012.Bayesianclassifiercombination.InInternationalcon-ferenceonartificialintelligenceandstatistics,pages619–627.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
1
3
6
1
5
6
6
7
6
4
/
/
t
l
a
c
_
a
_
0
0
1
3
6
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
255
AbbyLevenberg,StephenPulman,KaroMoilanen,Ed-winSimpson,andStephenRoberts.2014.Predict-ingeconomicindicatorsfromwebtextusingsentimentcomposition.InternationalJournalofComputerandCommunicationEngineering,3(2):109–115.RichardMaclinandDavidOpitz.1997.Anempiri-calevaluationofbaggingandboosting.AAAI/IAAI,1997:546–551.SlimMesfar.2007.Namedentityrecognitionforara-bicusingsyntacticgrammars.InNaturalLanguageProcessingandInformationSystems,pages305–316.Springer.PeterMika,MassimilianoCiaramita,HugoZaragoza,andJordiAtserias.2008.LearningtoTagandTag-gingtoLearn:ACaseStudyonWikipedia.vol-ume23,pages26–33.MikeMintz,StevenBills,RionSnow,andDanJurafsky.2009.Distantsupervisionforrelationextractionwith-outlabeleddata.InProceedingsoftheJointConfer-enceofthe47thAnnualMeetingoftheACLandthe4thInternationalJointConferenceonNaturalLan-guageProcessingoftheAFNLP:Volume2-Volume2,ACL’09,pages1003–1011,Stroudsburg,Pennsylvanie,USA.AssociationforComputationalLinguistics.DavidNadeauandSatoshiSekine.2007.Asurveyofnamedentityrecognitionandclassification.Lingvisti-caeInvestigationes,30(1):3–26.DavidNadeau.2007.Semi-supervisednamedentityrecognition:learningtorecognize100entitytypeswithlittlesupervision.Truc-VienT.NguyenandAlessandroMoschitti.2011.End-to-endrelationextractionusingdistantsupervi-sionfromexternalsemanticrepositories.InPro-ceedingsofthe49thAnnualMeetingoftheAssocia-tionforComputationalLinguistics:HumanLanguageTechnologies:ShortPapers-Volume2,HLT’11,pages277–282,Stroudsburg,Pennsylvanie,USA.AssociationforComputationalLinguistics.JoelNothman,NickyRingland,WillRadford,TaraMur-phy,andJamesRCurran.2013.Learningmultilin-gualNamedEntityRecognitionfromWikipedia.Ar-tificialIntelligence,194:151–175.MaiOudahandKhaledFShaalan.2012.Apipelineara-bicnamedentityrecognitionusingahybridapproach.InCOLING,pages2159–2176.MariusPasca,DekangLin,JeffreyBigham,AndreiLif-chits,andAlpaJain.2006.Organizingandsearchingtheworldwideweboffacts-stepone:theone-millionfactextractionchallenge.InAAAI,volume6,pages1400–1405.AlexanderERichmanandPatrickSchone.2008.MiningWikiResourcesforMultilingualNamedEntityRecog-nition.InACL,pages1–9.EllenRiloffandRosieJones.1999.Learningdictionar-iesforinformationextractionbymulti-levelbootstrap-ping.InAAAI,pages474–479.SriparnaSahaandAsifEkbal.2013.Combiningmul-tipleclassifiersusingvotebasedclassifierensembletechniquefornamedentityrecognition.Data&KnowledgeEngineering,85:15–39.SatoshiSekineetal.1998.NYU:DescriptionoftheJapaneseNEsystemusedforMET-2.InProceed-ingsoftheSeventhMessageUnderstandingConfer-ence(MUC-7),volume17.KhaledShaalanandHafsaRaza.2009.Nera:Namedentityrecognitionforarabic.JournaloftheAmeri-canSocietyforInformationScienceandTechnology,60(8):1652–1663.EdwinSimpson,StephenRoberts,IoannisPsorakis,andArfonSmith.2013.Dynamicbayesiancombinationofmultipleimperfectclassifiers.InDecisionMakingandImperfection,pages1–35.Springer.AndersSøgaard,AndersJohannsen,BarbaraPlank,DirkHovy,andHectorMartinez.2014.Whatsinap-valueinnlp?InProceedingsoftheeighteenthconferenceoncomputationalnaturallanguagelearning(CONLL14),pages1–10.SamTardif,JamesR.Curran,andTaraMurphy.2009.ImprovedTextCategorisationforWikipediaNamedEntities.InProceedingsoftheAustralasianLanguageTechnologyAssociationWorkshop,pages104–108.SergeyTulyakov,StefanJaeger,VenuGovindaraju,andDavidDoermann.2008.Reviewofclassifiercombi-nationmethods.InMachineLearninginDocumentAnalysisandRecognition,pages361–386.Springer.MerijnVanErp,LouisVuurpijl,andLambertSchomaker.2002.Anoverviewandcomparisonofvotingmethodsforpatternrecognition.InEighthInternationalWork-shoponFrontiersinHandwritingRecognition,pages195–200.IEEE.HansVanHalteren,WalterDaelemans,andJakubZa-vrel.2001.Improvingaccuracyinwordclasstaggingthroughthecombinationofmachinelearningsystems.Computationallinguistics,27(2):199–229.WajdiZaghouani.2014.CriticalSurveyoftheFreelyAvailableArabicCorpora.InWorkshoponFree/Open-SourceArabicCorporaandCorporaPro-cessingTools,pages1–8,Reykjavik,Iceland.TongZhang,FredDamerau,andDavidJohnson.2002.Textchunkingbasedonageneralizationofwinnow.TheJournalofMachineLearningResearch,2:615–637.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
e
d
u
/
t
a
c
l
/
l
a
r
t
i
c
e
-
p
d
f
/
d
o
i
/
.
1
0
1
1
6
2
/
t
l
a
c
_
a
_
0
0
1
3
6
1
5
6
6
7
6
4
/
/
t
l
a
c
_
a
_
0
0
1
3
6
p
d
.
f
b
y
g
u
e
s
t
t
o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3
256
Télécharger le PDF