Transactions of the Association for Computational Linguistics, vol. 3, pp. 243–255, 2015. Action Editor: Ryan McDonald.

Transactions of the Association for Computational Linguistics, vol. 3, pp. 243–255, 2015. Action Editor: Ryan McDonald.
Submission batch: 1/2015; Revision batch 4/2015; Published 5/2015.

2015 Association for Computational Linguistics. Distributed under a CC-BY-NC-SA 4.0 Licence.

c
(cid:13)

CombiningMinimally-supervisedMethodsforArabicNamedEntityRecognitionMahaAlthobaiti,UdoKruschwitz,andMassimoPoesioSchoolofComputerScienceandElectronicEngineeringUniversityofEssexColchester,ROYAUME-UNI{mjaltha,udo,poesio}@essex.ac.ukAbstractSupervisedmethodscanachievehighperfor-manceonNLPtasks,suchasNamedEn-tityRecognition(NER),butnewannotationsarerequiredforeverynewdomainand/orgenrechange.Thishasmotivatedresearchinminimallysupervisedmethodssuchassemi-supervisedlearninganddistantlearning,butneithertechniquehasyetachievedperfor-mancelevelscomparabletothoseofsuper-visedmethods.Semi-supervisedmethodstendtohaveveryhighprecisionbutcompar-ativelylowrecall,whereasdistantlearningtendstoachievehigherrecallbutlowerpre-cision.Thiscomplementaritysuggeststhatbetterresultsmaybeobtainedbycombiningthetwotypesofminimallysupervisedmeth-ods.Inthispaperwepresentanovelap-proachtoArabicNERusingacombinationofsemi-supervisedanddistantlearningtech-niques.Wetrainedasemi-supervisedNERclassiﬁerandanotheroneusingdistantlearn-ingtechniques,andthencombinedthemusingavarietyofclassiﬁercombinationschemes,includingtheBayesianClassiﬁerCombina-tion(BCC)procedurerecentlyproposedforsentimentanalysis.Accordingtoourresults,theBCCmodelleadstoanincreaseinper-formanceof8percentagepointsoverthebestbaseclassiﬁers.1IntroductionSupervisedlearningtechniquesareveryeffectiveandwidelyusedtosolvemanyNLPproblems,in-cludingNER(Sekineandothers,1998;Benajibaetal.,2007a;Darwish,2013).Themaindisadvantageofsupervisedtechniques,cependant,istheneedforalargeannotatedcorpus.Althoughaconsiderableamountofannotateddataisavailableformanylan-guages,includingArabic(Zaghouani,2014),chang-ingthedomainorexpandingthesetofclassesal-waysrequiresdomain-speciﬁcexpertsandnewan-notateddata,bothofwhichdemandtimeandeffort.Therefore,muchofthecurrentresearchonNERfocusesonapproachesthatrequireminimalhumaninterventiontoexportthenamedentity(NE)clas-siﬁerstonewdomainsandtoexpandNEclasses(Nadeau,2007;Nothmanetal.,2013).Semi-supervised(Abney,2010)anddistantlearn-ingapproaches(Mintzetal.,2009;Nothmanetal.,2013)arealternativestosupervisedmethodsthatdonotrequiremanuallyannotateddata.TheseapproacheshaveprovedtobeeffectiveandeasilyadaptabletonewNEtypes.However,theperfor-manceofsuchmethodstendstobelowerthanthatachievedwithsupervisedmethods(Althobaitietal.,2013;Nadeau,2007;Nothmanetal.,2013).Weproposecombiningthesetwominimallysu-pervisedmethodsinordertoexploittheirrespectivestrengthsandtherebyobtainbetterresults.Semi-supervisedlearningtendstobemoreprecisethandistantlearning,whichinturnleadstohigherre-callthansemi-supervisedlearning.Inthiswork,weusevariousclassiﬁercombinationschemestocombinetheminimalsupervisionmethods.Mostpreviousstudieshaveexaminedclassiﬁercombi-nationschemestocombinemultiplesupervised-learningsystems(Florianetal.,2003;SahaandEkbal,2013),butthisresearchistheﬁrsttocom-bineminimalsupervisionapproaches.Inaddition,

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4

/
t

un
c
_
un
_
0
0
1
3
6
p
d

b
oui
g
u
e
s
t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

244

wereportourresultsfromtestingtherecentlypro-posedIndependentBayesianClassiﬁerCombination(IBCC)scheme(KimandGhahramani,2012;Lev-enbergetal.,2014)andcomparingitwithtraditionalvotingmethodsforensemblecombination.2Background2.1ArabicNERAlotofresearchhasbeendevotedtoArabicNERoverthepasttenyears.Muchoftheinitialworkem-ployedhand-writtenrule-basedtechniques(Mesfar,2007;ShaalanandRaza,2009;Elsebaietal.,2009).MorerecentapproachestoArabicNERarebasedonsupervisedlearningtechniques.ThemostcommonsupervisedlearningtechniquesinvestigatedforAra-bicNERareMaximumEntropy(ME)(Benajibaetal.,2007b),SupportVectorMachines(SVMs)(Be-najibaetal.,2008),andConditionalRandomFields(CRFs)(BenajibaandRosso,2008;Abdul-HamidandDarwish,2010).Darwish(2013)presentedcross-lingualfeaturesforNERthatmakeuseofthelinguisticpropertiesandknowledgebasesofanotherlanguage.Inhisstudy,EnglishcapitalisationfeaturesandanEnglishknowledgebase(DBpedia)wereexploitedasdis-criminativefeaturesforArabicNER.AlargeMa-chineTranslation(MT)phrasetableandWikipediacross-linguallinkswereusedfortranslationbetweenArabicandEnglish.TheresultsshowedanoverallF-scoreof84.3%withanimprovementof5.5%overastrongbaselinesystemonastandarddataset(theANERcorpsetcollectedbyBenajibaetal.(2007un)).Abdallahetal.(2012)proposedahybridNERsystemforArabicthatintegratesarule-basedsys-temwithadecisiontreeclassiﬁer.Theirinte-gratedapproachincreasedtheF-scorebybetween8%and14%whencomparedtotheoriginalrulebasedsystemandthepuremachinelearningtech-nique.OudahandShaalan(2012)alsodevelopedhybridArabicNERsystemsthatintegratearule-basedapproachwiththreedifferentsupervisedtech-niques:decisiontrees,SVMs,andlogisticregres-sion.Theirbesthybridsystemoutperformsstate-of-the-artArabicNERsystems(BenajibaandRosso,2008;Abdallahetal.,2012)onstandardtestsets.2.2MinimalSupervisionandNERMuchcurrentresearchseeksadequatealternativestoexpensivecorpusannotationthataddressthelimita-tionsofsupervisedlearningmethods:theneedforsubstantialhumaninterventionandthelimitednum-berofNEclassesthatcanbehandledbythesystem.Semi-supervisedtechniquesanddistantlearningareexamplesofmethodsthatrequireminimalsupervi-sion.Semi-supervisedlearning(SSL)(Abney,2010)hasbeenusedforvariousNLPtasks,includingNER(Nadeau,2007).‘Bootstrapping’isthemostcom-monsemi-supervisedtechnique.Bootstrappingin-volvesasmalldegreeofsupervision,suchasasetofseeds,toinitiatethelearningprocess(NadeauandSekine,2007).Anearlystudythatintroducedmutualbootstrappingandprovedhighlyinﬂuentialis(RiloffandJones,1999).Theypresentedanal-gorithmthatbeginswithasetofseedexamplesofaparticularentitytype.Then,allcontextsfoundaroundtheseseedsinalargecorpusarecompiled,ranked,andusedtoﬁndnewexamples.Pascaetal.(2006)usedthesamebootstrappingtechniqueasRiloffandJones(1999),butappliedthetechniquetoverylargecorporaandmanagedtogenerateonemillionfactswithaprecisionrateofabout88%.Ab-delRahmanetal.(2010)proposedtointegrateboot-strappingsemi-supervisedpatternrecognitionandaConditionalRandomFields(CRFs)classiﬁer.Theyusedsemi-supervisedpatternrecognitioninordertogeneratepatternsthatwerethenusedasfeaturesintheCRFsclassiﬁer.Distantlearning(DL)isanotherpopularparadigmthatavoidsthehighcostofsupervision.Itdependsontheuseofexternalknowledge(e.g.,encyclopediassuchasWikipedia,unlabelledlargecorpora,orexternalsemanticrepositories)toincreasetheperformanceoftheclassiﬁer,ortoautomaticallycreatenewresourcesforuseinthelearningprocess(Mintzetal.,2009;NguyenandMoschitti,2011).Nothmanetal.(2013)automaticallycreatedmassive,multilingualtrainingannotationsforNERbyexploitingthetextandin-ternalstructureofWikipedia.TheyﬁrstcategorisedWikipediaarticlesintoaspeciﬁcsetofnamedentitytypesacrossninelanguages:Dutch,English,French,German,Italian,Polish,Portuguese,Rus-

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4

/
t

un
c
_
un
_
0
0
1
3
6
p
d

b
oui
g
u
e
s
t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

245

sian,andSpanish.Then,Wikipedia’slinksweretransformedintonamedentityannotationsbasedontheNEtypesofthetargetarticles.Followingthisapproach,millionsofwordswereannotatedintheaforementionedninelanguages.TheirmethodforautomaticallyderivingcorporafromWikipediaoutperformedthemethodsproposedbyRichmanandSchone(2008)andMikaetal.(2008)whentestingtheWikipedia-trainedmodelsonCONLLsharedtaskdataandothergold-standardcorpora.AlotaibiandLee(2013)presentedamethodologytoautomaticallybuildtwoNE-annotatedsetsfromArabicWikipedia.ThecorporawerebuiltbytransforminglinksintoNEannotationsaccordingtotheNEtypeofthetargetarticles.POS-tagging,morphologicalanalysis,andlinkedNEphraseswereusedtodetectothermentionsofNEsthatappearwithoutlinksintext.TheirWikipedia-trainedmodelperformedwellwhentestedonvariousnewswiretestsets,butitdidnotsurpasstheperformanceofthesupervisedclassiﬁerthatistrainedandtestedondatasetsdrawnfromthesamedomain.2.3ClassiﬁerCombinationandNERWearenotawareofanypreviousworkcombin-ingminimallysupervisedmethodsforNERtaskinArabicoranyothernaturallanguage,buttherearemanystudiesthathaveexaminedclassiﬁercom-binationschemestocombinevarioussupervised-learningsystems.Florianetal.(2003)presentedthebestsystemattheNERCoNLL2003task,withanF-scorevalueequalto88.76%.TheyusedacombinationoffourdiverseNEclassiﬁers:thetransformation-basedlearningclassiﬁer,aHiddenMarkovModelclassiﬁer(HMM),arobustriskmin-imizationclassiﬁerbasedonaregularizedwinnowmethod(Zhangetal.,2002),andaMEclassiﬁer.Thefeaturestheyusedincludedtokens,POSandchunktags,afﬁxes,gazetteers,andtheoutputoftwootherNEclassiﬁerstrainedonricherdatasets.TheirmethodsforcombiningtheresultsofthefourNEclassiﬁersimprovedtheoverallperformanceby17-21%whencomparedwiththebestperformingclas-siﬁer.SahaandEkbal(2013)studiedclassiﬁercombi-nationtechniquesforvariousNERmodelsundersingleandmulti-objectiveoptimisationframeworks.Theyusedsevendiverseclassiﬁers-naiveBayes,decisiontree,memorybasedlearner,HMM,ME,CRFs,andSVMs-tobuildanumberofvotingmod-elsbasedonidentiﬁedtextfeaturesthatareselectedmostlywithoutdomainknowledge.Thecombina-tionmethodsusedwerebinaryandrealvote-basedensembles.Theyreportedthattheproposedmulti-objectiveoptimisationclassiﬁerensemblewithrealvotingoutperformstheindividualclassiﬁers,thethreebaselineensembles,andthecorrespondingsin-gleobjectiveclassiﬁerensemble.3TwoMinimallySupervisedNERClassiﬁersTwomainminimallysupervisedapproacheshavebeenusedforNER:semi-supervisedlearning(Al-thobaitietal.,2013)anddistantsupervision(Noth-manetal.,2013).Wedevelopedstate-of-the-artclassiﬁersofbothtypesthatwillbeusedasbaseclassiﬁersinthispaper.OurimplementationsoftheseclassiﬁersareexplainedinSection3.1andSection3.2.3.1Semi-supervisedLearningAspreviouslymentioned,themostcommonSSLtechniqueisbootstrapping,whichonlyrequiresasetofseedstoinitiatethelearningprocess.WeusedanalgorithmadaptedfromAlthobaitietal.(2013)andcontainsthreecomponents,asshowninFigure1. Pattern Induction Instance Extraction Instance Ranking/Selection Seed Instances Figure1:TheThreeComponentsofSSLSystem.Thealgorithmbeginswithalistofafewexam-plesofagivenNEtype(e.g.,‘London’and‘Paris’canbeusedasseedexamplesforlocationentities)andlearnspatterns(P.)thatareusedtoﬁndmoreex-amples(candidateNEs).Theseexamplesareeven-tuallysortedandusedagainasseedexamplesforthenextiteration.Ouralgorithmdoesnotuseplainfrequenciessinceabsolutefrequencydoesnotalwaysproducegoodexamples.Thisisbecausebadexampleswillbeextractedbyonepattern,howeverunwantedly,asmanytimesasthebadexamplesappearinthetextinrelativelysimilarcontexts.Meanwhile,goodexam-

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4

/
t

un
c
_
un
_
0
0
1
3
6
p
d

b
oui
g
u
e
s
t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

246

plesarebestextractedusingmorethanonepattern,sincetheyoccurinawidervarietyofcontextsinthetext.Instead,ouralgorithmrankscandidateNEsac-cordingtothenumberofdifferentpatternsthatareusedtoextractthem,sincepatternvarietyisabettercuetosemanticsthanabsolutefrequency(Baronietal.,2010).Aftersortingtheexamplesaccordingtothenum-berofdistinctpatterns,allexamplesbutthetopmarediscarded,wheremissettothenumberofex-amplesfromthepreviousiteration,plusone.Thesemexampleswillbeusedinthenextiteration,andsoon.Forexample,ifwestartthealgorithmwith20seedinstances,thefollowingiterationwillstartwith21,andthenextonewillstartwith22,andsoon.Thisprocedureisnecessaryinordertocarefullyincludeexamplesfromoneiterationtoanotherandtoensurethatbadinstancesarenotpassedontothenextiteration.Thesameprocedurewasappliedby(Althobaitietal.,2013).3.2DistantLearningFordistantlearningwefollowthestateoftheartap-proachtoexploitWikipediaforArabicNER,asin(Althobaitietal.,2014).Ourdistantlearningsys-temexploitsmanyofWikipedia’sfeatures,suchasanchortexts,redirects,andinter-languagelinks,inordertoautomaticallydevelopanArabicNEanno-tatedcorpus,whichisusedlatertotrainastate-of-the-artsupervisedclassiﬁer.Thethreestepsofthisapproachare:1.ClassifyWikipediaarticlesintoasetofNEtypes.2.AnnotatetheWikipediatextasfollows:•Identifyandlabelmatchingtextinthetitleandtheﬁrstsentenceofeacharticle.•LabellinkedphrasesinthetextaccordingtotheNEtypeofthetargetarticle.•Compilealistofalternativetitlesforarticlesandﬁlteroutambiguousones.•IdentifyandlabelmatchingphrasesinthelistandtheWikipediatext.3.Filtersentencestopreventnoisysentencesfrombeingincludedinthecorpus.Webrieﬂyexplainthesestepsinthefollowingsec-tions.3.2.1ClassifyingWikipediaArticlesTheWikipediaarticlesinthedatasetneedtobeclassiﬁedintothesetofnamedentitytypesintheclassiﬁcationscheme.Weconductanexperimentthatusessimplebag-of-wordsfeaturesextractedfromdifferentportionsoftheWikipediadocumentandmetadatasuchascategories,theinfoboxta-ble,andtokensfromthearticletitleandﬁrstsen-tenceofthedocument.Toimprovetheaccuracyofdocumentclassiﬁcation,tokensaredistinguishedbasedontheirlocationinthedocument.There-fore,categoriesandinfoboxfeaturesaremarkedwithsufﬁxestodifferentiatethemfromtokensex-tractedfromthearticle’sbodytext(Tardifetal.,2009).ThefeaturesetisrepresentedbyTermFrequency-InverseDocumentFrequency(TF-IDF).InordertodevelopaWikipediadocumentclassiﬁertocategoriseWikipediadocumentsintoCoNLLNEtypes,namelyperson,location,organisation,mis-cellaneous,orother,weuseasetof4,000manuallyclassiﬁedWikipediaarticlesthatareavailablefreeonline(AlotaibiandLee,2012).80%ofthe4,000hand-classiﬁedWikipediaarticlesareusedfortrain-ing,and20%forevaluation.TheWikipediadocu-mentclassiﬁerthatwetrainperformswell,achiev-inganF-scoreof90%.TheclassiﬁeristhenusedtoclassifyallWikipediaarticles.Attheendofthisstage,weobtainalistofpairscontainingeachWikipediaarticleanditsNETypeinpreparationforthenextstage:developingtheNE-taggedtrainingcorpus.3.2.2TheAnnotationProcessTobegintheAnnotationProcessweidentifymatchingtermsinthearticletitleandtheﬁrstsen-tenceandthentagthematchingphraseswiththeNE-typeofthearticle.Thesystemadoptspartialmatchingwhereallcorrespondingwordsintheti-tleandtheﬁrstsentenceshouldﬁrstbeidentiﬁed.Then,thesystemannotatesthemandallwordsinbetween(Althobaitietal.,2014).ThenextstepistotransformthelinksbetweenWikipediaarticlesintoNEannotationsaccordingtotheNE-typeofthelinktarget.WikipediaalsocontainsafairamountofNEswithoutlinks.WefollowthetechniqueproposedbyNothmanetal.(2013),whichsuggestsinferringadditionallinksusingthealiasesforeacharticle.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4

/
t

un
c
_
un
_
0
0
1
3
6
p
d

b
oui
g
u
e
s
t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

247

Ainsi,wecompilealistofalternativetitles,in-cludinganchortextsandNEredirects(i.e.,thelinkedphrasesandredirectedpagesthatrefertoNEarticles).Itisnecessarytoﬁlterthelist,cependant,toremovenoisyalternativetitles,whichusuallyappeardueto(un)one-wordmeaningfulnamedentitiesthatareambiguouswhentakenoutofcontextand(b)multi-wordalternativetitlesthatcontainappositionwords(e.g.,‘President’,‘ViceMinister’).TothisendweusetheﬁlteringalgorithmproposedbyAlthobaitietal.(2014)(seeAlgorithm1).InthisalgorithmacapitalisationprobabilitymeasureforArabicisintroduced.ThisinvolvesﬁndingtheEnglishglossforeachone-wordalternativenameandthencomputingitsprobabilityofbeingcapitalisedintheEnglishWikipedia.InordertoﬁndtheEnglishglossforArabicwords,WikipediaArabic-to-Englishcross-linguallinksareexploited.IncasetheEnglishglossfortheArabicwordcouldnotbefoundusinginter-languagelinks,anonlinetranslatorisused.BeforetranslatingtheArabicword,alightstemmerisusedtoremovepreﬁxesandconjunctionsinordertoacquirethetranslationoftheworditselfwithoutitsassociatedafﬁxes.ThecapitalisationprobabilityiscomputedasfollowsPr[EN]=f(EN)isCapitalisedf(EN)isCapitalised+f(EN)notCapitalisedwhereENistheEnglishglossofthealterna-tivename;F(EN)isCapitalisedisthenumberoftimestheEnglishglossENiscapitalisedintheEnglishWikipedia;andf(EN)notCapitalisedisthenumberoftimestheEnglishglossENisnotcapitalisedintheEnglishWikipedia.Byspecifyingacapitalisationthresholdconstraint,ambiguousone-wordtitlesarepreventedfrombeingincludedinthelistofalternativetitles.Thecapitalisationthresholdissetto0.75assuggestedin(Althobaitietal.,2014).Themulti-wordalternativenameisalsoomittedifanyofitswordsbelongtothelistofappositionwords.3.2.3BuildingTheCorpusThelaststageistoincorporatesentencesintotheﬁnalcorpus.WerefertothisdatasetastheWikipedia-derivedcorpus(WDC).Itcontains165,119sentencesofaround6milliontokens.OurmodelwasthentrainedontheWDCcorpus.InthisAlgorithm1:FilteringAlternativeNamesInput:AsetL={l1,l2,…,ln}ofallalternativenamesofWikipediaarticlesOutput:AsetRL={rl1,rl2,…,rln}ofreliablealternativenames1fori←1tondo2T←splitliintotokens3if(T.size()>=2)then/*AlltokensofTdonotbelongtoappositionlist*/4if(!containAppositiveWord(T))then5addlitothesetRL6else7lightstem←ﬁndLightStem(li)8englishgloss←translate(lightstem)/*ComputeCapitalisationProbabilityforEnglishgloss*/9capprob←compCapProb(englishgloss)10si(capprob>0.75)then11addlitothesetRLpaperwerefertothismodelastheDLclassiﬁer.TheWDCdatasetisavailableonline1.Wealsoplantomakethemodelsavailabletotheresearchcommunity.4ClassiﬁerCombination4.1TheCaseforClassiﬁerCombinationInwhatfollowsweuseSSLtorefertooursemi-supervisedclassiﬁer(seeSection3.1)andDLtore-fertoourdistantlearningclassiﬁer(seeSection3.2).Table1showstheresultsofbothclassiﬁerswhentestedontheANERcorptestset(seeSection5fordetailsaboutthedataset).NEsClassiﬁersPrecisionRecallFβ=1PERSSL85.9151.1064.08DL80.0145.1157.69LOCSSL87.9162.4873.04DL75.2167.1470.95ORGSSL84.2740.3054.52DL74.1057.0264.45OverallSSL86.0351.2964.27DL76.4456.4264.92Table1:TheresultsofSSLandDLclassiﬁersontheANERcorptestset.AsisapparentinTable1,theSSLclassiﬁertendstobemorepreciseattheexpenseofrecall.Thedis-1https://sites.google.com/site/mahajalthobaiti/resources

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4

/
t

un
c
_
un
_
0
0
1
3
6
p
d

b
oui
g
u
e
s
t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

248

tantlearningtechniqueislowerinprecisionthanthesemi-supervisedlearningtechnique,buthigherinre-call.Generally,preferenceisgiventothedistantsu-pervisionclassiﬁerintermsofF-score.Theclassiﬁershavedifferentstrengths.Oursemi-supervisedalgorithmiteratesbetweenpatternex-tractionandcandidateNEsextractionandselection.OnlythecandidateNEsthattheclassiﬁerismostconﬁdentofareaddedateachiteration,whichre-sultsinthehighprecision.TheSSLclassiﬁerper-formsbetterthandistantlearningindetectingNEsthatappearinreliable/regularpatterns.Thesepat-ternsareusuallylearnedeasilyduringthetrainingphase,eitherbecausetheycontainimportantNEin-dicators2orbecausetheyaresupportedbymanyre-liablecandidateNEs.Forexample,theSSLclassi-ﬁerhasahighprobabilitytosuccessfullydetectA(cid:211)AK.(cid:240)@“Obama”and¨A(cid:9)g(cid:9)(cid:224)UN(cid:9)ﬂ(cid:129)(cid:29)(cid:10)æ¸“LouisvanGaal”asper-sonnamesinthefollowingsentences:•…AJ(cid:10)(cid:9)KA¢(cid:29)(cid:10)QK.P(cid:240)(cid:9)QK(cid:10)ł(cid:10)(cid:9)Y¸@A(cid:211)AK.(cid:240)@(cid:129)(cid:28)(cid:10)(cid:13)KQ¸@h(cid:15)Q(cid:229)(cid:149)“PresidentObamasaidonavisittoBritain…”•…(cid:9)(cid:224)@YJ(cid:10)(cid:16)(cid:28)K(cid:10)UN(cid:9)KæK(cid:10)Q(cid:16)(cid:30)(cid:130)(cid:17)(cid:130)(cid:9)(cid:29)UN(cid:211)H.PY(cid:211)¨A(cid:9)g(cid:9)(cid:224)UN(cid:9)ﬂ(cid:129)(cid:29)(cid:10)æ¸¨A(cid:16)ﬂ“LouisvanGaalthemanagerofManchesterUnitedsaidthat…”Thepatternsextractedfromsuchsentencesinthenewswiredomainarelearnedeasilyduringthetrain-ingphase,astheycontaingoodNEindicatorslike(cid:129)(cid:28)(cid:10)(cid:13)KQ¸@“president”andH.PY(cid:211)“manager”.OurdistantlearningmethodreliesonWikipediastructureandlinkstoautomaticallycreateNEanno-tateddata.ItalsodependsonWikipediafeatures,suchasinter-languagelinksandredirects,tohandletherichmorphologyofArabicwithouttheneedtoperformexcessivepre-processingsteps(e.g.,POS-tagging,deepmorphologicalanalysis),whichhasaslightnegativeeffectontheprecisionoftheDLclas-siﬁer.Therecall,cependant,oftheDLclassiﬁerishigh,coveringasmanyNEsaspossibleinallpos-sibledomains.Therefore,theDLclassiﬁerisbetterthantheSSLclassiﬁerindetectingNEsthatappearinambiguouscontexts(theycanbeusedfordiffer-entNEtypes)andwithnoobviousclues(NEindi-cators).Forexample,detectingł(cid:10)P@Q(cid:30)(cid:10)(cid:9)ﬂ“Ferrari”andAJ(cid:10)»æ(cid:9)K“Nokia”asorganizationnamesinthefollowingsentences:2AlsoknownastriggerwordswhichhelpinidentifyingNEswithintext•…ł(cid:10)P@Q(cid:30)(cid:10)(cid:9)ﬂ—Qkł(cid:10)(cid:9)Y¸@,æ(cid:9)JK(cid:10)P.(cid:16)(cid:135)(cid:13)KA(cid:131)œ˛«æ(cid:130)(cid:9)(cid:29)æ¸@—Y(cid:16)ﬁ(cid:16)K“AlonsogotaheadoftheRenaultdriverwhopreventedFerrarifrom…”•(cid:16)Ø(cid:16)ﬁ(cid:9)ﬁ(cid:146)¸@—A(cid:214)(cid:16)(cid:223)@(cid:9)(cid:224)C«@(cid:9)Æ(cid:211)—æK(cid:10)Y“K.AJ(cid:10)»æ(cid:9)KH.A¢(cid:9)kZAg.“Nokia’sspeechcameadayafterthecomple-tionofthedeal”ThestrengthsandweaknessesoftheSSLandDLclassiﬁersindicatesthataclassiﬁerensemblecouldperformbetterthanitsindividualcomponents.4.2ClassiﬁerCombinationMethodsClassiﬁercombinationmethodsaresuitablewhenweneedtomakethebestuseofthepredictionsofmultipleclassiﬁerstoenablehigheraccuracyclassi-ﬁcations.Dietterich(2000un)reviewsmanymethodsforconstructingensemblesandexplainswhyclas-siﬁercombinationtechniquescanoftengainbetterperformancethananybaseclassiﬁer.Tulyakovetal.(2008)introducevariouscategoriesofclassiﬁercombinationsaccordingtodifferentcriteriainclud-ingthetypeoftheclassiﬁer’soutputandthelevelatwhichthecombinationsoperate.Severalempir-icalandtheoreticalstudieshavebeenconductedtocompareensemblemethodssuchasboosting,ran-domisation,andbaggingtechniques(MaclinandOpitz,1997;Dietterich,2000b;BauerandKohavi,1999).GhahramaniandKim(2003)exploreagen-eralframeworkforaBayesianmodelcombinationthatexplicitlymodelstherelationshipbetweeneachclassiﬁer’soutputandtheunknowntruelabel.Assuch,multiclassBayesianClassiﬁerCombination(BCC)modelsaredevelopedtocombinepredictionsofmultipleclassiﬁers.TheirproposedmethodforBCCinthemachinelearningcontextisderiveddi-rectlyfromthemethodproposedin(Haitovskyetal.,2002)formodellingdisagreementbetweenhumanassessors,whichinturnisanextensionof(DawidandSkene,1979).Similarstudiesformodellingdataannotationusingavarietyofmethodsarepresentedin(Carpenter,2008;CohnandSpecia,2013).Simp-sonetal.(2013)presentavariantofBCCinwhichtheyconsidertheuseofaprincipledapproximateBayesianmethod,variationalBayes(VB),asanin-ferencetechniqueinsteadofusingGibbsSampling.Theyalsoalterthemodelsoastousepointvaluesforhyper-parameters,insteadofplacingexponentialhyper-priorsoverthem.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4

/
t

un
c
_
un
_
0
0
1
3
6
p
d

b
oui
g
u
e
s
t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

249

ThefollowingsectionsdetailthecombinationmethodsusedinthispapertocombinetheminimallysupervisedclassiﬁersforArabicNER.4.2.1VotingVotingisthemostcommonmethodinclassiﬁercombinationbecauseofitssimplicityandaccept-ableresults(VanHalterenetal.,2001;VanErpetal.,2002).Eachclassiﬁerisallowedtovotefortheclassofitschoice.Itiscommontotakethema-jorityvote,whereeachbaseclassiﬁerisgivenonevoteandtheclasswiththehighestnumberofvotesischosen.Inthecaseofatie,whentwoormoreclassesreceivethesamenumberofvotes,arandomselectionistakenfromamongthewinningclasses.Itisuseful,cependant,ifbaseclassiﬁersaredistin-guishedbytheirquality.Forthispurpose,weightsareusedtoencodetheimportanceofeachbaseclas-siﬁer(VanErpetal.,2002).Equalvotingassumesthatallclassiﬁershavethesamequality(VanHalterenetal.,2001).Weightedvoting,ontheotherhand,givesmoreweighttoclassiﬁersofbetterquality.So,eachclassiﬁerisweightedaccordingtoitsoverallprecision,oritsprecisionandrecallontheclassitsuggests.Formally,givenKclassiﬁers,awidelyusedcom-binationschemeisthroughthelinearinterpolationoftheclassiﬁers’classprobabilitydistributionasfollowsP(C|SK1(w))=KXk=1Pk(C|Sk(w))·λk(w)wherePk(C|Sk(w))isanestimationoftheproba-bilitythatthecorrectclassiﬁcationisCgivenSk(w),theclassforthewordwassuggestedbyclassiﬁerk.λk(w)istheweightthatspeciﬁestheimportancegiventoeachclassiﬁerkinthecombination.Pk(C|Sk(w))iscomputedasfollowsPk(C|Sk(w))=(1,ifSk(w)=C0,otherwiseForequalvoting,eachclassiﬁershouldhavethesameweight(e.g.,λk(w)=1/K).Incaseofweightedvoting,theweightassociatedwitheachclassiﬁercanbecomputedfromitsprecisionand/orrecallasillustratedabove.4.2.2IndependentBayesianClassiﬁerCombination(IBCC)UsingaBayesianapproachtoclassiﬁercombi-nation(BCC)providesamathematicalcombina-tionframeworkinwhichmanyclassiﬁers,withvar-iousdistributionsandtrainingfeatures,canbecom-binedtoprovidemoreaccurateinformation.Thisframeworkexplicitlymodelstherelationshipbe-tweeneachclassiﬁer’soutputandtheunknowntruelabel(Levenbergetal.,2014).Thissectionde-scribestheBayesianapproachtotheclassiﬁercom-binationweadoptedinthispaperwhich,liketheworkofLevenbergetal.(2014),isbasedonSimp-sonetal.(2013)simpliﬁcationofGhahramaniandKim(2003)model.Forithdatapoint,truelabeltiisassumedtobegeneratedbyamultinomialdistributionwiththepa-rameterδ:p(ti=j|d)=δj,whichmodelstheclassproportions.Truelabelsmaytakevaluesti=1…J.,whereJisthenumberoftrueclasses.Itisalsoas-sumedthatthereareKbaseclassiﬁers.Theoutputoftheclassiﬁersareassumedtobediscretewithvaluesl=1…L,whereListhenumberofpossibleout-puts.Theoutputc(k)ioftheclassiﬁerkisassumedtobegeneratedbyamultinomialdistributionwithparametersπ(k)j:p(c(k)i=l|ti=j,π(k)j)=π(k)j,lwhereπ(k)istheconfusionmatrixfortheclassiﬁerk,whichquantiﬁesthedecision-makingabilitiesofeachbaseclassiﬁer.AsinSimpsonetal.(2013)étude,weas-sumethatparametersπ(k)jandδhaveDirichletpriordistributionswithhyper-parametersα(k)0,j=[un(k)0,j1,α(k)0,j2,…,un(k)0,jL]andν=[ν0,1,ν0,2,…,ν0,J]respectively.Giventheobservedclasslabelsandbasedontheaboveprior,thejointdistributionoverallvariablesfortheIBCCmodelisp(d,Π,t,c|A0,ν)=IYi=1{δtiKYk=1π(k)ti,c(k)je}p(d|ν)p(Π|UN),whereΠ={π(k)j|j=1…J.,k=1…K}andA0={un(k)0,j|j=1…J.,k=1…K}.Theconditionalprobabilityofatestdatapointtibeingassignedclassjisgivenbyp(ti=j)=ρijPJy=1ρiy,

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4

/
t

un
c
_
un
_
0
0
1
3
6
p
d

b
oui
g
u
e
s
t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

250

whereρij=δjKYk=1πj,c(k)i.InourimplementationweusedpointvaluesforA0asin(Simpsonetal.,2013).Thevaluesofhyper-parametersA0offeredanaturalmethodtoincludeanypriorknowledge.Thus,theycanberegardedaspseudo-countsofpriorobservationsandtheycanbechosentorepresentanypriorlevelofuncertaintyintheconfusionmatrices,Π.Ourinferencetech-niquefortheunknownvariables(d,π,andt)wasGibbssamplingasin(GhahramaniandKim,2003;Simpsonetal.,2013).Figure2showsthedirectedgraphicalmodelforIBCC.Thec(k)irepresentsob-servedvalues,circularnodesarevariableswithdis-tributionsandsquarenodesarevariablesinstantiatedwithpointvalues. 𝝂𝟎 𝜶𝟎(𝒌) Classifiers: k=1, 2, …, K Data points: je = 1, 2, …, I 𝜹 𝝅(𝒌) 𝒄𝒊(𝒌) 𝒕𝒊 Figure2:ThedirectedgraphofIBCC.5DataInthissection,wedescribethetwodatasetsweused:•Validationset3(NEWS+BBCNEWS):90%ofthisdatasetisusedtoestimatetheweightofeachbaseclassiﬁerand10%isusedtoperformerroranalysis.•Testset(ANERcorptestset):Thisdatasetisusedtoevaluatedifferentclassiﬁercombina-tionmethods.Thevalidationsetiscomposedoftwodatasets:NEWSandBBCNEWS.TheNEWSsetcontainsaround15ktokenscollectedbyDarwish(2013)3Alsoknownasdevelopmentset.fromtheRSSfeedoftheArabic(Egypt)versionofnews.google.comfromOctober2012.WecreatedtheBBCNEWScorpusbycollectingarepresenta-tivesampleofnewsfromBBCinMay2014.Itcon-tainsaround3ktokensandcoversdifferenttypesofnewssuchaspolitics,économie,andentertainment.TheANERcorptestsetmakesup20%ofthewholeANERcorpset.TheANERcorpsetisanews-wirecorpusbuiltandmanuallytaggedespeciallyfortheArabicNERtaskbyBenajibaetal.(2007un)andcontainsaround150ktokens.Thistestsetiscom-monlyusedintheArabicNERliteraturetoevaluatesupervisedclassiﬁers(BenajibaandRosso,2008;Abdul-HamidandDarwish,2010;Abdallahetal.,2012;OudahandShaalan,2012)andminimally-supervisedclassiﬁers(AlotaibiandLee,2013;Al-thobaitietal.,2013;Althobaitietal.,2014),whichallowsustoreviewtheperformanceofthecombinedclassiﬁersandcompareittotheperformanceofeachbaseclassiﬁer.6ExperimentalAnalysis6.1ExperimentalSetupIntheIBCCmodel,thevalidationdatawasusedasknowntitogroundtheestimatesofmodelparam-eters.Thehyper-parametersweresetasα(k)j=1andνj=1(KimandGhahramani,2012;Leven-bergetal.,2014).Theinitialvaluesforrandomvariablesweresetasfollows:(un)theclasspropor-tionδwasinitialisedtotheresultofcountingtiand(b)theconfusionmatrixπwasinitialisedtothere-sultofcountingtiandtheoutputofeachclassiﬁerc(k).Gibbssamplingwasrunwellpaststability(i.e.,1000iterations).Stabilitywasactuallyreachedinapproximately100iterations.Allparametersrequiredinvotingmethodswerespeciﬁedusingthevalidationset.Weexaminedtwodifferentvotingmethods:equalvotingandweightedvoting.Inthecaseofequalvoting,eachclassiﬁerwasgivenanequalweight,(1/K)whereKwasthenumberofclassiﬁerstobecombined.Inweightedvoting,totalprecisionwasusedinordertogivepref-erencetoclassiﬁerswithgoodquality.

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4

/
t

un
c
_
un
_
0
0
1
3
6
p
d

b
oui
g
u
e
s
t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

251

6.2ResultsandDiscussion6.2.1ASimpleBaselineCombinedClassiﬁerAproposedcombinedclassiﬁersimplyandstraightforwardlymakesdecisionsbasedontheagreeddecisionsofthebaseclassiﬁers,namelytheSSLclassiﬁerandDLclassiﬁer.Thatis,ifthebaseclassiﬁersagreeontheNEtypeofacertainword,thenitisannotatedbyanagreedNEtype.Inthecaseofdisagreement,thewordisconsiderednotnamedentity.Table2showstheresultsofthiscombinedclassiﬁer,whichisconsideredabaselineinthispa-per.PrecisionRecallFβ=1Person97.3124.6939.39Location98.3540.0156.88Organisation97.3833.249.52Overall97.6832.6348.92Table2:TheresultsofthebaselineTheresultsofthecombinedclassiﬁershowsveryhighprecision,whichindicatesthatbothbaseclas-siﬁersaremostlyaccurate.Thebaseclassiﬁersalsocommitdifferenterrorsthatareevidentinthelowrecall.Theaccuracyanddiversityofthesingleclas-siﬁersarethemainconditionsforacombinedclas-siﬁertohavebetteraccuracythananyofitscom-ponents(Dietterich,2000un).Donc,inthenextsectionwetakeintoconsiderationvariousclassiﬁercombinationmethodsinordertoaggregatethebestdecisionsofSSLandDLclassiﬁers,andtoimproveoverallperformance.6.2.2CombinedClassiﬁers:ClassiﬁerCombinationMethodsTheSSLandDLclassiﬁersaretrainedwithtwodifferentalgorithmsusingdifferenttrainingdata.TheSSLclassiﬁeristrainedonANERcorptrainingdata,whiletheDLclassiﬁeristrainedonacorpusautomaticallyderivedfromArabicWikipedia,asex-plainedinSection3.1and3.2.WecombinetheSSLandDLclassiﬁersusingthethreeclassiﬁercombinationmethods,namelyequalvoting,weightedvoting,andIBCC.Table3showstheresultsoftheseclassiﬁercombinationmethods.TheIBCCschemeoutperformsallvotingtechniquesandbaseclassiﬁersintermsofF-score.Regard-ingprecision,votingtechniquesshowthehighestscores.However,thehighprecisionisaccompaniedbyareductioninrecallforbothvotingmethods.TheIBCCcombinationmethodalsohasrelativelyhighprecisioncomparedtotheprecisionofbaseclassi-ﬁers.MuchbetterrecallisregisteredforIBCC,butitisstilllow.NEsCombinationMethodsPrecisionRecallFβ=1PEREqualVoting79.9941.8854.97WeightedVoting80.1544.2457.01IBCC77.8763.8670.17LOCEqualVoting86.8730.6645.32WeightedVoting87.4830.2344.93IBCC81.5259.8669.03ORGEqualVoting97.0129.9745.79WeightedVoting98.1130.9847.09IBCC95.4434.3150.47OverallEqualVoting87.9634.1749.22WeightedVoting88.5835.1550.33IBCC84.9452.6865.03NEsBaseClassiﬁersPrecisionRecallFβ=1OverallSSL86.0351.2964.27DL76.4456.4264.92Table3:Theperformancesofvariouscombinationmeth-ods.6.2.3CombinedClassiﬁers:RestrictionoftheCombinationProcessAnerroranalysisofthevalidationsetshowsthat10.01%oftheNEswerecorrectlydetectedbythesemi-supervisedclassiﬁer,butconsiderednotNEsbythedistantlearningclassiﬁer.Atthesametime,thedistantlearningclassiﬁermanagedtocorrectlydetect25.44%oftheNEsthatwereconsiderednotNEsbythesemi-supervisedclassiﬁer.Wealsono-ticedthatfalsepositiverates,i.e.thepossibilityofconsideringawordNEwhenitisactuallynotNE,areverylow(0.66%and2.45%forthesemi-supervisedanddistantlearningclassiﬁersrespec-tively).TheselowfalsepositiveratesandthehighpercentageoftheNEsthataredetectedandmissedbythetwoclassiﬁersinamutuallyexclusivewaycanbeexploitedtoobtainbetterresults,morespecif-ically,toincreaserecallwithoutnegativelyaffect-ingprecision.Therefore,werestrictedthecombi-

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

:
/
/

d
je
r
e
c
t
.

je
t
.

e
d
u

/
t

un
c
je
/

un
r
t
je
c
e
–
p
d

F
/

d
o

je
/

1
0
1
1
6
2

/
t

un
c
_
un
_
0
0
1
3
6
1
5
6
6
7
6
4

/
t

un
c
_
un
_
0
0
1
3
6
p
d

b
oui
g
u
e
s
t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

252

nationprocesstoonlyincludesituationswherethebaseclassiﬁersagreeordisagreeontheNEtypeofacertainword.ThecombinationprocessisignoredincaseswherethebaseclassiﬁersonlydisagreeondetectingNEs.Forexample,ifthebaseclassiﬁersdisagreeonwhetheracertainwordisanNEornot,thewordisautomaticallyconsideredanNE.Figure3providessomeexamplesthatillustratetherestric-tionsweappliedtothecombinationprocess.TheannotationsintheexamplesarebasedontheCoNLL2003annotationguidelines(Chinchoretal.,1999). Predictions of SSL Classifier Predictions of DL Classifier B-PER B-LOC O B-LOC B-ORG O B-PER B-PER Apply Combination Method B-LOC B-ORG Apply Combination Method Figure3:Examplesofrestrictingthecombinationpro-cess.Restrictingthecombinationprocessinthiswayincreasesrecallwithoutnegativelyaffectingthepre-cision,asseeninTable4.TheincreaseinrecallmakestheoverallF-scoreforallcombinationmeth-odshigherthanthoseofbaseclassiﬁers.ThiswayofusingtheIBCCmodelresultsinaperformancelevelthatissuperiortoalloftheindividualclas-siﬁersandothervoting-basedcombinedclassiﬁers.Therefore,theIBCCmodelleadstoa12%increaseintheperformanceofthebestbaseclassiﬁer,whilevotingmethodsincreasetheperformancebyaround7%-10%.Theseresultshighlighttheroleofre-strictingthecombination,whichaffectstheperfor-manceofcombinationmethodsandgivesmorecon-troloverhowandwhenthepredictionsofbaseclas-siﬁersshouldbecombined.6.2.4ComparingCombinedClassiﬁers:StatisticalSigniﬁcanceofResultsWetestedwhetherthedifferenceinperformancebetweenthethreeclassiﬁercombinationmethods-equalvoting,weightedvoting,andIBCC-issig-niﬁcantusingtwodifferentstatisticaltestsovertheresultsofthesecombinationmethodsonanANER-corptestset.Thealphalevelof0.01wasusedasasigniﬁcancecriterionforallstatisticaltests.First,Werananon-parametricsigntest.Thesmallp-value(p(cid:28)0.01)foreachpairofthethreecombina-NEsCombinationMethodsPrecisionRecallFβ=1PEREqualVoting74.4661.8867.59WeightedVoting77.7763.5069.91IBCC77.8864.5670.60LOCEqualVoting74.0471.3672.68WeightedVoting74.0573.7073.86IBCC76.2075.9176.05ORGEqualVoting76.0163.9769.47WeightedVoting76.3066.6071.12IBCC78.9166.6572.26OverallEqualVoting74.8465.7469.99WeightedVoting76.0467.9371.76IBCC77.6669.0473.10NEsBaseClassiﬁersPrecisionRecallFβ=1OverallSSL86.0351.2964.27DL76.4456.4264.92Table4:Theperformancesofvariouscombinationmeth-odswhenrestrictingthecombinationprocess.tionmethods,asseeninTable5,suggeststhatthesemethodsaresigniﬁcantlydifferent.Theonlycom-parisonwherenosigniﬁcancewasfoundisequalvotingvs.weightedvoting,whenweusedthemtocombinethedatawithoutanyrestrictions(p=0.3394).CombinationMethods(WithoutRestriction)EqualVotingWeightedVotingIBCCEqualVotingWeightedVoting0.3394IBCC<2.2E-16<2.2E-16CombinationMethods(WithRestriction)EqualVotingWeightedVotingIBCCEqualVotingWeightedVoting1.78E-07IBCC<2.2E-161.97E-06Table5:Thesigntestresults(exactpvalues)forthepair-wisecomparisonsofthecombinationmethods.Second,weusedabootstrapsampling(EfronandTibshirani,1994),whichisbecomingthedefactostandardsinNLP(Søgaardetal.,2014).Table6compareseachpairofthethreecombinationmeth-odsusingabootstrapsamplingoverdocumentswith10,000replicates.Itshowsthep-valuesandconﬁ-denceintervalsofthedifferencebetweenmeans. l D o w n o un d e d f r o m H t t p : / / d je r e c t . m je t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 3 6 1 5 6 6 7 6 4 / / t l a c _ a _ 0 0 1 3 6 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 253 CombinationWithRestrictionCombinationMethodsComparisonp-value[95%CI]WeightedVoting,EqualVoting0.000[0.270,0.600]IBCC,EqualVoting0.000[0.539,0.896]IBCC,WeightedVoting0.000[0.157,0.426]CombinationWithoutRestrictionCombinationMethodsComparisonp-value[95%CI]WeightedVoting,EqualVoting0.508[-0.365,0.349]IBCC,EqualVoting0.000[4.800,6.122]IBCC,WeightedVoting0.000[4.783,6.130]Table6:Thebootstraptestresults(p-valuesandCI)forthepairwisecomparisonsofthecombinationmethods.Thedifferencesinperformancebetweenalmostallthethreemethodsofcombinationarehighlysig-niﬁcant.Theoneexceptionisthecomparisonbe-tweenequalvotingandweightedvoting,whentheyareusedasacombinationmethodwithoutrestric-tion,whichshowsanon-signiﬁcantdifference(p-value=0.508,CI=-0.365to0.349).Generally,theIBCCschemeperformssigniﬁ-cantlybetterthanvoting-basedcombinationmeth-odswhetherweimposerestrictionsonthecombina-tionprocessornot,ascanbeseeninTable3andTable4.7ConclusionMajoradvancesoverthepastdecadehaveoccurredinArabicNERwithregardtoutilisingvarioussu-pervisedsystems,exploringdifferentfeatures,andproducingmanuallyannotatedcorporathatmostlycoverthestandardsetofNEtypes.Moreeffortandtimeforadditionalmanualannotationsarere-quiredwhenexpandingthesetofNEtypes,orex-portingNEclassiﬁerstonewdomains.Thishasmo-tivatedresearchinminimallysupervisedmethods,suchassemi-supervisedlearninganddistantlearn-ing,buttheperformanceofsuchmethodsislowerthanthatachievedbysupervisedmethods.How-ever,semi-supervisedmethodsanddistantlearningtendtohavedifferentstrengths,whichsuggeststhatbetterresultsmaybeobtainedbycombiningthesemethods.Therefore,wetrainedtwoclassiﬁersbasedondistantlearningandsemi-supervisiontechniques,andthencombinedthemusingavarietyofclassiﬁercombinationschemes.Ourmaincontributionsin-cludethefollowing:•WepresentedanovelapproachtoArabicNERusingacombinationofsemi-supervisedlearninganddistantsupervision.•WeusedtheIndependentBayesianClassiﬁerCombination(IBCC)schemeforNER,andcom-paredittotraditionalvotingmethods.•Weintroducedtheclassiﬁercombinationrestric-tionasameansofcontrollinghowandwhenthepredictionsofbaseclassiﬁersshouldbecom-bined.Thisresearchdemonstratedthatcombiningthetwominimalsupervisionapproachesusingvariousclas-siﬁercombinationmethodsleadstobetterresultsforNER.TheuseofIBCCimprovestheperformanceby8percentagepointsoverthebestbaseclassi-ﬁer,whereastheimprovementintheperformancewhenusingvotingmethodsisonly4to6percent-agepoints.Althoughallcombinationmethodsre-sultinanaccurateclassiﬁcation,theIBCCmodelachievesbetterrecallthanothertraditionalcombi-nationmethods.Ourexperimentsalsoshowedhowrestrictingthecombinationprocesscanincreasetherecallabilityofallthecombinationmethodswithoutnegativelyaffectingtheprecision.TheapproachweproposedinthispapercanbeeasilyadaptedtonewNEtypesanddifferentdo-mainswithouttheneedforhumanintervention.Inaddition,therearemanywaystorestrictthecombi-nationprocessaccordingtotheapplications’prefer-ences,eitherproducinghighaccuracyorrecall.Forexample,wemayobtainahighlyaccuratecombinedclassiﬁerifwedonotcombinethepredictionsofallbaseclassiﬁersforacertainwordandautomaticallyconsideritnotNEwhenoneofthebaseclassiﬁerconsidersthiswordnotNE.ReferencesSheriefAbdallah,KhaledShaalan,andMuhammadShoaib.2012.Integratingrule-basedsystemwithclassiﬁcationforarabicnamedentityrecognition.InComputationalLinguisticsandIntelligentTextPro-cessing,pages311–322.Springer.SamirAbdelRahman,MohamedElarnaoty,MarwaMagdy,andAlyFahmy.2010.Integratedmachine l D o w n o a d e d f r o m h t t p : / / d je r e c t . m je t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 3 6 1 5 6 6 7 6 4 / / t l a c _ a _ 0 0 1 3 6 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 254 learningtechniquesforarabicnamedentityrecogni-tion.IJCSI,7:27–36.AhmedAbdul-HamidandKareemDarwish.2010.Sim-pliﬁedfeaturesetforArabicnamedentityrecognition.InProceedingsofthe2010NamedEntitiesWorkshop,pages110–115.AssociationforComputationalLin-guistics.StevenAbney.2010.Semisupervisedlearningforcom-putationallinguistics.CRCPress.FahdAlotaibiandMarkLee.2012.MappingAra-bicWikipediaintothenamedentitiestaxonomy.InProceedingsofCOLING2012:Posters,pages43–52,Mumbai,India,December.TheCOLING2012Orga-nizingCommittee.FahdAlotaibiandMarkLee.2013.AutomaticallyDe-velopingaFine-grainedArabicNamedEntityCorpusandGazetteerbyutilizingWikipedia.InIJCNLP.MahaAlthobaiti,UdoKruschwitz,andMassimoPoesio.2013.Asemi-supervisedlearningapproachtoarabicnamedentityrecognition.InProceedingsoftheInter-nationalConferenceRecentAdvancesinNaturalLan-guageProcessingRANLP2013,pages32–40,Hissar,Bulgaria,September.INCOMALtd.Shoumen,BUL-GARIA.MahaAlthobaiti,UdoKruschwitz,andMassimoPoesio.2014.AutomaticCreationofArabicNamedEntityAnnotatedCorpusUsingWikipedia.InProceedingsoftheStudentResearchWorkshopatthe14thConfer-enceoftheEuropeanChapteroftheAssociationforComputationalLinguistics(EACL),pages106–115,Gothenburg.MarcoBaroni,BrianMurphy,EduardBarbu,andMas-simoPoesio.2010.Strudel:ACorpus-BasedSeman-ticModelBasedonPropertiesandTypes.CognitiveScience,34(2):222–254.EricBauerandRonKohavi.1999.Anempiricalcomparisonofvotingclassiﬁcationalgorithms:Bag-ging,boosting,andvariants.Machinelearning,36(1-2):105–139.YassineBenajibaandPaoloRosso.2008.Arabicnamedentityrecognitionusingconditionalrandomﬁelds.InProc.ofWorkshoponHLT&NLPwithintheArabicWorld,LREC,volume8,pages143–153.YassineBenajiba,PaoloRosso,andJos´eMiguelBened´ıruiz.2007a.Anersys:AnArabicNamedEn-tityRecognitionSystembasedonMaximumEntropy.InComputationalLinguisticsandIntelligentTextPro-cessing,pages143–153.Springer.YassineBenajiba,PaoloRosso,andJos´eMiguelBened´ıruiz.2007b.Anersys:Anarabicnameden-tityrecognitionsystembasedonmaximumentropy.InComputationalLinguisticsandIntelligentTextPro-cessing,pages143–153.Springer.YassineBenajiba,MonaDiab,PaoloRosso,etal.2008.Arabicnamedentityrecognition:Ansvm-basedap-proach.InProceedingsof2008ArabInternationalConferenceonInformationTechnology(ACIT),pages16–18.BobCarpenter.2008.Multilevelbayesianmodelsofcategoricaldataannotation.Unpublishedmanuscript.Availableonlineathttp://lingpipe-blog.com/lingpipe-white-papers/,lastaccessed15-March-2015.NancyChinchor,EricaBrown,LisaFerro,andPattyRobinson.1999.1999NamedEntityRecognitionTaskDeﬁnition.MITREandSAIC.TrevorCohnandLuciaSpecia.2013.Modellinganno-tatorbiaswithmulti-taskgaussianprocesses:Anap-plicationtomachinetranslationqualityestimation.InACL,pages32–42.KareemDarwish.2013.NamedEntityRecognitionus-ingCross-lingualResources:ArabicasanExample.InACL,pages1558–1567.AlexanderPhilipDawidandAllanMSkene.1979.Max-imumlikelihoodestimationofobservererror-ratesus-ingtheemalgorithm.Appliedstatistics,pages20–28.ThomasG.Dietterich.2000a.Ensemblemethodsinma-chinelearning.InMultipleClassiﬁerSystems,volume1857ofLectureNotesinComputerScience,pages1–15.SpringerBerlinHeidelberg.ThomasGDietterich.2000b.Anexperimentalcompar-isonofthreemethodsforconstructingensemblesofdecisiontrees:Bagging,boosting,andrandomization.Machinelearning,40(2):139–157.BradleyEfronandRobertJTibshirani.1994.Anintro-ductiontothebootstrap.CRCpress.AliElsebai,FaridMeziane,andFatmaZohraBelkredim.2009.Arulebasedpersonsnamesarabicextractionsystem.CommunicationsoftheIBIMA,11(6):53–59.RaduFlorian,AbeIttycheriah,HongyanJing,andTongZhang.2003.Namedentityrecognitionthroughclas-siﬁercombination.InProceedingsoftheseventhcon-ferenceonNaturallanguagelearningatHLT-NAACL2003-Volume4,pages168–171.AssociationforCom-putationalLinguistics.ZoubinGhahramaniandHyun-ChulKim.2003.Bayesianclassiﬁercombination.Technicalreport,UniversityCollegeLondon.YHaitovsky,ASmith,andYLiu.2002.Modellingdis-agreementsamongandwithinratersassessmentsfromthebayesianpointofview.InDraft.PresentedattheValenciameeting2002.Hyun-ChulKimandZoubinGhahramani.2012.Bayesianclassiﬁercombination.InInternationalcon-ferenceonartiﬁcialintelligenceandstatistics,pages619–627. l D o w n o un d e d f r o m H t t p : / / d je r e c t . m je t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 3 6 1 5 6 6 7 6 4 / / t l a c _ a _ 0 0 1 3 6 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 255 AbbyLevenberg,StephenPulman,KaroMoilanen,Ed-winSimpson,andStephenRoberts.2014.Predict-ingeconomicindicatorsfromwebtextusingsentimentcomposition.InternationalJournalofComputerandCommunicationEngineering,3(2):109–115.RichardMaclinandDavidOpitz.1997.Anempiri-calevaluationofbaggingandboosting.AAAI/IAAI,1997:546–551.SlimMesfar.2007.Namedentityrecognitionforara-bicusingsyntacticgrammars.InNaturalLanguageProcessingandInformationSystems,pages305–316.Springer.PeterMika,MassimilianoCiaramita,HugoZaragoza,andJordiAtserias.2008.LearningtoTagandTag-gingtoLearn:ACaseStudyonWikipedia.vol-ume23,pages26–33.MikeMintz,StevenBills,RionSnow,andDanJurafsky.2009.Distantsupervisionforrelationextractionwith-outlabeleddata.InProceedingsoftheJointConfer-enceofthe47thAnnualMeetingoftheACLandthe4thInternationalJointConferenceonNaturalLan-guageProcessingoftheAFNLP:Volume2-Volume2,ACL’09,pages1003–1011,Stroudsburg,Pennsylvanie,USA.AssociationforComputationalLinguistics.DavidNadeauandSatoshiSekine.2007.Asurveyofnamedentityrecognitionandclassiﬁcation.Lingvisti-caeInvestigationes,30(1):3–26.DavidNadeau.2007.Semi-supervisednamedentityrecognition:learningtorecognize100entitytypeswithlittlesupervision.Truc-VienT.NguyenandAlessandroMoschitti.2011.End-to-endrelationextractionusingdistantsupervi-sionfromexternalsemanticrepositories.InPro-ceedingsofthe49thAnnualMeetingoftheAssocia-tionforComputationalLinguistics:HumanLanguageTechnologies:ShortPapers-Volume2,HLT’11,pages277–282,Stroudsburg,Pennsylvanie,USA.AssociationforComputationalLinguistics.JoelNothman,NickyRingland,WillRadford,TaraMur-phy,andJamesRCurran.2013.Learningmultilin-gualNamedEntityRecognitionfromWikipedia.Ar-tiﬁcialIntelligence,194:151–175.MaiOudahandKhaledFShaalan.2012.Apipelineara-bicnamedentityrecognitionusingahybridapproach.InCOLING,pages2159–2176.MariusPasca,DekangLin,JeffreyBigham,AndreiLif-chits,andAlpaJain.2006.Organizingandsearchingtheworldwideweboffacts-stepone:theone-millionfactextractionchallenge.InAAAI,volume6,pages1400–1405.AlexanderERichmanandPatrickSchone.2008.MiningWikiResourcesforMultilingualNamedEntityRecog-nition.InACL,pages1–9.EllenRiloffandRosieJones.1999.Learningdictionar-iesforinformationextractionbymulti-levelbootstrap-ping.InAAAI,pages474–479.SriparnaSahaandAsifEkbal.2013.Combiningmul-tipleclassiﬁersusingvotebasedclassiﬁerensembletechniquefornamedentityrecognition.Data&KnowledgeEngineering,85:15–39.SatoshiSekineetal.1998.NYU:DescriptionoftheJapaneseNEsystemusedforMET-2.InProceed-ingsoftheSeventhMessageUnderstandingConfer-ence(MUC-7),volume17.KhaledShaalanandHafsaRaza.2009.Nera:Namedentityrecognitionforarabic.JournaloftheAmeri-canSocietyforInformationScienceandTechnology,60(8):1652–1663.EdwinSimpson,StephenRoberts,IoannisPsorakis,andArfonSmith.2013.Dynamicbayesiancombinationofmultipleimperfectclassiﬁers.InDecisionMakingandImperfection,pages1–35.Springer.AndersSøgaard,AndersJohannsen,BarbaraPlank,DirkHovy,andHectorMartinez.2014.Whatsinap-valueinnlp?InProceedingsoftheeighteenthconferenceoncomputationalnaturallanguagelearning(CONLL14),pages1–10.SamTardif,JamesR.Curran,andTaraMurphy.2009.ImprovedTextCategorisationforWikipediaNamedEntities.InProceedingsoftheAustralasianLanguageTechnologyAssociationWorkshop,pages104–108.SergeyTulyakov,StefanJaeger,VenuGovindaraju,andDavidDoermann.2008.Reviewofclassiﬁercombi-nationmethods.InMachineLearninginDocumentAnalysisandRecognition,pages361–386.Springer.MerijnVanErp,LouisVuurpijl,andLambertSchomaker.2002.Anoverviewandcomparisonofvotingmethodsforpatternrecognition.InEighthInternationalWork-shoponFrontiersinHandwritingRecognition,pages195–200.IEEE.HansVanHalteren,WalterDaelemans,andJakubZa-vrel.2001.Improvingaccuracyinwordclasstaggingthroughthecombinationofmachinelearningsystems.Computationallinguistics,27(2):199–229.WajdiZaghouani.2014.CriticalSurveyoftheFreelyAvailableArabicCorpora.InWorkshoponFree/Open-SourceArabicCorporaandCorporaPro-cessingTools,pages1–8,Reykjavik,Iceland.TongZhang,FredDamerau,andDavidJohnson.2002.Textchunkingbasedonageneralizationofwinnow.TheJournalofMachineLearningResearch,2:615–637. l D o w n o un d e d f r o m H t t p : / / d je r e c t . m je t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 3 6 1 5 6 6 7 6 4 / / t l a c _ a _ 0 0 1 3 6 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 256
Télécharger le PDF

Recherche en IA spécialisée au MIT

Recherche en IA spécialisée au MIT

Transactions of the Association for Computational Linguistics, vol. 3, pp. 243–255, 2015. Action Editor: Ryan McDonald.