计算语言学协会会刊, 2 (2014) 419–434. 动作编辑器: Alexander Koller.

计算语言学协会会刊, 2 (2014) 419–434. 动作编辑器: Alexander Koller.
Submitted 10/2013; 修改 6/2014; 已发表 10/2014. C(西德:13)2014 计算语言学协会.

419

BuildingaState-of-the-ArtGrammaticalErrorCorrectionSystemAllaRozovskayaCenterforComputationalLearningSystemsColumbiaUniversityNewYork,NY10115alla@ccls.columbia.eduDanRothDepartmentofComputerScienceUniversityofIllinoisUrbana,IL61801danr@illinois.eduAbstractThispaperidentifiesandexaminesthekeyprinciplesunderlyingbuildingastate-of-the-artgrammaticalerrorcorrectionsystem.WedothisbyanalyzingtheIllinoissystemthatplacedfirstamongseventeenteamsinthere-centCoNLL-2013sharedtaskongrammaticalerrorcorrection.Thesystemfocusesonfivedifferenttypesoferrorscommonamongnon-nativeEnglishwriters.Wedescribefourdesignprinciplesthatarerelevantforcorrectingalloftheseer-rors,analyzethesystemalongthesedimen-sions,andshowhoweachofthesedimensionscontributestotheperformance.1IntroductionThefieldoftextcorrectionhasseenanincreasedinterestinthepastseveralyears,withafocusoncorrectinggrammaticalerrorsmadebyEnglishasaSecondLanguage(ESL)learners.Threecompeti-tionsdevotedtoerrorcorrectionfornon-nativewrit-erstookplacerecently:HOO-2011(DaleandKil-garriff,2011),HOO-2012(Daleetal.,2012),andtheCoNLL-2013sharedtask(Ngetal.,2013).Themostrecentandmostprominentamongthese,theCoNLL-2013sharedtask,coversseveralcommonESLerrors,includingarticleandprepositionusagemistakes,mistakesinnounnumber,andvariousverberrors,asillustratedinFig.1.1Seventeenteamsthat1TheCoNLL-2014sharedtaskthatcompletedatthetimeofwritingthispaperwasanextensionoftheCoNLL-2013com-petition(Ngetal.,2014)butaddressedalltypesoferrors.TheIllinois-Columbiasubmission,aslightlyextendedversionoftheNowadays*phone/phones*has/havemanyfunctionalities,*included/including*∅/acameraand*∅/aWi-Fireceiver.Figure1:ExamplesofrepresentativeESLerrors.participatedinthetaskdevelopedawidearrayofap-proachesthatincludediscriminativeclassifiers,lan-guagemodels,statisticalmachine-translationsys-tems,andrule-basedmodules.Manyofthesystemsalsomadeuseoflinguisticresourcessuchasaddi-tionalannotatedlearnercorpora,anddefinedhigh-levelfeaturesthattakeintoaccountsyntacticandse-manticknowledge.Eventhoughthesystemsincorporatedsimilarre-sources,thescoresvariedwidely.Thetopsystem,fromtheUniversityofIllinois,obtainedanF1scoreof31.202,whilethesecondteamscored25.01andthemedianresultwas8.48points.3Theseresultssuggestthatthereisnotenoughunderstandingofwhatworksbestandwhatelementsareessentialforbuildingastate-of-the-arterrorcorrectionsystem.Inthispaper,weidentifykeyprinciplesforbuild-ingarobustgrammaticalerrorcorrectionsystemandshowtheirimportanceinthecontextofthesharedtask.WedothisbyanalyzingtheIllinoissystemandevaluatingitalongseveraldimensions:choiceIllinoisCoNLL-2013system,rankedatthetop.Foradescrip-tionoftheIllinois-Columbiasubmission,wereferthereadertoRozovskayaetal.(2014A).2Thestate-of-the-artperformanceoftheIllinoissystemdis-cussedhereiswithrespecttoindividualcomponentsfordiffer-enterrors.ImprovementsinRozovskayaandRoth(2013)overtheIllinoissystemthatareduetojointlearningandinferenceareorthogonal,andtheanalysisinthispaperstillappliesthere.3F1mightnotbetheidealmetricforthistaskbutthiswastheonechosenintheevaluation.SeemoreinSec.6.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
1
9
3
1
5
6
6
9
1
3

/

/
t

A
C
_
A
_
0
0
1
9
3
p
d

.

F


y
G

e
s
t

t


n
0
9
S
e
p
e


e
r
2
0
2
3

420

oflearningalgorithm;choiceoftrainingdata(nativeorannotatedlearnerdata);modeladaptationtothemistakesmadebythewriters;andtheuseoflinguis-ticknowledge.Foreachdimension,severalimple-mentationsarecompared,包括,whenpossible,approacheschosenbyotherteams.Wealsovali-datetheobtainedresultsonanotherlearnercorpus.Overall,thispapermakestwocontributions:(1)weexplainthesuccessoftheIllinoissystem,和(2)weprovideanunderstandingandqualitativeanalysisofdifferentdimensionsthatareessentialforsuccessinthistask,withthegoalofaidingfutureresearchonit.GiventhattheIllinoissystemhasbeenthetopsysteminfourcompetitiveevaluationsoverthelastfewyears(HOOandCoNLL),webelievethattheanalysisweproposewillbeusefulforresearchersinthisarea.Inthenextsection,wepresenttheCoNLL-2013competition.Sec.3givesanoverviewoftheap-proachesadoptedbythetopfiveteams.Sec.4de-scribestheIllinoissystem.InSec.5,theanalysisoftheIllinoissystemispresented.Sec.6offersabriefdiscussion,andSec.7concludesthepaper.2TaskDescriptionTheCoNLL-2013sharedtaskfocusesonfivecommonmistakesmadebyESLwriters:arti-cle/determiner,preposition,nounnumber,verbagreement,verbform.ThetrainingdataofthesharedtaskistheNUCLEcorpus(Dahlmeieretal.,2013),whichcontainsessayswrittenbylearnersofEnglish(wealsorefertoitaslearnerdataorsharedtasktrainingdata).Thetestdataconsistsof50essaysbystudentsfromthesamelinguisticback-ground.Thetrainingandthetestdatacontain1.2Mand29Kwords,respectively.Table1showsthenumberoferrorsbytypeandtheerrorrates.Determinererrorsarethemostcom-monandaccountfor42.1%ofallerrorsintraining.Notethatthetestdatacontainsamuchlargerpro-portionofannotatedmistakes;e.g.determinererrorsoccurfourtimesmoreofteninthetestdatathaninthetrainingdata(only2.4%ofnounphrasesinthetrainingdatahavedeterminererrors,versus10%inthetestdata).Thedifferencesmightbeattributedtodifferencesinannotationstandards,annotators,orwriters,asthetestdatawasannotatedatalatertime.Thesharedtaskprovidedtwosetsoftestan-ErrorNumberoferrorsanderrorratesTrainTestArt.6658(2.4%)690(10.0%)Prep.2404(2.0%)311(10.7%)Noun3779(1.6%)396(6.0%)Verbagr.1527(2.0%)124(5.2%)Verbform1453(0.8%)122(2.5%)Table1:StatisticsonannotatederrorsintheCoNLL-2013sharedtaskdata.Percentagedenotestheerrorrates,i.e.thenumberoferroneousinstanceswithrespecttothetotalnumberofrelevantinstancesinthedata.notations:theoriginalannotateddataandasetwithadditionalrevisionsthatalsoincludesalternativean-notationsproposedbyparticipants.Clearly,havingalternativeanswersistherightapproachastherearetypicallymultiplewaystocorrectanerror.How-ever,becausethealternativesarebasedontheerroranalysisoftheparticipatingsystems,therevisedsetmaybebiased(Ngetal.,2013).最后,wereportresultsontheoriginalset.3ModelDimensionsTable2summarizesapproachesandmethodologiesofthetopfivesystems.TheprevailingapproachconsistsinbuildingastatisticalmodeleitheronlearnerdataoronamuchlargercorpusofnativeEn-glishdata.Fornativedata,severalteamsmakeuseoftheWeb1T5-gramcorpus(henceforthWeb1T,(BrantsandFranz,2006)).NARAemploysastatis-ticalmachinetranslationmodelfortwoerrortypes;twosystemshaverule-basedcomponentsforse-lectederrors.BasedontheanalysisoftheIllinoissystem,weidentifythefollowing,inter-dependent,dimensionsthatwillbeexaminedinthiswork:1.Learningalgorithm:Mostoftheteams,includ-ingIllinois,builtstatisticalmodels.Weshowthatthechoiceofthelearningalgorithmisveryimpor-tantandaffectstheperformanceofthesystem.2.Adaptationtolearnererrors:Previousstud-ies,例如(RozovskayaandRoth,2011)showedthatadaptation,i.e.developingmodelsthatutilizeknowledgeabouterrorpatternsofthenon-nativewriters,isextremelyimportant.Wesummarizeadaptationtechniquesproposedearlierandexaminetheirimpactontheperformanceofthesystem.3.Linguisticknowledge:Itisessentialtousesomelinguisticknowledgewhendevelopingerrorcorrec-tionmodules,e.g.,toidentifywhichtypeofverb

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
1
9
3
1
5
6
6
9
1
3

/

/
t

A
C
_
A
_
0
0
1
9
3
p
d

.

F


y
G

e
s
t

t


n
0
9
S
e
p
e


e
r
2
0
2
3

421

SystemErrorApproachIllinois(Rozovskayaetal.,2013)Art.APmodelonNUCLEwithword,销售点,shallowparsefeaturesPrep.NBmodeltrainedonWeb1TandadaptedtolearnererrorsNoun/Agr./FormNBmodeltrainedonWeb1TNTHU(Kaoetal.,2013)AllCountmodelwithbackofftrainedonWeb1THIT(Xiangetal.,2013)Art./Prep./NounMEonNUCLEwithword,销售点,dependencyfeaturesAgr./FormRule-basedNARA(Yoshimotoetal.,2013)Art./Prep.SMTmodeltrainedonlearnerdatafromLang-8corpusNounMEmodelonNUCLEwithword,POSanddependencyfeaturesAgr./FormTreeletLMonGigawordandPennTreeBankcorporaUMC(Xingetal.,2013)Art./Prep.TwoLMs–onNUCLEandWeb1Tcorpus–withvotingNounRulesandMEmodelonNUCLE+LMtrainedonWeb1TAgr./FormMEmodelonNUCLE(agr.)andrules(形式)Table2:TopsystemsintheCoNLL-2013sharedtask.Thesecondcolumnindicatestheerrortype;thethirdcolumndescribestheapproachadoptedbythesystem.MEstandsforMaximumEntropy;LMstandsforlanguagemodel;SMTstandsforStatisticalMachineTranslation;APstandsforAveragedPerceptron;NBstandsforNa¨ıveBayes.ClassifierArt.Prep.NounAgr.FormTrain254K103K240K75K175KTest6K2.5K2.6K2.4K4.8KTable3:Numberofcandidatewordsbyclassifiertype.erroroccursinagivencontext,beforetheappropri-atecorrectionmoduleisemployed.Wedescribeandevaluatethecontributionoftheseelements.4.Trainingdata:WediscusstheadvantagesoftrainingonlearnerdataornativeEnglishdatainthecontextofthesharedtaskandinbroadercontext.4TheIllinoisSystemTheIllinoissystemconsistsoffivemachine-learningmodels,eachspecializingincorrectingoneoftheer-rorsdescribedabove.Thewordsthatareselectedasinputtoaclassifierarecalledcandidates(Table3).Intheprepositionsystem,forexample,candidatesaredeterminedbysurfaceforms.Inothersystems,determiningthecandidatesmightbemoreinvolved.Allmodulestakeasinputthecorpusdocumentspre-processedwithapart-of-speechtagger4(Even-ZoharandRoth,2001)andshallowparser5(Pun-yakanokandRoth,2001).IntheIllinoissubmis-sion,somemodulesaretrainedonnativedata,oth-ersonlearnerdata.Themodulestrainedonlearnerdatamakeuseofadiscriminativealgorithm,while4http://cogcomp.cs.illinois.edu/page/softwareview/POS5http://cogcomp.cs.illinois.edu/page/softwareview/Chunkernative-trainedmodulesmakeuseoftheNa¨ıveBayes(NB)algorithm.TheIllinoissystemhasanoptionforapost-processingstepwherecorrectionsthatal-waysresultinafalsepositiveintrainingareignoredbutthisoptionisnotusedhere.4.1DeterminerErrorsThemajorityofdeterminererrorsinvolvearticles,althoughsomeerrorsalsoinvolvepronouns.TheIllinoissystemaddressesonlyarticleerrors.Can-didatesincludearticles(“a”,“an”,“the”)6andomis-sions,byconsideringnoun-phrase-initialcontextswhereanarticleislikelytobeomitted.Thecon-fusionsetforarticlesisthus{A,这,}.Thear-ticleclassifieristhesameastheoneintheHOOsharedtasks(Rozovskayaetal.,2012;Rozovskayaetal.,2011),whereitdemonstratedsuperiorper-formance.ItisadiscriminativemodelthatmakesuseoftheAveragedPerceptronalgorithm(AP,(Fre-undandSchapire,1996))implementedwithLBJava(RizzoloandRoth,2010)andistrainedonlearnerdatawithrichfeaturesandadaptationtolearnerer-rors.SeeSec.5.2andSec.5.3.4.2PrepositionErrorsSimilartodeterminers,wedistinguishthreetypesofprepositionmistakes:choosinganincorrectprepo-sition,usingasuperfluouspreposition,andomittingapreposition.Incontrasttodeterminers,forlearn-ersofmanyfirstlanguagebackgrounds,mostoftheprepositionerrorsarereplacements,i.e.,wherethe6Thevariants“a”and“an”arecollapsedtooneclass.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
1
9
3
1
5
6
6
9
1
3

/

/
t

A
C
_
A
_
0
0
1
9
3
p
d

.

F


y
G

e
s
t

t


n
0
9
S
e
p
e


e
r
2
0
2
3

422

“Hence,theenvironmentalfactorsalso*contributes/contributetovariousdifficulties,*giving/givenprob-lemsinnucleartechnology.”ErrorConfusionsetAgr.{INF=contribute,S=contributes}Form{INF=give,ED=given,ING=giving,S=gives}Table4:Confusionsetsforagreementandform.Forirreg-ularverbs,thesecondcandidateintheconfusionsetforVerbformisthepastparticiple.authorcorrectlyrecognizedtheneedforaprepo-sition,butchosethewrongone(Leacocketal.,2010).然而,learnererrorsdependonthefirstlanguage;inNUCLE,spuriousprepositionsoccurmorefrequently:29%versus18%ofallprepositionmistakesinotherlearnercorpora(RozovskayaandRoth,2010A;Yannakoudakisetal.,2011).TheIllinoisprepositionclassifierisaNBmodeltrainedonWeb1Tthatuseswordn-gramfeaturesinthe4-wordwindowaroundthepreposition.The4-wordwindowreferstothefourwordsbeforeandthefourwordsafterthepreposition,e.g.“problemasthesearchofalternativeresourcestothe”forthepreposition“of”.Featuresconsistofwordn-gramsofvariouslengthsspanningthetargetpreposition.Forexample,“thesearchof”isa3-gramfeature.Themodelisadaptedtolikelyprepositionconfu-sionsusingthepriorsmethod(seeSec.5.2).TheIllinoismodeltargetsreplacementerrorsofthe12mostcommonEnglishprepositions.Hereweaug-mentittoidentifyspuriousprepositions.Thecon-fusionsetforprepositionsisasfollows:{在,的,在,为了,到,在,关于,和,从,经过,进入,期间,}.4.3AgreementandFormErrorsTheIllinoissystemimplementstwoverbmodules–agreementandform–thatconsistofthefollowingcomponents:(1)candidateidentification;(2)deter-miningtherelevantmoduleforeachcandidatebasedonverbfiniteness;(3)correctionmodulesforeacherrortype.Theconfusionsetforverbsdependsonthetargetwordandincludesitsmorphologicalvari-ants(Table4).Forirregularverbs,thepastpartici-pleformisincluded,whilethepasttenseformisnot(i.e.“given”isincludedbut“gave”isnot),sincetenseerrorsarenotpartofthetask.Togeneratemorphologicalvariants,thesystemmakesuseofamorphologicalanalyzerverbMorph;itassumes(1)alistofvalidverblemmas(compiledusingaPOS-DimensionSystemsusedinthecomparisonLearn.alg.(Sec.5.1)NTHU,UMCAdaptation(Sec.5.2)Errorinflation:HITLing.knowledgeCand.identification:NTHU,HIT(Sec.5.3)Verbfiniteness:NTHUTrain.data(Sec.5.4)HIT,NARATable5:Systemcomparisons.Column1indicatesthedi-mension,andcolumn2listssystemswhoseapproachesprovidearelevantpointofcomparison.taggedversionoftheNYTsectionoftheGigawordcorpus)和(2)alistofirregularEnglishverbs.7CandidateIdentificationstageselectsthesetofwordsthatarepresentedasinputtotheclassifier.Thisisacrucialstep:errorsmissedatthisstagewillnotbedetectedbythelaterstages.SeeSec.5.3.VerbFinitenessisusedintheIllinoissystemtosep-aratelyprocessverbsthatfulfilldifferentgrammati-calfunctionsandthusaremarkedfordifferentgram-maticalproperties.SeeSec.5.3.CorrectionModulesTheagreementmoduleisabi-naryclassifier.Theformmoduleisa4-classsystem.BothclassifiersaretrainedontheWeb1Tcorpus.4.4NounErrorsNounnumbererrorsinvolveconfusingsingularandpluralnounforms(e.g.“phone”insteadof“phones”inFig.1)andarethesecondmostcommonerrortypeintheNUCLEcorpusafterdeterminermistakes(Table1).TheIllinoisnounmoduleistrainedontheWeb1TcorpususingNB.Similartoverbs,candi-dateidentificationisanimportantstepinthenounclassifier.SeeSec.5.3.5SystemAnalysisInthissection,weevaluatetheIllinoissystemalongthefourdimensionsidentifiedinSec.3,compareitscomponentstoalternativeconfigurationsimple-mentedbyotherteams,andpresentadditionalexper-imentsthatfurtheranalyzeeachdimension.Whileadirectcomparisonwithothersystemsisnotalwayspossibleduetootherdifferencesbetweenthesys-tems,webelievethattheseresultsarestilluseful.Table5listssystemsusedforcomparion.Itisim-portanttonotethatthedimensionsarenotindepen-dent.Forinstance,thereisacorrelationbetweenalgorithmchoiceandtrainingdata.7Thetoolandmoredetailaboutitcanbefoundathttp://cogcomp.cs.illinois.edu/page/publicationview/743

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
1
9
3
1
5
6
6
9
1
3

/

/
t

A
C
_
A
_
0
0
1
9
3
p
d

.

F


y
G

e
s
t

t


n
0
9
S
e
p
e


e
r
2
0
2
3

423

ResultsarereportedonthetestdatausingF1com-putedwiththeCoNLLscorer(DahlmeierandNg,2012).Error-specificresultsaregeneratedbasedontheoutputofindividualmodules.Notethatthesearenotdirectlycomparabletoerror-specificresultsintheCoNLLoverviewpaper:thelatterareapprox-imateastheorganizersdidnothavetheerrortypeinformationforcorrectionsintheoutput.Thecom-pletesystemincludestheunionofcorrectionsmadebyeachofthesemodules,wherethecorrectionsareappliedinorder.Orderingoverlappingcandidates8mightpotentiallyaffectthefinaloutput,whenmod-ulescorrectlyidentifyanerrorbutproposediffer-entcorrections,butthisdoesnothappeninpractice.ModulesthatarepartoftheIllinoissubmissionaremarkedwithanasteriskinalltables.Todemonstratethatourfindingsarenotspe-cifictoCoNLL,wealsoshowresultsontheFCEdataset.Itisproducedbylearnersfromseventeenfirstlanguagebackgroundsandcontains500,000wordsfromtheCambridgeLearnerCorpus(CLC)(Yannakoudakisetal.,2011).Wesplitthecorpusintotwoequalparts–trainingandtest.Thestatis-ticsareshowninAppendixTablesA.16andA.17.5.1Dim.1:LearningAlgorithmRozovskayaandRoth(2011,Sec.3)discussthere-lationsbetweentheamountoftrainingdata,learn-ingalgorithms,andtheresultingperformance.Theyshowthatontrainingsetsofsimilarsizes,discrimi-nativeclassifiersoutperformothermachinelearningmethodsonthistask.Followingtheseresults,theIllinoisarticlemodulethatistrainedontheNUCLEcorpususesthediscriminativeapproachAP.MostoftheotherteamsthattrainontheNUCLEcorpusalsouseadiscriminativemethod.However,whenaverylargenativetrainingsetsuchastheWeb1Tcorpusisavailable,itisoftenad-vantageoustouseit.TheWeb1Tcorpusisacollec-tionofn-gramcountsoflengthonetofiveoveracor-pusof1012words.Sincethecorpusdoesnotcomewithcompletesentences,itisnotstraightforwardtomakeuseofadiscriminativeclassifierbecauseofthelimitedwindowprovidedaroundeachexample:trainingadiscriminativemodelwouldlimitthesur-8Overlappingcandidatesareincludedinmorethanonemodule:if“work”istaggedasNN,itisincludedinthenounmodule,butalsointheformmodule(asavalidverblemma).roundingcontextfeaturestoa2-wordwindow.Be-causewewishtomakeuseofthecontextfeaturesthatextendbeyondthe2-wordwindow,itisonlypossibletousecount-basedmethods,suchasNBorLM.SeveralteamsmakeuseoftheWeb1Tcorpus:UMCusesacount-basedLMforarticle,preposition,andnounnumbererrors;NTHUaddressesallerrorswithacount-basedmodelwithbackoff,whichises-sentiallyavariationofalanguagemodelwithback-off.TheIllinoissystememploystheWeb1Tcorpusforallerrors,exceptarticles,usingNB.TrainingNa¨ıveBayesforDeletionsandInser-tionsThereasonfornotusingtheWeb1TcorpusforarticleerrorsisthattrainingNBonWeb1Tfordeletionsandinsertionspresentsaproblem,andthemajorityofarticleerrorsareofthistype.RecallthatWeb1Tcontainsonlyn-gramcounts,whichmakesitdifficulttoestimatethepriorcountforthe∅candi-date.(Withaccesstocompletesentences,thepriorof∅isestimatedbycountingthetotalnumberof∅candidates;e.g.,incaseofarticles,thenumberofNPswith∅articleiscomputed.)Wesolvethisprob-lembytreatingthearticleandthewordfollowingitasonetarget.Forinstance,toestimatepriorcountsforthearticlecandidatesinfrontoftheword“cam-era”in“includingcamera”,weobtaincountsfor“camera”,“acamera”,“thecamera”.Inthecaseofthe∅candidate,theword“camera”actsasthetar-get.Thus,theconfusionsetforthearticleclassifierismodifiedasfollows:insteadofthethreearticles(asshowninSec.4.1),eachmemberoftheconfu-sionsetisaconcatenationofthearticleandthewordthatfollowsit,例如{acamera,thecamera,cam-era}.Thecountsforcontextualfeaturesareobtainedsimilarly,e.g.afeaturethatincludesaprecedingwordwouldcorrespondtothecountof“includingx”,wherexcantakeanyvaluefromtheconfusionset.TheabovesolutionallowsustotrainNBforar-ticleerrorsandtoextendtheprepositionclassifiertohandleextraneousprepositionerrors(Table6).RozovskayaandRoth(2011)studyseveralalgo-rithmstrainedontheWeb1Tcorpusandobservethat,whenevaluatedwiththesamecontextwin-dowsize,NBperformsbetterthanothercount-basedmethods.Inordertoshowtheimpactofthealgo-rithmchoice,inTable6,wecompareLMandNBmodels.Bothmodelsusewordn-gramsspanningthetargetwordinthe4-wordwindow.WetrainLMs

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
1
9
3
1
5
6
6
9
1
3

/

/
t

A
C
_
A
_
0
0
1
9
3
p
d

.

F


y
G

e
s
t

t


n
0
9
S
e
p
e


e
r
2
0
2
3

424

ErrorModelF1CoNLLFCEArt.LM21.1124.15NB32.4530.78Prep.LM12.0930.01NB14.0429.40NounLM40.7232.41NB*42.6034.40Agr.LM20.6533.53NB*26.4636.42FormLM13.4008.46NB*14.5012.16Table6:Comparisonoflearningmodels.Web1Tcorpus.ModulesthatarepartoftheIllinoissubmissionaremarkedwithanasterisk.SourceCandidatesEDINFINGSED0.996750.001920.001030.00030INF0.001770.996300.001680.00025ING0.001240.004470.994070.00022S0.000540.005440.001320.99269Table7:PriorsconfusionmatrixusedforadaptingNB.EachentryshowsProb(candidate|来源),wheresourcecorre-spondstotheverbformchosenbytheauthor.withSRILM(Stolcke,2002)usingJelinek-Mercerlinearinterpolationasasmoothingmethod(ChenandGoodman,1996).OntheCoNLLtestdata,NBoutperformsLMonallerrors;ontheFCEcorpus,NBissuperioronallerrors,exceptprepositioner-rors,whereLMoutperformsNBonlyveryslightly.Weattributethistothefactthattheprepositionprob-lemhasmorelabels;whenthereisabigconfusionset,morefeatureshavedefaultsmoothweights,sothereisnoadvantagetorunningNB.Wefoundthatwithfewerclasses(6ratherthan12prepositions),NBoutperformsLM.Itisalsopossiblethatwhenwehavealotoflabels,thetheoreticaldifferencebe-tweenthealgorithmsdisappears.NotethatNBcanbeimprovedviaadaptation(nextsection)andthenitoutperformstheLMalsoforprepositionerrors.5.2Dim.2:AdaptationtoLearnerErrorsIntheprevioussection,themodelsweretrainedonnativedata.Thesemodelshavenonotionoftheer-rorpatternsofthelearners.Herewediscussmodeladaptationtolearnererrors,i.e.developingmodelsthatutilizetheknowledgeaboutthetypesofmis-takeslearnersmake.Adaptationisbasedonthefactthatlearnersmakemistakesinasystematicmanner,e.g.errorsareinfluencedbythewriter’sfirstlan-guage(GassandSelinker,1992;Ioninetal.,2008).Therearedifferentwaystoadaptamodelthatde-pendonthetypeoftrainingdata(learnerornative)andthealgorithmchoice.ThekeyapplicationofadaptationisformodelstrainedonnativeEnglishdata,becausethelearnedmodelsdonotknowany-thingabouttheerrorslearnersmake.Withadapta-tion,modelstrainedonnativedatacanusetheau-thor’sword(thesourceword)asafeatureandthusproposeacorrectionbasedonwhattheauthororig-inallywrote.Thisiscrucial,asthesourcewordisanimportantpieceofinformation(RozovskayaandRoth,2010乙).以下,severaladaptationtechniquesaresummarizedandevaluated.TheIllinoissystemmakesuseofadaptationinthearticlemodelviatheinflationmethodandadaptsitsNBprepositionclas-sifiertrainedonWeb1Twiththepriorsmethod.AdaptingNBThepriorsmethod(RozovskayaandRoth,2011,Sec.4)isanadaptationtechniqueforaNBmodeltrainedonnativeEnglishdata;itisbasedonchangingthedistributionofpriorsoverthecor-rectioncandidates.Candidatepriorisaspecialpa-rameterinNB;whenNBistrainedonnativedata,candidatepriorscorrespondtotherelativefrequen-ciesofthecandidatesinthenativecorpusanddonotprovideanyinformationontherealdistributionofmistakesandthedependenceofthecorrectiononthewordusedbytheauthor.Inthepriorsmethod,candidatepriorsarechangedusinganerrorconfusionmatrixbasedonlearnerdatathatspecifieshowlikelyeachconfusionpairis.Table7showstheconfusionmatrixforverbformerrors,computedontheNUCLEdata.Adaptedpri-orsaredependentontheauthor’soriginalverbformused:letsbeaformoftheverbappearinginthesourcetext,andcacorrectioncandidate.Thentheadaptedpriorofcgivensis:prior(C|s)=C(s,C)C(s)whereC(s)denotesthenumberoftimessappearedinthelearnerdata,andC(s,C)denotesthenumberoftimescwasthecorrectformwhenswasusedbyawriter.Theadaptedpriorsdifferbythesource:theprobabilityofcandidateINFwhenthesourceformisS,ismorethantwicethanwhenthesourceformis

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
1
9
3
1
5
6
6
9
1
3

/

/
t

A
C
_
A
_
0
0
1
9
3
p
d

.

F


y
G

e
s
t

t


n
0
9
S
e
p
e


e
r
2
0
2
3

425

ErrorModelF1CoNLLFCETrainTestArt.NB18.2832.4530.78NB-adapted19.1834.4931.76Prep.NB09.0314.0429.40NB-adapted*10.9412.1432.22NounNB*23.0642.6034.40NB-adapted22.8942.3132.38Agr.NB*16.7226.4636.42NB-adapted17.6223.4638.57FormNB*11.9314.5012.16NB-adapted14.6318.3516.67Table8:AdaptingNBwiththepriorsmethod.AllmodelsaretrainedontheWeb1Tcorpus.ModulesthatarepartoftheIllinoissubmissionaremarkedwithanasterisk.ED;theprobabilitythatSisthecorrectformisveryhigh,whichreflectsthelowerrorrates.Table8comparesNBandNB-adaptedmodels.BecauseofthedichotomyintheerrorratesinCoNLLtrainingandtestdata,wealsoshowexper-imentsusing5-foldcross-validationonthetrainingdata.AdaptationalwayshelpsontheCoNLLtrain-ingdataandtheFCEdata(exceptnounerrors),butonthetestdataitonlyhelpsonarticleandverbformerrors.Thisisduetodiscrepanciesintheerrorrates,asadaptationexploitsthepropertythatlearnererrorsaresystematic.Indeed,whenpriorsareestimatedonthetestdata(in5-foldcross-validation),theperfor-manceimproves,e.g.theprepositionmoduleattainsanF1of18.05insteadof12.14.Concerninglackofimprovementonnounnum-bererrors,wehypothesizethattheseerrorsdifferfromtheothermistakesinthattheappropriateformstronglydependsonthesurfaceformofthenoun,whichwould,inturn,suggestthatthedependencyofthelabelonthegrammaticalformofthesourcethattheadaptationistryingtodiscoverisweak.In-deed,thepriordistributionof{singular,plural}la-belspacedoesnotchangemuchwhenthesourcefeatureistakenintoaccount.Theunadaptedpriorsfor“singular”and“plural”are0.75and0.25,respec-tively.Similarly,theadaptedpriors(singular|plural)和(plural|singular)are0.034and0.016,respec-tively.Inotherwords,theunadaptedpriorprobabil-ityfor“plural”isthreetimeslowerthanfor“singu-lar”,whichdoesnotchangemuchwithadaptation.Thisisdifferentforothererrors.Forinstance,incaseofverbagreement,theunadaptedpriorfor“plu-ral”is0.617,morethanthreetimesthanthe“sin-gular”priorof0.20.Withadaptation,thesepriorsbecomealmostthesame(0.016and0.012).AdaptingAPTheAPisadiscriminativelearningalgorithmanddoesnotusepriorsonthesetofcan-didates.Inordertoreflectourestimateoftheerrordistribution,theAPalgorithmisadapteddifferently,byintroducingintothenativedataartificialerrors,inaratethatreflectstheerrorsmadebytheESLwriters(RozovskayaandRoth,2010乙).Theideaistosimulatelearnererrorsintraining,througharti-ficialmistakes(alsoproducedusinganerrorconfu-sionmatrix).9Theoriginalmethodwasproposedformodelstrainedonnativedata.Thistechniquecanbefurtherenhancedusingtheerrorinflationmethod(Rozovskayaetal.,2012,Sec.6)appliedtomodelstrainedonnativeorlearnerdata.TheIllinoissystemuseserrorinflationinitsar-ticleclassifier.Becausethisclassifieristrainedonlearnerdata,thesourcearticlecanbeusedasafea-ture.However,sincelearnererrorsaresparse,thesourcefeatureencouragesthemodeltoabstainfromflaggingamistake,whichresultsinlowrecall.Theerrorinflationtechniqueaddressesthisproblembyboostingtheproportionoferrorsinthetrainingdata.Itdoesthisbygeneratingadditionalartificialerrorsusingtheerrordistributionfromthetrainingset.Table9showstheresultsofadaptingtheAPclas-sifierusingerrorinflation.(Weomitnounresults,sincethenounAPmodelperformsbetterwithoutthesourcefeature,whichissimilartothenounNBmodel,asdiscussedabove.)Theinflationmethodimprovesrecalland,最后,F1.Itshouldbenotedthatalthoughinflationalsodecreasespreci-sionitisstillhelpful.Infact,becauseofthelowerrorrates,performanceontheCoNLLdatasetwithnaturalerrorsisverypoor,oftenresultinginF1be-ingequalto0duetonoerrorsbeingdetected.Inflationvs.SamplingTodemonstratetheimpactoferrorinflation,wecompareitagainstsampling,anapproachusedbyotherteams–e.g.HIT–thatimprovesrecallbyremovingcorrectexamplesintraining.TheHITarticlemodelissimilartothe9TheideaofusingartificialerrorsgoesbacktoIzumietal.(2003)andwasalsousedinFosterandAndersen(2009).TheapproachdiscussedherereferstotheadaptationmethodinRo-zovskayaandRoth(2010乙)thatgeneratesartificialerrorsusingthedistributionofnaturally-occurringerrors.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
1
9
3
1
5
6
6
9
1
3

/

/
t

A
C
_
A
_
0
0
1
9
3
p
d

.

F


y
G

e
s
t

t


n
0
9
S
e
p
e


e
r
2
0
2
3

426

ErrorModelF1CoNLLFCEArt.AP(naturalerrors)07.0627.65AP(infl.const.0.9)*24.6130.96Prep.AP(naturalerrors)0.014.69AP(infl.const.0.7)07.3734.77Agr.AP(naturalerrors)0.008.05AP(infl.const.0.8)17.0631.03FormAP(naturalerrors)0.001.56AP(infl.const.0.9)10.5309.43Table9:AdaptingAPusingerrorinflation.Modelsaretrainedonlearnerdatawithwordn-gramfeaturesandthesourcefeature.Inflationconstantshowshowmanycorrectinstancesremain(例如.0.9indicatesthat90%ofcorrectexamplesareun-changed,while10%areconvertedtomistakes.)ModulesthatarepartoftheIllinoissubmissionaremarkedwithanasterisk.Infl.constantF1SamplingInflation0.9023.2224.610.8527.7529.290.8030.0433.470.7033.0235.520.6032.7835.03Table10:Comparisonoftheinflationandsamplingmeth-odsonarticleerrors(CoNLL).Theproportionoferrorsintrainingineachrowisidentical.Illinoismodelbutscoredthreepointsbelow.Ta-ble10showsthatsamplingfallsbehindtheinflationmethod,sinceitconsiderablyreducesthetrainingsizetoachievesimilarerrorrates.Theproportionoferrorsintrainingineachrowisidentical:samplingachievestheerrorratesbyremovingcorrectexam-ples,whereastheinflationmethodconvertssomepositiveexamplestoartificialmistakes.Inflationconstantshowshowmanycorrectinstancesremain;smallerinflationvaluescorrespondtomoreerro-neousinstancesintraining;thesamplingapproach,correspondingly,removesmorepositiveexamples.Tosummarize,wehavedemonstratedtheimpactoferrorinflationbycomparingittoasimilarmethodusedbyanotherteam;wehavealsoshownthatfur-therimprovementscanbeobtainedbyadaptingNBtolearnererrorsusingthepriorsmethod,whentrain-ingandtestdataexhibitsimilarerrorpatterns.5.3Dim.3:LinguisticKnowledgeTheuseoflinguisticknowledgeisimportantinsev-eralcomponentsoftheerrorcorrectionsystem:fea-tureengineering,candidateidentification,andspe-ErrorFeaturesF1CoNLLFCEArt.n-gram24.6130.96n-gram+POS+chunk*33.5035.66Agr.n-gram17.0631.03n-gram+POS24.1435.29n-gram+POS+syntax27.9341.23Table11:Featureevaluation.Modelsaretrainedonlearnerdata,usethesourcewordanderrorinflation.ModulesthatarepartoftheIllinoissubmissionaremarkedwithanasterisk.cialtechniquesforcorrectingverberrors.FeaturesItisknownfrommanyNLPtasksthatfeatureengineeringisimportant,andthisisthecasehere.Notethatthisisrelevantonlywhentrainingonlearnerdata,asmodelstrainedonWeb1Tcanmakeuseofn-gramfeaturesonlybutfortheNUCLEcor-puswehaveseverallayersoflinguisticannotation.10Wefoundthatforarticleandagreementerrors,usingdeeperlinguisticknowledgeisespeciallybeneficial.ThearticlefeaturesintheIllinoismodule,inaddi-tiontothesurfaceformofthecontext,encodePOSandshallowparseproperties.Thesefeaturesarepre-sentedinRozovskayaetal.(2013,Table3)andAp-pendixTableA.19.TheIllinoisagreementmoduleistrainedonWeb1Tbutfurtheranalysisrevealsthatitisbettertotrainonlearnerdatawithrichfeatures.Thewordn-gramandPOSagreementfeaturesarethesameasthoseinthearticlemodule.SyntacticfeaturesencodepropertiesofthesubjectoftheverbandarepresentedinRozovskayaetal.(2014乙,Table7)andAppendixTableA.18;thesearebasedonthesyntacticparser(KleinandManning,2003)andthedependencyconverter(Marneffeetal.,2006).Table11showsthataddingrichfeaturesishelp-ful.Notably,addingdeepersyntacticknowledgetotheagreementmoduleisuseful,althoughparsefeaturesarelikelytocontainmorenoise.11Foster(2007)andLeeandSeneff(2008)observeadegradeinperformanceonsyntacticparsersduetogrammat-icalnoisethatalsoincludesagreementerrors.Forarticles,wechosetoaddsyntacticknowledgefromshallowparseasitislikelytobesufficientforarti-clesandmoreaccuratethanfull-parsefeatures.CandidateIdentificationforerrorsonopen-class10Featureengineeringwillalsoberelevantwhentrainingonanativecorpusthathaslinguisticannotation.11Parsefeatureshavealsobeenfoundusefulinprepositionerrorcorrection(Tetreaultetal.,2010).

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
1
9
3
1
5
6
6
9
1
3

/

/
t

A
C
_
A
_
0
0
1
9
3
p
d

.

F


y
G

e
s
t

t


n
0
9
S
e
p
e


e
r
2
0
2
3

427

wordsisrarelydiscussedbutisacrucialstep:itisnotpossibletoidentifytherelevantcandidatesus-ingaclosedlistofwords,andtheprocedureneedstorelyonpre-processingtools,whoseperformanceonlearnerdataissuboptimal.12Rozovskayaetal.(2014乙,Sec.5.1)describeandevaluateseveralcan-didateselectionmethodsforverbs.TheIllinoissys-temimplementstheirbestmethodthataddressespre-processingerrors,byselectingwordstaggedasverbsaswellaswordstaggedasNN,whoselemmaisonthelistofvalidverblemmas(Sec.4.3).Followingdescriptionsprovidedbyseveralteams,weevaluateseveralcandidateselectionmethodsfornouns.ThefirstmethodincludeswordstaggedasNNorNNSthatheadanNP.NTHUandHITusethismethod;NTHUobtainedthesecondbestnounscore,aftertheIllinoissystem;itsmodelisalsotrainedonWeb1T.ThesecondmethodincludesallwordstaggedasNNandNNSandisusedinseveralothersystems,e.g.SZEG,(Berendetal.,2013).Theaboveproceduressufferfrompre-processingerrors.TheIllinoismethodaddressesthisproblembyaddingwordsthatendincommonnounsuffixes,e.g.“ment”,“ments”,and“ist”.ThepercentageofnounerrorsselectedascandidatesbyeachmethodandtheimpactofeachmethodontheperformanceareshowninTable12.TheIllinoismethodhasthebestresultonbothdatasets;onCoNLL,itimprovesF1scoreby2pointsandrecovers43%ofthecandi-datesthataremissedbythefirstapproach.OnFCE,thesecondmethodisabletorecovermoreerroneouscandidates,butitdoesnotperformaswellasthelastmethod,possibly,duetothenumberofnoisycandidatesitgenerates.Toconclude,pre-processingmistakesshouldbetakenintoconsideration,whencorrectingerrors,especiallyonopen-classwords.UsingVerbFinitenesstoCorrectVerbErrorsAsshowninTable4,thesurfacerealizationsthatcor-respondtotheagreementcandidatesareasubsetofthepossiblesurfacerealizationsoftheformclassi-fier.Onenaturalapproach,因此,istotrainoneclas-sifiertopredictthecorrectsurfaceformoftheverb.However,thesamesurfacerealizationmaycorre-spondtomultiplegrammaticalproperties.Thisob-12Candidateselectionisalsodifficultforclosed-classerrorsinthecaseofomissions,e.g.articles,butarticleerrorshavebeenstudiedratherextensively,例如(Hanetal.,2006),andwehavenoroomtoelaborateonithere.CandidateErrorrecall(%)F1ident.methodCoNLLFCECoNLLFCENPheads87.7292.3240.4734.16Allnouns89.5095.2941.0833.16Nouns+heuristics*92.8494.8642.6034.40Table12:Nouns:effectofcandidateidentificationmethodsonthecorrectionperformance.ModelsaretrainedusingNB.Errorrecalldenotesthepercentageofnounscontainingnumbererrorsthatareselectedascandidates.ModulesthatarepartoftheIllinoissubmissionaremarkedwithanasterisk.TrainingmethodF1CoNLLFCEOneclassifier16.4321.14Finiteness-basedtraining(我)18.5927.72Finiteness-basedtraining(二)21.0829.98Table13:Improvementduetoseparatetrainingforverberrors.ModelsaretrainedusingtheAPalgorithm.servationmotivatestheapproachthatcorrectsagree-mentandformerrorsseparately(Rozovskayaetal.,2014b).Itusesthelinguisticnotionofverbfinite-ness(雷德福,1988)thatdistinguishesbetweenfi-niteandnon-finiteverbs,eachofwhichfulfilldiffer-entgrammaticalfunctionsandthusaremarkedfordifferentgrammaticalproperties.Verbfinitenessisusedtodirecteachverbtotheappropriateclassifier.Thecandidatesfortheagree-mentmoduleareverbsthattakeagreementmarkers:thefinitesurfaceformsofthebe-verbs(“is”,“are”,“was”,and“were”),auxiliaries“have”and“has”,andfiniteverbstaggedasVBandVBZthathaveex-plicitsubjects(identifiedwiththeparser).Theformcandidatesarenon-finiteverbsandsomeoftheverbswhosefinitenessisambiguous.Table13comparesthetwoapproaches:whenallverbsarehandledtogether;andwhenverbsarepro-cessedseparately.AlloftheclassifiersusesurfaceformandPOSfeaturesofthewordsinthe4-wordwindowaroundtheverb.Severalsubsetsofthesefeaturesweretried;thesingleclassifierusesthebestcombination,whichisthesamewordandPOSfea-turesshowninAppendixTableA.19.Finiteness-basedclassifier(我)usesthesamefeaturesforagree-mentandformasthesingleclassifier.Whentrainingseparately,wecanalsoexplorewhetherdifferenterrorsbenefitfromdifferentfea-tures;finiteness-basedclassifier(二)optimizesfea-turesforeachclassifier.Thedifferencesinthefea-turesetsareminorandconsistofremovingseveral

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
1
9
3
1
5
6
6
9
1
3

/

/
t

A
C
_
A
_
0
0
1
9
3
p
d

.

F


y
G

e
s
t

t


n
0
9
S
e
p
e


e
r
2
0
2
3

428

unigramwordandPOSfeaturesoftokensthatdonotappearimmediatelynexttotheverb.Recallfromthediscussiononfeaturesthattheagreementmodulecanbefurtherimprovedbyaddingsyntacticknowl-edge.Inthenextsection,itisshownthatanevenbetterapproachistotrainonlearnerdataforagree-mentmistakesandonnativedataforformerrors.TheresultsinTable13areforAPmodelsbutsim-ilarimprovementsduetoseparatetrainingareob-servedforNBmodelstrainedonWeb1T.NotethattheNTHUsystemalsocorrectsallverberrorsus-ingamodeltrainedonWeb1Tbuthandlesalltheseerrorstogether;itsverbmodulescored8F1pointsbelowtheIllinoisone.Whilethereareotherdiffer-encesbetweenthetwosystems,theresultssuggestthatpartoftheimprovementwithintheIllinoissys-temisindeedduetohandlingthetwoerrorssepa-rately.5.4Dim.4:TrainingDataNUCLEisalargecorpusproducedbylearnersofthesamelanguagebackgroundasthetestdata.Becauseofitslargesize,trainingonthiscorpusisanaturalchoice.Indeed,manyteamsfollowthisapproach.Ontheotherhand,animportantissueintheCoNLLtaskisthedifferencebetweenthetrainingandtestsets,whichhasimpactontheselectionofthetrain-ingset–thelargeWeb1Thasmorecoverageandallowsforbettergeneralization.Weshowthatforsomeerrorsitisespeciallyadvantageoustotrainonalargercorpusofnativedata.ItshouldbenotedthatwhilewerefertotheWeb1Tcorpusas“native”,itcertainlycontainsdatafromlanguagelearners;weassumethatthenoisecanbeneglected.Table14comparesmodelstrainedonnativeandlearnerdataintheirbestconfigurationsbasedonthetrainingdata.Overall,wefindthatWeb1Tisclearlypreferablefornounerrors.Weattributethistotheobservationthatnounnumberusagestronglydependsonthesurfaceformofthenoun,andnotjustthecontextualcuesandsyntacticstructure.Forexample,certainnounsinEnglishtendtobeusedexclusivelyinsingularorpluralform.Thus,con-siderablymoredatacomparedtoothererrortypesisrequiredtolearnmodelparameters.Onarticleandprepositionerrors,native-trainedmodelsperformslightlybetteronCoNLL,whilelearner-trainedmodelsarebetteronFCE.Wecon-ErrorTrain.LearningFeaturesF1dataalgorithmCoNLLFCEArt.NativeNB-adapt.n-gram34.4931.76LearnerAP-infl.*+POS+chunk33.5035.66Prep.NativeLM;NB-adapt.n-gram12.0932.22LearnerAP-infl.n-gram10.2633.93NounNativeNB*n-gram42.6032.38LearnerAP-infl.+POS19.2217.28Agr.NativeNB-adapt.n-gram23.4638.57LearnerAP-infl.+POS+syntax27.9341.23FormNativeNB-adapt.n-gram18.3516.67LearnerAP-infl.+POS12.3212.02Table14:Choiceoftrainingdata:learnervs.native(Web1T).Forprepositions,LMischosenforCoNLL,andNB-adaptedforFCE.ModulesthatarepartoftheIllinoissubmis-sionaremarkedwithanasterisk.jecturethattheFCEtrainingsetismoresimilartotherespectivetestdataandthusprovidesanadvan-tageovertrainingonnativedata.Onverbagreementerrors,native-trainedmodelsperformbetterthanthosetrainedonlearnerdata,whenthesamen-gramfeaturesareused.However,whenweaddPOSandsyntacticknowledge,train-ingonlearnerdataisadvantageous.Finally,forverbformerrors,thereisanadvantagewhentrainingonalotofnativedata,althoughthedifferenceisnotassubstantialasfornounerrors.Thissuggeststhatunlikeagreementmistakesthatarebetteraddressedusingsyntax,formerrors,similarlytonouns,benefitfromtrainingonalotofdatawithn-gramfeatures.Tosummarize,choiceofthetrainingdataisanimportantconsiderationforbuildingarobustsys-tem.Researcherscomparednative-andlearner-trainedmodelsforprepositions(Hanetal.,2010;Cahilletal.,2013),whiletheanalysisinthisworkaddressesfiveerrortypes–showingthaterrorsbe-havedifferently–andevaluatesontwocorpora.136DiscussionInTable15,weshowtheresultsofthesystem,wherethebestmodulesareselectedbasedontheperformanceonthetrainingdata.WealsoshowtheIllinoismodules(withoutpost-processing).Thefol-lowingchangesaremadewithrespecttotheIllinoissubmission:theprepositionsystemisbasedonanLMandenhancedtohandlespuriousprepositioner-rors(thustheIllinoisresultof7.10shownhereis13Forstudiesthatdirectlycombinenativeandlearnerdataintraining,seeGamon(2010)andDahlmeierandNg(2011).

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
1
9
3
1
5
6
6
9
1
3

/

/
t

A
C
_
A
_
0
0
1
9
3
p
d

.

F


y
G

e
s
t

t


n
0
9
S
e
p
e


e
r
2
0
2
3

429

ErrorIllinoissubmissionThisworkModelF1ModelF1Art.AP-infl.33.50AP-infl.33.50Prep.NB-adapt.07.10LM12.09NounNB42.60NB42.60Agr.NB26.14AP-infl.27.93FormNB14.50NB-adapt.18.35All31.4331.75Table15:ResultsonCoNLLoftheIllinoissystem(with-outpost-processing)andthiswork.NBandLMmodelsaretrainedonWeb1T;APmodelsaretrainedonNUCLE.ModulesdifferentfromtheIllinoissubmissionareinbold.differentfromthe12.14inTable8);theagreementclassifieristrainedonthelearnerdatausingAPwithrichfeaturesanderrorinflation;theformclassifierisadaptedtolearnermistakes,whereastheIllinoissubmissiontrainsNBwithoutadaptation.Thekeyimprovementsareobservedwithrespecttoleastfre-quenterrors,sotheoverallimprovementissmall.Importantly,theIllinoissystemalreadytakesintoaccountthefourdimensionsanalyzedinthispaper.InCoNLL-2013,systemswerecomparedusingF1.Practicalsystems,然而,shouldbetunedforgoodprecisiontoguaranteethattheoverallqual-ityofthetextdoesnotgodown.Clearly,optimiz-ingforF1doesnotensurethatthesystemimprovesthequalityofthetext(seeAppendixB).Adiffer-entevaluationmetricbasedontheaccuracyofthedataisproposedinRozovskayaandRoth(2010乙).Forfurtherdiscussionofevaluationmetrics,seealsoWagner(2012)andChodorowetal.(2012).Itisalsoworthnotingthattheobtainedresultsunderestimatetheperformancebecausetheagree-mentonwhatconstitutesamistakecanbequitelow(Madnanietal.,2011),soprovidingalternativecor-rectionsisimportant.Therevisedannotationsad-dressthisproblem.TheIllinoissystemimprovesitsF1from31.20to42.14onrevisedannotations.However,thesenumbersarestillanunderestimationbecausetheanalysistypicallyeliminatesprecisionerrorsbutnotrecallerrors.ThisisnotspecifictoCoNLL:anerroranalysisofthefalsepositivesinCLCthatincludestheFCEshowedanincreaseinprecisionfrom33%to85%and33%to75%forprepositionandarticleerrors(Gamon,2010).Anerroranalysisofthetrainingdataalsoal-lowsustodetermineprominentgroupsofsystemerrorsandidentifyareasforpotentialimprovement,whichweoutlinebelow.CascadingNLPerrors:Intheexamplebelow,theIllinoissystemincorrectlychanges“need”to“needs”asitconsiders“victim”tobethesubjectofthatverb:“Also,notonlythekid-nappersandthevictimneedstobetrackeddown,butalsojailbreakers.”Errorsininteractinglinguis-ticstructures:TheIllinoissystemconsiderseverywordindependentlyandthuscannothandleinteract-ingphenomena.Intheexamplebelow,thearticleandthenounnumberclassifiersproposecorrectionsthatresultinanungrammaticalstructure“suchasit-uations”:“Insuchsituation,individualswilllosetheirbasicprivacy.”Thisproblemisaddressedviaglobalmodels(RozovskayaandRoth,2013)andre-sultsinanimprovementovertheIllinoissystem.Er-rorsduetolimitedcontext:TheIllinoissystemdoesnotconsidercontextbeyondsentencelevel.Intheexamplebelow,thesystemincorrectlyproposestodelete“the”butthewidercontextindicatesthatthedefinitearticleismoreappropriatehere:“Wehavetoadmitthathowtopreventtheabuseandhowtouseitreasonablydependonasoundlegalsystem,anditmeanssurveillancehasitsownrestriction.”7ConclusionWeidentifiedkeydesignprinciplesindevelopingastate-of-the-arterrorcorrectionsystem.WedidthisthroughanalysisofthetopsystemintheCoNLL-2013sharedtaskalongseveraldimensions.Thekeydimensionsthatweidentifiedandanalyzedcon-cernthechoiceofalearningalgorithm,adaptationtolearnermistakes,linguisticknowledge,andthechoiceofthetrainingdata.Weshowedthatthede-cisionsineachcasedependbothonthetypeofamistakeandthespecificsetting,e.g.howmuchan-notatedlearnerdataisavailable.Furthermore,weprovidedpointsofcomparisonwithothersystemsalongthesefourdimensions.AcknowledgmentsWethankPeterChewandtheanonymousreviewersforthefeedback.MostofthisworkwasdonewhilethefirstauthorwasattheUniversityofIllinois.Thismaterialisbasedonre-searchsponsoredbyDARPAunderagreementnumberFA8750-13-2-0008andbytheArmyResearchLaboratory(ARL)underagreementW911NF-09-2-0053.Anyopinions,findings,con-clusionsorrecommendationsarethoseoftheauthorsanddonotnecessarilyreflecttheviewoftheagencies.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
1
9
3
1
5
6
6
9
1
3

/

/
t

A
C
_
A
_
0
0
1
9
3
p
d

.

F


y
G

e
s
t

t


n
0
9
S
e
p
e


e
r
2
0
2
3

430

ReferencesG.Berend,V.Vincze,S.Zarrieß,andR.Farkas.2013.Lfg-basedfeaturesfornounnumberandarticlegram-maticalerrors.InProceedingsofCoNLL:SharedTask.T.BrantsandA.Franz.2006.Web1T5-gramVersion1.LinguisticDataConsortium.A.Cahill,N.Madnani,J.Tetreault,andD.Napolitano.2013.Robustsystemsforprepositionerrorcorrectionusingwikipediarevisions.InProceedingsofNAACL.S.ChenandJ.Goodman.1996.Anempiricalstudyofsmoothingtechniquesforlanguagemodeling.InPro-ceedingsofACL.M.Chodorow,M.Dickinson,R.Israel,andJ.Tetreault.2012.Problemsinevaluatinggrammaticalerrorde-tectionsystems.InProceedingsofCOLING.D.DahlmeierandH.T.Ng.2011.Grammaticalerrorcorrectionwithalternatingstructureoptimization.InProceedingsofACL.D.DahlmeierandH.TNg.2012.Abeam-searchdecoderforgrammaticalerrorcorrection.InProceedingsofEMNLP-CoNLL.D.Dahlmeier,H.T.Ng,andS.M.Wu.2013.Build-ingalargeannotatedcorpusoflearnerEnglish:TheNUScorpusoflearnerEnglish.InProceedingsoftheNAACLWorkshoponInnovativeUseofNLPforBuild-ingEducationalApplications.R.DaleandA.Kilgarriff.2011.HelpingOurOwn:TheHOO2011pilotsharedtask.InProceedingsofthe13thEuropeanWorkshoponNaturalLanguageGen-eration.R.Dale,I.Anisimoff,andG.Narroway.2012.Are-portontheprepositionanddeterminererrorcorrectionsharedtask.InProceedingsoftheNAACLWorkshoponInnovativeUseofNLPforBuildingEducationalApplications.Y.Even-ZoharandD.Roth.2001.Asequentialmodelformulticlassclassification.InProceedingsofEMNLP.J.FosterandØ.Andersen.2009.Generrate:Generatingerrorsforuseingrammaticalerrordetection.InPro-ceedingsoftheNAACLWorkshoponInnovativeUseofNLPforBuildingEducationalApplications.J.Foster.2007.Treebanksgonebad:Generatingatree-bankofungrammaticalenglish.InProceedingsoftheIJCAIWorkshoponAnalyticsforNoisyUnstructuresData.Y.FreundandR.E.Schapire.1996.Experimentswithanewboostingalgorithm.InProceedingsofthe13thInternationalConferenceonMachineLearning.M.Gamon.2010.Usingmostlynativedatatocorrecterrorsinlearners’writing.InProceedingsofNAACL.S.GassandL.Selinker.1992.Languagetransferinlanguagelearning.JohnBenjamins.N.Han,M.Chodorow,andC.Leacock.2006.DetectingerrorsinEnglisharticleusagebynon-nativespeakers.JournalofNaturalLanguageEngineering,12(2):115–129.N.Han,J.Tetreault,S.Lee,andJ.Ha.2010.Us-inganerror-annotatedlearnercorpustodevelopandESL/EFLerrorcorrectionsystem.InProceedingsofLREC.T.Ionin,M.L.Zubizarreta,andS.Bautista.2008.Sourcesoflinguisticknowledgeinthesecondlan-guageacquisitionofEnglisharticles.Lingua,118:554–576.E.Izumi,K.Uchimoto,T.Saiga,T.Supnithi,andH.Isa-hara.2003.AutomaticerrordetectionintheJapaneselearners’Englishspokendata.InProceedingsofACL.T.-H.Kao,Y.-W.Chang,H.-W.Chiu,T-.H.Yen,J.Bois-son,J.-C.Wu,andJ.S.Chang.2013.CoNLL-2013sharedtask:GrammaticalerrorcorrectionNTHUsys-temdescription.InProceedingsofCoNLL:SharedTask.D.KleinandC.D.Manning.2003.Fastexactinferencewithafactoredmodelfornaturallanguageparsing.InProceedingsofNIPS.C.Leacock,M.Chodorow,M.Gamon,andJ.Tetreault.2010.AutomatedGrammaticalErrorDetectionforLanguageLearners.MorganandClaypoolPublish-ers.J.LeeandS.Seneff.2008.Correctingmisuseofverbforms.InProceedingsofACL.N.Madnani,M.Chodorow,J.Tetreault,andA.Ro-zovskaya.2011.Theycanhelp:Usingcrowdsourcingtoimprovetheevaluationofgrammaticalerrordetec-tionsystems.InProceedingsofACL.M.Marneffe,B.MacCartney,andCh.Manning.2006.Generatingtypeddependencyparsesfromphrasestructureparses.InProceedingsofLREC.H.T.Ng,S.M.Wu,Y.Wu,Ch.Hadiwinoto,andJ.Tetreault.2013.TheCoNLL-2013sharedtaskongrammaticalerrorcorrection.InProceedingsofCoNLL:SharedTask.H.T.Ng,S.M.Wu,T.Briscoe,C.Hadiwinoto,R.H.Su-santo,andC.Bryant.2014.TheCoNLL-2014sharedtaskongrammaticalerrorcorrection.InProceedingsofCoNLL:SharedTask.V.PunyakanokandD.Roth.2001.Theuseofclassifiersinsequentialinference.InProceedingsofNIPS.A.Radford.1988.TransformationalGrammar.Cam-bridgeUniversityPress.N.RizzoloandD.Roth.2010.LearningBasedJavaforRapidDevelopmentofNLPSystems.InProceedingsofLREC.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
1
9
3
1
5
6
6
9
1
3

/

/
t

A
C
_
A
_
0
0
1
9
3
p
d

.

F


y
G

e
s
t

t


n
0
9
S
e
p
e


e
r
2
0
2
3

431

A.RozovskayaandD.Roth.2010a.AnnotatingESLerrors:Challengesandrewards.InProceedingsoftheNAACLWorkshoponInnovativeUseofNLPforBuild-ingEducationalApplications.A.RozovskayaandD.Roth.2010b.Trainingparadigmsforcorrectingerrorsingrammarandusage.InPro-ceedingsofNAACL.A.RozovskayaandD.Roth.2011.Algorithmselec-tionandmodeladaptationforESLcorrectiontasks.InProceedingsofACL.A.RozovskayaandD.Roth.2013.Jointlearningandin-ferenceforgrammaticalerrorcorrection.InProceed-ingsofEMNLP.A.Rozovskaya,M.Sammons,J.Gioja,andD.Roth.2011.UniversityofIllinoissysteminHOOtextcor-rectionsharedtask.InProceedingsoftheEuropeanWorkshoponNaturalLanguageGeneration(ENLG).A.Rozovskaya,M.Sammons,andD.Roth.2012.TheUIsystemintheHOO2012sharedtaskonerrorcor-rection.InProceedingsoftheNAACLWorkshoponInnovativeUseofNLPforBuildingEducationalAp-plications.A.Rozovskaya,K.-W.Chang,M.Sammons,andD.Roth.2013.TheUniversityofIllinoissystemintheCoNLL-2013sharedtask.InProceedingsofCoNLLSharedTask.A.Rozovskaya,K.-W.Chang,M.Sammons,D.Roth,andN.Habash.2014a.TheUniversityofIllinoisandColumbiasystemintheCoNLL-2014sharedtask.InProceedingsofCoNLLSharedTask.A.Rozovskaya,D.Roth,andV.Srikumar.2014b.Cor-rectinggrammaticalverberrors.InProceedingsofEACL.A.Stolcke.2002.Srilm-anextensiblelanguagemodel-ingtoolkit.InProceedingsofInternationalConfer-enceonSpokenLanguageProcessing.J.Tetreault,J.Foster,andM.Chodorow.2010.Usingparsefeaturesforprepositionselectionanderrorde-tection.InProceedingsofACL.J.Wagner.2012.DetectingGrammaticalErrorswithTreebank-Induced,ProbabilisticParsers.Ph.D.the-sis.Y.Xiang,B.Yuan,Y.Zhang,X.Wang,W.Zheng,andC.Wei.2013.Ahybridmodelforgrammaticalerrorcorrection.InProceedingsofCoNLL:SharedTask.J.Xing,L.Wang,D.F.Wong,L.S.Chao,andX.Zeng.2013.UM-Checker:AhybridsystemforEnglishgrammaticalerrorcorrection.InProceedingsofCoNLL:SharedTask.H.Yannakoudakis,T.Briscoe,andB.Medlock.2011.AnewdatasetandmethodforautomaticallygradingESOLtexts.InProceedingsofACL.I.Yoshimoto,T.Kose,K.Mitsuzawa,K.Sakaguchi,T.Mizumoto,Y.Hayashibe,M.Komachi,andY.Mat-sumoto.2013.NAISTat2013CoNLLgrammat-icalerrorcorrectionsharedtask.InProceedingsofCoNLL:SharedTask.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
1
9
3
1
5
6
6
9
1
3

/

/
t

A
C
_
A
_
0
0
1
9
3
p
d

.

F


y
G

e
s
t

t


n
0
9
S
e
p
e


e
r
2
0
2
3

432

AppendixAFeaturesandAdditionalInformationabouttheDataClassifierArt.Prep.NounAgr.FormTrain43K20K39K22K37KTest43K20K39K22K37KTableA.16:Numberofcandidatewordsbyclassifiertypeintrainingandtestdata(FCE).ErrorNumberoferrorsanderrorrateTrainTestArt.2336(5.4%)2290(5.3%)Prep.1263(6.4%)1205(6.1%)Noun858(2.2%)805(2.0%)Verbagr.319(1.5%)330(1.4%)Verbform104(0.3%)127(0.3%)TableA.17:StatisticsonannotatederrorsintheFCEcor-pus.Percentagedenotestheerrorrates,i.e.thenumberofer-roneousinstanceswithrespecttothetotalnumberofrelevantinstancesinthedata.FeaturesDescription(1)subjHead,subjPOSthesurfaceformandthePOStagofthesubjecthead(2)subjDetdeterminerofthesubjectNP(3)subjDistancedistancebetweentheverbandthesubjecthead(4)subjNumberSing–singularpro-nounsandnouns;Pl–pluralpronounsandnouns(5)subjPerson3rdSing–“she”,“he”,“it”,singularnouns;Not3rdSing–“we”,“you”,“they”,pluralnouns;1stSing–“I”(6)conjunctions(1)&(3);(4)&(5)TableA.18:Verbagreementfeaturesthatusesyntacticknowledge.AppendixBEvaluationMetricsHere,wediscusstheCoNLL-2013sharedtaskeval-uationmetricandprovidealittlebitmoredetailon 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50PRECISIONRECALLArticlePrepNounAgreementFormFigure2:Precision/Recallcurvesbyerrortype.theperformanceoftheIllinoismodulesinthiscon-text.AsshowninTable1inSec.2,over90%ofwords(about98%intraining)areusedcorrectly.Thelowerrorratesarethekeyreasontheerrorcor-rectiontaskissodifficult:itisquitechallengingforasystemtoimproveoverawriterthatalreadyper-formsatthelevelofover90%.Indeed,veryfewNLPtasksalreadyhavesystemsthatperformatthatlevel.Theerrorsparsitymakesitverychallengingtoidentifymistakesaccurately.Infact,thehighestprecisionof46.45%,ascalculatedbythesharedtaskevaluationmetric,isachievedbytheIllinoissystem.However,oncetheprecisiondropsbelow50%,thesystemintroducesmoremistakesthanitidentifies.Wecanlookatindividualmodulesandseewhetherforanytypeofmistakethesystemimprovesthequalityofthetext.Fig.2showsPreci-sion/RecallcurvesforthesysteminTable15.Itisinterestingtonotethatperformancevarieswidelybyerrortype.Theeasiestarenounandarticleusageerrors:fornouns,wecandoprettywellattherecallpoint20%(withthecorrespondingprecisionofover60%);forarticles,theprecisionisaround50%attherecallvalueof20%.Foragreementerrors,wecangetaprecisionof55%withaveryhighthreshold(identifyingonly5%ofmistakes).Fi-nally,ontwomistakes–prepositionandverbform–thesystemneverachievesaprecisionover50%.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
1
9
3
1
5
6
6
9
1
3

/

/
t

A
C
_
A
_
0
0
1
9
3
p
d

.

F


y
G

e
s
t

t


n
0
9
S
e
p
e


e
r
2
0
2
3

433

FeaturetypeFeaturegroupFeaturesWordn-gramwB,w2B,w3B,wA,w2A,w3A,wBwA,w2BwB,wAw2A,w3Bw2BwB,w2BwBwA,wBwAw2A,wAw2Aw3A,w4Bw3Bw2BwB,w3Bw2BwBwA,w2BwBwAw2A,wBwAw2Aw3A,wAw2Aw3w4APOSpB,p2B,p3B,pA,p2A,p3A,pBpA,p2BpB,pAp2A,pBwB,pAwA,p2Bw2B,p2Aw2A,p2BpBpA,pBpAp2A,pAp2Ap3AChunkNP1headWord,npWords,NC,adj&headWord,adjTag&headWord,adj&NC,adjTag&NC,npTags&headWord,npTags&NCNP2headWord&headPOS,headNumberwordsAfterNPheadWord&wordAfterNP,npWords&wordAfterNP,headWord&2wordsAfterNP,npWords&2wordsAfterNP,headWord&3wordsAfterNP,npWords&3wordsAfterNPwordBeforeNPwB&fi∀i∈NP1Verbverb,动词&fi∀i∈NP1Prepositionprep&fi∀i∈NP1TableA.19:Featuresusedinthearticleerrorcorrectionsystem.wBandwAdenotethewordimmediatelybeforeandafterthetarget,分别;andpBandpAdenotethePOStagbeforeandafterthetarget.headWorddenotestheheadoftheNPcomplement.NCstandsfornouncompoundandisactiveifsecondtolastwordintheNPistaggedasanoun.VerbfeaturesareactiveiftheNPisthedirectobjectofaverb.PrepositionfeaturesareactiveiftheNPisimmediatelyprecededbyapreposition.Adjfeatureisactiveifthefirstword(orthesecondwordprecededbyanadverb)intheNPisanadjective.NpWordsandnpTagsdenoteallwords(POStags)intheNP.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
1
9
3
1
5
6
6
9
1
3

/

/
t

A
C
_
A
_
0
0
1
9
3
p
d

.

F


y
G

e
s
t

t


n
0
9
S
e
p
e


e
r
2
0
2
3

434
下载pdf