Transacciones de la Asociación de Lingüística Computacional, 2 (2014) 169–180. Editor de acciones: Eric Fosler-Lussier.
Submitted 11/2013; Revised 2/2014; Publicado 4/2014. C
(cid:13)
2014 Asociación de Lingüística Computacional.
SegmentationforEfficientSupervisedLanguageAnnotationwithanExplicitCost-UtilityTradeoffMatthiasSperber1,MirjamSimantzik2,GrahamNeubig3,SatoshiNakamura3,AlexWaibel11KarlsruheInstituteofTechnology,InstituteforAnthropomatics,Germany2MobileTechnologiesGmbH,Germany3NaraInstituteofScienceandTechnology,AHCLaboratory,Japanmatthias.sperber@kit.edu,mirjam.simantzik@jibbigo.com,neubig@is.naist.jps-nakamura@is.naist.jp,waibel@kit.eduAbstractInthispaper,westudytheproblemofmanu-allycorrectingautomaticannotationsofnatu-rallanguageinasefficientamanneraspos-sible.Weintroduceamethodforautomati-callysegmentingacorpusintochunkssuchthatmanyuncertainlabelsaregroupedintothesamechunk,whilehumansupervisioncanbeomittedaltogetherforothersegments.Atradeoffmustbefoundforsegmentsizes.Choosingshortsegmentsallowsustoreducethenumberofhighlyconfidentlabelsthataresupervisedbytheannotator,whichisusefulbecausetheselabelsareoftenalreadycorrectandsupervisingcorrectlabelsisawasteofeffort.Incontrast,longsegmentsreducethecognitiveeffortduetocontextswitches.Ourmethodhelpsfindthesegmentationthatopti-mizessupervisionefficiencybydefiningusermodelstopredictthecostandutilityofsu-pervisingeachsegmentandsolvingacon-strainedoptimizationproblembalancingthesecontradictoryobjectives.Auserstudydemon-stratesnoticeablegainsoverpre-segmented,confidence-orderedbaselinesontwonaturallanguageprocessingtasks:speechtranscrip-tionandwordsegmentation.1IntroductionManynaturallanguageprocessing(NLP)tasksre-quirehumansupervisiontobeusefulinpractice,beittocollectsuitabletrainingmaterialortomeetsomedesiredoutputquality.Giventhehighcostofhumanintervention,howtominimizethesupervi-sioneffortisanimportantresearchproblem.Previ-ousworksinareassuchasactivelearning,postedit-(a)Itwasabrightcold(ellos)en(apron),y(a)clockswerestrikingthirteen.(b)Itwasabrightcold(ellos)en(apron),y(a)clockswerestrikingthirteen.(C)Itwasabrightcold(ellos)en(apron),y(a)clockswerestrikingthirteen.Figure1:Threeautomatictranscriptsofthesentence“ItwasabrightcolddayinApril,andtheclockswerestrik-ingthirteen”,withrecognitionerrorsinparentheses.Theunderlinedpartsaretobecorrectedbyahumanfor(a)oraciones,(b)palabras,o(C)theproposedsegmentation.ing,andinteractivepatternrecognitionhaveinves-tigatedthisquestionwithnotablesuccess(Settles,2008;Specia,2011;Gonz´alez-Rubioetal.,2010).Themostcommonframeworkforefficientanno-tationintheNLPcontextconsistsoftraininganNLPsystemonasmallamountofbaselinedata,andthenrunningthesystemonunannotateddatatoestimateconfidencescoresofthesystem’spredictions(Set-tles,2008).Sentenceswiththelowestconfidencearethenusedasthedatatobeannotated(Figure1(a)).Sin embargo,ithasbeennotedthatwhentheNLPsysteminquestionalreadyhasrelativelyhighaccu-racy,annotatingentiresentencescanbewasteful,asmostwordswillalreadybecorrect(TomanekandHahn,2009;Neubigetal.,2011).Inthesecases,itispossibletoachievemuchhigherbenefitperanno-tatedwordbyannotatingsub-sententialunits(Fig-ure1(b)).Sin embargo,asSettlesetal.(2008)pointout,sim-plymaximizingthebenefitperannotatedinstanceisnotenough,astherealsupervisioneffortvaries
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2
/
/
t
yo
a
C
_
a
_
0
0
1
7
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
170
1357911131517190246Segment lengthAvg. tiempo / instancia [segundo] Transcription taskWord segmentation taskFigure2:Averageannotationtimeperinstance,plottedoverdifferentsegmentlengths.Forbothtasks,theeffortclearlyincreasesforshortsegments.greatlyacrossinstances.Thisisparticularlyimpor-tantinthecontextofchoosingsegmentstoannotate,ashumanannotatorsheavilyrelyonsemanticsandcontextinformationtoprocesslanguage,andintu-itively,aconsecutivesequenceofwordscanbesu-pervisedfasterandmoreaccuratelythanthesamenumberofwordsspreadoutoverseverallocationsinatext.Thisintuitioncanalsobeseeninourempiri-caldatainFigure2,whichshowsthatforthespeechtranscriptionandwordsegmentationtasksdescribedlaterinSection5,shortsegmentshadalongeranno-tationtimeperword.Basedonthisfact,weargueitwouldbedesirabletopresenttheannotatorwithasegmentationofthedataintoeasilysupervisablechunksthatarebothlargeenoughtoreducethenum-berofcontextswitches,andsmallenoughtopreventunnecessaryannotation(Figure1(C)).Inthispaper,weintroduceanewstrategyfornat-urallanguagesupervisiontasksthatattemptstoop-timizesupervisionefficiencybychoosinganappro-priatesegmentation.Itreliesonausermodelthat,givenaspecificsegment,predictsthecostandtheutilityofsupervisingthatsegment.Giventhisusermodel,thegoalistofindasegmentationthatmini-mizesthetotalpredictedcostwhilemaximizingtheutility.Webalancethesetwocriteriabydefiningaconstrainedoptimizationprobleminwhichonecri-terionistheoptimizationobjective,whiletheothercriterionisusedasaconstraint.Doingsoallowsspecifyingpracticaloptimizationgoalssuchas“re-moveasmanyerrorsaspossiblegivenalimitedtimebudget,”or“annotatedatatoobtainsomerequiredclassifieraccuracyinaslittletimeaspossible.”Solvingthisoptimizationtaskiscomputationallydifficult,anNP-hardproblem.Nevertheless,wedemonstratethatbymakingrealisticassumptionsaboutthesegmentlength,anoptimalsolutioncanbefoundusinganintegerlinearprogrammingfor-mulationformid-sizedcorpora,asarecommonforsupervisedannotationtasks.Forlargercorpora,weprovidesimpleheuristicstoobtainanapproximatesolutioninareasonableamountoftime.Experimentsovertwoexamplescenariosdemon-stratetheusefulnessofourmethod:Posteditingforspeechtranscription,andactivelearningforJapanesewordsegmentation.Ourmodelpredictsnoticeableefficiencygains,whichareconfirmedinexperimentswithhumanannotators.2ProblemDefinitionThegoalofourmethodistofindasegmentationoveracorpusofwordtokenswN1thatoptimizessupervisionefficiencyaccordingtosomepredictiveusermodel.Theusermodelisdenotedasasetoffunctionsul,k(wba)thatevaluateanypossiblesub-sequencewbaoftokensinthecorpusaccordingtocriterial2L,andsupervisionmodesk2K.Letusillustratethiswithanexample.Sperberetal.(2013)definedaframeworkforspeechtranscrip-tioninwhichaninitial,erroneoustranscriptiscre-atedusingautomaticspeechrecognition(ASR),andanannotatorcorrectsthetranscripteitherbycorrect-ingthewordsbykeyboard,byrespeakingthecon-tent,orbyleavingthewordsasis.Inthiscase,wecoulddefineK={TYPE,RESPEAK,SKIP},eachconstantrepresentingoneofthesethreesupervisionmodes.Ourmethodwillautomaticallydeterminetheappropriatesupervisionmodeforeachsegment.Theusermodelinthisexamplemightevaluateev-erysegmentaccordingtotwocriteriaL,acostcrite-rion(intermsofsupervisiontime)andautilitycri-terion(intermsofnumberofremovederrors),whenusingeachmode.Intuitively,respeakingshouldbeassignedbothlowercost(becausespeakingisfasterthantyping),butalsolowerutilitythantypingonakeyboard(becauserespeakingrecognitionerrorscanoccur).TheSKIPmodedenotesthespecial,unsuper-visedmodethatalwaysreturns0costand0utility.Otherpossiblesupervisionmodesincludemul-tipleinputmodalities(Suhmetal.,2001),severalhumanannotatorswithdifferentexpertiseandcost
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2
/
/
t
yo
a
C
_
a
_
0
0
1
7
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
171
(DonmezandCarbonell,2008),andcorrectionvs.translationfromscratchinmachinetranslation(Spe-cia,2011).Similarmente,costcouldinsteadbeex-pressedinmonetaryterms,ortheutilityfunctioncouldpredicttheimprovementofaclassifierwhentheresultingannotationisnotintendedfordirecthu-manconsumption,butastrainingdataforaclassifierinanactivelearningframework.3OptimizationFrameworkGiventhissetting,weareinterestedinsimulta-neouslyfindingoptimallocationsandsupervisionmodesforallsegments,accordingtothegivencri-teria.Eachresultingsegmentwillbeassignedex-actlyoneofthesesupervisionmodes.Wede-noteasegmentationoftheNtokensofcorpuswN1intoMNsegmentsbyspecifyingsegmentbound-arymarkerssM+11=(s1=1,s2,…,sM+1=N+1).Settingaboundarymarkersi=ameansthatweputasegmentboundarybeforethea-thwordto-ken(ortheend-of-corpusmarkerfora=N+1).Thusourcorpusissegmentedintotokensequences[(wsj,…,wsj+11)]Mj=1.Thesupervisionmodesassignedtoeachsegmentaredenotedbymj.Wefavorthosesegmentationsthatminimizethecumu-lativevaluePMj=1[ul,mj(wsj+1sj)]foreachcriterionl.Foranycriterionwherelargervaluesareintuitivelybetter,weflipthesignbeforedefiningul,mj(wsj+1sj)tomaintainconsistency(e.g.negativenumberofer-rorsremoved).3.1MultipleCriteriaOptimizationInthecaseofasinglecriterion(|l|=1),weobtainasimple,single-objectiveunconstrainedlinearopti-mizationproblem,efficientlysolvableviadynamicprogramming(TerziandTsaparas,2006).Sin embargo,inpracticeoneusuallyencountersseveralcompet-ingcriteria,suchascostandutility,andherewewillfocusonthismorerealisticsetting.Webalancecompetingcriteriabyusingoneasanoptimizationobjective,andtheothersasconstraints.1Letcrite-1Thisapproachisknownastheboundedobjectivefunctionmethodinmulti-objectiveoptimizationliterature(MarlerandArora,2004).Theverypopularweightedsummethodmergescriteriaintoasingleefficiencymeasure,butisproblematicinourcasebecausethenumberofsupervisedtokensisunspec-ified.Unlesstheweightsarecarefullychosen,thealgorithmmightfind,e.g.,thecompletelyunsupervisedorcompletelysu-(en)%(what’s)%a%bright%…%[RESPEAK:1.5/2]/[SKIP:0/0]/1/cold%2/3/4/5/6/[TYPE:2/5]/[TYPE:1/4]/[TYPE:1/4]/[RESPEAK:0/3]/[SKIP:0/0]/Figure3:Excerptofasegmentationgraphforanex-ampletranscriptiontasksimilartoFigure1(someedgesareomittedforreadability).Edgesarelabeledwiththeirmode,predictednumberoferrorsthatcanberemoved,andnecessarysupervisiontime.Asegmentationschememightprefersolidedgesoverdashedonesinthisexam-ple.rionl0betheoptimizationobjectivecriterion,andletCldenotetheconstrainingconstantsforthecri-terial2Ll0=L\{l0}.Westatetheoptimizationproblem:minM;sM+11;mM1MXj=1⇥ul0,mjwsj+1sj⇤s.t.MXj=1⇥ul,mjwsj+1sj⇤Cl(8l2Ll0)Thisconstrainedoptimizationproblemisdifficulttosolve.Infact,theNP-hardmultiple-choiceknap-sackproblem(Pisinger,1994)correspondstoaspe-cialcaseofourprobleminwhichthenumberofseg-mentsisequaltothenumberoftokens,implyingthatourmoregeneralproblemisNP-hardaswell.Inordertoovercomethisproblem,werefor-mulatesearchfortheoptimalsegmentationasaresource-constrainedshortestpathprobleminadi-rected,acyclicmultigraph.Whilestillnotefficientlysolvableintheory,thisproblemiswellstudiedindomainssuchasvehicleroutingandcrewschedul-ing(IrnichandDesaulniers,2005),anditisknownthatinmanypracticalsituationstheproblemcanbesolvedreasonablyefficientlyusingintegerlinearprogrammingrelaxations(TothandVigo,2001).Inourformalism,thesetofnodesVrepresentsthespacesbetweenneighboringtokens,atwhichthealgorithmmayinsertsegmentboundaries.Anodewithindexirepresentsasegmentbreakbeforethei-thtoken,andthusthesequenceoftheindicesinapathdirectlycorrespondstosM+11.EdgesEde-notethegroupingoftokensbetweentherespectivepervisedsegmentationtobemost“efficient.”
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2
/
/
t
yo
a
C
_
a
_
0
0
1
7
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
172
nodesintoonesegment.Edgesarealwaysdirectedfromlefttoright,andlabeledwithasupervisionmode.Inaddition,eachedgebetweennodesiandjisassignedul,k(wj1i),thecorrespondingpredictedvalueforeachcriterionl2Landsupervisionmodek2K,indicatingthatthesupervisionmodeofthej-thsegmentinapathdirectlycorrespondstomj.Figure3showsanexampleofwhattheresult-inggraphmaylooklike.Ouroriginaloptimizationproblemisnowequivalenttofindingtheshortestpathbetweenthefirstandlastnodesaccordingtocriterionl0,whileobeyingthegivenresourcecon-straints.Accordingtoawidelyusedformulationfortheresourceconstrainedshortestpathproblem,wecandefineEijasthesetofcompetingedgesbetweeniandj,andexpressthisoptimizationproblemwiththefollowingintegerlinearprogram(ILP):minxXi,j2VXk2Eijxijkul0,k(sj1i)(1)s.t.Xi,j2VXk2Eijxijkul,k(sj1i)Cl(8l2Ll0)(2)Xi2Vk2Eijxijk=Xi2Vk2Eijxjik(8j2V\{1,norte})(3)Xj2Vk2E1jx1jk=1(4)Xi2Vk2Einxink=1(5)xijk2{0,1}(8xijk2x)(6)Thevariablesx={xijk|i,j2V,k2Eij}denotetheactivationofthek’thedgebetweennodesiandj.Theshortestpathaccordingtotheminimizationobjective(1),thatstillmeetstheresourceconstraintsforthespecifiedcriteria(2),istobecomputed.Thedegreeconstraints(3,4,5)specifythatallbutthefirstandlastnodesmusthaveasmanyincomingasout-goingedges,whilethefirstnodemusthaveexactlyoneoutgoing,andthelastnodeexactlyoneincom-ingedge.Finally,theintegralitycondition(6)forcesalledgestobeeitherfullyactivatedorfullydeacti-vated.Theoutlinedproblemformulationcansolveddirectlybyusingoff-the-shelfILPsolvers,hereweemployGUROBI(GurobiOptimization,2012).3.2HeuristicsforApproximationIngeneral,edgesareinsertedforeverysupervisionmodebetweeneverycombinationoftwonodes.Thesearchspacecanbeconstrainedbyremovingsomeoftheseedgestoincreaseefficiency.Inthisstudy,weonlyconsideredgesspanningatmost20tokens.Forcasesinwhichlargercorporaaretobeanno-tated,orwhentheacceptabledelayfordeliveringre-sultsissmall,asuitablesegmentationcanbefoundapproximately.Theeasiestwaywouldbetoparti-tionthecorpus,e.g.accordingtoitsindividualdoc-uments,dividethebudgetconstraintsevenlyacrossallpartitions,andthensegmenteachpartitioninde-pendently.Moresophisticatedmethodsmightap-proximatetheParetofrontforeachpartition,anddistributethebudgetsinanintelligentway.4UserModelingWhiletheproposedframeworkisabletooptimizethesegmentationwithrespecttoeachcriterion,italsorestsupontheassumptionthatwecanprovideusermodelsul,k(wj1i)thataccuratelyevaluateev-erysegmentaccordingtothespecifiedcriteriaandsupervisionmodes.Inthissection,wediscussourstrategiesforestimatingthreeconceivablecriteria:annotationcost,correctionoferrors,andimprove-mentofaclassifier.4.1AnnotationCostModelingModelingcostrequiressolvingaregressionprob-lemfromfeaturesofacandidatesegmenttoannota-tioncost,forexampleintermsofsupervisiontime.Appropriateinputfeaturesdependonthetask,butshouldincludenotionsofcomplexity(e.g.aconfi-dencemeasure)andlengthofthesegment,asbothareexpectedtostronglyinfluencesupervisiontime.WeproposeusingGaussianprocess(médico de cabecera)regres-sionforcostprediction,astart-of-the-artnonpara-metricBayesianregressiontechnique(RasmussenandWilliams,2006)2.AsreportedonasimilartaskbyCohnandSpecia(2013),andconfirmedbyourpreliminaryexperiments,GPregressionsignifi-cantlyoutperformspopulartechniquessuchassup-2Codeavailableathttp://www.gaussianprocess.org/gpml/
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2
/
/
t
yo
a
C
_
a
_
0
0
1
7
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
173
portvectorregressionandleast-squareslinearre-gression.WealsofollowtheirsettingsforGP,em-ployingGPregressionwithasquaredexponentialkernelwithautomaticrelevancedetermination.De-pendingonthenumberofusersandamountoftrain-ingdataavailableforeachuser,modelsmaybetrainedseparatelyforeachuser(aswedohere),orinacombinedfashionviamulti-tasklearningaspro-posedbyCohnandSpecia(2013).Itisalsocrucialforthepredictionstobereliablethroughoutthewholerelevantspaceofsegments.Ifthecostofcertaintypesofsegmentsissystem-aticallyunderpredicted,thesegmentationalgorithmmightbemisledtopreferthese,possiblyalargenumberoftimes.3Aneffectivetricktopreventsuchunderpredictionsistopredictthelogtimeinsteadoftheactualtime.Inthisway,errorsinthecriticallowendarepenalizedmorestrongly,andthetimecanneverbecomenegative.4.2ErrorCorrectionModelingAsoneutilitymeasure,wecanusethenumberoferrorscorrected,ausefulmeasureforposteditingtasksoverautomaticallyproducedannotations.Inordertomeasurehowmanyerrorscanberemovedbysupervisingaparticularsegment,wemustes-timatebothhowmanyerrorsareintheautomaticannotation,andhowreliablyahumancanremovetheseforagivensupervisionmode.Mostmachinelearningtechniquescanestimateconfidencescoresintheformofposteriorprobabil-ities.Toestimatethenumberoferrors,wecansumoveroneminustheposteriorforalltokens,whichestimatestheHammingdistancefromthereferenceannotation.Thismeasureisappropriatefortasksinwhichthenumberoftokensisfixedinadvance(e.g.apart-of-speechestimationtask),andareasonableapproximationfortasksinwhichthenumberofto-kensisnotknowninadvance(e.g.speechtranscrip-tion,cf.Section5.1.1).Predictingtheparticulartokensatwhichahumanwillmakeamistakeisknowntobeadifficulttask(OlsonandOlson,1990),butasimplifyingconstant3Forinstance,consideramodelthatpredictswellforseg-mentsofmediumsizeorlonger,butunderpredictsthesupervi-siontimeofsingle-tokensegments.Thismayleadthesegmen-tationalgorithmtoputeverytokenintoitsownsegment,whichisclearlyundesirable.humanerrorratecanstillbeuseful.Forexample,inthetaskfromSection2,wemaysuspectacertainnumberoferrorsinatranscriptsegment,andpredict,decir,95%ofthoseerrorstoberemovedviatyping,butonly85%viarespeaking.4.3ClassifierImprovementModelingAnotherreasonableutilitymeasureisaccuracyofaclassifiertrainedonthedatawechoosetoannotateinanactivelearningframework.Confidencescoreshavebeenfoundusefulforrankingparticulartokenswithregardstohowmuchtheywillimproveaclas-sifier(Settles,2008).Aquí,wemaysimilarlyscoresegmentutilityasthesumofitstokenconfidences,althoughcaremustbetakentonormalizeandcali-bratethetokenconfidencestobelinearlycompara-blebeforedoingso.Whiletheresultingutilityscorehasnointerpretationinabsoluteterms,itcanstillbeusedasanoptimizationobjective(cf.Section5.2.1).5ExperimentsInthissection,wepresentexperimentalresultsex-aminingtheeffectivenessoftheproposedmethodovertwotasks:speechtranscriptionandJapanesewordsegmentation.45.1SpeechTranscriptionExperimentsAccuratespeechtranscriptsareamuch-demandedNLPproduct,usefulbythemselves,astrainingma-terialforASR,orasinputforfollow-uptaskslikespeechtranslation.Withrecognitionaccuraciesplateauing,manuallycorrecting(postediting)auto-maticspeechtranscriptshasbecomepopular.Com-monapproachesaretoidentifywords(Sanchez-Cortinaetal.,2012)o(sub-)oraciones(Sperberetal.,2013)oflowconfidence,andhaveahumanedi-torcorrectthese.5.1.1ExperimentalSetupWeconductauserstudyinwhichparticipantspost-editedspeechtranscripts,givenafixedgoalworderrorrate.ThetranscriptionsetupwassuchthatthetranscribercouldseetheASRtranscriptofpartsbeforeandafterthesegmentthathewasedit-ing,providingcontextifneeded.Whenimprecisetimealignmentresultedinsegmentbreaksthatwere4Softwareandexperimentaldatacanbedownloadedfromhttp://www.msperber.com/research/tacl-segmentation/
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2
/
/
t
yo
a
C
_
a
_
0
0
1
7
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
174
slightly“off,”ashappenedoccasionally,thatcontexthelpedguesswhatwassaid.Thesegmentitselfwastranscribedfromscratch,asopposedtoeditingtheASRtranscript;besidesbeingarguablymoreeffi-cientwhentheASRtranscriptcontainsmanymis-takes(Nanjoetal.,2006;Akitaetal.,2009),prelim-inaryexperimentsalsoshowedthatsupervisiontimeisfareasiertopredictthisway.Figure4illustrateswhatthesetuplookedlike.Weusedaself-developedtranscriptiontooltoconductexperiments.Itpresentsourcomputedseg-mentsonebyone,allowsconvenientinputandplay-backviakeyboardshortcuts,andlogsuserinterac-tionswiththeirtimestamps.AselectionofTEDtalks5(Englishtalksontechnology,entretenimiento,anddesign)servedasexperimentaldata.Whilesomeofthesetalkscontainjargonsuchasmedi-calterms,theyarepresentedbyskilledspeakers,makingthemcomparablyeasytounderstand.InitialtranscriptswerecreatedusingtheJanusrecognitiontoolkit(Soltauetal.,2001)withastandard,TED-optimizedsetup.Weusedconfusionnetworksfordecodingandobtainingconfidencescores.Forreasonsofsimplicity,andbettercompara-bilitytoourbaseline,werestrictedourexperimenttotwosupervisionmodes:TYPEandSKIP.Weconductedexperimentswith3participants,1withseveralyearsofexperienceintranscription,2withnone.Eachparticipantreceivedanexplanationonthetranscriptionguidelines,andashorthands-ontrainingtolearntouseourtool.Next,theytran-scribedabalancedselectionof200segmentsofvaryinglengthandqualityinrandomorder.Thisdatawasusedtotraintheusermodels.Finally,eachparticipanttranscribedanother2TEDtalks,withworderrorrate(WER)19.96%(predicted:22.33%).Wesetatarget(predicted)WERof15%asouroptimizationconstraint,6andminimizethepredictedsupervisiontimeasourob-jectivefunction.BothTEDtalksweretranscribedonceusingthebaselinestrategy,andonceusingtheproposedstrategy.Theorderofbothstrategieswasreversedbetweentalks,tominimizelearningbiasduetotranscribingeachtalktwice.Thebaselinestrategywasadoptedaccordingto5www.ted.com6Dependingonthelevelofaccuracyrequiredbyourfinalapplication,thistargetmaybesetlowerorhigher.Sperberetal.(2013):Wesegmentedthetalkintonatural,subsententialunits,usingMatusovetal.(2006)’ssegmenter,whichwetunedtoreproducetheTEDsubtitlesegmentation,producingameansegmentlengthof8.6words.Segmentswereaddedinorderofincreasingaveragewordconfidence,untiltheusermodelpredictedaWER<15%.Thesecondsegmentationstrategywastheproposedmethod,similarlywitharesourceconstraintofWER<15%.SupervisiontimewaspredictedviaGPregres-sion(cf.Section4.1),usingsegmentlength,au-dioduration,andmeanconfidenceasinputfeatures.Theoutputvariablewasassumedsubjecttoaddi-tiveGaussiannoisewithzeromean,avarianceof5secondswaschosenempiricallytominimizethemeansquarederror.Utilityprediction(cf.Section4.2)wasbasedonposteriorscoresobtainedfromtheconfusionnetworks.Wefounditimportanttocalibratethem,astheposteriorswereoverconfidentespeciallyintheupperrange.Todoso,weautomat-icallytranscribedadevelopmentsetofTEDdata,groupedtherecognizedwordsintobucketsaccord-ingtotheirposteriors,anddeterminedtheaveragenumberoferrorsperwordineachbucketfromanalignmentwiththereferencetranscript.Themap-pingfromaverageposteriortoaveragenumberoferrorswasestimatedviaGPregression.Theresultwassummedoveralltokens,andmultipliedbyaconstanthumanconfidence,separatelydeterminedforeachparticipant.75.1.2SimulationResultsToconveyabetterunderstandingofthepoten-tialgainsaffordedbyourmethod,wefirstpresentasimulatedexperiment.Weassumeatranscriberwhomakesnomistakes,andneedsexactlytheamountoftimepredictedbyausermodeltrainedonthedataofarandomlyselectedparticipant.Wecomparethreescenarios:Abaselinesimulation,inwhichthebase-linesegmentsaretranscribedinascendingorderofconfidence;asimulationusingtheproposedmethod,inwhichwechangetheWERconstraintinsmallin-crements;finally,anoraclesimulation,whichuses7MoreelaboratemethodsforWERestimationexist,suchasbyOgawaetal.(2013),butifourmethodachievesimprove-mentsusingsimpleHammingdistance,incorporatingmoreso-phisticatedmeasureswilllikelyachievesimilar,orevenbetteraccuracy.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
-
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2
/
/
t
yo
a
C
_
a
_
0
0
1
7
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
175
(3)SKIP:“nineteenfortysixuntiltodayyouseethegreen”(4)TYPE:
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2
/
/
t
yo
a
C
_
a
_
0
0
1
7
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
176
5.2.1ExperimentalSetupNeubigetal.(2011)haveproposedapointwisemethodforJapanesewordsegmentationthatcanbetrainedusingpartiallyannotatedsentences,whichmakesitattractiveincombinationwithactivelearn-ing,aswellasoursegmentationmethod.Theauthorsreleasedtheirmethodasasoftwarepack-age“KyTea”thatweemployedinthisuserstudy.WeusedKyTea’sactivelearningdomainadaptationtoolkit8asabaseline.Fordata,weusedtheBalancedCorpusofCon-temporaryWrittenJapanese(BCCWJ),createdbyMaekawa(2008),withtheinternetQ&Asubcor-pusasin-domaindata,andthewhitepapersubcor-pusasbackgrounddata,adomainadaptationsce-nario.Sentencesweredrawnfromthein-domaincorpus,andthemanuallyannotateddatawasthenusedtotrainKyTea,alongwiththepre-annotatedbackgrounddata.Thegoal(objectivefunction)wastoimproveKyTea’sclassificationaccuracyonanin-domaintestset,givenaconstrainedtimebudgetof30minutes.Therewereagain2supervisionmodes:ANNOTATEandSKIP.Notethatthisisessentiallyabatchactivelearningsetupwithonlyoneiteration.WeconductedexperimentswithoneexpertwithseveralyearsofexperiencewithJapanesewordseg-mentationannotation,andthreenon-expertnativespeakerswithnopriorexperience.Japanesewordsegmentationisnotatrivialtask,soweprovidednon-expertswithtraining,includingexplanationofthesegmentationstandard,asupervisedtestwithimmediatefeedbackandexplanations,andhands-ontrainingtogetusedtotheannotationsoftware.SupervisiontimewaspredictedviaGPregression(cf.Section4.1),usingthesegmentlengthandmeanconfidenceasinputfeatures.Asbefore,theoutputvariablewasassumedsubjecttoadditiveGaussiannoisewithzeromeanand5secondsvariance.Toob-taintrainingdataforthesemodels,eachparticipantannotatedabout500exampleinstances,drawnfromtheadaptationcorpus,groupedintosegmentsandbalancedregardingsegmentlengthanddifficulty.Forutilitymodeling(cf.Section4.3),wefirstnor-malizedKyTea’sconfidencescores,whicharegivenintermsofSVMmargin,usingasigmoidfunction(Platón,1999).Thenormalizationparameterwasse-8http://www.phontron.com/kytea/active.htmllectedsothatthemeanconfidenceonadevelopmentsetcorrespondedtotheactualclassifieraccuracy.Wederiveourmeasureofclassifierimprovementforcorrectingasegmentbysummingoveroneminusthecalibratedconfidenceforeachofitstokens.Toanalyzehowwellthismeasuredescribestheactualtrainingutility,wetrainedKyTeausingtheback-grounddataplusdisjointgroupsof100in-domaininstanceswithsimilarprobabilitiesandmeasuredtheachievedreductionofpredictionerrors.Thecor-relationbetweeneachgroup’smeanutilityandtheachievederrorreductionwas0.87.Notethatweig-norethedecayingreturnsusuallyobservedasmoredataisaddedtothetrainingset.Also,wedidnotattempttomodelusererrors.Employingacon-stantbaseerrorrate,asinthetranscriptionscenario,wouldchangesegmentutilitiesonlybyaconstantfactor,withoutchangingtheresultingsegmentation.Aftercreatingtheusermodels,weconductedthemainexperiment,inwhicheachparticipantanno-tateddatathatwasselectedfromapoolof1000in-domainsentencesusingtwostrategies.Thefirst,baselinestrategywasasproposedbyNeubigetal.(2011).Queriesarethoseinstanceswiththelow-estconfidencescores.Eachqueryisthenextendedtotheleftandright,untilawordboundaryispre-dicted.Thisstrategyfollowssimilarreasoningaswasthepremisetothispaper:Todecidewhetherornotapositioninatextcorrespondstoawordbound-ary,theannotatorhastoacquiresurroundingcontextinformation.Thiscontextacquisitionisrelativelytimeconsuming,sohemightaswelllabelthesur-roundinginstanceswithlittleadditionaleffort.Thesecondstrategywasourproposed,moreprincipledapproach.Queriesofbothmethodswereshuffledtominimizebiasduetolearningeffects.Finally,wetrainedKyTeausingtheresultsofbothmethods,andcomparedtheachievedclassifierimprovementandsupervisiontimes.5.2.2UserStudyResultsTable2summarizestheresultsofourexperi-ment.Itshowsthattheannotationsbyeachpartic-ipantresultedinabetterclassifierfortheproposedmethodthanthebaseline,butalsotookupconsider-ablymoretime,alessclearimprovementthanforthetranscriptiontask.Infact,thetotalerrorfortimepredictionswasashighas12.5%onaverage,
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2
/
/
t
yo
a
C
_
a
_
0
0
1
7
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
177
ParticipantBaselineProposedTimeAcc.TimeAcc.Expert25:5096.1732:4596.55NonExp122:0595.7926:4495.98NonExp223:3796.1531:2896.21NonExp325:2396.3833:3696.45Table2:Wordsegmentationtaskresults,forourex-pertand3non-expertparticipants.Foreachparticipant,theresultingclassifieraccuracy[%]aftersupervisionisshown,alongwiththetime[mín.]theyneeded.Theunsu-pervisedaccuracywas95.14%.wherethebaselinemethodtendedtakelesstimethanpredicted,theproposedmethodmoretime.Thisisincontrasttoamuchlowertotalerror(within1%)whencross-validatingourusermodeltrainingdata.Thisislikelyduetothefactthatthedatafortrain-ingtheusermodelwasselectedinabalancedman-ner,asopposedtoselectingdifficultexamples,asourmethodispronetodo.Thus,wemayexpectmuchbetterpredictionswhenselectingusermodeltrainingdatathatismoresimilartothetestcase.Plottingclassifieraccuracyoverannotationtimedrawsaclearerpicture.Letusfirstanalyzethere-sultsfortheexpertannotator.Figure6(E.1)showsthattheproposedmethodresultedinconsistentlybetterresults,indicatingthattimepredictionswerestilleffective.Notethatthiscomparisonmayputtheproposedmethodataslightdisadvantagebycom-paringintermediateresultsdespiteoptimizingglob-ally.Forthenon-experts,theimprovementoverthebaselineislessconsistent,ascanbeseeninFig-ure6(N.1)foronerepresentative.Accordingtoouranalysis,thiscanbeexplainedbytwofactors:(1)Thenon-experts’annotationerror(6.5%onav-erage)wasmuchhigherthantheexpert’s(2.7%),resultinginasomewhatirregularclassifierlearn-ingcurve.(2)Thevarianceinannotationtimepersegmentwasconsistentlyhigherforthenon-expertsthantheexpert,indicatedbyanaverageper-segmentpredictionerrorof71%vs.58%rela-tivetothemeanactualvalue,respectively.Infor-mallyspeaking,non-expertsmademoremistakes,andweremorestronglyinfluencedbythedifficultyofaparticularsegment(whichwashigheronav-eragewiththeproposedmethod,asindicatedbya01020300.9550.96501020300.9550.96501020300.9550.96501020300.9550.96501020300.9550.96501020300.9550.96501020300.9550.96501020300.9550.965 Prop.BaselN.1E.1N.2E.2N.3E.3N.4E.4Annotation time [min.]Classifier Accuracy.Figure6:Classifierimprovementovertime,depictedfortheexpert(mi)andanon-expert(norte).Thegraphsshownumbersbasedon(1)actualannotationsandusermod-elsasinSections4.1and4.3,(2)error-freeannotations,(3)measuredtimesreplacedbypredictedtimes,y(4)bothreferenceannotationsandreplacedtimepredictions.loweraverageconfidence).9InFigures6(2-4)wepresentasimulationexperi-mentinwhichwefirstpretendasifannotatorsmadenomistakes,thenasiftheyneededexactlyasmuchtimeaspredictedforeachsegment,andthenboth.Thischeatingexperimentworksinfavorofthepro-posedmethod,especiallyforthenon-expert.Wemayconcludethatoursegmentationapproachisef-fectiveforthewordsegmentationtask,butrequiresmoreaccuratetimepredictions.Betterusermodelswillcertainlyhelp,althoughforthepresentedsce-narioourmethodmaybemostusefulforanexpertannotator.9Notethatthenon-expertinthefigureannotatedmuchfasterthantheexpert,whichexplainsthecomparableclassificationresultdespitemakingmoreannotationerrors.Thisisincontrasttotheothernon-experts,whowereslower.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2
/
/
t
yo
a
C
_
a
_
0
0
1
7
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
178
5.3ComputationalEfficiencySinceoursegmentationalgorithmdoesnotguar-anteepolynomialruntime,computationalefficiencywasaconcern,butdidnotturnoutproblematic.Onaconsumerlaptop,thesolverproducedseg-mentationswithinafewsecondsforasingledocu-mentcontainingseveralthousandtokens,andwithinhoursforcorporaconsistingofseveraldozendoc-uments.Runtimeincreasedroughlyquadraticallywithrespecttothenumberofsegmentedtokens.Wefeelthatthisisacceptable,consideringthatthetimeneededforhumansupervisionwilllikelydominatethecomputationtime,andreasonableapproxima-tionscanbemadeasnotedinSection3.2.6RelationtoPriorWorkEfficientsupervisionstrategieshavebeenstudiedacrossavarietyofNLP-relatedresearchareas,andreceivedincreasingattentioninrecentyears.Ex-amplesincludeposteditingforspeechrecogni-tion(Sanchez-Cortinaetal.,2012),interactivema-chinetranslation(Gonz´alez-Rubioetal.,2010),ac-tivelearningformachinetranslation(Haffarietal.,2009;Gonz´alez-Rubioetal.,2011)andmanyotherNLPtasks(Olsson,2009),tonamebutafewstudies.Ithasalsobeenrecognizedbytheactivelearn-ingcommunitythatcorrectingthemostusefulpartsfirstisoftennotoptimalintermsofefficiency,sincethesepartstendtobethemostdifficulttomanuallyannotate(Settlesetal.,2008).Theauthorsadvocatetheuseofausermodeltopredictthesupervisionef-fort,andselecttheinstanceswithbest“bang-for-the-buck.”Thispredictionofsupervisioneffortwassuc-cessful,andwasfurtherrefinedinotherNLP-relatedstudies(Tomaneketal.,2010;Specia,2011;CohnandSpecia,2013).OurapproachtousermodelingusingGPregressionisinspiredbythelatter.Moststudiesonusermodelsconsideronlysuper-visioneffort,whileneglectingtheaccuracyofhu-manannotations.Theviewonhumansasaperfectoraclehasbeencriticized(DonmezandCarbonell,2008),sincehumanerrorsarecommonandcannegativelyaffectsupervisionutility.Researchonhuman-computer-interactionhasidentifiedthemod-elingofhumanerrorsasverydifficult(OlsonandOlson,1990),dependingonfactorssuchasuserex-perience,cognitiveload,userinterfacedesign,andfatigue.Nevertheless,eventhesimpleerrormodelusedinourposteditingtaskwaseffective.Theactivelearningcommunityhasaddressedtheproblemofbalancingutilityandcostinsomemoredetail.Thepreviouslyreported“bang-for-the-buck”approachisaverysimple,greedyapproachtocom-binebothintoonemeasure.Amoretheoreticallyfoundedscalaroptimizationobjectiveisthenetben-efit(utilityminuscosts)asproposedbyVijaya-narasimhanandGrauman(2009),butunfortunatelyisrestrictedtoapplicationswherebothcanbeex-pressedintermsofthesamemonetaryunit.Vijaya-narasimhanetal.(2010)andDonmezandCarbonell(2008)useamorepracticalapproachthatspecifiesaconstrainedoptimizationproblembyallowingonlyalimitedtimebudgetforsupervision.Ourapproachisageneralizationthereofandallowseitherspecify-inganupperboundonthepredictedcost,oralowerboundonthepredictedutility.Themainnoveltyofourpresentedapproachistheexplicitmodelingandselectionofsegmentsofvarioussizes,suchthatannotationefficiencyisopti-mizedaccordingtothespecifiedconstraints.Whilesomeworks(SassanoandKurohashi,2010;Neubigetal.,2011)haveproposedusingsubsententialseg-ments,wearenotawareofanypreviousworkthatexplicitlyoptimizesthatsegmentation.7ConclusionWepresentedamethodthatcaneffectivelychooseasegmentationofalanguagecorpusthatoptimizessupervisionefficiency,consideringnotonlytheac-tualusefulnessofeachsegment,butalsotheanno-tationcost.Wereportednoticeableimprovementsoverstrongbaselinesintwouserstudies.Futureuserexperimentswithmoreparticipantswouldbedesir-abletoverifyourobservations,andallowfurtheranalysisofdifferentfactorssuchasannotatorex-pertise.Also,futureresearchmayimprovetheusermodeling,whichwillbebeneficialforourmethod.AcknowledgmentsTheresearchleadingtotheseresultshasreceivedfundingfromtheEuropeanUnionSeventhFrame-workProgramme(FP7/2007-2013)undergrantagreementn287658BridgesAcrosstheLanguageDivide(EU-BRIDGE).
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2
/
/
t
yo
a
C
_
a
_
0
0
1
7
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
179
ReferencesYuyaAkita,MasatoMimura,andTatsuyaKawahara.2009.AutomaticTranscriptionSystemforMeetingsoftheJapaneseNationalCongress.InInterspeech,pages84–87,Brighton,UK.TrevorCohnandLuciaSpecia.2013.ModellingAnno-tatorBiaswithMulti-taskGaussianProcesses:AnAp-plicationtoMachineTranslationQualityEstimation.InAssociationforComputationalLinguisticsConfer-ence(LCA),Sofia,Bulgaria.PinarDonmezandJaimeCarbonell.2008.ProactiveLearning:Cost-SensitiveActiveLearningwithMul-tipleImperfectOracles.InConferenceonInformationandKnowledgeManagement(CIKM),pages619–628,NapaValley,California,USA.Jes´usGonz´alez-Rubio,DanielOrtiz-Mart´ınez,andFran-ciscoCasacuberta.2010.BalancingUserEffortandTranslationErrorinInteractiveMachineTranslationViaConfidenceMeasures.InAssociationforCompu-tationalLinguisticsConference(LCA),ShortPapersTrack,pages173–177,Uppsala,Sweden.Jes´usGonz´alez-Rubio,DanielOrtiz-Mart´ınez,andFran-ciscoCasacuberta.2011.Anactivelearningscenarioforinteractivemachinetranslation.InInternationalConferenceonMultimodalInterfaces(ICMI),pages197–200,Alicante,Spain.GurobiOptimization.2012.GurobiOptimizerRefer-enceManual.GholamrezaHaffari,MaximRoy,andAnoopSarkar.2009.ActiveLearningforStatisticalPhrase-basedMachineTranslation.InNorthAmericanChapteroftheAssociationforComputationalLinguistics-HumanLanguageTechnologiesConference(NAACL-HLT),pages415–423,Boulder,CO,USA.StefanIrnichandGuyDesaulniers.2005.ShortestPathProblemswithResourceConstraints.InColumnGen-eration,pages33–65.SpringerUS.KikuoMaekawa.2008.BalancedCorpusofContem-poraryWrittenJapanese.InInternationalJointCon-ferenceonNaturalLanguageProcessing(IJCNLP),pages101–102,Hyderabad,India.R.TimothyMarlerandJasbirS.Arora.2004.Surveyofmulti-objectiveoptimizationmethodsforengineer-ing.StructuralandMultidisciplinaryOptimization,26(6):369–395,April.EvgenyMatusov,ArneMauser,andHermannNey.2006.AutomaticSentenceSegmentationandPunctuationPredictionforSpokenLanguageTranslation.InInter-nationalWorkshoponSpokenLanguageTranslation(IWSLT),pages158–165,Kyoto,Japan.HiroakiNanjo,YuyaAkita,andTatsuyaKawahara.2006.ComputerAssistedSpeechTranscriptionSys-temforEfficientSpeechArchive.InWesternPacificAcousticsConference(WESPAC),Seoul,Korea.GrahamNeubig,YosukeNakata,andShinsukeMori.2011.PointwisePredictionforRobust,Adapt-ableJapaneseMorphologicalAnalysis.InAssocia-tionforComputationalLinguistics:HumanLanguageTechnologiesConference(ACL-HLT),pages529–533,Portland,O,USA.AtsunoriOgawa,TakaakiHori,andAtsushiNaka-mura.2013.DiscriminativeRecognitionRateEsti-mationForN-BestListandItsApplicationToN-BestRescoring.InInternationalConferenceonAcoustics,Discurso,andSignalProcessing(ICASSP),pages6832–6836,Vancouver,Canada.JudithReitmanOlsonandGaryOlson.1990.TheGrowthofCognitiveModelinginHuman-ComputerInteractionSinceGOMS.Human-ComputerInterac-tion,5(2):221–265,June.FredrikOlsson.2009.Aliteraturesurveyofactivema-chinelearninginthecontextofnaturallanguagepro-cessing.Technicalreport,SICSSweden.DavidPisinger.1994.AMinimalAlgorithmfortheMultiple-ChoiceKnapsackProblem.EuropeanJour-nalofOperationalResearch,83(2):394–410.JohnC.Platt.1999.ProbabilisticOutputsforSup-portVectorMachinesandComparisonstoRegularizedLikelihoodMethods.InAdvancesinLargeMarginClassifiers,pages61–74.MITPress.CarlE.RasmussenandChristopherK.I.Williams.2006.GaussianProcessesforMachineLearning.MITPress,Cambridge,MAMÁ,USA.IsaiasSanchez-Cortina,NicolasSerrano,AlbertoSan-chis,andAlfonsJuan.2012.AprototypeforInter-activeSpeechTranscriptionBalancingErrorandSu-pervisionEffort.InInternationalConferenceonIntel-ligentUserInterfaces(IUI),pages325–326,Lisbon,Portugal.ManabuSassanoandSadaoKurohashi.2010.UsingSmallerConstituentsRatherThanSentencesinAc-tiveLearningforJapaneseDependencyParsing.InAssociationforComputationalLinguisticsConference(LCA),pages356–365,Uppsala,Sweden.BurrSettles,MarkCraven,andLewisFriedland.2008.ActiveLearningwithRealAnnotationCosts.InNeuralInformationProcessingSystemsConference(NIPS)-WorkshoponCost-SensitiveLearning,LakeTahoe,NV,UnitedStates.BurrSettles.2008.AnAnalysisofActiveLearningStrategiesforSequenceLabelingTasks.InConfer-enceonEmpiricalMethodsinNaturalLanguagePro-cessing(EMNLP),pages1070–1079,Honolulu,USA.HagenSoltau,FlorianMetze,ChristianF¨ugen,andAlexWaibel.2001.AOne-PassDecoderBasedonPoly-morphicLinguisticContextAssignment.InAuto-maticSpeechRecognitionandUnderstandingWork-
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2
/
/
t
yo
a
C
_
a
_
0
0
1
7
4
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
180
shop(ASRU),pages214–217,MadonnadiCampiglio,Italy.LuciaSpecia.2011.ExploitingObjectiveAnnota-tionsforMeasuringTranslationPost-editingEffort.InConferenceoftheEuropeanAssociationforMachineTranslation(EAMT),pages73–80,Nice,France.MatthiasSperber,GrahamNeubig,ChristianF¨ugen,SatoshiNakamura,andAlexWaibel.2013.EfficientSpeechTranscriptionThroughRespeaking.InInter-speech,pages1087–1091,Lyon,France.BernhardSuhm,BradMyers,andAlexWaibel.2001.Multimodalerrorcorrectionforspeechuserinter-faces.TransactionsonComputer-HumanInteraction,8(1):60–98.EvimariaTerziandPanayiotisTsaparas.2006.Efficientalgorithmsforsequencesegmentation.InSIAMCon-ferenceonDataMining(SDM),Bethesda,Maryland,USA.KatrinTomanekandUdoHahn.2009.Semi-SupervisedActiveLearningforSequenceLabeling.InInterna-tionalJointConferenceonNaturalLanguageProcess-ing(IJCNLP),pages1039–1047,Singapore.KatrinTomanek,UdoHahn,andSteffenLohmann.2010.ACognitiveCostModelofAnnotationsBasedonEye-TrackingData.InAssociationforCompu-tationalLinguisticsConference(LCA),pages1158–1167,Uppsala,Sweden.PaoloTothandDanieleVigo.2001.TheVehicleRoutingProblem.SocietyforIndustrial&AppliedMathemat-ics(SIAM),Philadelphia.SudheendraVijayanarasimhanandKristenGrauman.2009.WhatsItGoingtoCostYou?:PredictingEf-fortvs.InformativenessforMulti-LabelImageAnno-tations.InConferenceonComputerVisionandPat-ternRecognition(CVPR),pages2262–2269,MiamiBeach,Florida,USA.SudheendraVijayanarasimhan,PrateekJain,andKristenGrauman.2010.Far-sightedactivelearningonabud-getforimageandvideorecognition.InConferenceonComputerVisionandPatternRecognition(CVPR),pages3035–3042,SanFrancisco,California,EE.UU,Junio.