Transactions of the Association for Computational Linguistics, 2 (2014) 169–180. Action Editor: Eric Fosler-Lussier.

Transactions of the Association for Computational Linguistics, 2 (2014) 169–180. Action Editor: Eric Fosler-Lussier.

Submitted 11/2013; Revised 2/2014; Published 4/2014. c
(cid:13)

2014 Association for Computational Linguistics.

SegmentationforEfficientSupervisedLanguageAnnotationwithanExplicitCost-UtilityTradeoffMatthiasSperber1,MirjamSimantzik2,GrahamNeubig3,SatoshiNakamura3,AlexWaibel11KarlsruheInstituteofTechnology,InstituteforAnthropomatics,Germany2MobileTechnologiesGmbH,Germany3NaraInstituteofScienceandTechnology,AHCLaboratory,Japanmatthias.sperber@kit.edu,mirjam.simantzik@jibbigo.com,neubig@is.naist.jps-nakamura@is.naist.jp,waibel@kit.eduAbstractInthispaper,westudytheproblemofmanu-allycorrectingautomaticannotationsofnatu-rallanguageinasefficientamanneraspos-sible.Weintroduceamethodforautomati-callysegmentingacorpusintochunkssuchthatmanyuncertainlabelsaregroupedintothesamechunk,whilehumansupervisioncanbeomittedaltogetherforothersegments.Atradeoffmustbefoundforsegmentsizes.Choosingshortsegmentsallowsustoreducethenumberofhighlyconfidentlabelsthataresupervisedbytheannotator,whichisusefulbecausetheselabelsareoftenalreadycorrectandsupervisingcorrectlabelsisawasteofeffort.Incontrast,longsegmentsreducethecognitiveeffortduetocontextswitches.Ourmethodhelpsfindthesegmentationthatopti-mizessupervisionefficiencybydefiningusermodelstopredictthecostandutilityofsu-pervisingeachsegmentandsolvingacon-strainedoptimizationproblembalancingthesecontradictoryobjectives.Auserstudydemon-stratesnoticeablegainsoverpre-segmented,confidence-orderedbaselinesontwonaturallanguageprocessingtasks:speechtranscrip-tionandwordsegmentation.1IntroductionManynaturallanguageprocessing(NLP)tasksre-quirehumansupervisiontobeusefulinpractice,beittocollectsuitabletrainingmaterialortomeetsomedesiredoutputquality.Giventhehighcostofhumanintervention,howtominimizethesupervi-sioneffortisanimportantresearchproblem.Previ-ousworksinareassuchasactivelearning,postedit-(un)Itwasabrightcold(ils)dans(apron),et(un)clockswerestrikingthirteen.(b)Itwasabrightcold(ils)dans(apron),et(un)clockswerestrikingthirteen.(c)Itwasabrightcold(ils)dans(apron),et(un)clockswerestrikingthirteen.Figure1:Threeautomatictranscriptsofthesentence“ItwasabrightcolddayinApril,andtheclockswerestrik-ingthirteen”,withrecognitionerrorsinparentheses.Theunderlinedpartsaretobecorrectedbyahumanfor(un)phrases,(b)words,ou(c)theproposedsegmentation.ing,andinteractivepatternrecognitionhaveinves-tigatedthisquestionwithnotablesuccess(Settles,2008;Specia,2011;Gonz´alez-Rubioetal.,2010).Themostcommonframeworkforefficientanno-tationintheNLPcontextconsistsoftraininganNLPsystemonasmallamountofbaselinedata,andthenrunningthesystemonunannotateddatatoestimateconfidencescoresofthesystem’spredictions(Set-tles,2008).Sentenceswiththelowestconfidencearethenusedasthedatatobeannotated(Figure1(un)).Cependant,ithasbeennotedthatwhentheNLPsysteminquestionalreadyhasrelativelyhighaccu-racy,annotatingentiresentencescanbewasteful,asmostwordswillalreadybecorrect(TomanekandHahn,2009;Neubigetal.,2011).Inthesecases,itispossibletoachievemuchhigherbenefitperanno-tatedwordbyannotatingsub-sententialunits(Fig-ure1(b)).Cependant,asSettlesetal.(2008)pointout,sim-plymaximizingthebenefitperannotatedinstanceisnotenough,astherealsupervisioneffortvaries

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
4
1
5
6
6
8
6
2

/

/
t

je

un
c
_
un
_
0
0
1
7
4
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

170

1357911131517190246Segment lengthAvg. temps / instance [sec] Transcription taskWord segmentation taskFigure2:Averageannotationtimeperinstance,plottedoverdifferentsegmentlengths.Forbothtasks,theeffortclearlyincreasesforshortsegments.greatlyacrossinstances.Thisisparticularlyimpor-tantinthecontextofchoosingsegmentstoannotate,ashumanannotatorsheavilyrelyonsemanticsandcontextinformationtoprocesslanguage,andintu-itively,aconsecutivesequenceofwordscanbesu-pervisedfasterandmoreaccuratelythanthesamenumberofwordsspreadoutoverseverallocationsinatext.Thisintuitioncanalsobeseeninourempiri-caldatainFigure2,whichshowsthatforthespeechtranscriptionandwordsegmentationtasksdescribedlaterinSection5,shortsegmentshadalongeranno-tationtimeperword.Basedonthisfact,weargueitwouldbedesirabletopresenttheannotatorwithasegmentationofthedataintoeasilysupervisablechunksthatarebothlargeenoughtoreducethenum-berofcontextswitches,andsmallenoughtopreventunnecessaryannotation(Figure1(c)).Inthispaper,weintroduceanewstrategyfornat-urallanguagesupervisiontasksthatattemptstoop-timizesupervisionefficiencybychoosinganappro-priatesegmentation.Itreliesonausermodelthat,givenaspecificsegment,predictsthecostandtheutilityofsupervisingthatsegment.Giventhisusermodel,thegoalistofindasegmentationthatmini-mizesthetotalpredictedcostwhilemaximizingtheutility.Webalancethesetwocriteriabydefiningaconstrainedoptimizationprobleminwhichonecri-terionistheoptimizationobjective,whiletheothercriterionisusedasaconstraint.Doingsoallowsspecifyingpracticaloptimizationgoalssuchas“re-moveasmanyerrorsaspossiblegivenalimitedtimebudget,”or“annotatedatatoobtainsomerequiredclassifieraccuracyinaslittletimeaspossible.”Solvingthisoptimizationtaskiscomputationallydifficult,anNP-hardproblem.Nevertheless,wedemonstratethatbymakingrealisticassumptionsaboutthesegmentlength,anoptimalsolutioncanbefoundusinganintegerlinearprogrammingfor-mulationformid-sizedcorpora,asarecommonforsupervisedannotationtasks.Forlargercorpora,weprovidesimpleheuristicstoobtainanapproximatesolutioninareasonableamountoftime.Experimentsovertwoexamplescenariosdemon-stratetheusefulnessofourmethod:Posteditingforspeechtranscription,andactivelearningforJapanesewordsegmentation.Ourmodelpredictsnoticeableefficiencygains,whichareconfirmedinexperimentswithhumanannotators.2ProblemDefinitionThegoalofourmethodistofindasegmentationoveracorpusofwordtokenswN1thatoptimizessupervisionefficiencyaccordingtosomepredictiveusermodel.Theusermodelisdenotedasasetoffunctionsul,k(wba)thatevaluateanypossiblesub-sequencewbaoftokensinthecorpusaccordingtocriterial2L,andsupervisionmodesk2K.Letusillustratethiswithanexample.Sperberetal.(2013)definedaframeworkforspeechtranscrip-tioninwhichaninitial,erroneoustranscriptiscre-atedusingautomaticspeechrecognition(ASR),andanannotatorcorrectsthetranscripteitherbycorrect-ingthewordsbykeyboard,byrespeakingthecon-tent,orbyleavingthewordsasis.Inthiscase,wecoulddefineK={TYPE,RESPEAK,SKIP},eachconstantrepresentingoneofthesethreesupervisionmodes.Ourmethodwillautomaticallydeterminetheappropriatesupervisionmodeforeachsegment.Theusermodelinthisexamplemightevaluateev-erysegmentaccordingtotwocriteriaL,acostcrite-rion(intermsofsupervisiontime)andautilitycri-terion(intermsofnumberofremovederrors),whenusingeachmode.Intuitively,respeakingshouldbeassignedbothlowercost(becausespeakingisfasterthantyping),butalsolowerutilitythantypingonakeyboard(becauserespeakingrecognitionerrorscanoccur).TheSKIPmodedenotesthespecial,unsuper-visedmodethatalwaysreturns0costand0utility.Otherpossiblesupervisionmodesincludemul-tipleinputmodalities(Suhmetal.,2001),severalhumanannotatorswithdifferentexpertiseandcost

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
4
1
5
6
6
8
6
2

/

/
t

je

un
c
_
un
_
0
0
1
7
4
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

171

(DonmezandCarbonell,2008),andcorrectionvs.translationfromscratchinmachinetranslation(Spe-cia,2011).De la même manière,costcouldinsteadbeex-pressedinmonetaryterms,ortheutilityfunctioncouldpredicttheimprovementofaclassifierwhentheresultingannotationisnotintendedfordirecthu-manconsumption,butastrainingdataforaclassifierinanactivelearningframework.3OptimizationFrameworkGiventhissetting,weareinterestedinsimulta-neouslyfindingoptimallocationsandsupervisionmodesforallsegments,accordingtothegivencri-teria.Eachresultingsegmentwillbeassignedex-actlyoneofthesesupervisionmodes.Wede-noteasegmentationoftheNtokensofcorpuswN1intoMNsegmentsbyspecifyingsegmentbound-arymarkerssM+11=(s1=1,s2,…,sM+1=N+1).Settingaboundarymarkersi=ameansthatweputasegmentboundarybeforethea-thwordto-ken(ortheend-of-corpusmarkerfora=N+1).Thusourcorpusissegmentedintotokensequences[(wsj,…,wsj+11)]Mj=1.Thesupervisionmodesassignedtoeachsegmentaredenotedbymj.Wefavorthosesegmentationsthatminimizethecumu-lativevaluePMj=1[ul,mj(wsj+1sj)]foreachcriterionl.Foranycriterionwherelargervaluesareintuitivelybetter,weflipthesignbeforedefiningul,mj(wsj+1sj)tomaintainconsistency(e.g.negativenumberofer-rorsremoved).3.1MultipleCriteriaOptimizationInthecaseofasinglecriterion(|L|=1),weobtainasimple,single-objectiveunconstrainedlinearopti-mizationproblem,efficientlysolvableviadynamicprogramming(TerziandTsaparas,2006).Cependant,inpracticeoneusuallyencountersseveralcompet-ingcriteria,suchascostandutility,andherewewillfocusonthismorerealisticsetting.Webalancecompetingcriteriabyusingoneasanoptimizationobjective,andtheothersasconstraints.1Letcrite-1Thisapproachisknownastheboundedobjectivefunctionmethodinmulti-objectiveoptimizationliterature(MarlerandArora,2004).Theverypopularweightedsummethodmergescriteriaintoasingleefficiencymeasure,butisproblematicinourcasebecausethenumberofsupervisedtokensisunspec-ified.Unlesstheweightsarecarefullychosen,thealgorithmmightfind,e.g.,thecompletelyunsupervisedorcompletelysu-(à)%(what’s)%a%bright%…%[RESPEAK:1.5/2]/[SKIP:0/0]/1/cold%2/3/4/5/6/[TYPE:2/5]/[TYPE:1/4]/[TYPE:1/4]/[RESPEAK:0/3]/[SKIP:0/0]/Figure3:Excerptofasegmentationgraphforanex-ampletranscriptiontasksimilartoFigure1(someedgesareomittedforreadability).Edgesarelabeledwiththeirmode,predictednumberoferrorsthatcanberemoved,andnecessarysupervisiontime.Asegmentationschememightprefersolidedgesoverdashedonesinthisexam-ple.rionl0betheoptimizationobjectivecriterion,andletCldenotetheconstrainingconstantsforthecri-terial2Ll0=L\{l0}.Westatetheoptimizationproblem:minM;sM+11;mM1MXj=1⇥ul0,mjwsj+1sj⇤s.t.MXj=1⇥ul,mjwsj+1sj⇤Cl(8l2Ll0)Thisconstrainedoptimizationproblemisdifficulttosolve.Infact,theNP-hardmultiple-choiceknap-sackproblem(Pisinger,1994)correspondstoaspe-cialcaseofourprobleminwhichthenumberofseg-mentsisequaltothenumberoftokens,implyingthatourmoregeneralproblemisNP-hardaswell.Inordertoovercomethisproblem,werefor-mulatesearchfortheoptimalsegmentationasaresource-constrainedshortestpathprobleminadi-rected,acyclicmultigraph.Whilestillnotefficientlysolvableintheory,thisproblemiswellstudiedindomainssuchasvehicleroutingandcrewschedul-ing(IrnichandDesaulniers,2005),anditisknownthatinmanypracticalsituationstheproblemcanbesolvedreasonablyefficientlyusingintegerlinearprogrammingrelaxations(TothandVigo,2001).Inourformalism,thesetofnodesVrepresentsthespacesbetweenneighboringtokens,atwhichthealgorithmmayinsertsegmentboundaries.Anodewithindexirepresentsasegmentbreakbeforethei-thtoken,andthusthesequenceoftheindicesinapathdirectlycorrespondstosM+11.EdgesEde-notethegroupingoftokensbetweentherespectivepervisedsegmentationtobemost“efficient.”

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
4
1
5
6
6
8
6
2

/

/
t

je

un
c
_
un
_
0
0
1
7
4
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

172

nodesintoonesegment.Edgesarealwaysdirectedfromlefttoright,andlabeledwithasupervisionmode.Inaddition,eachedgebetweennodesiandjisassignedul,k(wj1i),thecorrespondingpredictedvalueforeachcriterionl2Landsupervisionmodek2K,indicatingthatthesupervisionmodeofthej-thsegmentinapathdirectlycorrespondstomj.Figure3showsanexampleofwhattheresult-inggraphmaylooklike.Ouroriginaloptimizationproblemisnowequivalenttofindingtheshortestpathbetweenthefirstandlastnodesaccordingtocriterionl0,whileobeyingthegivenresourcecon-straints.Accordingtoawidelyusedformulationfortheresourceconstrainedshortestpathproblem,wecandefineEijasthesetofcompetingedgesbetweeniandj,andexpressthisoptimizationproblemwiththefollowingintegerlinearprogram(ILP):minxXi,j2VXk2Eijxijkul0,k(sj1i)(1)s.t.Xi,j2VXk2Eijxijkul,k(sj1i)Cl(8l2Ll0)(2)Xi2Vk2Eijxijk=Xi2Vk2Eijxjik(8j2V\{1,n})(3)Xj2Vk2E1jx1jk=1(4)Xi2Vk2Einxink=1(5)xijk2{0,1}(8xijk2x)(6)Thevariablesx={xijk|je,j2V,k2Eij}denotetheactivationofthek’thedgebetweennodesiandj.Theshortestpathaccordingtotheminimizationobjective(1),thatstillmeetstheresourceconstraintsforthespecifiedcriteria(2),istobecomputed.Thedegreeconstraints(3,4,5)specifythatallbutthefirstandlastnodesmusthaveasmanyincomingasout-goingedges,whilethefirstnodemusthaveexactlyoneoutgoing,andthelastnodeexactlyoneincom-ingedge.Finally,theintegralitycondition(6)forcesalledgestobeeitherfullyactivatedorfullydeacti-vated.Theoutlinedproblemformulationcansolveddirectlybyusingoff-the-shelfILPsolvers,hereweemployGUROBI(GurobiOptimization,2012).3.2HeuristicsforApproximationIngeneral,edgesareinsertedforeverysupervisionmodebetweeneverycombinationoftwonodes.Thesearchspacecanbeconstrainedbyremovingsomeoftheseedgestoincreaseefficiency.Inthisstudy,weonlyconsideredgesspanningatmost20tokens.Forcasesinwhichlargercorporaaretobeanno-tated,orwhentheacceptabledelayfordeliveringre-sultsissmall,asuitablesegmentationcanbefoundapproximately.Theeasiestwaywouldbetoparti-tionthecorpus,e.g.accordingtoitsindividualdoc-uments,dividethebudgetconstraintsevenlyacrossallpartitions,andthensegmenteachpartitioninde-pendently.Moresophisticatedmethodsmightap-proximatetheParetofrontforeachpartition,anddistributethebudgetsinanintelligentway.4UserModelingWhiletheproposedframeworkisabletooptimizethesegmentationwithrespecttoeachcriterion,italsorestsupontheassumptionthatwecanprovideusermodelsul,k(wj1i)thataccuratelyevaluateev-erysegmentaccordingtothespecifiedcriteriaandsupervisionmodes.Inthissection,wediscussourstrategiesforestimatingthreeconceivablecriteria:annotationcost,correctionoferrors,andimprove-mentofaclassifier.4.1AnnotationCostModelingModelingcostrequiressolvingaregressionprob-lemfromfeaturesofacandidatesegmenttoannota-tioncost,forexampleintermsofsupervisiontime.Appropriateinputfeaturesdependonthetask,butshouldincludenotionsofcomplexity(e.g.aconfi-dencemeasure)andlengthofthesegment,asbothareexpectedtostronglyinfluencesupervisiontime.WeproposeusingGaussianprocess(GP)regres-sionforcostprediction,astart-of-the-artnonpara-metricBayesianregressiontechnique(RasmussenandWilliams,2006)2.AsreportedonasimilartaskbyCohnandSpecia(2013),andconfirmedbyourpreliminaryexperiments,GPregressionsignifi-cantlyoutperformspopulartechniquessuchassup-2Codeavailableathttp://www.gaussianprocess.org/gpml/

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
4
1
5
6
6
8
6
2

/

/
t

je

un
c
_
un
_
0
0
1
7
4
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

173

portvectorregressionandleast-squareslinearre-gression.WealsofollowtheirsettingsforGP,em-ployingGPregressionwithasquaredexponentialkernelwithautomaticrelevancedetermination.De-pendingonthenumberofusersandamountoftrain-ingdataavailableforeachuser,modelsmaybetrainedseparatelyforeachuser(aswedohere),orinacombinedfashionviamulti-tasklearningaspro-posedbyCohnandSpecia(2013).Itisalsocrucialforthepredictionstobereliablethroughoutthewholerelevantspaceofsegments.Ifthecostofcertaintypesofsegmentsissystem-aticallyunderpredicted,thesegmentationalgorithmmightbemisledtopreferthese,possiblyalargenumberoftimes.3Aneffectivetricktopreventsuchunderpredictionsistopredictthelogtimeinsteadoftheactualtime.Inthisway,errorsinthecriticallowendarepenalizedmorestrongly,andthetimecanneverbecomenegative.4.2ErrorCorrectionModelingAsoneutilitymeasure,wecanusethenumberoferrorscorrected,ausefulmeasureforposteditingtasksoverautomaticallyproducedannotations.Inordertomeasurehowmanyerrorscanberemovedbysupervisingaparticularsegment,wemustes-timatebothhowmanyerrorsareintheautomaticannotation,andhowreliablyahumancanremovetheseforagivensupervisionmode.Mostmachinelearningtechniquescanestimateconfidencescoresintheformofposteriorprobabil-ities.Toestimatethenumberoferrors,wecansumoveroneminustheposteriorforalltokens,whichestimatestheHammingdistancefromthereferenceannotation.Thismeasureisappropriatefortasksinwhichthenumberoftokensisfixedinadvance(e.g.apart-of-speechestimationtask),andareasonableapproximationfortasksinwhichthenumberofto-kensisnotknowninadvance(e.g.speechtranscrip-tion,cf.Section5.1.1).Predictingtheparticulartokensatwhichahumanwillmakeamistakeisknowntobeadifficulttask(OlsonandOlson,1990),butasimplifyingconstant3Forinstance,consideramodelthatpredictswellforseg-mentsofmediumsizeorlonger,butunderpredictsthesupervi-siontimeofsingle-tokensegments.Thismayleadthesegmen-tationalgorithmtoputeverytokenintoitsownsegment,whichisclearlyundesirable.humanerrorratecanstillbeuseful.Forexample,inthetaskfromSection2,wemaysuspectacertainnumberoferrorsinatranscriptsegment,andpredict,say,95%ofthoseerrorstoberemovedviatyping,butonly85%viarespeaking.4.3ClassifierImprovementModelingAnotherreasonableutilitymeasureisaccuracyofaclassifiertrainedonthedatawechoosetoannotateinanactivelearningframework.Confidencescoreshavebeenfoundusefulforrankingparticulartokenswithregardstohowmuchtheywillimproveaclas-sifier(Settles,2008).Ici,wemaysimilarlyscoresegmentutilityasthesumofitstokenconfidences,althoughcaremustbetakentonormalizeandcali-bratethetokenconfidencestobelinearlycompara-blebeforedoingso.Whiletheresultingutilityscorehasnointerpretationinabsoluteterms,itcanstillbeusedasanoptimizationobjective(cf.Section5.2.1).5ExperimentsInthissection,wepresentexperimentalresultsex-aminingtheeffectivenessoftheproposedmethodovertwotasks:speechtranscriptionandJapanesewordsegmentation.45.1SpeechTranscriptionExperimentsAccuratespeechtranscriptsareamuch-demandedNLPproduct,usefulbythemselves,astrainingma-terialforASR,orasinputforfollow-uptaskslikespeechtranslation.Withrecognitionaccuraciesplateauing,manuallycorrecting(postediting)auto-maticspeechtranscriptshasbecomepopular.Com-monapproachesaretoidentifywords(Sanchez-Cortinaetal.,2012)ou(sub-)phrases(Sperberetal.,2013)oflowconfidence,andhaveahumanedi-torcorrectthese.5.1.1ExperimentalSetupWeconductauserstudyinwhichparticipantspost-editedspeechtranscripts,givenafixedgoalworderrorrate.ThetranscriptionsetupwassuchthatthetranscribercouldseetheASRtranscriptofpartsbeforeandafterthesegmentthathewasedit-ing,providingcontextifneeded.Whenimprecisetimealignmentresultedinsegmentbreaksthatwere4Softwareandexperimentaldatacanbedownloadedfromhttp://www.msperber.com/research/tacl-segmentation/

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
4
1
5
6
6
8
6
2

/

/
t

je

un
c
_
un
_
0
0
1
7
4
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

174

slightly“off,”ashappenedoccasionally,thatcontexthelpedguesswhatwassaid.Thesegmentitselfwastranscribedfromscratch,asopposedtoeditingtheASRtranscript;besidesbeingarguablymoreeffi-cientwhentheASRtranscriptcontainsmanymis-takes(Nanjoetal.,2006;Akitaetal.,2009),prelim-inaryexperimentsalsoshowedthatsupervisiontimeisfareasiertopredictthisway.Figure4illustrateswhatthesetuplookedlike.Weusedaself-developedtranscriptiontooltoconductexperiments.Itpresentsourcomputedseg-mentsonebyone,allowsconvenientinputandplay-backviakeyboardshortcuts,andlogsuserinterac-tionswiththeirtimestamps.AselectionofTEDtalks5(Englishtalksontechnology,entertainment,anddesign)servedasexperimentaldata.Whilesomeofthesetalkscontainjargonsuchasmedi-calterms,theyarepresentedbyskilledspeakers,makingthemcomparablyeasytounderstand.InitialtranscriptswerecreatedusingtheJanusrecognitiontoolkit(Soltauetal.,2001)withastandard,TED-optimizedsetup.Weusedconfusionnetworksfordecodingandobtainingconfidencescores.Forreasonsofsimplicity,andbettercompara-bilitytoourbaseline,werestrictedourexperimenttotwosupervisionmodes:TYPEandSKIP.Weconductedexperimentswith3participants,1withseveralyearsofexperienceintranscription,2withnone.Eachparticipantreceivedanexplanationonthetranscriptionguidelines,andashorthands-ontrainingtolearntouseourtool.Next,theytran-scribedabalancedselectionof200segmentsofvaryinglengthandqualityinrandomorder.Thisdatawasusedtotraintheusermodels.Finally,eachparticipanttranscribedanother2TEDtalks,withworderrorrate(WER)19.96%(predicted:22.33%).Wesetatarget(predicted)WERof15%asouroptimizationconstraint,6andminimizethepredictedsupervisiontimeasourob-jectivefunction.BothTEDtalksweretranscribedonceusingthebaselinestrategy,andonceusingtheproposedstrategy.Theorderofbothstrategieswasreversedbetweentalks,tominimizelearningbiasduetotranscribingeachtalktwice.Thebaselinestrategywasadoptedaccordingto5www.ted.com6Dependingonthelevelofaccuracyrequiredbyourfinalapplication,thistargetmaybesetlowerorhigher.Sperberetal.(2013):Wesegmentedthetalkintonatural,subsententialunits,usingMatusovetal.(2006)’ssegmenter,whichwetunedtoreproducetheTEDsubtitlesegmentation,producingameansegmentlengthof8.6words.Segmentswereaddedinorderofincreasingaveragewordconfidence,untiltheusermodelpredictedaWER<15%.Thesecondsegmentationstrategywastheproposedmethod,similarlywitharesourceconstraintofWER<15%.SupervisiontimewaspredictedviaGPregres-sion(cf.Section4.1),usingsegmentlength,au-dioduration,andmeanconfidenceasinputfeatures.Theoutputvariablewasassumedsubjecttoaddi-tiveGaussiannoisewithzeromean,avarianceof5secondswaschosenempiricallytominimizethemeansquarederror.Utilityprediction(cf.Section4.2)wasbasedonposteriorscoresobtainedfromtheconfusionnetworks.Wefounditimportanttocalibratethem,astheposteriorswereoverconfidentespeciallyintheupperrange.Todoso,weautomat-icallytranscribedadevelopmentsetofTEDdata,groupedtherecognizedwordsintobucketsaccord-ingtotheirposteriors,anddeterminedtheaveragenumberoferrorsperwordineachbucketfromanalignmentwiththereferencetranscript.Themap-pingfromaverageposteriortoaveragenumberoferrorswasestimatedviaGPregression.Theresultwassummedoveralltokens,andmultipliedbyaconstanthumanconfidence,separatelydeterminedforeachparticipant.75.1.2SimulationResultsToconveyabetterunderstandingofthepoten-tialgainsaffordedbyourmethod,wefirstpresentasimulatedexperiment.Weassumeatranscriberwhomakesnomistakes,andneedsexactlytheamountoftimepredictedbyausermodeltrainedonthedataofarandomlyselectedparticipant.Wecomparethreescenarios:Abaselinesimulation,inwhichthebase-linesegmentsaretranscribedinascendingorderofconfidence;asimulationusingtheproposedmethod,inwhichwechangetheWERconstraintinsmallin-crements;finally,anoraclesimulation,whichuses7MoreelaboratemethodsforWERestimationexist,suchasbyOgawaetal.(2013),butifourmethodachievesimprove-mentsusingsimpleHammingdistance,incorporatingmoreso-phisticatedmeasureswilllikelyachievesimilar,orevenbetteraccuracy. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 1 7 4 1 5 6 6 8 6 2 / / t l a c _ a _ 0 0 1 7 4 p d . f b y g u e s t t o n 0 9 S e p e m b e r 2 0 2 3 175 (3)SKIP:“nineteenfortysixuntiltodayyouseethegreen”(4)TYPE:(5)SKIP:“Interstateconflict”(6)TYPE:(7)SKIP:…Figure4:Resultofoursegmentationmethod(excerpt).TYPEsegmentsaredisplayedemptyandshouldbetran-scribedfromscratch.ForSKIPsegments,theASRtran-scriptisdisplayedtoprovidecontext.Whenannotatingasegment,thecorrespondingaudioisplayedback.01020304050600510152025Post editing time [min]Resulting WER [%] BaselineProposedOracleFigure5:SimulationofposteditingonexampleTEDtalk.TheproposedmethodreducestheWERconsider-ablyfasterthanthebaselineatfirst,laterbothconverge.Themuchsuperiororaclesimulationindicatesroomforfurtherimprovement.theproposedmethod,butusesautilitymodelthatknowstheactualnumberoferrorsineachsegment.Foreachsupervisedsegment,wesimplyreplacetheASRoutputwiththereference,andmeasurethere-sultingWER.Figure5showsthesimulationonanexampleTEDtalk,basedonaninitialtranscriptwith21.9%WER.TheproposedmethodisabletoreducetheWERfasterthanthebaseline,uptoacertainpointwheretheyconverge.Theoraclesimulationisevenfaster,indicatingroomforimprovementthroughbetterconfidencescores.5.1.3UserStudyResultsTable1showstheresultsoftheuserstudy.First,wenotethattheWERestimationbyourutilitymodelwasoffbyabout2.5%:WhilethepredictedimprovementinWERwasfrom22.33%to15.0%,theactualimprovementwasfrom19.96%toabout12.5%.TheactualresultingWERwasconsistentParticipantBaselineProposedWERTimeWERTimeP112.2644:0512.1833:01P212.7536:1912.7729:54P312.7052:4212.5037:57AVG12.5744:2212.4833:37Table1:Transcriptiontaskresults.Foreachuser,theresultingWER[%]aftersupervisionisshown,alongwiththetime[min]theyneeded.TheunsupervisedWERwas19.96%.acrossallusers,andweobservestrong,consistentreductionsinsupervisiontimeforallparticipants.Predictionofthenecessarysupervisiontimewasac-curate:Averagedoverparticipants,45:41minuteswerepredictedforthebaseline,44:22minutesmea-sured.Fortheproposedmethod,32:11minuteswerepredicted,33:37minutesmeasured.Onaverage,participantsremoved6.68errorsperminuteusingthebaseline,and8.93errorsperminuteusingtheproposedmethod,aspeed-upof25.2%.Notethatpredictedandmeasuredvaluesarenotstrictlycomparable:Intheexperiments,toprovideafaircomparisonparticipantstranscribedthesametalkstwice(onceusingbaseline,oncetheproposedmethod,inalternatingorder),resultinginanotice-ablelearningeffect.Theusermodel,ontheotherhand,istrainedtopredictthecaseinwhichatran-scriberconductsonlyonetranscriptionpass.Asaninterestingfinding,withoutbeinginformedabouttheorderofbaselineandproposedmethod,participantsreportedthattranscribingaccordingtotheproposedsegmentationseemedharder,astheyfoundthebaselinesegmentationmorelinguisticallyreasonable.However,thisperceivedincreaseindif-ficultydidnotshowinefficiencynumbers.5.2JapaneseWordSegmentationExperimentsWordsegmentationisthefirststepinNLPforlan-guagesthatarecommonlywrittenwithoutwordboundaries,suchasJapaneseandChinese.Weap-plyourmethodtoataskinwhichwedomain-adaptawordsegmentationclassifierviaactivelearning.Inthisexperiment,participantsannotatedwhetherornotawordboundaryoccurredatcertainpositionsinaJapanesesentence.Thetokenstobegroupedintosegmentsarepositionsbetweenadjacentcharacters.

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
4
1
5
6
6
8
6
2

/

/
t

je

un
c
_
un
_
0
0
1
7
4
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

176

5.2.1ExperimentalSetupNeubigetal.(2011)haveproposedapointwisemethodforJapanesewordsegmentationthatcanbetrainedusingpartiallyannotatedsentences,whichmakesitattractiveincombinationwithactivelearn-ing,aswellasoursegmentationmethod.Theauthorsreleasedtheirmethodasasoftwarepack-age“KyTea”thatweemployedinthisuserstudy.WeusedKyTea’sactivelearningdomainadaptationtoolkit8asabaseline.Fordata,weusedtheBalancedCorpusofCon-temporaryWrittenJapanese(BCCWJ),createdbyMaekawa(2008),withtheinternetQ&Asubcor-pusasin-domaindata,andthewhitepapersubcor-pusasbackgrounddata,adomainadaptationsce-nario.Sentencesweredrawnfromthein-domaincorpus,andthemanuallyannotateddatawasthenusedtotrainKyTea,alongwiththepre-annotatedbackgrounddata.Thegoal(objectivefunction)wastoimproveKyTea’sclassificationaccuracyonanin-domaintestset,givenaconstrainedtimebudgetof30minutes.Therewereagain2supervisionmodes:ANNOTATEandSKIP.Notethatthisisessentiallyabatchactivelearningsetupwithonlyoneiteration.WeconductedexperimentswithoneexpertwithseveralyearsofexperiencewithJapanesewordseg-mentationannotation,andthreenon-expertnativespeakerswithnopriorexperience.Japanesewordsegmentationisnotatrivialtask,soweprovidednon-expertswithtraining,includingexplanationofthesegmentationstandard,asupervisedtestwithimmediatefeedbackandexplanations,andhands-ontrainingtogetusedtotheannotationsoftware.SupervisiontimewaspredictedviaGPregression(cf.Section4.1),usingthesegmentlengthandmeanconfidenceasinputfeatures.Asbefore,theoutputvariablewasassumedsubjecttoadditiveGaussiannoisewithzeromeanand5secondsvariance.Toob-taintrainingdataforthesemodels,eachparticipantannotatedabout500exampleinstances,drawnfromtheadaptationcorpus,groupedintosegmentsandbalancedregardingsegmentlengthanddifficulty.Forutilitymodeling(cf.Section4.3),wefirstnor-malizedKyTea’sconfidencescores,whicharegivenintermsofSVMmargin,usingasigmoidfunction(Platt,1999).Thenormalizationparameterwasse-8http://www.phontron.com/kytea/active.htmllectedsothatthemeanconfidenceonadevelopmentsetcorrespondedtotheactualclassifieraccuracy.Wederiveourmeasureofclassifierimprovementforcorrectingasegmentbysummingoveroneminusthecalibratedconfidenceforeachofitstokens.Toanalyzehowwellthismeasuredescribestheactualtrainingutility,wetrainedKyTeausingtheback-grounddataplusdisjointgroupsof100in-domaininstanceswithsimilarprobabilitiesandmeasuredtheachievedreductionofpredictionerrors.Thecor-relationbetweeneachgroup’smeanutilityandtheachievederrorreductionwas0.87.Notethatweig-norethedecayingreturnsusuallyobservedasmoredataisaddedtothetrainingset.Also,wedidnotattempttomodelusererrors.Employingacon-stantbaseerrorrate,asinthetranscriptionscenario,wouldchangesegmentutilitiesonlybyaconstantfactor,withoutchangingtheresultingsegmentation.Aftercreatingtheusermodels,weconductedthemainexperiment,inwhicheachparticipantanno-tateddatathatwasselectedfromapoolof1000in-domainsentencesusingtwostrategies.Thefirst,baselinestrategywasasproposedbyNeubigetal.(2011).Queriesarethoseinstanceswiththelow-estconfidencescores.Eachqueryisthenextendedtotheleftandright,untilawordboundaryispre-dicted.Thisstrategyfollowssimilarreasoningaswasthepremisetothispaper:Todecidewhetherornotapositioninatextcorrespondstoawordbound-ary,theannotatorhastoacquiresurroundingcontextinformation.Thiscontextacquisitionisrelativelytimeconsuming,sohemightaswelllabelthesur-roundinginstanceswithlittleadditionaleffort.Thesecondstrategywasourproposed,moreprincipledapproach.Queriesofbothmethodswereshuffledtominimizebiasduetolearningeffects.Finally,wetrainedKyTeausingtheresultsofbothmethods,andcomparedtheachievedclassifierimprovementandsupervisiontimes.5.2.2UserStudyResultsTable2summarizestheresultsofourexperi-ment.Itshowsthattheannotationsbyeachpartic-ipantresultedinabetterclassifierfortheproposedmethodthanthebaseline,butalsotookupconsider-ablymoretime,alessclearimprovementthanforthetranscriptiontask.Infact,thetotalerrorfortimepredictionswasashighas12.5%onaverage,

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
4
1
5
6
6
8
6
2

/

/
t

je

un
c
_
un
_
0
0
1
7
4
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

177

ParticipantBaselineProposedTimeAcc.TimeAcc.Expert25:5096.1732:4596.55NonExp122:0595.7926:4495.98NonExp223:3796.1531:2896.21NonExp325:2396.3833:3696.45Table2:Wordsegmentationtaskresults,forourex-pertand3non-expertparticipants.Foreachparticipant,theresultingclassifieraccuracy[%]aftersupervisionisshown,alongwiththetime[min]theyneeded.Theunsu-pervisedaccuracywas95.14%.wherethebaselinemethodtendedtakelesstimethanpredicted,theproposedmethodmoretime.Thisisincontrasttoamuchlowertotalerror(within1%)whencross-validatingourusermodeltrainingdata.Thisislikelyduetothefactthatthedatafortrain-ingtheusermodelwasselectedinabalancedman-ner,asopposedtoselectingdifficultexamples,asourmethodispronetodo.Thus,wemayexpectmuchbetterpredictionswhenselectingusermodeltrainingdatathatismoresimilartothetestcase.Plottingclassifieraccuracyoverannotationtimedrawsaclearerpicture.Letusfirstanalyzethere-sultsfortheexpertannotator.Figure6(E.1)showsthattheproposedmethodresultedinconsistentlybetterresults,indicatingthattimepredictionswerestilleffective.Notethatthiscomparisonmayputtheproposedmethodataslightdisadvantagebycom-paringintermediateresultsdespiteoptimizingglob-ally.Forthenon-experts,theimprovementoverthebaselineislessconsistent,ascanbeseeninFig-ure6(N.1)foronerepresentative.Accordingtoouranalysis,thiscanbeexplainedbytwofactors:(1)Thenon-experts’annotationerror(6.5%onav-erage)wasmuchhigherthantheexpert’s(2.7%),resultinginasomewhatirregularclassifierlearn-ingcurve.(2)Thevarianceinannotationtimepersegmentwasconsistentlyhigherforthenon-expertsthantheexpert,indicatedbyanaverageper-segmentpredictionerrorof71%vs.58%rela-tivetothemeanactualvalue,respectively.Infor-mallyspeaking,non-expertsmademoremistakes,andweremorestronglyinfluencedbythedifficultyofaparticularsegment(whichwashigheronav-eragewiththeproposedmethod,asindicatedbya01020300.9550.96501020300.9550.96501020300.9550.96501020300.9550.96501020300.9550.96501020300.9550.96501020300.9550.96501020300.9550.965 Prop.BaselN.1E.1N.2E.2N.3E.3N.4E.4Annotation time [min.]Classifier Accuracy.Figure6:Classifierimprovementovertime,depictedfortheexpert(E)andanon-expert(N).Thegraphsshownumbersbasedon(1)actualannotationsandusermod-elsasinSections4.1and4.3,(2)error-freeannotations,(3)measuredtimesreplacedbypredictedtimes,et(4)bothreferenceannotationsandreplacedtimepredictions.loweraverageconfidence).9InFigures6(2-4)wepresentasimulationexperi-mentinwhichwefirstpretendasifannotatorsmadenomistakes,thenasiftheyneededexactlyasmuchtimeaspredictedforeachsegment,andthenboth.Thischeatingexperimentworksinfavorofthepro-posedmethod,especiallyforthenon-expert.Wemayconcludethatoursegmentationapproachisef-fectiveforthewordsegmentationtask,butrequiresmoreaccuratetimepredictions.Betterusermodelswillcertainlyhelp,althoughforthepresentedsce-narioourmethodmaybemostusefulforanexpertannotator.9Notethatthenon-expertinthefigureannotatedmuchfasterthantheexpert,whichexplainsthecomparableclassificationresultdespitemakingmoreannotationerrors.Thisisincontrasttotheothernon-experts,whowereslower.

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
4
1
5
6
6
8
6
2

/

/
t

je

un
c
_
un
_
0
0
1
7
4
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

178

5.3ComputationalEfficiencySinceoursegmentationalgorithmdoesnotguar-anteepolynomialruntime,computationalefficiencywasaconcern,butdidnotturnoutproblematic.Onaconsumerlaptop,thesolverproducedseg-mentationswithinafewsecondsforasingledocu-mentcontainingseveralthousandtokens,andwithinhoursforcorporaconsistingofseveraldozendoc-uments.Runtimeincreasedroughlyquadraticallywithrespecttothenumberofsegmentedtokens.Wefeelthatthisisacceptable,consideringthatthetimeneededforhumansupervisionwilllikelydominatethecomputationtime,andreasonableapproxima-tionscanbemadeasnotedinSection3.2.6RelationtoPriorWorkEfficientsupervisionstrategieshavebeenstudiedacrossavarietyofNLP-relatedresearchareas,andreceivedincreasingattentioninrecentyears.Ex-amplesincludeposteditingforspeechrecogni-tion(Sanchez-Cortinaetal.,2012),interactivema-chinetranslation(Gonz´alez-Rubioetal.,2010),ac-tivelearningformachinetranslation(Haffarietal.,2009;Gonz´alez-Rubioetal.,2011)andmanyotherNLPtasks(Olsson,2009),tonamebutafewstudies.Ithasalsobeenrecognizedbytheactivelearn-ingcommunitythatcorrectingthemostusefulpartsfirstisoftennotoptimalintermsofefficiency,sincethesepartstendtobethemostdifficulttomanuallyannotate(Settlesetal.,2008).Theauthorsadvocatetheuseofausermodeltopredictthesupervisionef-fort,andselecttheinstanceswithbest“bang-for-the-buck.”Thispredictionofsupervisioneffortwassuc-cessful,andwasfurtherrefinedinotherNLP-relatedstudies(Tomaneketal.,2010;Specia,2011;CohnandSpecia,2013).OurapproachtousermodelingusingGPregressionisinspiredbythelatter.Moststudiesonusermodelsconsideronlysuper-visioneffort,whileneglectingtheaccuracyofhu-manannotations.Theviewonhumansasaperfectoraclehasbeencriticized(DonmezandCarbonell,2008),sincehumanerrorsarecommonandcannegativelyaffectsupervisionutility.Researchonhuman-computer-interactionhasidentifiedthemod-elingofhumanerrorsasverydifficult(OlsonandOlson,1990),dependingonfactorssuchasuserex-perience,cognitiveload,userinterfacedesign,andfatigue.Nevertheless,eventhesimpleerrormodelusedinourposteditingtaskwaseffective.Theactivelearningcommunityhasaddressedtheproblemofbalancingutilityandcostinsomemoredetail.Thepreviouslyreported“bang-for-the-buck”approachisaverysimple,greedyapproachtocom-binebothintoonemeasure.Amoretheoreticallyfoundedscalaroptimizationobjectiveisthenetben-efit(utilityminuscosts)asproposedbyVijaya-narasimhanandGrauman(2009),butunfortunatelyisrestrictedtoapplicationswherebothcanbeex-pressedintermsofthesamemonetaryunit.Vijaya-narasimhanetal.(2010)andDonmezandCarbonell(2008)useamorepracticalapproachthatspecifiesaconstrainedoptimizationproblembyallowingonlyalimitedtimebudgetforsupervision.Ourapproachisageneralizationthereofandallowseitherspecify-inganupperboundonthepredictedcost,oralowerboundonthepredictedutility.Themainnoveltyofourpresentedapproachistheexplicitmodelingandselectionofsegmentsofvarioussizes,suchthatannotationefficiencyisopti-mizedaccordingtothespecifiedconstraints.Whilesomeworks(SassanoandKurohashi,2010;Neubigetal.,2011)haveproposedusingsubsententialseg-ments,wearenotawareofanypreviousworkthatexplicitlyoptimizesthatsegmentation.7ConclusionWepresentedamethodthatcaneffectivelychooseasegmentationofalanguagecorpusthatoptimizessupervisionefficiency,consideringnotonlytheac-tualusefulnessofeachsegment,butalsotheanno-tationcost.Wereportednoticeableimprovementsoverstrongbaselinesintwouserstudies.Futureuserexperimentswithmoreparticipantswouldbedesir-abletoverifyourobservations,andallowfurtheranalysisofdifferentfactorssuchasannotatorex-pertise.Also,futureresearchmayimprovetheusermodeling,whichwillbebeneficialforourmethod.AcknowledgmentsTheresearchleadingtotheseresultshasreceivedfundingfromtheEuropeanUnionSeventhFrame-workProgramme(FP7/2007-2013)undergrantagreementn287658BridgesAcrosstheLanguageDivide(EU-BRIDGE).

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
4
1
5
6
6
8
6
2

/

/
t

je

un
c
_
un
_
0
0
1
7
4
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

179

ReferencesYuyaAkita,MasatoMimura,andTatsuyaKawahara.2009.AutomaticTranscriptionSystemforMeetingsoftheJapaneseNationalCongress.InInterspeech,pages84–87,Brighton,UK.TrevorCohnandLuciaSpecia.2013.ModellingAnno-tatorBiaswithMulti-taskGaussianProcesses:AnAp-plicationtoMachineTranslationQualityEstimation.InAssociationforComputationalLinguisticsConfer-ence(ACL),Sofia,Bulgaria.PinarDonmezandJaimeCarbonell.2008.ProactiveLearning:Cost-SensitiveActiveLearningwithMul-tipleImperfectOracles.InConferenceonInformationandKnowledgeManagement(CIKM),pages619–628,NapaValley,Californie,USA.Jes´usGonz´alez-Rubio,DanielOrtiz-Mart´ınez,andFran-ciscoCasacuberta.2010.BalancingUserEffortandTranslationErrorinInteractiveMachineTranslationViaConfidenceMeasures.InAssociationforCompu-tationalLinguisticsConference(ACL),ShortPapersTrack,pages173–177,Uppsala,Sweden.Jes´usGonz´alez-Rubio,DanielOrtiz-Mart´ınez,andFran-ciscoCasacuberta.2011.Anactivelearningscenarioforinteractivemachinetranslation.InInternationalConferenceonMultimodalInterfaces(ICMI),pages197–200,Alicante,Spain.GurobiOptimization.2012.GurobiOptimizerRefer-enceManual.GholamrezaHaffari,MaximRoy,andAnoopSarkar.2009.ActiveLearningforStatisticalPhrase-basedMachineTranslation.InNorthAmericanChapteroftheAssociationforComputationalLinguistics-HumanLanguageTechnologiesConference(NAACL-HLT),pages415–423,Boulder,CO,USA.StefanIrnichandGuyDesaulniers.2005.ShortestPathProblemswithResourceConstraints.InColumnGen-eration,pages33–65.SpringerUS.KikuoMaekawa.2008.BalancedCorpusofContem-poraryWrittenJapanese.InInternationalJointCon-ferenceonNaturalLanguageProcessing(IJCNLP),pages101–102,Hyderabad,India.R.TimothyMarlerandJasbirS.Arora.2004.Surveyofmulti-objectiveoptimizationmethodsforengineer-ing.StructuralandMultidisciplinaryOptimization,26(6):369–395,April.EvgenyMatusov,ArneMauser,andHermannNey.2006.AutomaticSentenceSegmentationandPunctuationPredictionforSpokenLanguageTranslation.InInter-nationalWorkshoponSpokenLanguageTranslation(IWSLT),pages158–165,Kyoto,Japan.HiroakiNanjo,YuyaAkita,andTatsuyaKawahara.2006.ComputerAssistedSpeechTranscriptionSys-temforEfficientSpeechArchive.InWesternPacificAcousticsConference(WESPAC),Séoul,Korea.GrahamNeubig,YosukeNakata,andShinsukeMori.2011.PointwisePredictionforRobust,Adapt-ableJapaneseMorphologicalAnalysis.InAssocia-tionforComputationalLinguistics:HumanLanguageTechnologiesConference(ACL-HLT),pages529–533,Portland,OR,USA.AtsunoriOgawa,TakaakiHori,andAtsushiNaka-mura.2013.DiscriminativeRecognitionRateEsti-mationForN-BestListandItsApplicationToN-BestRescoring.InInternationalConferenceonAcoustics,Speech,andSignalProcessing(ICASSP),pages6832–6836,Vancouver,Canada.JudithReitmanOlsonandGaryOlson.1990.TheGrowthofCognitiveModelinginHuman-ComputerInteractionSinceGOMS.Human-ComputerInterac-tion,5(2):221–265,June.FredrikOlsson.2009.Aliteraturesurveyofactivema-chinelearninginthecontextofnaturallanguagepro-cessing.Technicalreport,SICSSweden.DavidPisinger.1994.AMinimalAlgorithmfortheMultiple-ChoiceKnapsackProblem.EuropeanJour-nalofOperationalResearch,83(2):394–410.JohnC.Platt.1999.ProbabilisticOutputsforSup-portVectorMachinesandComparisonstoRegularizedLikelihoodMethods.InAdvancesinLargeMarginClassifiers,pages61–74.MITPress.CarlE.RasmussenandChristopherK.I.Williams.2006.GaussianProcessesforMachineLearning.MITPress,Cambridge,MA,USA.IsaiasSanchez-Cortina,NicolasSerrano,AlbertoSan-chis,andAlfonsJuan.2012.AprototypeforInter-activeSpeechTranscriptionBalancingErrorandSu-pervisionEffort.InInternationalConferenceonIntel-ligentUserInterfaces(IUI),pages325–326,Lisbon,Portugal.ManabuSassanoandSadaoKurohashi.2010.UsingSmallerConstituentsRatherThanSentencesinAc-tiveLearningforJapaneseDependencyParsing.InAssociationforComputationalLinguisticsConference(ACL),pages356–365,Uppsala,Sweden.BurrSettles,MarkCraven,andLewisFriedland.2008.ActiveLearningwithRealAnnotationCosts.InNeuralInformationProcessingSystemsConference(NIPS)-WorkshoponCost-SensitiveLearning,LakeTahoe,NV,UnitedStates.BurrSettles.2008.AnAnalysisofActiveLearningStrategiesforSequenceLabelingTasks.InConfer-enceonEmpiricalMethodsinNaturalLanguagePro-cessing(EMNLP),pages1070–1079,Honolulu,USA.HagenSoltau,FlorianMetze,ChristianF¨ugen,andAlexWaibel.2001.AOne-PassDecoderBasedonPoly-morphicLinguisticContextAssignment.InAuto-maticSpeechRecognitionandUnderstandingWork-

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi

/
t

un
c
je
/

je

un
r
t
je
c
e

p
d

F
/

d
o

je
/

.

1
0
1
1
6
2

/
t

je

un
c
_
un
_
0
0
1
7
4
1
5
6
6
8
6
2

/

/
t

je

un
c
_
un
_
0
0
1
7
4
p
d

.

F

b
oui
g
toi
e
s
t

t

o
n
0
9
S
e
p
e
m
b
e
r
2
0
2
3

180

shop(ASRU),pages214–217,MadonnadiCampiglio,Italy.LuciaSpecia.2011.ExploitingObjectiveAnnota-tionsforMeasuringTranslationPost-editingEffort.InConferenceoftheEuropeanAssociationforMachineTranslation(EAMT),pages73–80,Nice,France.MatthiasSperber,GrahamNeubig,ChristianF¨ugen,SatoshiNakamura,andAlexWaibel.2013.EfficientSpeechTranscriptionThroughRespeaking.InInter-speech,pages1087–1091,Lyon,France.BernhardSuhm,BradMyers,andAlexWaibel.2001.Multimodalerrorcorrectionforspeechuserinter-faces.TransactionsonComputer-HumanInteraction,8(1):60–98.EvimariaTerziandPanayiotisTsaparas.2006.Efficientalgorithmsforsequencesegmentation.InSIAMCon-ferenceonDataMining(SDM),Béthesda,MARYLAND,USA.KatrinTomanekandUdoHahn.2009.Semi-SupervisedActiveLearningforSequenceLabeling.InInterna-tionalJointConferenceonNaturalLanguageProcess-ing(IJCNLP),pages1039–1047,Singapore.KatrinTomanek,UdoHahn,andSteffenLohmann.2010.ACognitiveCostModelofAnnotationsBasedonEye-TrackingData.InAssociationforCompu-tationalLinguisticsConference(ACL),pages1158–1167,Uppsala,Sweden.PaoloTothandDanieleVigo.2001.TheVehicleRoutingProblem.SocietyforIndustrial&AppliedMathemat-ics(SIAM),Philadelphia.SudheendraVijayanarasimhanandKristenGrauman.2009.WhatsItGoingtoCostYou?:PredictingEf-fortvs.InformativenessforMulti-LabelImageAnno-tations.InConferenceonComputerVisionandPat-ternRecognition(CVPR),pages2262–2269,MiamiBeach,FL,USA.SudheendraVijayanarasimhan,PrateekJain,andKristenGrauman.2010.Far-sightedactivelearningonabud-getforimageandvideorecognition.InConferenceonComputerVisionandPatternRecognition(CVPR),pages3035–3042,SanFrancisco,Californie,Etats-Unis,Juin.Transactions of the Association for Computational Linguistics, 2 (2014) 169–180. Action Editor: Eric Fosler-Lussier. image
Transactions of the Association for Computational Linguistics, 2 (2014) 169–180. Action Editor: Eric Fosler-Lussier. image
Transactions of the Association for Computational Linguistics, 2 (2014) 169–180. Action Editor: Eric Fosler-Lussier. image
Transactions of the Association for Computational Linguistics, 2 (2014) 169–180. Action Editor: Eric Fosler-Lussier. image
Transactions of the Association for Computational Linguistics, 2 (2014) 169–180. Action Editor: Eric Fosler-Lussier. image
Transactions of the Association for Computational Linguistics, 2 (2014) 169–180. Action Editor: Eric Fosler-Lussier. image

Télécharger le PDF