Transactions of the Association for Computational Linguistics, 2 (2014) 169–180. Action Editor: Eric Fosler-Lussier. - IA de Investigación especializada en el MIT

Transacciones de la Asociación de Lingüística Computacional, 2 (2014) 169–180. Editor de acciones: Eric Fosler-Lussier.

Submitted 11/2013; Revisado 2/2014; Publicado 4/2014. C
(cid:13)

2014 Asociación de Lingüística Computacional.

SegmentationforEfﬁcientSupervisedLanguageAnnotationwithanExplicitCost-UtilityTradeoffMatthiasSperber1,MirjamSimantzik2,GrahamNeubig3,SatoshiNakamura3,AlexWaibel11KarlsruheInstituteofTechnology,InstituteforAnthropomatics,Germany2MobileTechnologiesGmbH,Germany3NaraInstituteofScienceandTechnology,AHCLaboratory,Japanmatthias.sperber@kit.edu,mirjam.simantzik@jibbigo.com,neubig@is.naist.jps-nakamura@is.naist.jp,waibel@kit.eduAbstractInthispaper,westudytheproblemofmanu-allycorrectingautomaticannotationsofnatu-rallanguageinasefﬁcientamanneraspos-sible.Weintroduceamethodforautomati-callysegmentingacorpusintochunkssuchthatmanyuncertainlabelsaregroupedintothesamechunk,whilehumansupervisioncanbeomittedaltogetherforothersegments.Atradeoffmustbefoundforsegmentsizes.Choosingshortsegmentsallowsustoreducethenumberofhighlyconﬁdentlabelsthataresupervisedbytheannotator,whichisusefulbecausetheselabelsareoftenalreadycorrectandsupervisingcorrectlabelsisawasteofeffort.Incontrast,longsegmentsreducethecognitiveeffortduetocontextswitches.Ourmethodhelpsﬁndthesegmentationthatopti-mizessupervisionefﬁciencybydeﬁningusermodelstopredictthecostandutilityofsu-pervisingeachsegmentandsolvingacon-strainedoptimizationproblembalancingthesecontradictoryobjectives.Auserstudydemon-stratesnoticeablegainsoverpre-segmented,conﬁdence-orderedbaselinesontwonaturallanguageprocessingtasks:speechtranscrip-tionandwordsegmentation.1IntroductionManynaturallanguageprocessing(PNLP)tasksre-quirehumansupervisiontobeusefulinpractice,beittocollectsuitabletrainingmaterialortomeetsomedesiredoutputquality.Giventhehighcostofhumanintervention,howtominimizethesupervi-sioneffortisanimportantresearchproblem.Previ-ousworksinareassuchasactivelearning,postedit-(a)Itwasabrightcold(ellos)en(apron),y(a)clockswerestrikingthirteen.(b)Itwasabrightcold(ellos)en(apron),y(a)clockswerestrikingthirteen.(C)Itwasabrightcold(ellos)en(apron),y(a)clockswerestrikingthirteen.Figure1:Threeautomatictranscriptsofthesentence“ItwasabrightcolddayinApril,andtheclockswerestrik-ingthirteen”,withrecognitionerrorsinparentheses.Theunderlinedpartsaretobecorrectedbyahumanfor(a)oraciones,(b)palabras,o(C)theproposedsegmentation.ing,andinteractivepatternrecognitionhaveinves-tigatedthisquestionwithnotablesuccess(Settles,2008;Specia,2011;Gonz´alez-Rubioetal.,2010).Themostcommonframeworkforefﬁcientanno-tationintheNLPcontextconsistsoftraininganNLPsystemonasmallamountofbaselinedata,andthenrunningthesystemonunannotateddatatoestimateconﬁdencescoresofthesystem’spredictions(Set-tles,2008).Sentenceswiththelowestconﬁdencearethenusedasthedatatobeannotated(Figure1(a)).Sin embargo,ithasbeennotedthatwhentheNLPsysteminquestionalreadyhasrelativelyhighaccu-racy,annotatingentiresentencescanbewasteful,asmostwordswillalreadybecorrect(TomanekandHahn,2009;Neubigetal.,2011).Inthesecases,itispossibletoachievemuchhigherbeneﬁtperanno-tatedwordbyannotatingsub-sententialunits(Fig-ure1(b)).Sin embargo,asSettlesetal.(2008)pointout,sim-plymaximizingthebeneﬁtperannotatedinstanceisnotenough,astherealsupervisioneffortvaries

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2

/
t

a
C
_
a
_
0
0
1
7
4
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

170

1357911131517190246Segment lengthAvg. tiempo / instancia [segundo] Transcription taskWord segmentation taskFigure2:Averageannotationtimeperinstance,plottedoverdifferentsegmentlengths.Forbothtasks,theeffortclearlyincreasesforshortsegments.greatlyacrossinstances.Thisisparticularlyimpor-tantinthecontextofchoosingsegmentstoannotate,ashumanannotatorsheavilyrelyonsemanticsandcontextinformationtoprocesslanguage,andintu-itively,aconsecutivesequenceofwordscanbesu-pervisedfasterandmoreaccuratelythanthesamenumberofwordsspreadoutoverseverallocationsinatext.Thisintuitioncanalsobeseeninourempiri-caldatainFigure2,whichshowsthatforthespeechtranscriptionandwordsegmentationtasksdescribedlaterinSection5,shortsegmentshadalongeranno-tationtimeperword.Basedonthisfact,weargueitwouldbedesirabletopresenttheannotatorwithasegmentationofthedataintoeasilysupervisablechunksthatarebothlargeenoughtoreducethenum-berofcontextswitches,andsmallenoughtopreventunnecessaryannotation(Figure1(C)).Inthispaper,weintroduceanewstrategyfornat-urallanguagesupervisiontasksthatattemptstoop-timizesupervisionefﬁciencybychoosinganappro-priatesegmentation.Itreliesonausermodelthat,givenaspeciﬁcsegment,predictsthecostandtheutilityofsupervisingthatsegment.Giventhisusermodel,thegoalistoﬁndasegmentationthatmini-mizesthetotalpredictedcostwhilemaximizingtheutility.Webalancethesetwocriteriabydeﬁningaconstrainedoptimizationprobleminwhichonecri-terionistheoptimizationobjective,whiletheothercriterionisusedasaconstraint.Doingsoallowsspecifyingpracticaloptimizationgoalssuchas“re-moveasmanyerrorsaspossiblegivenalimitedtimebudget,”or“annotatedatatoobtainsomerequiredclassiﬁeraccuracyinaslittletimeaspossible.”Solvingthisoptimizationtaskiscomputationallydifﬁcult,anNP-hardproblem.Nevertheless,wedemonstratethatbymakingrealisticassumptionsaboutthesegmentlength,anoptimalsolutioncanbefoundusinganintegerlinearprogrammingfor-mulationformid-sizedcorpora,asarecommonforsupervisedannotationtasks.Forlargercorpora,weprovidesimpleheuristicstoobtainanapproximatesolutioninareasonableamountoftime.Experimentsovertwoexamplescenariosdemon-stratetheusefulnessofourmethod:Posteditingforspeechtranscription,andactivelearningforJapanesewordsegmentation.Ourmodelpredictsnoticeableefﬁciencygains,whichareconﬁrmedinexperimentswithhumanannotators.2ProblemDeﬁnitionThegoalofourmethodistoﬁndasegmentationoveracorpusofwordtokenswN1thatoptimizessupervisionefﬁciencyaccordingtosomepredictiveusermodel.Theusermodelisdenotedasasetoffunctionsul,k(wba)thatevaluateanypossiblesub-sequencewbaoftokensinthecorpusaccordingtocriterial2L,andsupervisionmodesk2K.Letusillustratethiswithanexample.Sperberetal.(2013)deﬁnedaframeworkforspeechtranscrip-tioninwhichaninitial,erroneoustranscriptiscre-atedusingautomaticspeechrecognition(ASR),andanannotatorcorrectsthetranscripteitherbycorrect-ingthewordsbykeyboard,byrespeakingthecon-tent,orbyleavingthewordsasis.Inthiscase,wecoulddeﬁneK={TYPE,RESPEAK,SKIP},eachconstantrepresentingoneofthesethreesupervisionmodes.Ourmethodwillautomaticallydeterminetheappropriatesupervisionmodeforeachsegment.Theusermodelinthisexamplemightevaluateev-erysegmentaccordingtotwocriteriaL,acostcrite-rion(intermsofsupervisiontime)andautilitycri-terion(intermsofnumberofremovederrors),whenusingeachmode.Intuitively,respeakingshouldbeassignedbothlowercost(becausespeakingisfasterthantyping),butalsolowerutilitythantypingonakeyboard(becauserespeakingrecognitionerrorscanoccur).TheSKIPmodedenotesthespecial,unsuper-visedmodethatalwaysreturns0costand0utility.Otherpossiblesupervisionmodesincludemul-tipleinputmodalities(Suhmetal.,2001),severalhumanannotatorswithdifferentexpertiseandcost

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2

/
t

a
C
_
a
_
0
0
1
7
4
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

171

(DonmezandCarbonell,2008),andcorrectionvs.translationfromscratchinmachinetranslation(Spe-cia,2011).Similarmente,costcouldinsteadbeex-pressedinmonetaryterms,ortheutilityfunctioncouldpredicttheimprovementofaclassiﬁerwhentheresultingannotationisnotintendedfordirecthu-manconsumption,butastrainingdataforaclassiﬁerinanactivelearningframework.3OptimizationFrameworkGiventhissetting,weareinterestedinsimulta-neouslyﬁndingoptimallocationsandsupervisionmodesforallsegments,accordingtothegivencri-teria.Eachresultingsegmentwillbeassignedex-actlyoneofthesesupervisionmodes.Wede-noteasegmentationoftheNtokensofcorpuswN1intoMNsegmentsbyspecifyingsegmentbound-arymarkerssM+11=(s1=1,s2,…,sM+1=N+1).Settingaboundarymarkersi=ameansthatweputasegmentboundarybeforethea-thwordto-ken(ortheend-of-corpusmarkerfora=N+1).Thusourcorpusissegmentedintotokensequences[(wsj,…,wsj+11)]Mj=1.Thesupervisionmodesassignedtoeachsegmentaredenotedbymj.Wefavorthosesegmentationsthatminimizethecumu-lativevaluePMj=1[ul,mj(wsj+1sj)]foreachcriterionl.Foranycriterionwherelargervaluesareintuitivelybetter,weﬂipthesignbeforedeﬁningul,mj(wsj+1sj)tomaintainconsistency(e.g.negativenumberofer-rorsremoved).3.1MultipleCriteriaOptimizationInthecaseofasinglecriterion(|l|= 1),weobtainasimple,single-objectiveunconstrainedlinearopti-mizationproblem,efﬁcientlysolvableviadynamicprogramming(TerziandTsaparas,2006).Sin embargo,inpracticeoneusuallyencountersseveralcompet-ingcriteria,suchascostandutility,andherewewillfocusonthismorerealisticsetting.Webalancecompetingcriteriabyusingoneasanoptimizationobjective,andtheothersasconstraints.1Letcrite-1Thisapproachisknownastheboundedobjectivefunctionmethodinmulti-objectiveoptimizationliterature(MarlerandArora,2004).Theverypopularweightedsummethodmergescriteriaintoasingleefﬁciencymeasure,butisproblematicinourcasebecausethenumberofsupervisedtokensisunspec-iﬁed.Unlesstheweightsarecarefullychosen,thealgorithmmightﬁnd,e.g.,thecompletelyunsupervisedorcompletelysu-(en)%(what’s)%a%bright%…%[RESPEAK:1.5/2]/[SKIP:0/0]/1/cold%2/3/4/5/6/[TYPE:2/5]/[TYPE:1/4]/[TYPE:1/4]/[RESPEAK:0/3]/[SKIP:0/0]/Figure3:Excerptofasegmentationgraphforanex-ampletranscriptiontasksimilartoFigure1(someedgesareomittedforreadability).Edgesarelabeledwiththeirmode,predictednumberoferrorsthatcanberemoved,andnecessarysupervisiontime.Asegmentationschememightprefersolidedgesoverdashedonesinthisexam-ple.rionl0betheoptimizationobjectivecriterion,andletCldenotetheconstrainingconstantsforthecri-terial2Ll0=L\{l0}.Westatetheoptimizationproblem:minM;sM+11;mM1MXj=1⇥ul0,mjwsj+1sj⇤s.t.MXj=1⇥ul,mjwsj+1sj⇤Cl(8l2Ll0)Thisconstrainedoptimizationproblemisdifﬁculttosolve.Infact,theNP-hardmultiple-choiceknap-sackproblem(Pisinger,1994)correspondstoaspe-cialcaseofourprobleminwhichthenumberofseg-mentsisequaltothenumberoftokens,implyingthatourmoregeneralproblemisNP-hardaswell.Inordertoovercomethisproblem,werefor-mulatesearchfortheoptimalsegmentationasaresource-constrainedshortestpathprobleminadi-rected,acyclicmultigraph.Whilestillnotefﬁcientlysolvableintheory,thisproblemiswellstudiedindomainssuchasvehicleroutingandcrewschedul-ing(IrnichandDesaulniers,2005),anditisknownthatinmanypracticalsituationstheproblemcanbesolvedreasonablyefﬁcientlyusingintegerlinearprogrammingrelaxations(TothandVigo,2001).Inourformalism,thesetofnodesVrepresentsthespacesbetweenneighboringtokens,atwhichthealgorithmmayinsertsegmentboundaries.Anodewithindexirepresentsasegmentbreakbeforethei-thtoken,andthusthesequenceoftheindicesinapathdirectlycorrespondstosM+11.EdgesEde-notethegroupingoftokensbetweentherespectivepervisedsegmentationtobemost“efﬁcient.”

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2

/
t

a
C
_
a
_
0
0
1
7
4
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

172

nodesintoonesegment.Edgesarealwaysdirectedfromlefttoright,andlabeledwithasupervisionmode.Inaddition,eachedgebetweennodesiandjisassignedul,k(wj1i),thecorrespondingpredictedvalueforeachcriterionl2Landsupervisionmodek2K,indicatingthatthesupervisionmodeofthej-thsegmentinapathdirectlycorrespondstomj.Figure3showsanexampleofwhattheresult-inggraphmaylooklike.Ouroriginaloptimizationproblemisnowequivalenttoﬁndingtheshortestpathbetweentheﬁrstandlastnodesaccordingtocriterionl0,whileobeyingthegivenresourcecon-straints.Accordingtoawidelyusedformulationfortheresourceconstrainedshortestpathproblem,wecandeﬁneEijasthesetofcompetingedgesbetweeniandj,andexpressthisoptimizationproblemwiththefollowingintegerlinearprogram(ILP):minxXi,j2VXk2Eijxijkul0,k(sj1i)(1)s.t.Xi,j2VXk2Eijxijkul,k(sj1i)Cl(8l2Ll0)(2)Xi2Vk2Eijxijk=Xi2Vk2Eijxjik(8j2V\{1,norte})(3)Xj2Vk2E1jx1jk=1(4)Xi2Vk2Einxink=1(5)xijk2{0,1}(8xijk2x)(6)Thevariablesx={xijk|i,j2V,k2Eij}denotetheactivationofthek’thedgebetweennodesiandj.Theshortestpathaccordingtotheminimizationobjective(1),thatstillmeetstheresourceconstraintsforthespeciﬁedcriteria(2),istobecomputed.Thedegreeconstraints(3,4,5)specifythatallbuttheﬁrstandlastnodesmusthaveasmanyincomingasout-goingedges,whiletheﬁrstnodemusthaveexactlyoneoutgoing,andthelastnodeexactlyoneincom-ingedge.Finally,theintegralitycondition(6)forcesalledgestobeeitherfullyactivatedorfullydeacti-vated.Theoutlinedproblemformulationcansolveddirectlybyusingoff-the-shelfILPsolvers,hereweemployGUROBI(GurobiOptimization,2012).3.2HeuristicsforApproximationIngeneral,edgesareinsertedforeverysupervisionmodebetweeneverycombinationoftwonodes.Thesearchspacecanbeconstrainedbyremovingsomeoftheseedgestoincreaseefﬁciency.Inthisstudy,weonlyconsideredgesspanningatmost20tokens.Forcasesinwhichlargercorporaaretobeanno-tated,orwhentheacceptabledelayfordeliveringre-sultsissmall,asuitablesegmentationcanbefoundapproximately.Theeasiestwaywouldbetoparti-tionthecorpus,e.g.accordingtoitsindividualdoc-uments,dividethebudgetconstraintsevenlyacrossallpartitions,andthensegmenteachpartitioninde-pendently.Moresophisticatedmethodsmightap-proximatetheParetofrontforeachpartition,anddistributethebudgetsinanintelligentway.4UserModelingWhiletheproposedframeworkisabletooptimizethesegmentationwithrespecttoeachcriterion,italsorestsupontheassumptionthatwecanprovideusermodelsul,k(wj1i)thataccuratelyevaluateev-erysegmentaccordingtothespeciﬁedcriteriaandsupervisionmodes.Inthissection,wediscussourstrategiesforestimatingthreeconceivablecriteria:annotationcost,correctionoferrors,andimprove-mentofaclassiﬁer.4.1AnnotationCostModelingModelingcostrequiressolvingaregressionprob-lemfromfeaturesofacandidatesegmenttoannota-tioncost,forexampleintermsofsupervisiontime.Appropriateinputfeaturesdependonthetask,butshouldincludenotionsofcomplexity(e.g.aconﬁ-dencemeasure)andlengthofthesegment,asbothareexpectedtostronglyinﬂuencesupervisiontime.WeproposeusingGaussianprocess(médico de cabecera)regres-sionforcostprediction,astart-of-the-artnonpara-metricBayesianregressiontechnique(RasmussenandWilliams,2006)2.AsreportedonasimilartaskbyCohnandSpecia(2013),andconﬁrmedbyourpreliminaryexperiments,GPregressionsigniﬁ-cantlyoutperformspopulartechniquessuchassup-2Codeavailableathttp://www.gaussianprocess.org/gpml/

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2

/
t

a
C
_
a
_
0
0
1
7
4
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

173

portvectorregressionandleast-squareslinearre-gression.WealsofollowtheirsettingsforGP,em-ployingGPregressionwithasquaredexponentialkernelwithautomaticrelevancedetermination.De-pendingonthenumberofusersandamountoftrain-ingdataavailableforeachuser,modelsmaybetrainedseparatelyforeachuser(aswedohere),orinacombinedfashionviamulti-tasklearningaspro-posedbyCohnandSpecia(2013).Itisalsocrucialforthepredictionstobereliablethroughoutthewholerelevantspaceofsegments.Ifthecostofcertaintypesofsegmentsissystem-aticallyunderpredicted,thesegmentationalgorithmmightbemisledtopreferthese,possiblyalargenumberoftimes.3Aneffectivetricktopreventsuchunderpredictionsistopredictthelogtimeinsteadoftheactualtime.Inthisway,errorsinthecriticallowendarepenalizedmorestrongly,andthetimecanneverbecomenegative.4.2ErrorCorrectionModelingAsoneutilitymeasure,wecanusethenumberoferrorscorrected,ausefulmeasureforposteditingtasksoverautomaticallyproducedannotations.Inordertomeasurehowmanyerrorscanberemovedbysupervisingaparticularsegment,wemustes-timatebothhowmanyerrorsareintheautomaticannotation,andhowreliablyahumancanremovetheseforagivensupervisionmode.Mostmachinelearningtechniquescanestimateconﬁdencescoresintheformofposteriorprobabil-ities.Toestimatethenumberoferrors,wecansumoveroneminustheposteriorforalltokens,whichestimatestheHammingdistancefromthereferenceannotation.Thismeasureisappropriatefortasksinwhichthenumberoftokensisﬁxedinadvance(e.g.apart-of-speechestimationtask),andareasonableapproximationfortasksinwhichthenumberofto-kensisnotknowninadvance(e.g.speechtranscrip-tion,cf.Section5.1.1).Predictingtheparticulartokensatwhichahumanwillmakeamistakeisknowntobeadifﬁculttask(OlsonandOlson,1990),butasimplifyingconstant3Forinstance,consideramodelthatpredictswellforseg-mentsofmediumsizeorlonger,butunderpredictsthesupervi-siontimeofsingle-tokensegments.Thismayleadthesegmen-tationalgorithmtoputeverytokenintoitsownsegment,whichisclearlyundesirable.humanerrorratecanstillbeuseful.Forexample,inthetaskfromSection2,wemaysuspectacertainnumberoferrorsinatranscriptsegment,andpredict,decir,95%ofthoseerrorstoberemovedviatyping,butonly85%viarespeaking.4.3ClassiﬁerImprovementModelingAnotherreasonableutilitymeasureisaccuracyofaclassiﬁertrainedonthedatawechoosetoannotateinanactivelearningframework.Conﬁdencescoreshavebeenfoundusefulforrankingparticulartokenswithregardstohowmuchtheywillimproveaclas-siﬁer(Settles,2008).Aquí,wemaysimilarlyscoresegmentutilityasthesumofitstokenconﬁdences,althoughcaremustbetakentonormalizeandcali-bratethetokenconﬁdencestobelinearlycompara-blebeforedoingso.Whiletheresultingutilityscorehasnointerpretationinabsoluteterms,itcanstillbeusedasanoptimizationobjective(cf.Section5.2.1).5ExperimentsInthissection,wepresentexperimentalresultsex-aminingtheeffectivenessoftheproposedmethodovertwotasks:speechtranscriptionandJapanesewordsegmentation.45.1SpeechTranscriptionExperimentsAccuratespeechtranscriptsareamuch-demandedNLPproduct,usefulbythemselves,astrainingma-terialforASR,orasinputforfollow-uptaskslikespeechtranslation.Withrecognitionaccuraciesplateauing,manuallycorrecting(postediting)auto-maticspeechtranscriptshasbecomepopular.Com-monapproachesaretoidentifywords(Sanchez-Cortinaetal.,2012)o(sub-)oraciones(Sperberetal.,2013)oflowconﬁdence,andhaveahumanedi-torcorrectthese.5.1.1ExperimentalSetupWeconductauserstudyinwhichparticipantspost-editedspeechtranscripts,givenaﬁxedgoalworderrorrate.ThetranscriptionsetupwassuchthatthetranscribercouldseetheASRtranscriptofpartsbeforeandafterthesegmentthathewasedit-ing,providingcontextifneeded.Whenimprecisetimealignmentresultedinsegmentbreaksthatwere4Softwareandexperimentaldatacanbedownloadedfromhttp://www.msperber.com/research/tacl-segmentation/

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2

/
t

a
C
_
a
_
0
0
1
7
4
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

174

slightly“off,”ashappenedoccasionally,thatcontexthelpedguesswhatwassaid.Thesegmentitselfwastranscribedfromscratch,asopposedtoeditingtheASRtranscript;besidesbeingarguablymoreefﬁ-cientwhentheASRtranscriptcontainsmanymis-takes(Nanjoetal.,2006;Akitaetal.,2009),prelim-inaryexperimentsalsoshowedthatsupervisiontimeisfareasiertopredictthisway.Figure4illustrateswhatthesetuplookedlike.Weusedaself-developedtranscriptiontooltoconductexperiments.Itpresentsourcomputedseg-mentsonebyone,allowsconvenientinputandplay-backviakeyboardshortcuts,andlogsuserinterac-tionswiththeirtimestamps.AselectionofTEDtalks5(Englishtalksontechnology,entretenimiento,anddesign)servedasexperimentaldata.Whilesomeofthesetalkscontainjargonsuchasmedi-calterms,theyarepresentedbyskilledspeakers,makingthemcomparablyeasytounderstand.InitialtranscriptswerecreatedusingtheJanusrecognitiontoolkit(Soltauetal.,2001)withastandard,TED-optimizedsetup.Weusedconfusionnetworksfordecodingandobtainingconﬁdencescores.Forreasonsofsimplicity,andbettercompara-bilitytoourbaseline,werestrictedourexperimenttotwosupervisionmodes:TYPEandSKIP.Weconductedexperimentswith3participants,1withseveralyearsofexperienceintranscription,2withnone.Eachparticipantreceivedanexplanationonthetranscriptionguidelines,andashorthands-ontrainingtolearntouseourtool.Next,theytran-scribedabalancedselectionof200segmentsofvaryinglengthandqualityinrandomorder.Thisdatawasusedtotraintheusermodels.Finally,eachparticipanttranscribedanother2TEDtalks,withworderrorrate(WER)19.96%(predicted:22.33%).Wesetatarget(predicted)WERof15%asouroptimizationconstraint,6andminimizethepredictedsupervisiontimeasourob-jectivefunction.BothTEDtalksweretranscribedonceusingthebaselinestrategy,andonceusingtheproposedstrategy.Theorderofbothstrategieswasreversedbetweentalks,tominimizelearningbiasduetotranscribingeachtalktwice.Thebaselinestrategywasadoptedaccordingto5www.ted.com6Dependingonthelevelofaccuracyrequiredbyourﬁnalapplication,thistargetmaybesetlowerorhigher.Sperberetal.(2013):Wesegmentedthetalkintonatural,subsententialunits,usingMatusovetal.(2006)’ssegmenter,whichwetunedtoreproducetheTEDsubtitlesegmentation,producingameansegmentlengthof8.6words.Segmentswereaddedinorderofincreasingaveragewordconﬁdence,untiltheusermodelpredictedaWER<15%.Thesecondsegmentationstrategywastheproposedmethod,similarlywitharesourceconstraintofWER<15%.SupervisiontimewaspredictedviaGPregres-sion(cf.Section4.1),usingsegmentlength,au-dioduration,andmeanconﬁdenceasinputfeatures.Theoutputvariablewasassumedsubjecttoaddi-tiveGaussiannoisewithzeromean,avarianceof5secondswaschosenempiricallytominimizethemeansquarederror.Utilityprediction(cf.Section4.2)wasbasedonposteriorscoresobtainedfromtheconfusionnetworks.Wefounditimportanttocalibratethem,astheposteriorswereoverconﬁdentespeciallyintheupperrange.Todoso,weautomat-icallytranscribedadevelopmentsetofTEDdata,groupedtherecognizedwordsintobucketsaccord-ingtotheirposteriors,anddeterminedtheaveragenumberoferrorsperwordineachbucketfromanalignmentwiththereferencetranscript.Themap-pingfromaverageposteriortoaveragenumberoferrorswasestimatedviaGPregression.Theresultwassummedoveralltokens,andmultipliedbyaconstanthumanconﬁdence,separatelydeterminedforeachparticipant.75.1.2SimulationResultsToconveyabetterunderstandingofthepoten-tialgainsaffordedbyourmethod,weﬁrstpresentasimulatedexperiment.Weassumeatranscriberwhomakesnomistakes,andneedsexactlytheamountoftimepredictedbyausermodeltrainedonthedataofarandomlyselectedparticipant.Wecomparethreescenarios:Abaselinesimulation,inwhichthebase-linesegmentsaretranscribedinascendingorderofconﬁdence;asimulationusingtheproposedmethod,inwhichwechangetheWERconstraintinsmallin-crements;ﬁnally,anoraclesimulation,whichuses7MoreelaboratemethodsforWERestimationexist,suchasbyOgawaetal.(2013),butifourmethodachievesimprove-mentsusingsimpleHammingdistance,incorporatingmoreso-phisticatedmeasureswilllikelyachievesimilar,orevenbetteraccuracy. yo D oh w norte oh a d mi d F r oh metro h t t pag : / / d i r mi C t . metro i t . mi d tu / t a C yo / yo a r t i C mi - pag d F / d oh i / . 1 0 1 1 6 2 / t yo a C _ a _ 0 0 1 7 4 1 5 6 6 8 6 2 / / t yo a C _ a _ 0 0 1 7 4 pag d . F b y gramo tu mi s t t oh norte 0 9 S mi pag mi metro b mi r 2 0 2 3 175 (3)SKIP:“nineteenfortysixuntiltodayyouseethegreen”(4)TYPE:(5)SKIP:“Interstateconﬂict”(6)TYPE:(7)SKIP:…Figure4:Resultofoursegmentationmethod(excerpt).TYPEsegmentsaredisplayedemptyandshouldbetran-scribedfromscratch.ForSKIPsegments,theASRtran-scriptisdisplayedtoprovidecontext.Whenannotatingasegment,thecorrespondingaudioisplayedback.01020304050600510152025Post editing time [mín.]Resulting WER [%] BaselineProposedOracleFigure5:SimulationofposteditingonexampleTEDtalk.TheproposedmethodreducestheWERconsider-ablyfasterthanthebaselineatﬁrst,laterbothconverge.Themuchsuperiororaclesimulationindicatesroomforfurtherimprovement.theproposedmethod,butusesautilitymodelthatknowstheactualnumberoferrorsineachsegment.Foreachsupervisedsegment,wesimplyreplacetheASRoutputwiththereference,andmeasurethere-sultingWER.Figure5showsthesimulationonanexampleTEDtalk,basedonaninitialtranscriptwith21.9%WER.TheproposedmethodisabletoreducetheWERfasterthanthebaseline,uptoacertainpointwheretheyconverge.Theoraclesimulationisevenfaster,indicatingroomforimprovementthroughbetterconﬁdencescores.5.1.3UserStudyResultsTable1showstheresultsoftheuserstudy.First,wenotethattheWERestimationbyourutilitymodelwasoffbyabout2.5%:WhilethepredictedimprovementinWERwasfrom22.33%to15.0%,theactualimprovementwasfrom19.96%toabout12.5%.TheactualresultingWERwasconsistentParticipantBaselineProposedWERTimeWERTimeP112.2644:0512.1833:01P212.7536:1912.7729:54P312.7052:4212.5037:57AVG12.5744:2212.4833:37Table1:Transcriptiontaskresults.Foreachuser,theresultingWER[%]aftersupervisionisshown,alongwiththetime[mín.]theyneeded.TheunsupervisedWERwas19.96%.acrossallusers,andweobservestrong,consistentreductionsinsupervisiontimeforallparticipants.Predictionofthenecessarysupervisiontimewasac-curate:Averagedoverparticipants,45:41minuteswerepredictedforthebaseline,44:22minutesmea-sured.Fortheproposedmethod,32:11minuteswerepredicted,33:37minutesmeasured.Onaverage,participantsremoved6.68errorsperminuteusingthebaseline,and8.93errorsperminuteusingtheproposedmethod,aspeed-upof25.2%.Notethatpredictedandmeasuredvaluesarenotstrictlycomparable:Intheexperiments,toprovideafaircomparisonparticipantstranscribedthesametalkstwice(onceusingbaseline,oncetheproposedmethod,inalternatingorder),resultinginanotice-ablelearningeffect.Theusermodel,por otro lado,istrainedtopredictthecaseinwhichatran-scriberconductsonlyonetranscriptionpass.Asaninterestingﬁnding,withoutbeinginformedabouttheorderofbaselineandproposedmethod,participantsreportedthattranscribingaccordingtotheproposedsegmentationseemedharder,astheyfoundthebaselinesegmentationmorelinguisticallyreasonable.However,thisperceivedincreaseindif-ﬁcultydidnotshowinefﬁciencynumbers.5.2JapaneseWordSegmentationExperimentsWordsegmentationistheﬁrststepinNLPforlan-guagesthatarecommonlywrittenwithoutwordboundaries,suchasJapaneseandChinese.Weap-plyourmethodtoataskinwhichwedomain-adaptawordsegmentationclassiﬁerviaactivelearning.Inthisexperiment,participantsannotatedwhetherornotawordboundaryoccurredatcertainpositionsinaJapanesesentence.Thetokenstobegroupedintosegmentsarepositionsbetweenadjacentcharacters.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2

/
t

a
C
_
a
_
0
0
1
7
4
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

176

5.2.1ExperimentalSetupNeubigetal.(2011)haveproposedapointwisemethodforJapanesewordsegmentationthatcanbetrainedusingpartiallyannotatedsentences,whichmakesitattractiveincombinationwithactivelearn-ing,aswellasoursegmentationmethod.Theauthorsreleasedtheirmethodasasoftwarepack-age“KyTea”thatweemployedinthisuserstudy.WeusedKyTea’sactivelearningdomainadaptationtoolkit8asabaseline.Fordata,weusedtheBalancedCorpusofCon-temporaryWrittenJapanese(BCCWJ),createdbyMaekawa(2008),withtheinternetQ&Asubcor-pusasin-domaindata,andthewhitepapersubcor-pusasbackgrounddata,adomainadaptationsce-nario.Sentencesweredrawnfromthein-domaincorpus,andthemanuallyannotateddatawasthenusedtotrainKyTea,alongwiththepre-annotatedbackgrounddata.Thegoal(objectivefunction)wastoimproveKyTea’sclassiﬁcationaccuracyonanin-domaintestset,givenaconstrainedtimebudgetof30minutes.Therewereagain2supervisionmodes:ANNOTATEandSKIP.Notethatthisisessentiallyabatchactivelearningsetupwithonlyoneiteration.WeconductedexperimentswithoneexpertwithseveralyearsofexperiencewithJapanesewordseg-mentationannotation,andthreenon-expertnativespeakerswithnopriorexperience.Japanesewordsegmentationisnotatrivialtask,soweprovidednon-expertswithtraining,includingexplanationofthesegmentationstandard,asupervisedtestwithimmediatefeedbackandexplanations,andhands-ontrainingtogetusedtotheannotationsoftware.SupervisiontimewaspredictedviaGPregression(cf.Section4.1),usingthesegmentlengthandmeanconﬁdenceasinputfeatures.Asbefore,theoutputvariablewasassumedsubjecttoadditiveGaussiannoisewithzeromeanand5secondsvariance.Toob-taintrainingdataforthesemodels,eachparticipantannotatedabout500exampleinstances,drawnfromtheadaptationcorpus,groupedintosegmentsandbalancedregardingsegmentlengthanddifﬁculty.Forutilitymodeling(cf.Section4.3),weﬁrstnor-malizedKyTea’sconﬁdencescores,whicharegivenintermsofSVMmargin,usingasigmoidfunction(Platón,1999).Thenormalizationparameterwasse-8http://www.phontron.com/kytea/active.htmllectedsothatthemeanconﬁdenceonadevelopmentsetcorrespondedtotheactualclassiﬁeraccuracy.Wederiveourmeasureofclassiﬁerimprovementforcorrectingasegmentbysummingoveroneminusthecalibratedconﬁdenceforeachofitstokens.Toanalyzehowwellthismeasuredescribestheactualtrainingutility,wetrainedKyTeausingtheback-grounddataplusdisjointgroupsof100in-domaininstanceswithsimilarprobabilitiesandmeasuredtheachievedreductionofpredictionerrors.Thecor-relationbetweeneachgroup’smeanutilityandtheachievederrorreductionwas0.87.Notethatweig-norethedecayingreturnsusuallyobservedasmoredataisaddedtothetrainingset.Also,wedidnotattempttomodelusererrors.Employingacon-stantbaseerrorrate,asinthetranscriptionscenario,wouldchangesegmentutilitiesonlybyaconstantfactor,withoutchangingtheresultingsegmentation.Aftercreatingtheusermodels,weconductedthemainexperiment,inwhicheachparticipantanno-tateddatathatwasselectedfromapoolof1000in-domainsentencesusingtwostrategies.Theﬁrst,baselinestrategywasasproposedbyNeubigetal.(2011).Queriesarethoseinstanceswiththelow-estconﬁdencescores.Eachqueryisthenextendedtotheleftandright,untilawordboundaryispre-dicted.Thisstrategyfollowssimilarreasoningaswasthepremisetothispaper:Todecidewhetherornotapositioninatextcorrespondstoawordbound-ary,theannotatorhastoacquiresurroundingcontextinformation.Thiscontextacquisitionisrelativelytimeconsuming,sohemightaswelllabelthesur-roundinginstanceswithlittleadditionaleffort.Thesecondstrategywasourproposed,moreprincipledapproach.Queriesofbothmethodswereshufﬂedtominimizebiasduetolearningeffects.Finally,wetrainedKyTeausingtheresultsofbothmethods,andcomparedtheachievedclassiﬁerimprovementandsupervisiontimes.5.2.2UserStudyResultsTable2summarizestheresultsofourexperi-ment.Itshowsthattheannotationsbyeachpartic-ipantresultedinabetterclassiﬁerfortheproposedmethodthanthebaseline,butalsotookupconsider-ablymoretime,alessclearimprovementthanforthetranscriptiontask.Infact,thetotalerrorfortimepredictionswasashighas12.5%onaverage,

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2

/
t

a
C
_
a
_
0
0
1
7
4
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

177

ParticipantBaselineProposedTimeAcc.TimeAcc.Expert25:5096.1732:4596.55NonExp122:0595.7926:4495.98NonExp223:3796.1531:2896.21NonExp325:2396.3833:3696.45Table2:Wordsegmentationtaskresults,forourex-pertand3non-expertparticipants.Foreachparticipant,theresultingclassiﬁeraccuracy[%]aftersupervisionisshown,alongwiththetime[mín.]theyneeded.Theunsu-pervisedaccuracywas95.14%.wherethebaselinemethodtendedtakelesstimethanpredicted,theproposedmethodmoretime.Thisisincontrasttoamuchlowertotalerror(within1%)whencross-validatingourusermodeltrainingdata.Thisislikelyduetothefactthatthedatafortrain-ingtheusermodelwasselectedinabalancedman-ner,asopposedtoselectingdifﬁcultexamples,asourmethodispronetodo.Thus,wemayexpectmuchbetterpredictionswhenselectingusermodeltrainingdatathatismoresimilartothetestcase.Plottingclassiﬁeraccuracyoverannotationtimedrawsaclearerpicture.Letusﬁrstanalyzethere-sultsfortheexpertannotator.Figure6(E.1)showsthattheproposedmethodresultedinconsistentlybetterresults,indicatingthattimepredictionswerestilleffective.Notethatthiscomparisonmayputtheproposedmethodataslightdisadvantagebycom-paringintermediateresultsdespiteoptimizingglob-ally.Forthenon-experts,theimprovementoverthebaselineislessconsistent,ascanbeseeninFig-ure6(N.1)foronerepresentative.Accordingtoouranalysis,thiscanbeexplainedbytwofactors:(1)Thenon-experts’annotationerror(6.5%onav-erage)wasmuchhigherthantheexpert’s(2.7%),resultinginasomewhatirregularclassiﬁerlearn-ingcurve.(2)Thevarianceinannotationtimepersegmentwasconsistentlyhigherforthenon-expertsthantheexpert,indicatedbyanaverageper-segmentpredictionerrorof71%vs.58%rela-tivetothemeanactualvalue,respectively.Infor-mallyspeaking,non-expertsmademoremistakes,andweremorestronglyinﬂuencedbythedifﬁcultyofaparticularsegment(whichwashigheronav-eragewiththeproposedmethod,asindicatedbya01020300.9550.96501020300.9550.96501020300.9550.96501020300.9550.96501020300.9550.96501020300.9550.96501020300.9550.96501020300.9550.965 Prop.BaselN.1E.1N.2E.2N.3E.3N.4E.4Annotation time [min.]Classifier Accuracy.Figure6:Classiﬁerimprovementovertime,depictedfortheexpert(mi)andanon-expert(norte).Thegraphsshownumbersbasedon(1)actualannotationsandusermod-elsasinSections4.1and4.3,(2)error-freeannotations,(3)measuredtimesreplacedbypredictedtimes,y(4)bothreferenceannotationsandreplacedtimepredictions.loweraverageconﬁdence).9InFigures6(2-4)wepresentasimulationexperi-mentinwhichweﬁrstpretendasifannotatorsmadenomistakes,thenasiftheyneededexactlyasmuchtimeaspredictedforeachsegment,andthenboth.Thischeatingexperimentworksinfavorofthepro-posedmethod,especiallyforthenon-expert.Wemayconcludethatoursegmentationapproachisef-fectiveforthewordsegmentationtask,butrequiresmoreaccuratetimepredictions.Betterusermodelswillcertainlyhelp,althoughforthepresentedsce-narioourmethodmaybemostusefulforanexpertannotator.9Notethatthenon-expertintheﬁgureannotatedmuchfasterthantheexpert,whichexplainsthecomparableclassiﬁcationresultdespitemakingmoreannotationerrors.Thisisincontrasttotheothernon-experts,whowereslower.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2

/
t

a
C
_
a
_
0
0
1
7
4
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

178

5.3ComputationalEfﬁciencySinceoursegmentationalgorithmdoesnotguar-anteepolynomialruntime,computationalefﬁciencywasaconcern,butdidnotturnoutproblematic.Onaconsumerlaptop,thesolverproducedseg-mentationswithinafewsecondsforasingledocu-mentcontainingseveralthousandtokens,andwithinhoursforcorporaconsistingofseveraldozendoc-uments.Runtimeincreasedroughlyquadraticallywithrespecttothenumberofsegmentedtokens.Wefeelthatthisisacceptable,consideringthatthetimeneededforhumansupervisionwilllikelydominatethecomputationtime,andreasonableapproxima-tionscanbemadeasnotedinSection3.2.6RelationtoPriorWorkEfﬁcientsupervisionstrategieshavebeenstudiedacrossavarietyofNLP-relatedresearchareas,andreceivedincreasingattentioninrecentyears.Ex-amplesincludeposteditingforspeechrecogni-tion(Sanchez-Cortinaetal.,2012),interactivema-chinetranslation(Gonz´alez-Rubioetal.,2010),ac-tivelearningformachinetranslation(Haffarietal.,2009;Gonz´alez-Rubioetal.,2011)andmanyotherNLPtasks(Olsson,2009),tonamebutafewstudies.Ithasalsobeenrecognizedbytheactivelearn-ingcommunitythatcorrectingthemostusefulpartsﬁrstisoftennotoptimalintermsofefﬁciency,sincethesepartstendtobethemostdifﬁculttomanuallyannotate(Settlesetal.,2008).Theauthorsadvocatetheuseofausermodeltopredictthesupervisionef-fort,andselecttheinstanceswithbest“bang-for-the-buck.”Thispredictionofsupervisioneffortwassuc-cessful,andwasfurtherreﬁnedinotherNLP-relatedstudies(Tomaneketal.,2010;Specia,2011;CohnandSpecia,2013).OurapproachtousermodelingusingGPregressionisinspiredbythelatter.Moststudiesonusermodelsconsideronlysuper-visioneffort,whileneglectingtheaccuracyofhu-manannotations.Theviewonhumansasaperfectoraclehasbeencriticized(DonmezandCarbonell,2008),sincehumanerrorsarecommonandcannegativelyaffectsupervisionutility.Researchonhuman-computer-interactionhasidentiﬁedthemod-elingofhumanerrorsasverydifﬁcult(OlsonandOlson,1990),dependingonfactorssuchasuserex-perience,cognitiveload,userinterfacedesign,andfatigue.Nevertheless,eventhesimpleerrormodelusedinourposteditingtaskwaseffective.Theactivelearningcommunityhasaddressedtheproblemofbalancingutilityandcostinsomemoredetail.Thepreviouslyreported“bang-for-the-buck”approachisaverysimple,greedyapproachtocom-binebothintoonemeasure.Amoretheoreticallyfoundedscalaroptimizationobjectiveisthenetben-eﬁt(utilityminuscosts)asproposedbyVijaya-narasimhanandGrauman(2009),butunfortunatelyisrestrictedtoapplicationswherebothcanbeex-pressedintermsofthesamemonetaryunit.Vijaya-narasimhanetal.(2010)andDonmezandCarbonell(2008)useamorepracticalapproachthatspeciﬁesaconstrainedoptimizationproblembyallowingonlyalimitedtimebudgetforsupervision.Ourapproachisageneralizationthereofandallowseitherspecify-inganupperboundonthepredictedcost,oralowerboundonthepredictedutility.Themainnoveltyofourpresentedapproachistheexplicitmodelingandselectionofsegmentsofvarioussizes,suchthatannotationefﬁciencyisopti-mizedaccordingtothespeciﬁedconstraints.Whilesomeworks(SassanoandKurohashi,2010;Neubigetal.,2011)haveproposedusingsubsententialseg-ments,wearenotawareofanypreviousworkthatexplicitlyoptimizesthatsegmentation.7ConclusionWepresentedamethodthatcaneffectivelychooseasegmentationofalanguagecorpusthatoptimizessupervisionefﬁciency,consideringnotonlytheac-tualusefulnessofeachsegment,butalsotheanno-tationcost.Wereportednoticeableimprovementsoverstrongbaselinesintwouserstudies.Futureuserexperimentswithmoreparticipantswouldbedesir-abletoverifyourobservations,andallowfurtheranalysisofdifferentfactorssuchasannotatorex-pertise.Also,futureresearchmayimprovetheusermodeling,whichwillbebeneﬁcialforourmethod.AcknowledgmentsTheresearchleadingtotheseresultshasreceivedfundingfromtheEuropeanUnionSeventhFrame-workProgramme(FP7/2007-2013)undergrantagreementn287658BridgesAcrosstheLanguageDivide(EU-BRIDGE).

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2

/
t

a
C
_
a
_
0
0
1
7
4
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

179

ReferencesYuyaAkita,MasatoMimura,andTatsuyaKawahara.2009.AutomaticTranscriptionSystemforMeetingsoftheJapaneseNationalCongress.InInterspeech,pages84–87,Brighton,UK.TrevorCohnandLuciaSpecia.2013.ModellingAnno-tatorBiaswithMulti-taskGaussianProcesses:AnAp-plicationtoMachineTranslationQualityEstimation.InAssociationforComputationalLinguisticsConfer-ence(LCA),Soﬁa,Bulgaria.PinarDonmezandJaimeCarbonell.2008.ProactiveLearning:Cost-SensitiveActiveLearningwithMul-tipleImperfectOracles.InConferenceonInformationandKnowledgeManagement(CIKM),pages619–628,NapaValley,California,USA.Jes´usGonz´alez-Rubio,DanielOrtiz-Mart´ınez,andFran-ciscoCasacuberta.2010.BalancingUserEffortandTranslationErrorinInteractiveMachineTranslationViaConﬁdenceMeasures.InAssociationforCompu-tationalLinguisticsConference(LCA),ShortPapersTrack,pages173–177,Uppsala,Sweden.Jes´usGonz´alez-Rubio,DanielOrtiz-Mart´ınez,andFran-ciscoCasacuberta.2011.Anactivelearningscenarioforinteractivemachinetranslation.InInternationalConferenceonMultimodalInterfaces(ICMI),pages197–200,Alicante,Spain.GurobiOptimization.2012.GurobiOptimizerRefer-enceManual.GholamrezaHaffari,MaximRoy,andAnoopSarkar.2009.ActiveLearningforStatisticalPhrase-basedMachineTranslation.InNorthAmericanChapteroftheAssociationforComputationalLinguistics-HumanLanguageTechnologiesConference(NAACL-HLT),pages415–423,Boulder,CO,USA.StefanIrnichandGuyDesaulniers.2005.ShortestPathProblemswithResourceConstraints.InColumnGen-eration,pages33–65.SpringerUS.KikuoMaekawa.2008.BalancedCorpusofContem-poraryWrittenJapanese.InInternationalJointCon-ferenceonNaturalLanguageProcessing(Ijcnlp),pages101–102,Hyderabad,India.R.TimothyMarlerandJasbirS.Arora.2004.Surveyofmulti-objectiveoptimizationmethodsforengineer-ing.StructuralandMultidisciplinaryOptimization,26(6):369–395,April.EvgenyMatusov,ArneMauser,andHermannNey.2006.AutomaticSentenceSegmentationandPunctuationPredictionforSpokenLanguageTranslation.InInter-nationalWorkshoponSpokenLanguageTranslation(IWSLT),pages158–165,Kyoto,Japan.HiroakiNanjo,YuyaAkita,andTatsuyaKawahara.2006.ComputerAssistedSpeechTranscriptionSys-temforEfﬁcientSpeechArchive.InWesternPaciﬁcAcousticsConference(WESPAC),Seoul,Korea.GrahamNeubig,YosukeNakata,andShinsukeMori.2011.PointwisePredictionforRobust,Adapt-ableJapaneseMorphologicalAnalysis.InAssocia-tionforComputationalLinguistics:HumanLanguageTechnologiesConference(ACL-HLT),pages529–533,Portland,O,USA.AtsunoriOgawa,TakaakiHori,andAtsushiNaka-mura.2013.DiscriminativeRecognitionRateEsti-mationForN-BestListandItsApplicationToN-BestRescoring.InInternationalConferenceonAcoustics,Discurso,andSignalProcessing(ICASSP),pages6832–6836,Vancouver,Canada.JudithReitmanOlsonandGaryOlson.1990.TheGrowthofCognitiveModelinginHuman-ComputerInteractionSinceGOMS.Human-ComputerInterac-tion,5(2):221–265,June.FredrikOlsson.2009.Aliteraturesurveyofactivema-chinelearninginthecontextofnaturallanguagepro-cessing.Technicalreport,SICSSweden.DavidPisinger.1994.AMinimalAlgorithmfortheMultiple-ChoiceKnapsackProblem.EuropeanJour-nalofOperationalResearch,83(2):394–410.JohnC.Platt.1999.ProbabilisticOutputsforSup-portVectorMachinesandComparisonstoRegularizedLikelihoodMethods.InAdvancesinLargeMarginClassiﬁers,pages61–74.MITPress.CarlE.RasmussenandChristopherK.I.Williams.2006.GaussianProcessesforMachineLearning.MITPress,Cambridge,MAMÁ,USA.IsaiasSanchez-Cortina,NicolasSerrano,AlbertoSan-chis,andAlfonsJuan.2012.AprototypeforInter-activeSpeechTranscriptionBalancingErrorandSu-pervisionEffort.InInternationalConferenceonIntel-ligentUserInterfaces(IUI),pages325–326,Lisbon,Portugal.ManabuSassanoandSadaoKurohashi.2010.UsingSmallerConstituentsRatherThanSentencesinAc-tiveLearningforJapaneseDependencyParsing.InAssociationforComputationalLinguisticsConference(LCA),pages356–365,Uppsala,Sweden.BurrSettles,MarkCraven,andLewisFriedland.2008.ActiveLearningwithRealAnnotationCosts.InNeuralInformationProcessingSystemsConference(NIPS)-WorkshoponCost-SensitiveLearning,LakeTahoe,NV,UnitedStates.BurrSettles.2008.AnAnalysisofActiveLearningStrategiesforSequenceLabelingTasks.InConfer-enceonEmpiricalMethodsinNaturalLanguagePro-cessing(EMNLP),pages1070–1079,Honolulu,USA.HagenSoltau,FlorianMetze,ChristianF¨ugen,andAlexWaibel.2001.AOne-PassDecoderBasedonPoly-morphicLinguisticContextAssignment.InAuto-maticSpeechRecognitionandUnderstandingWork-

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
7
4
1
5
6
6
8
6
2

/
t

a
C
_
a
_
0
0
1
7
4
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
9
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

180

shop(ASRU),pages214–217,MadonnadiCampiglio,Italy.LuciaSpecia.2011.ExploitingObjectiveAnnota-tionsforMeasuringTranslationPost-editingEffort.InConferenceoftheEuropeanAssociationforMachineTranslation(EAMT),pages73–80,Nice,France.MatthiasSperber,GrahamNeubig,ChristianF¨ugen,SatoshiNakamura,andAlexWaibel.2013.EfﬁcientSpeechTranscriptionThroughRespeaking.InInter-speech,pages1087–1091,Lyon,France.BernhardSuhm,BradMyers,andAlexWaibel.2001.Multimodalerrorcorrectionforspeechuserinter-faces.TransactionsonComputer-HumanInteraction,8(1):60–98.EvimariaTerziandPanayiotisTsaparas.2006.Efﬁcientalgorithmsforsequencesegmentation.InSIAMCon-ferenceonDataMining(SDM),Bethesda,Maryland,USA.KatrinTomanekandUdoHahn.2009.Semi-SupervisedActiveLearningforSequenceLabeling.InInterna-tionalJointConferenceonNaturalLanguageProcess-ing(Ijcnlp),pages1039–1047,Singapore.KatrinTomanek,UdoHahn,andSteffenLohmann.2010.ACognitiveCostModelofAnnotationsBasedonEye-TrackingData.InAssociationforCompu-tationalLinguisticsConference(LCA),pages1158–1167,Uppsala,Sweden.PaoloTothandDanieleVigo.2001.TheVehicleRoutingProblem.SocietyforIndustrial&AppliedMathemat-ics(SIAM),Philadelphia.SudheendraVijayanarasimhanandKristenGrauman.2009.WhatsItGoingtoCostYou?:PredictingEf-fortvs.InformativenessforMulti-LabelImageAnno-tations.InConferenceonComputerVisionandPat-ternRecognition(CVPR),pages2262–2269,MiamiBeach,Florida,USA.SudheendraVijayanarasimhan,PrateekJain,andKristenGrauman.2010.Far-sightedactivelearningonabud-getforimageandvideorecognition.InConferenceonComputerVisionandPatternRecognition(CVPR),pages3035–3042,SanFrancisco,California,EE.UU,Junio. Transacciones de la Asociación de Lingüística Computacional, 2 (2014) 169–180. Editor de acciones: Eric Fosler-Lussier. imagen

Descargar PDF