Transacciones de la Asociación de Lingüística Computacional, volumen. 6, páginas. 467–481, 2018. Editor de acciones: Jordan Boyd-Graber .
Lote de envío: 11/2017; Lote de revisión: 2/2018; Publicado 7/2018.
C(cid:13)2018 Asociación de Lingüística Computacional. Distribuido bajo CC-BY 4.0 licencia.
DetectingInstitutionalDialogActsinPoliceTrafficStopsVinodkumarPrabhakaranStanfordUniv.,CACamillaGriffithsStanfordUniv.,CAHangSuUCBerkeley,CAPrateekVermaStanfordUniv.,CANelsonMorganICSIBerkeley,CAJenniferL.EberhardtStanfordUniv.,CADanJurafskyStanfordUniv.,CA{vinodkpg,camillag,jleberhardt,jurafsky}@stanford.edu,{suhang3240,prateek119}@gmail.com,morgan@uprise.orgAbstractWeapplycomputationaldialogmethodstopolicebody-worncamerafootagetomodelconversationsbetweenpoliceofficersandcommunitymembersintrafficstops.Rely-ingonthetheoryofinstitutionaltalk,wede-velopalabelingschemeforpolicespeechdur-ingtrafficstops,andataggertodetectinsti-tutionaldialogacts(Reasons,Searches,Of-feringHelp)fromtranscribedtextattheturn(78%F-score)andstop(89%F-score)level.Wethendevelopspeechrecognitionandseg-mentationalgorithmstodetecttheseactsatthestoplevelfromrawcameraaudio(81%F-score,withevenhigheraccuracyforcrucialactslikeconveyingthereasonforthestop).Wedemonstratethatthedialogstructurespro-ducedbyourtaggercouldrevealwhetheroffi-cersfollowlawenforcementnormslikeintro-ducingthemselves,explainingthereasonforthestop,andaskingpermissionforsearches.Thisworkmaythereforeinformandaidef-fortstoensuretheproceduraljusticeofpolice-communityinteractions.1IntroductionImprovingtherelationshipbetweenpoliceofficersandthecommunitiestheyserveisacriticalsocietalgoal.Weproposetostudythisrelationshipbyap-plyingNLPtechniquestoconversationsbetweenof-ficersandcommunitymembersintrafficstops.Traf-ficstopsareoneofthemostcommonformsofpo-licecontactwithcommunitymembers,with10%ofU.S.adultspulledovereveryyear(LangtonandDurose,2013).Yetpastresearchonwhatpeopleex-perienceduringthesetrafficstopshasmainlybeenlimitedtoself-reportedbehaviorandpost-hocnar-ratives(LundmanandKaufman,2003;ángel,2005;brunson,2007;Eppetal.,2014).Therapidadoptionofbody-worncamerasbypo-licedepartmentsintheU.S.(lawsin60%ofstatesintheU.S.encouragetheuseofbodycameras)andacrosstheworldhasprovidedunprecedentedinsightintotrafficstops.1Whilefootagefromthesecam-erasisusedasevidenceincontentiouscases,theun-structurednatureandimmensevolumeofvideodatameansthatmostofthisfootageisuntapped.RecentworkbyVoigtetal.(2017)demonstratedthatbody-worncamerafootagecouldbeusednotjustasevidenceincourt,butasdata.Theydevel-opedalgorithmstoautomaticallydetectthedegreeofrespectthatofficerscommunicatedtodriversincloseto1,000routinetrafficstopscapturedoncam-era.Itwasthefirststudytousemachinelearningtechniquestoextractinsightsfromthisfootage.Thisfootagecanbefurtherusedtounearththestructureofpolice-communityinteractionsandgainamorecomprehensivepictureofthetrafficstopasaneverydayinstitutionalpractice.Forinstance,knowingwhichrequeststheofficermakes,whetherandwhentheyintroducethemselvesorexplainthereasonforthestopisanovelwaytomeasurepro-ceduraljustice;asetoffairnessprinciplesrecom-mendedbythePresident’sTaskForceon21stCen-turyPolicing,2andendorsedbypolicedepartmentsacrosstheU.S.1https://en.wikipedia.org/wiki/Body_worn_video_(police_equipment)2http://www.theiacp.org/TaskForceReport
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
3
1
1
5
6
7
6
5
8
/
/
t
yo
a
C
_
a
_
0
0
0
3
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
468
Weproposeautomaticallyextractingdialogstruc-turefrombodycamerafootagetocontributetoourunderstandingofpolice-communityinterac-tions.Werelyonthenotionofinstitutionaltalk(Heritage,2005),whichpositsthatdialogacts,top-ics,andnarrativeareheavilydefinedbytheinsti-tutionalcontext.Trafficstopsareakindofinstitu-tionaltalk;asare,forexample,doctor-patientinter-actions,counselingconversations,andcitizencallsforhelpfrompolice.Weintroduceamodelofinsti-tutionalactsfortrafficstopconversations.Sincetheofficerholdsapositionofpowerwithinthisinsti-tutionalcontext,theirdialogbehaviorhasagreaterinfluenceinshapingtheconversation(Couplandetal.,1991;Gnisci,2005);hence,wefocusonthein-stitutionalactsperformedbytheofficerinthispaper.Contributionsofourpaper:1)Atypologyofinstitutionaldialogactstomodelthestructureofpolice-driverinteractionsduringtrafficstops.2)Aninstitutionalacttaggerthatworksfromtranscribedwords(78%F-score)orfromrawaudio(60%F-score).3)Aclassifierthatusesthisdialogstruc-turetodetectactsatthestoplevel(e.g.,“DoesthisstopcontainaReason?")(81%F-scorefromrawau-dio).4)Ananalysisofsalientdialogstructurepat-ternsintrafficstops;demonstratingitspotentialasatoolforpolicedepartmentstoassessandimprovepolicecommunityinteractions.2BackgroundComputationalworkonhuman-humanconversationhaslongfocusedondialogstructure,beginningwiththeinfluentialworkofGroszshowingtheho-mologybetweendialogandtaskstructure(Grosz,1977).Recentworkhasintegratedspeechacttheory(austin,1975)andconversationalanalysis(Sche-gloffandSacks,1973;Sacksetal.,1974;Schegloff,1979)intomodelsofdialogactsfordomainslikemeetings(Angetal.,2005),telephonecalls(Stolckeetal.,2006),emails(Cohenetal.,2004),chats(Kimetal.,2010),andTwitter(Ritteretal.,2010).Ourmodelsextendthisworkbydrawingonthenotionofinstitutionaltalk(AtkinsonandDrew,1979),anapplicationofconversationalanalysistoenvironmentsinwhichthegoalsofparticipantsareinstitution-specific.Actions,theirsequences,andinterpretationsduringinstitutionaltalkdependnotonlyonthespeaker(asspeechacttheorysuggests)orthedialog(asconversationalanalystsargue),buttheyareinherentlytiedtotheinstitutionalcontext.Institutionaltalkhasbeenusedasatooltoun-derstandtheworkofsocialinstitutions.Forexam-ple,WhalenandZimmerman(1987)studieddialogstructureintranscriptsofcitizencallsforhelp.Theyobservedthatthe“regular,repetitiveandrepro-duciblefeaturesofcallsforpolice,fireorparamedicservices[…]arisefromsituatedpracticesresponsivetothesequentialandinstitutionalcontextsofthistypeofcall”.Suchrecurringpatternsinlanguageandconversationexistacrossdifferentinstitutionalcontextssuchasdoctor-patientinteractions,psycho-logicalcounseling,salescalls,courtroomconversa-tions,aswellastrafficstops(Heritage,2005).Deviationsfromthesesequentialconfigurationsareconsequential.Apoliceofficerfailingtoexplainthereasonforthetrafficstopcanleadtoaggrava-tioninthedriver(Gilesetal.,2007),andanofficer’sperceivedcommunicationskills(e.g.dotheylisten,takecivilianviewsintoaccount)predictcivilian’sat-titudestowardsthepolice(Gilesetal.,2006).Thesefindingsdemonstratetheimportanceofun-derstandingtheroleofinstitutionalcontextinshap-ingconversationstructure.Indoingso,ourpaperalsodrawsonrecentresearchonautomaticallyex-tractingstructurefromhuman-humandialog.Draw-ingonGrosz’soriginalinsights,Bangaloreetal.(2006)showhowtoextractahierarchicaltaskstruc-tureforcatalogorderingdialogswithsubtaskslikeopening,contact-information,order-item,related-offers,andsummary.Prabhakaranetal.(2012)andPrabhakaranetal.(2014)employdialogactanaly-sistostudycorrelatesofgenderandpowerinworkemails,whileAlthoffetal.(2016)studiedstructuralaspectsofsuccessfulcounselingconversations,andYangetal.(2013)andChandrasekaranetal.(2017)investigatedstructuresinonlineclassroomconver-sationsthatpredictsuccessorneedforintervention.Ourworkalsodrawsonanimportantlineofunsu-pervisedworkthatmodelstopicalstructureofcon-versations(BleiandMoreno,2001;EisensteinandBarzilay,2008;Pablo,2012;Nguyenetal.,2012).Ourworkiscloselyrelatedtotheactivelineofre-searchinNLPondialogactclassification.Recently,recurrentneuralnetwork-baseddialogacttaggers,e.g.,Khanpouretal.(2016),LiandWu(2016)y
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
3
1
1
5
6
7
6
5
8
/
/
t
yo
a
C
_
a
_
0
0
0
3
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
469
Liuetal.(2017),havepostedstate-of-the-artperfor-manceonbenchmarkdatasetssuchastheSwitch-boardcorpus(Jurafskyetal.,1997)andMRDA(Angetal.,2005).Sincethesecorporacomefromsignificantlydifferentdomains(telephoneconver-sationsandmeetingtranscripts,respectivamente)thanours,andsinceweareinterestedspecificallyintheinstitutionalacts(e.g.,didtheofficerrequestdoc-umentationfromthedriver?)ratherthanthegen-eraldialogacts(didtheofficerissuearequest?),thesetaggersdonotdirectlyserveourpurpose.Fur-thermore,ourdataisanorderofmagnitudesmaller(around7Ksentences)thanthesecorpora;makingitinfeasibletotrainin-domainrecurrentnetworks.Priortoneuralnetworkapproaches,supportvec-tormachinesandconditionalrandomfields(Cohenetal.,2004;Kimetal.,2010;Kimetal.,2012;Omuyaetal.,2013)werethestate-of-the-artalgo-rithmsonthistask.Theseapproachesalsoincorpo-ratedcontextualandstructuralinformationintotheclassifier.Forinstance,Kimetal.(2012)usedlexi-calinformationfrompreviousutterancesinpredict-ingthedialogactofacurrentutterance;andOmuyaetal.(2013)usesfeaturessuchastherelativeposi-tionofanutterancew.r.tthewholedialog.Wedrawfromthislineofwork;wealsoexperimentwithpo-sitionalandcontextualfeaturesinadditiontolexicalfeatures.Furthermore,weusefeaturesthatcapturetheinstitutionalcontextoftheconversation.3InstitutionalDialogActsofTrafficStopsWebeginwithaframeworkforanalyzingthestruc-tureofinteractionsinthisimportantbutunderstud-ieddomainoftrafficstopconversations,developedbyapplyingadata-orientedapproachtobodycam-erafootage.Ourgoalistocreateaframeworkthatcanbeatoolforpolicedepartments,Responsables políticos,andthegeneralpublictounderstand,assessandim-provepolicingpractices.3.1DataWeusetheVoigtetal.(2017)datasetofbodycameraaudiofrom981vehiclestopsconductedbytheOak-landPoliceDepartmentduringthemonthofApril2014.Thisamountsto35hoursofspeech,hand-transcribedto94Kspeakerturnsand757Kwords.Officer.:Sir,hello,myname’sOfficer[NAME]oftheOak-landPoliceDepartment.[GREETING]Conductor:Hi.Officer.:ThereasonwhyIpulledyouoveriswhenyoupassedmebackthereyouweretextingortalkingonyourcellphone.[REASON]Conductor:Iwaslookingatatext,yes.Officer.:Okay.Doyouhaveum,whatyearisthecaryou’redriving?[DETAILS]Conductor:It’sa2010.Officer.:2010.Doyoustilllivein[ADDRESS]?[DETAILS]Conductor:Yes.[…]Officer.:Allright,sir.Thisisacitationforhavingyourcellphoneinyourhandwhileyou’redriving.[]YouactuallyhavetwomonthsonorbeforeJune7thtotakecareofthecitation,okay?Pleasedrivecare-fully.[SANCTION;POSITIVECLOSING]Conductor:Okay.Officer.:Thankyou.Table1:Excerptfromatrafficstopconversationwithin-stitutionalactsin[azul](names/addressesredacted).3.2TrafficStopsasInstitutionalTalkTrafficstopspossessallthreecharacteristicsofinsti-tutionaltalk(Heritage,2005):i)participants’goalsaretiedtotheirinstitution-relevantidentity(e.g.of-ficer&conductor);ii)therearespecialconstraintsonwhatisallowablewithintheinteraction;iii)therearespecialinferencesthatareparticulartothecon-text.Table1presentsanexcerptfromatrafficstopconversationfromourcorpus:Theofficergreetsthecommunitymember,givesthereasonforthestop,asksaboutpersonaldetails,issuesthesanc-tion,andclosesbyencouragingsafedriving.Weareinterestedinsuchrecurringsequencesofinstitution-specificdialogacts,orinstitutionalacts,whichcom-bineaspectsofdialogactsandthoseoftopicalseg-ments,allconditionedbytheinstitutionalcontext.3.3DevelopingtheTypologyTodevelopthetaxonomyofinstitutionaldialogacts,webeginwithadata-orientedexploration:identify-ingrecurringsequencesoftopicsegmentsusingthe(unsupervised)mixedmembershipMarkovmodel(Pablo,2012).3Figure1showsthetopicsegmentsassignedbya10-topicmodelonthetrafficstopofTable1.Themodelidentifieddifferentspansofcon-3Wetrainedthemodelonasubsetof541stoptranscriptsfromourdata,exploringdifferentnumbersoftopics.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
3
1
1
5
6
7
6
5
8
/
/
t
yo
a
C
_
a
_
0
0
0
3
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
470
Figure1:TopicassignmentsfromMixedMembershipMarkovModeling(Pablo,2012)onasamplestop(turnsgofromtoptobottom;x-axisshowsprobabilitiesassignedtoeachtopic;rightarethetoptopicwords).Themodelidentifiesthereasonforthestop(naranja),driver’sdocu-ments(azul),driver’saddressanddemographics(purple),thesanction(beige)andclosing(yellow).versation;theofficergivesthereasonforthestop(naranja),asksfordocuments(azul),collectsdriverinformation(purple),thenintheend,therearespansofissuingasanction(beige)andclosing(yellow).Whilethesetopicalassignmentshelpfullysuggestahigh-levelnotionofthestructureoftheseconver-sations,theydonotcapturethespecificactsofficersdo.Wenextturnedtotheproceduraljusticeliter-ature,whichhighlightsspecificacts.Forinstance,questioningthedriver’slegitimacyforbeingsome-where(whyareyouhere?)ordrivingacar(whosecarisit?)areactsthattriggernegativereactionsindrivers(Eppetal.,2014).Por otro lado,officersintroducingthemselvesandexplainingthereasonsforthestopareimportantproceduraljusticefacetsthatcommunicatefairnessandrespect(RamseyandRobinson,2015).Informedbytheproceduraljusticeliterature,thePresident’sTaskForcerecommenda-tions,andareviewoftheunsupervisedtopicseg-ments,twooftheauthorsmanuallyanalyzedtwentystoptranscriptstoidentifyinstitutionaldialogacts.Wefocusedonactsthattendtorecur(e.g.ci-tations),andthosewithproceduraljusticeinterest(e.g.reasons,introductions),teasingapartactswithsimilargoalsbutdifferentillocutionaryforce(ex-plicitlystatingvs.implyingthereasonforthestop;orrequestingtosearchthevehiclevs.statingthatasearchwasbeingconducted).Thisprocessresultedinaninitialcodingschemeoftwentytwoinstitu-tionalactsinninecategories.Wealsoobservethattherecurringactsbycommunitymemberswereof-teninresponsetoofficers’acts(e.g.,respondingtodemographicquestions),astheirpositionofpowergivesthemhigherinfluenceinshapingtheconversa-tion(Gilesetal.,2007).Por eso,wefocusonofficerspeechtocaptureourinstitutionalactannotations.3.4AnnotatingInstitutionalActsFromeachstoptranscript,weselectedallofficerturns(excludingthosedirectedtotheradiodis-patcher),andannotatedeachsentenceofeachturn.Inthefirstround,threeannotatorsannotatedthesame10stopsusingthetaxonomyandmanualdevelopedabovewithanaveragepair-wiseinter-annotatoragreementofκ=0.79.Wediscussedthesourcesofdisagreement,ratifiedtheannotations,andupdatedtheannotationmanualtoclarifyactde-scriptions.Duringthisprocess,wealsoupdatedtheannotationmanualtoincludefouradditionalinstitu-tionalacts,resultinginasetoftwentyfiveactsinelevencategories.Table2presentsthisfinaltypol-ogy,alongwithactualexamplesfromourdata.Wethenperformedtwosubsequentroundsofthree-wayparallelannotationsobtainingaveragepair-wiseκvaluesof0.84and0.88,respectively.Onceweobtainedhighagreement,weconductedafourthroundwhereeachannotatorannotatedasep-aratesetof30stops.Stopswerechosenatran-domfromtheentirecorpusforeachround;sin embargo,sevenofthepreviouslyannotatedstopswereincor-rectlyincludedinthefinalroundofannotations,re-sultinginatotalof113annotatedstops(7081sen-tences,4245turns).Table1showsresultinglabels.4LearningtoDetectInstitutionalActsWenowinvestigatewhetherwecantrainamodelthatcanautomaticallydetecttheinstitutionalactsduringthecourseofatrafficstop.InSections5-7,wepresentaninstitutionalacttagger,anddescribethreeincreasinglydifficultevaluationsettings:1.Usingmanualtranscripts:Wetrainandtestaninstitutionalacttaggeronthemanualtranscripts.Thistaskissimilartodialogacttagging(p.ej.,(Stolckeetal.,2006)),butithastheimportantdistinctionthatitneedstocapturesdialogstruc-tureattheintersectionofthegeneraldialogacts
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
3
1
1
5
6
7
6
5
8
/
/
t
yo
a
C
_
a
_
0
0
0
3
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
471
Event(Coarse-grained)Event(Fine-grained)CountExampleUtterancesGREETINGGreeting98“Whatsup,yall?",“Howyoudoing,hombre?",“Hello.”Introduction16“Hi.Imofficer,OaklandPD”REASONQuestionAwareness12“YouknowwhyImpullingyouover?”Explicit127“ReasonIpulledyouoverisforacellphoneviolation.”Implicit19“Didntseethestopsign?”DOCUMENTSRequestingDocuments252“Youhaveyourdriverslicense,registrationandinsurance?”DETAILSDemographics71“Howoldareyou?",“Whatsyourlastname?”Address65“What’syouraddress?",“Wheredoyouliveat?”SANCTIONIssuingCitation37“Okay,asIsay,thereasonI’mcitingyouisforfailuretoyieldtooncomingtraffic.”IssuingFix-itTicket31“I’llgiveyouafix-itticketfortheheadlight,leftfrontheadlight,allright?”IssuingWarning19“I’llgiveyouawarningtoday.”MentionLenience50“Imcuttingyouguysabreak”POSITIVECLOSINGFarewell86“Allright.Drivesafe”,“Allright,guys.Takecare”,“Haveagoodday.”ORDERSHandsOnWheel9“Heyjustkeepyourhandsonthesteeringwheelman”TurnCarOff37“Hey,turnthecaroff”LEGITIMACYVehicleOwnership41“Thisyourcar?”QuestioningIntent15“Whatareyoudoingouthere?”HISTORYWarrants3“Doyouknowyougotalittlewarranttoo?”Probation/Parole16“Youknowyou’reonprobation,bien?”Arrests4“Doyou,um,haveyoueverbeenarrested?”OFFERHELPGivingVoice19“Doyouhaveanyquestions?",“Youunderstand?”OfferingHelp5“Needanyhelpgettingbackonthetraffic?",“Youneeddirections?”SEARCHRequestforSearch3“DoyoumindifIuhsearchthecar?”StatementofSearch7“Youreonprobationsoyouhaveasearchclause.”Weapons15“YougotnothingonyouIneedtoworryabout?",“Noweapons,bien?”Table2:Typologyofinstitutionalactsduringtrafficstops.Column1showsthe11-waycoarse-grainedgroupings.Column2showsthe25-wayfine-grainedinstitutionalactlabelsusedforannotations,andColumn3showsthenumberofsentenceslabeledwitheachacts.(e.g.,requests,respuestas)andthetopicalstruc-ture.Section5presentstheexperimentsonbuild-ingtheinstitutionalacttaggerforthisdomain.2.UsingASR:Wedevelopanautomaticspeechrecognizerthatworksinourdomain,andusesthetextitgenerates,insteadofmanualtranscripts,totrainandtestthemodel.Thedownstreaminsti-tutionalacttaggingframeworkstaysthesame.Thissettingisnotfullyautomatic,aswestillrelyonthemanuallyidentifiedsegmentsofau-diowhereofficersspoke.Section6firstpresentsexperimentsonbuildingtheASRsystemforthisdomain,andthenpresentsresultsonusingASR-generatedtextforinstitutionalacttagging.3.Fromrawaudio:Webuildautomaticmeanstodetectthesegmentsofofficers’speech,applytheASRonthosesegments,andthenusethetextthusproducedtodetectinstitutionalacts,build-ingafullyautomatictaggerwithnohumaninter-vention.Section7firstdescribestheexperimentsondetectingtheofficers’speechautomatically,andthenpresentsresultsoninstitutionalacttag-ginginthisfullyautomaticsetting.Forallourexperiments,wemergelabelsfromallsentencesineachturn,makingthisamulti-label(insteadofmulti-class)classificationtask.4Onlyaround7%oftheinstitutionalactbearingutteranceshadmultipleacts.Commonco-occurrenceswereGREETINGandREASON,andGREETINGandOR-DERS,e.g.,Hey,turnthecaroff.Howyoudoing?5InstitutionalActTaggingfromManualTranscriptsWeadoptasupervisedmachinelearningapproachtothetaskofinstitutionalacttagging.Wedrawfrompriorworkintheareaofdialogactmodeling,whilealsoaddingfeaturesthatspecificallycapturethein-stitutionalcontextoftrafficstopconversations.5.1AlgorithmsWecomparedthreesupervisedtextclassificationmethods:SupportVectorMachine(SVM)(CortesandVapnik,1995)andExtremelyRandomized4Wepresentturn-level(insteadofsentence-level)predic-tionstofacilitatecomparisonswithexperimentspresentedinSection6&7;sentence-levelexperimentswereperformedus-ingmanualtranscriptsandyieldedslightlybetternumbers.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
3
1
1
5
6
7
6
5
8
/
/
t
yo
a
C
_
a
_
0
0
0
3
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
472
Trees(ERT)(Geurtsetal.,2006),5whichareeffi-cientandtendtoworkwellwithsmallerdatasetslikeours,andConvolutionalNeuralNetwork(CNN)(kim,2014),whichcapturesvariablelengthpatternswithoutfeatureengineering.ForSVM,weusetheone-vs-allmulti-labelalgorithm(ERTandCNNin-herentlydealwithmulti-labelclassification)andusethebalancedmodetoaddresstheskewedlabeldis-tribution(0.5%to3.5%positivecases).Inthebal-ancedmode,positiveandnegativeexamplesarebal-ancedattrainingtime.ForCNN,weusetwocon-volutionallayersoffiltersizes3and4and20filterswithreluactivationandmax-poolingwithpoolsize2.Thisisfollowedbytwodenselayers,andafi-nallayerwithsigmoidactivationandbinarycrossentropylosstohandlemulti-labelclassification.Whilesomepriorworkindialogacttagging(p.ej.,(Kimetal.,2010;Kimetal.,2012)haveshownthatsequencetaggingalgorithmssuchasconditionalrandomfields(CRF)havesomeadvantageovertextclassificationapproachessuchasSVMs,prelimi-naryexperimentsusingCRFsrevealedthistonotbethecaseinourcorpus.5.2FeaturesLexicalFeatures:WeusedunigramsandbigramsasindicatorfeaturesforSVMandERT.Weinitial-izetheinputlayerofCNNwithwordembeddingstrainedusingourentiretranscribeddataset.6Patternfeatures:Weuseindicatorfeaturesfortwotypesofpatterns.1)Foreachinstitutionalact,wehand-craftedalistoflinguisticpatterns;e.g.,thepatternfeatureforGREETINGincludedhowareyou,hello,andgoodmorning,amongoth-ers.2)Weuseasemi-automaticallybuiltdictionaryofoffenses(e.g.,taillight)byqueryingthewordembeddingmodeltrainedonalltranscriptswithaseedlistofoffenses,resultinginalargelistofof-fensesandvariationsoftheirusage(e.g.,breaklight,rearlite)withhighincidenceinsomeacts(e.g.,REASON,SANCTION).5ERTisavariantoftherandomforestalgorithm,withthedifferencethatthesplitsateachstepareselectedatrandomratherthanusingapresetcriteria.6Inpreliminaryexperiments,wefoundthatSVMsus-ingthesewordembeddings(orGloVeembeddings)performedworsethanusingngramfeaturesdirectly.AlgorithmPRFExtremelyRandomizedTrees80.963.671.2Conv.NeuralNetwork77.457.365.8SVM78.976.277.5SVM(-ngrams)15.483.326.0SVM(-patrones)78.474.476.4SVM(-estructura)76.374.275.3SVM(-patrones&estructura)76.371.974.0Table3:Micro-averagedprecision(PAG),recordar(R)andF-score(F)forexperimentsusingmanualtranscripts.Structuralfeatures:1)Thenumberofwordsintheutterance,sincesomeacts(e.g.,GREETING)re-quirefewerwordsthanothers(e.g.,SANCTION).Webinnedthisfeatureintofourbins:<3,4-10,11-20,and>20.2)Thepositionoftheutterancewithintheconversation(e.g.,SANCTIONislikelytohap-penlate,andGREETINGearly),binnedtooneormoreof:firstfive,firstquarter,firstthird,firsthalf,lasthalf,lastthird,lastquarter,andlastfive.Otherfeatures:Wetriedotherfeaturessuchas1)ngramsfrompreviousutterances,2)ngramsfromdriver’sresponses,3)dependencyparsepatterns,4)word/sentenceembeddings,and5)topicassign-mentsobtainedfromthemixedmembershipMarkovmodel(Pablo,2012)discussedinSection3.3.Thesefeaturesturnedoutnottobehelpfulforthistask,andwedonotincludethoseresultshere.5.3ExperimentsandResultsTable3presentsmicro-averaged(i.e.,weightedav-erageofeachclass)precisión,recallandF-measureobtainedon10-foldcrossvalidation.7WhileERTpostedthehighestprecisionof80.9%atalowre-callof63.6%,SVMreportedthehighestrecallof76.2%withoutahugedentinprecision.Overall,weobtainthebestmicro-averagedF-scoreof77.5%usingSVM.CNNperformedworsethanbothERTandSVM.8WealsoperformedanablationstudytoseetherelativeimportanceoffeaturesintheSVM7CNN:batchsizeof10,dropoutof0.3,adam,10epochs.SVM:C=1,linearkernel.ERT:100estimators,maxtreedepth75,#offeaturescappedat20%ofallfeatures.Parametervaluesobtainedusinggrid-searchwithinthetrainingsetforeachfold.8SinceCNNperformedmuchworsethanSVMwithlexi-calfeaturesalone(lastrow),presumablybecauseofthesmallamountofdata,wedidnotperformmoreCNNexperiments.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
3
1
1
5
6
7
6
5
8
/
/
t
yo
a
C
_
a
_
0
0
0
3
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
473
Figure2:Top25most(byabsolutevalue)weightedfea-turesintheGREETINGmodel.model.Asexpected,thengramfeaturescontributethemost;removingthemdrasticallyloweredperfor-mance.Patternsandstructuralfeatureshadasmallerimpactonperformance.Weinspectedtheweightsassignedtothefeaturesbyamodeltrainedontheentiredataset.Themod-elscreatedforeachinstitutionalacthadatleastonepatternorstructurefeatureinthetoptwentyfivefeatures.Figure2showsthefeatureweightsassignedtothemodeldetectingGREETING.Themodelup-weightedutteranceswithgreetingpat-terns(GREETINGS),firstutterances(FIRST),andutterancesinthefirstquarter(FIRSTQUART),whiledown-weightinglongerutterances(LENGTH11-20)andthosethatmentionlenience(LENIENCE).6InstitutionalActTaggingusingASRTheinstitutionalacttaggerofSection5reliesonmanualtranscriptions,makingitnotscalabletothethousandsoftrafficstopsconductedeverymonth.Wenowinvestigateusingautomaticspeechrecogni-tion,whileassumingmanualsegmentation,i.e.,weknowthetimesegmentswhereanofficerspoketothedriver;inthenextsectionweexploretheaddi-tionaltaskofautomaticofficerturndetection.6.1DataAugmentationTrafficstopshaveconsiderablenoise(wind,traffic,horns),superposición,anddifficultvocabulary(names,ad-dresses,jargon),makingitachallengingdomainforoff-the-shelfautomaticspeechrecognizers(ASR).Sin embargo,our35hoursoftranscribedspeechisin-sufficienttotrainadomain-specificrecognizer.WeDataRecordingsUtterancesHoursTrain603+2435407,408494Dev663,2413.6Test1134,2484.6Table4:DatausedtobuildtheASRmodels.thereforeemploytwodataaugmentationtechniques.First,weperturbourdatabyframe-shiftingandfilterbankadjustmentfollowingtheprocedurede-scribedin(Koetal.,2015).Inframe-shifting,wechangethestartingpointofeachframe,makingfea-turesgeneratedfromtheseframesslightlydiffer-entfromtheoriginalones.Forfilterbankadjust-ment,wemovethelocationsofthecenterfrequen-ciesoffilterbanktriangularfrequencybinsduringfeatureextraction.Thismethodincreasesourtrain-ingdata5-foldto180hours.Second,wemakeuseofthe300-hourSwitchboardtelephonespeechdataset(GodfreyandHolliman,1997)tocreatead-ditionaldata.WefirstupsampleSwitchboardspeechtothe16KHzofourdata,andthenmixthemwithnoisesamplesrandomlypickedfromourdatawherespeechisnotidentified,usingarandomspeech-to-noise-ratiobetween0and10.Thismethodcon-tributesanother300hoursofspeechfortraining.6.2AcousticModelingWeimplementedtwoacousticmodels,aBi-directionalLongShort-TermMemorynetwork(BLSTM)(Gravesetal.,2013)andaDeepNeuralNetHiddenMarkovModel(DNN-HMM)tri-phonebaseline.WhileLSTMbasedapproachesgenerallyworkbetter,theyaremuchslowertotrain,sowewantedtoknowiftheirworderrorimprovementsin-deedtranslatedtoacttaggerimprovements.DNN-HMMsystemtrainingfollowsthestandardpipelineintheKalditoolkit(Poveyetal.,2011;Vesel´yetal.,2013).FramealignmentsgeneratedfromatraditionalGaussianmixturemodelbasedsystemareusedastargetsand40-dimensionfMLLRfeatures(Gales,1998)areusedasinputstotheDNNtoaidspeakeradaptation.ThenetworkwastrainedusingRestrictedBoltzmannMachine(RBM)basedpretraining(Salakhutdinovetal.,2007)andthendis-criminativelytrainedusingstochasticgradientde-scentwithcross-entropyaslossfunction.(Vesel´y
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
3
1
1
5
6
7
6
5
8
/
/
t
yo
a
C
_
a
_
0
0
0
3
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
474
DataPerplexityTrafficstops79.4+Switchboard75.9+Fisher74.3Table5:LanguagemodelperplexityonDevset.etal.,2013)describesmoretrainingdetails.WetrainedtheBLSTMusingtherecipeproposedbyMohamedetal.(2015).TheBLSTMisusedtomodelshortsegmentsofspeech(withaslidingwin-dowof40frames),andpredictframe-levelHMMstatesateachtimeframe9.Weuse6hiddenlayersand512LSTMcellsineachdirection.Dropout(Sri-vastavaetal.,2014),peepholeconnections(Gersetal.,2002)andgradientclippingareadoptedtosta-bilizetraining(Saketal.,2014).AsinDNN-HMMtraining,fMLLRfeaturesandframealignmentsareusedasinputsandtargetsrespectively.Fordecoding,frameposteriorsfromtheacousticmodelarefedintoaweightedfinitestatetransducerwithHMMs,context-dependenttri-phonemodels,alexicon,10anda3-gramlanguagemodelwithKneser-Neysmoothing(KneserandNey,1995).6.3LanguageModelDataAugmentationTomitigatelanguagemodeldatascarcity,weusetranscriptionsfromtheSwitchboardandFisher(Cierietal.,2004)corpus,addingabout3.12Mand21.1Mwords,respectively.Separatelanguagemod-elsaretrainedonthesedatasets,andtheninterpo-latedwiththetrafficstoplanguagemodel;interpola-tionweightswerechosenbyminimizingperplexityonaseparateDevset.Table5showstheperplexitiesofdifferentlanguagemodelsonthisDevset.6.4EvaluatingASRModelsTable4showsstatisticsofthedatausedtobuildtheASRsystem.Wekeptasidethe113institu-tionalactannotatedstopsfromSection3astestset.Theremaining669stopsweredivided9:1intoTrainandDevsets.TheTrainsetalsoincludesthe2435recordingsfromtheSwitchboardcorpora.9Notethatthisrecipeisdifferentfromtheend-to-endap-proachwhereLSTMmodeltakesinthewholeutteranceandpredictphone/wordoutputsdirectly(GravesandJaitly,2014)10CMUdictionary(CMUdictv0.7a)isused.ModelDevTestDNN57.048.5BLSTM49.745.0BLSTM(-dataaugmentation)56.951.4BLSTM(-LMinterpolation)50.245.7Table6:WorderrorratefordifferentASRmodels.ASRSource1Best10BestDNN57.263.6BLSTM65.065.3Table7:Micro-averagedF-scoresoninstitutionalactpre-dictionusingdifferentASRsources.Table6showsworderrorratesunderdifferentset-tings.Overall,weobtainrelativelyhigherrorrates,largelyduetothenoisyenvironmentoftheaudiointhisdomain.BLSTMperformsbetterthanDNN-HMM,consistentwithpriorresearch(Mohamedetal.,2015;Saketal.,2014).11InterpolatingSwitch-boardandFisherlanguagemodelsprovidesafurtherboostof0.7percentagepoints.6.5InstitutionalActTaggingExperimentsWenowusetextgeneratedbyASRtotrainandtesttheinstitutionalacttaggerofSection4.Toincreaserecall,wealsomadeuseofN-bestlistoutputfromtheASRsystems,collectingngramandpatternfea-turesfromthetop10candidatetranscriptions.TheL1penaltyintheSVMlimitstheimpactofthere-sultingnoisierngramsonprecision.Table7presentsmicro-averagedF-scores.BLSTMwith10BestobtainedthebestF-scoreof65.3.Whileusing10BestlistsonlyhelpedmarginallyforBLSTM,ithelpedtheDNNenoughtoeliminatemostofthegapinperformancewithBLSTMs.OurresultssuggestthatdownstreamtaskswithefficiencyconstraintscouldemployDNNswithoutahugedentinperformancebymakinguseofNBestorlatticeoutput.11NotethatourTestset,designedformeasuringinstitutionalactdetection,consistsofonlypoliceofficerstalkingclosetothecamera;hencetheworderrorratecanbelowerthantheDev,whichisdesignedtomeasureoverallASRperformanceandincludescommunitymemberspeechaswell.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
3
1
1
5
6
7
6
5
8
/
/
t
yo
a
C
_
a
_
0
0
0
3
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
475
ASRSource1Best10BestDNN43.756.0BLSTM53.859.8Table8:Micro-averagedF-scoresoninstitutionalactpre-dictionfromrawaudiousingdifferentASRsources.7InstitutionalActTaggingfromRawAudioWenowturntothetaskofdetectinginstitutionalactsdirectlyfromrawbodycameraaudio.Thisrequiresdetectingspanswithspeechactivityanddistinguish-ingthemfromnoise—voiceactivitydetection—andidentifyingsegmentsspokenbythepoliceofficers.7.1FindingOfficerSpeechSegmentsOurgoalistofindregionsoftheaudiowithahighprobabilityofbeingofficerspeech.Wecouldnotbuildastandardsupervisedofficer-versus-otherclassifier,becausethestopscontainlargeuntran-scribedregionsofofficerspeech(wedidnottran-scribesegmentswheretheofficerwas,forexample,talkingtothedispatcherinthecar).Wethereforeinsteadbuiltatwo-outputclassifiertodiscriminatebetweentheofficerandcommunitymemberspeech,andusedatunedthreshold(0.55)ontheposteriorprobabilityofofficerasourvoiceactivitydetector,drawingontheintuitionsof(WilliamsandEllis,1999;Vermaetal.,2015)whofoundthatposteriorfeaturesonspeechtasksalsoimprovedspeech/non-speechperformance.Ourmodelisa3-layerfullyconnectedneuralnetworkwith1024neuronstrainedwithcrossentropyloss.12Figure3sketchesthear-chitecture.Weruntheclassifieroneach.5secondspan;(recall=.97andprecision=.90ontheDevsetofTable4),andthenmergeclassificationstoasingleturnifadjacentspansareclassifiedasofficerspeech,witha500mslenienceforpauses.7.2InstitutionalActTaggingExperimentsWenowpresentexperimentsusingtheautomaticallyidentifiedofficerspeechsegments.Attrainingtime,weusetheASRgeneratedtextusinggoldsegments;12Patchof210mswithastrideof50ms.Audiowasdown-sampledto16kHz,andconvertedto21-dimensionalmagni-tudemel-filterbankrepresentationcoveringfrequenciesfrom0-8kHz.FFTsizewas512with10mshopand30msframesize.Figure3:DetectingOfficerSpeechsegments.attesttime,weusethesameASRmodeltogeneratetextforthepredictedsegments.Sincethepredictedsegmentsdonotexactlymatchgoldsegments,weuseafuzzy-matchingapproachforevaluation.Ifagoldsegmentcontainsanactandanoverlappingpredictedsegmenthasthesameact,weconsideritatruepositive.Ifagoldsegmentcontainsanact,butnoneoftheoverlappingpredictedsegmentshavethatact,itiscountedasafalsenegative.Ifanactisidentifiedinoneofthepredictedsegments,withoutanyoftheoverlappinggoldsegmentshavingit,thenweconsideritafalsepositive.Table8presentsresultsusingthisevaluationscheme.Again,BLSTMusingthe10BeststrategyobtainedthebestF-scoreof59.8%.BothBLSTMandDNNbenefitedsignificantlyfromusingthe10Bestlikelypredictions.AsintheASRexperi-ments,theDNNsubstantiallyclosesthegapinper-formancebyusingthe10Beststrategy.8StopLevelActDetectionOurthreeprevioussetsofmodelsfocusedonlabel-ingeachofficerturnwithoneormoreinstitutionalacts.Formanypurposes,itsufficestoaskafarsim-plerquestion:doesanactoccursomewhereinthetrafficstop?Fromaproceduraljusticestandpoint,forexample,wewanttoknowwhethertheofficerexplainedthereasonforthestop;wemaynotcareabouttheturninwhichthereasonoccurred.Wecallthistaskstop-levelactdetection,inwhicheachstopislabeledasapositiveinstanceofanactifthatparticularactoccurredinitinthegoldlabels.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
3
1
1
5
6
7
6
5
8
/
/
t
yo
a
C
_
a
_
0
0
0
3
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
476
UsingManualTranscriptsUsingASRTranscriptsUsingRawAudioEventCountPrec.Rec.F-meas.Prec.Rec.F-meas.Prec.Rec.F-meas.GREETING8092.390.091.184.588.886.670.291.379.4REASON9694.793.894.294.386.590.296.484.490.0DOCUMENTS10097.097.097.095.993.094.496.892.094.4DETAILS5686.289.387.768.878.673.366.166.166.1SANCTION7994.181.087.184.281.082.690.382.386.1POSITIVECLOSING7191.287.389.284.476.180.090.667.677.4ORDERS3287.184.485.790.387.588.996.687.591.8LEGITIMACY4178.470.774.489.763.474.385.729.343.6HISTORY1177.863.670.075.054.663.271.445.555.6OFFERHELP1871.483.376.982.477.880.082.477.880.0SEARCH1070.070.070.066.720.030.860.030.040.0MicroAverage(Weighted)90.487.589.086.581.784.085.577.181.1MacroAverage(Unweighted)85.582.883.983.373.476.882.468.573.1Table9:Stoplevelinstitutionalactpresencedetectionresults(foreachlabel).Ouralgorithmissimple:runourbestturn-basedacttagger,andifthetaggerlabelsaninstitutionalactanywhereintheconversation,tagtheconversationashavingthatclass.13Weexploreallthreesettings:manualsegmentsandtranscripts,manualsegmentswithASR,andautomaticsegmentswithASR.Wecompareourresultswithadialog-structure-ignorantlexicalbaseline:simplymergealltextfea-tures(ngramsandpatterns)fromalltheofficerturnsinastopandusethemtoclassifywhetherthestopdidordidn’tcontainanact.Ourgoalhereistoseewhetherdialogstructureisusefulforthistask;ifso,thetaggerbasedondialogturnsshouldoutperformtheglobaltextclassifier.Table10showsthatusingtheoutputoftheturn-basedclassifiertodostopclassificationoffersahugeadvantageoverthestructure-ignorantbaseline,re-ducingF-scoreerrorby49%whileusingmanualtranscripts,andby22%whileappliedtorawaudio.Table9andTable11summarizethedifferentex-perimentspresentedinSections4-8.Table9breaksdownperformanceforeachofthe11acts,whileTa-ble11comparesturn-leveltostop-levelresults.Despiteourrelativelysmalltrainingresources(113stopswithdialogactlabels,ASRandsegmen-tationtrainingdatafromonemonth),performanceatthestopleveldirectlyfromrawaudioissurpris-inglyhigh.Forinstance,detectingwhetherornotthecommunitymemberwasexplainedthereasontheywerestopped—animportantquestionforpro-13Weusethebestsystemfromeachsetofexperiments:SVMmodelusingngrams,patrones,andstructurefeaturestrainedonmanualtranscriptsorfromtheBLSTMASRmodel.PRFManual(Lexicalbaseline)79.677.678.6Manual(OurTagger)90.487.589.0ASR(Lexicalbaseline)78.075.676.8ASR(OurTagger)86.581.784.0RawAudio(Lexicalbaseline)79.671.475.2RawAudio(OurTagger)85.577.181.1Table10:Stoplevelinstitutionalactdetectionusingourtagger,comparedtoalexicalbaselinemodeltrainedonallthewordsspokenbytheofficer,withoutaccountingforthedialogstructure.TextsourceManualASRASRSegmentationsourceManualManualAutoTurnlevel77.565.359.8Stoplevel89.084.081.1Table11:Summary:Micro-averagedF-scoresacrossdif-ferenttext/segmentationsources.ceduraljustice—weobtainedaround96%precisionwithan84%recallfromrawcameraaudio.9ConversationTrajectoriesTheinstitutionalactsthathappenduringatrafficstop,whentheyoccur,andinwhatorderareallofimportancetopolicedepartments.Forinstance,thePresident’sTaskForceon21stCenturyPolicingrec-ommends(andsomedepartmentsrequire)thatoffi-cersidentifythemselvesandstatethereasonforthestopasanimportantaspectoffairness.However,
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
3
1
1
5
6
7
6
5
8
/
/
t
yo
a
C
_
a
_
0
0
0
3
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
477
Figure4:Prototypicalconversationstructureoftrafficstops;transitionprobabilitiesbasedon900stopsfromApr’14.Figure5:Presenceofinstitutionalactsinthe900stopsofblackorwhitedriversfromthemonthofApril2014.policedepartmentscurrentlyhavenowayofeasilymeasuringhowconsistentlysuchpoliciesarecar-riedoutduringtrafficstops.Theyalsohavenowaytotesttheeffectivenessofanytrainingprogramsorpolicyupdatesthataremeanttoaffecttheseconver-sations.Inthissection,wedemonstratethatourinstitu-tionalacttaggerprovidesanefficientandreliabletoolfordepartmentstodetectandmonitorconver-sationalpatternsduringtrafficstops.Specifically,wefocusonconversationalopenings,afundamen-talaspectofconversations(SchegloffandSacks,1973)thatisalsoimportantforproceduraljus-tice(WhalenandZimmerman,1987;RamseyandRobinson,2015).Forinstance,doofficersstarttheconversationswithagreeting?Arethedriverstoldthereasonwhytheywerestopped?Wasthereasongivenbeforeorafteraskingfortheirdocumentation?Wefirstapplyourhighperformance(78%F-scoreatturnlevel;89%atstoplevel)taggingmodelonmanualtranscripts.Figure5showsthepercentageofstopsmadeinwhicheachoftheeleveninstitutionalactswaspresent.Around17%ofstopsdidnotpro-videareasonatall.Only69%ofthestopsstartedwithagreeting,andanevensmallerpercentageofstopsendedwithapositiveclosing.Whilethesehighlevelstatisticsprovideawindowintothesecon-versations,ourinstitutionaleventtaggerallowsustogaindeeperperspectives.Usingtheturn-leveltagsassignedbyoursystem,wecalculatethetransitionprobabilitiesbetweendi-alogacts.Figure4showsatrafficstop‘narrativeschema’orscript,extractedfromthehighproba-bilitytransitions.Variationsfromthisprototypicalscriptcanbeausefultoolforpolicedepartmentstostudyhowpolicecommunityinteractionsdif-feracrossdifferentsquads,citylocations,ordrivercharacteristicslikerace.Figure6,forexample,showsdifferentconversa-tionalpathsthatofficerstakebeforeexplainingthereasonforthestop.Inoveraquarterofthestops,eitherthereasonisnotgiven,oritisgivenafteris-suingordersorrequestingdocuments.Theseviola-tionsofpolicingrecommendationsorrequirementscanimpactthedrivers’attitudeandperceptionofthelegitimacyoftheinstitution.Figure6:ConversationalPathstoGivingReason.10DiscussionInthissection,weoutlinesomeofthelimitationsofourworkanddiscussfuturedirectionsofresearch.First,ourworkisbasedondatafromasinglepo-licedepartment(theOaklandPoliceDepartmentintheStateofCalifornia)intheU.S.Theschemawedevelopedmayneedtobeupdatedforittobeappli-cabletootherpolicedepartments;especiallythoseinothercountries,wherethelaws,policiesandcul-turearoundpolicingmaybesignificantlydifferent.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
3
1
1
5
6
7
6
5
8
/
/
t
yo
a
C
_
a
_
0
0
0
3
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
478
Duetothesensitivenatureofthedata,wewillnotbeabletopubliclyreleasetherawannotationsde-scribedinSection3.4.However,wewillreleasethelabelingschemeforinstitutionalactsintrafficstops,alongwiththeannotationmanual.Webelievethatitwillserveasastartingpointforfutureresearchersworkinginthisdomain.Likeanydata-orientedapproach,ourmachinelearningmodelsmayhavecapturedtheidiosyn-crasiesoftheparticulardepartmentrepresentedinourdataset.Sincewearenotawareofanyotherpo-licedepartments’body-worncamerafootagethatisavailableforresearch,wehavenowaytoguaran-teethatourmodelsaredirectlyapplicabletootherpolicedepartments’data.Ourinstitutionalacttaggerenablesustoperformlargescalesocialscienceanalysescontrollingforvariousconfounds,whichisinfeasibletoperformusinghand-labeleddata.However,althoughourmodelsobtainhighperformanceindetectingindi-vidualinstitutionalacts,itmayalsocapturebiasesthatexistinthedata(HopkinsandKing,2010).Por eso,ourmodelsshouldbecorrectedforbiasesbeforetheymaybeusedtoestimateproportionsinanycategoryofstops.Inthispaper,wefocusonofficers’speechalone,sincetheconversationalinitiativewithrespecttotheinstitutionalactsliesmostlywiththeofficer.How-ever,drivers’speechmayalsoneedtobetakenintoaccountsometimes;e.g.,ifanofficersaysyestoadriver’squestiondidyoustopmeforrunningtheredlight?,theofficerhasinfactgiventhereasonforthestopeventhoughtheirwordsalonedon’tconveythatfact.Moreover,drivers’speechmayalsocontributetohowtheconversationsareshaped.However,sincethecameraisfurtherawayfromthedriverthantheofficer,andsincetheenvironmentisnoisy,theaudioqualityofdrivers’speechispoor,andfurtherworkisrequiredtoextractusefulinformationfromdriver’sspeech.Thisisanimportantlineoffuturework.Thevideoinformationfromthebody-camerafootagemaypotentiallyhelpinthediarizationandsegmentationtasks,andinanalyzingtheeffectstheinstitutionalactshaveonthedriver.However,sincemanyofthestopsoccuratnightwhenthevideoisoftendark,itisnotstraightforwardtoextractusefulinformationfromthem.Thisisanotherdirectionoffuturework.11ConclusionInthispaper,wedevelopedatypologyofinstitu-tionaldialogactstomodelthestructureofpoliceofficerinteractionswithdriversintrafficstops.Itenablesafine-grainedandcontextualizedanalysisofdialogstructurethatgenericdialogactsfailtoprovide.Webuiltsupervisedtaggersfordetect-ingtheseinstitutionaldialogactsfrominteractionscapturedonpolicebody-worncameras,achievingaround78%F-scoreattheturnleveland89%F-scoreatthestoplevel.Ourtaggerdetectsinstitu-tionalactsatthestopleveldirectlyfromrawbody-cameraaudiowith81%F-score,withevenhigheraccuracyonimportantactslikegivingthereasonforastop.Finally,weuseourinstitutionalacttaggerononemonth’sworthofstopstoextractinsightsaboutthefrequencyandorderinwhichtheseactsoccur.Thestrainsonpolice-communityrelationsintheU.S.makeitevermoreimportanttodevelopinsightsintohowconversationsbetweenpoliceandcommu-nitymembersareshaped.Untilnow,wehavenothadareliablewaytounderstandthedynamicsofthesestops.Inthispaper,wepresentanovelwaytolookattheseconversationsandgainactionableinsightsintotheirstructure.Beingabletoauto-maticallyextractthisinformationdirectlyfromrawbody-worncamerafootageholdsimmensepotentialnotonlyforpolicedepartments,butalsoforpolicymakersandthegeneralpublicaliketounderstandandimprovethisubiquitousinstitutionalpractice.Thecorecontributionofthispaperisatechnicaloneofdetectinginstitutionalactsinthedomainoftrafficstops,fromtextandfromunstructuredaudiofilesextractedfromrawbody-worncamerafootage.Currentworkaimstoimprovetheperformanceofthesegmentationanddiarizationcomponents,withthehopeofreducingsomeoftheperformancegapwithoursystemrunongoldtranscripts.WealsoplantoextendthepreliminaryanalyseswedescribeinSection9,forinstance,studyinghowthedifferentconversationalpathsandthepresenceorabsenceofcertainacts(suchasgreetingsorreason)shapestherestoftheconversation,includinghowitchangesthecommunitymember’slanguageuse.Finally,ourmodelallowsustostudywhetherpolicetraininghasaneffectonthekindsofconversationsthatpoliceofficershavewiththecommunitiestheyserve.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
3
1
1
5
6
7
6
5
8
/
/
t
yo
a
C
_
a
_
0
0
0
3
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
479
AcknowledgmentsWethanktheanonymousreviewersaswellastheactioneditor,JordanBoyd-Graber,forhelpfulfeed-backonanearlierdraftofthispaper.Thisre-searchwassupportedbyaJohnD.andCatherineT.MacArthurFoundationawardgrantedtoJ.L.Eber-hardtandD.Jurafsky,aswellasNSFgrantsIIS-1514268andIIS-1159679.WealsothanktheCityofOaklandandtheOaklandPoliceDepartmentfortheirsupportandcooperationinthisproject.ReferencesTimAlthoff,KevinClark,andJureLeskovec.2016.Large-scaleanalysisofcounselingconversations:Anapplicationofnaturallanguageprocessingtomentalhealth.TransactionsoftheAssociationforComputa-tionalLinguistics,4:463–476.JeremyAng,YangLiu,andElizabethShriberg.2005.Automaticdialogactsegmentationandclassificationinmultipartymeetings.InProceedingsofIEEEInter-nationalConferenceonAcoustics,Discurso,andSignalProcessing,volume1,pages1061–1064.IEEE.J.MaxwellAtkinsonandPaulDrew.1979.OrderinCourt.Springer.JohnLangshawAustin.1975.HowToDoThingsWithWords.OxfordUniversityPress.SrinivasBangalore,GiuseppeDiFabbrizio,andAmandaStent.2006.Learningthestructureoftask-drivenhuman-humandialogs.InProceedingsofthe21stIn-ternationalConferenceonComputationalLinguisticsand44thAnnualMeetingoftheAssociationforCom-putationalLinguistics,pages201–208.AssociationforComputationalLinguistics.DavidM.BleiandPedroJ.Moreno.2001.Topicseg-mentationwithanaspecthiddenMarkovmodel.InProceedingsofthe24thAnnualInternationalACMSI-GIRconferenceonResearchandDevelopmentinIn-formationRetrieval,pages343–348.ACM.RodK.Brunson.2007.“PoliceDon’tLikeBlackpeo-ple”:African-AmericanYoungMen’sAccumulatedPoliceExperiences.Criminology&PublicPolicy,6(1):71–101.MuthuKumarChandrasekaran,CarrieEpp,Min-YenKan,andDianeLitman.2017.Usingdiscoursesig-nalsforrobustinstructorinterventionprediction.InProceedingsoftheAAAIConferenceonArtificialIn-telligence.ChristopherCieri,DavidMiller,andKevinWalker.2004.TheFishercorpus:Aresourceforthenextgenerationsofspeech-to-text.InProceedingsoftheFourthInternationalConferenceonLanguageRe-sourcesandEvaluation(LREC’04).EuropeanLan-guageResourcesAssociation(ELRA).WilliamW.Cohen,VitorR.Carvalho,andTomM.Mitchell.2004.Learningtoclassifyemailinto“speechacts”.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing,volume4,pages309–316.AssociationforComputa-tionalLinguistics.CorinnaCortesandVladimirVapnik.1995.Support-vectornetworks.Machinelearning,20(3):273–297.JustineCoupland,NikolasCoupland,andHowardGiles.1991.Accommodationtheory,comunicación,con-textandconsequences.ContextsofAccommodation,pages1–68.JacobEisensteinandReginaBarzilay.2008.Bayesianunsupervisedtopicsegmentation.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLan-guageProcessing,pages334–343.AssociationforComputationalLinguistics.RobinS.Engel.2005.Citizens’perceptionsofdis-tributiveandproceduralinjusticeduringtrafficstopswithpolice.JournalofResearchinCrimeandDelin-quency,42(4):445–481.CharlesR.Epp,StevenMaynard-Moody,andDonaldP.Haider-Markel.2014.PulledOver:HowPoliceStopsDefineRaceandCitizenship.UniversityofChicagoPress.MarkJ.F.Gales.1998.MaximumlikelihoodlineartransformationsforHMM-basedspeechrecognition.ComputerSpeech&Idioma,12(2):75–98.FelixA.Gers,NicolN.Schraudolph,andJ¨urgenSchmid-huber.2002.LearningprecisetimingwithLSTMre-currentnetworks.JournalofMachineLearningRe-search,3(Aug):115–143.PierreGeurts,DamienErnst,andLouisWehenkel.2006.Extremelyrandomizedtrees.MachineLearning,63(1):3–42.HowardGiles,JenniferFortman,Ren´eDailey,ValerieBarker,ChristopherHajek,MichelleChernikoffAn-derson,andNicholasO.Rule.2006.Communica-tionaccommodation:Lawenforcementandthepublic.AppliedInterpersonalCommunicationMatters:Fam-ily,Salud,andCommunityRelations,5:241–269.HowardGiles,ChristopherHajek,ValerieBarker,Mei-ChenLin,YanBingZhang,MaryLeeHummert,andMichelleC.Anderson.2007.Accommodationandinstitutionaltalk:Communicativedimensionsofpo-licecivilianinteractions.InLanguage,DiscourseandSocialPsychology,pages131–159.Springer.AugustoGnisci.2005.Sequentialstrategiesofaccom-modation:Anewmethodincourtroom.BritishJour-nalofSocialPsychology,44(4):621–643.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
3
1
1
5
6
7
6
5
8
/
/
t
yo
a
C
_
a
_
0
0
0
3
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
480
JohnJ.GodfreyandEdwardHolliman.1997.Switchboard-1release2.LinguisticDataConsortium,Filadelfia,926:927.AlexGravesandNavdeepJaitly.2014.Towardsend-to-endspeechrecognitionwithrecurrentneuralnetworks.InInternationalConferenceonMachineLearning,pages1764–1772.AlexGraves,Abdel-rahmanMohamed,andGeoffreyHinton.2013.Speechrecognitionwithdeeprecurrentneuralnetworks.In2013IEEEInternationalCon-ferenceonAcoustics,SpeechandSignalProcessing(ICASSP),pages6645–6649.IEEE.BarbaraJ.Grosz.1977.Therepresentationanduseoffo-cusindialogueunderstanding.Technicalreport,SRIInternationalMenloParkUnitedStates.JohnHeritage.2005.Conversationanalysisandinstitu-tionaltalk.HandbookofLanguageandSocialInter-action,pages103–147.DanielJ.HopkinsandGaryKing.2010.Amethodofautomatednonparametriccontentanalysisforso-cialscience.AmericanJournalofPoliticalScience,54(1):229–247.DanielJurafsky,RebeccaBates,NoahCoccaro,RachelMartin,MarieMeteer,KlausRies,ElizabethShriberg,AndreasStolcke,PaulTaylor,andCarolVanEss-Dykema.1997.Automaticdetectionofdiscoursestructureforspeechrecognitionandunderstanding.InProceedingsofthe1997IEEEWorkshoponAutomaticSpeechRecognitionandUnderstanding,pages88–95.IEEE.HamedKhanpour,NishithaGuntakandla,andRodneyNielsen.2016.Dialogueactclassificationindomain-independentconversationsusingadeeprecurrentneu-ralnetwork.InProceedingsofthe26thInternationalConferenceonComputationalLinguistics:TechnicalPapers,pages2012–2021.SuNamKim,LawrenceCavedon,andTimothyBald-win.2010.Classifyingdialogueactsinone-on-onelivechats.InProceedingsofthe2010ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages862–871.AssociationforComputationalLin-guistics.SuNamKim,LawrenceCavedon,andTimothyBaldwin.2012.Classifyingdialogueactsinmulti-partylivechats.InProceedingsofthe26thPacificAsiaCon-ferenceonLanguage,Información,andComputation,pages463–472.YoonKim.2014.Convolutionalneuralnetworksforsen-tenceclassification.InProceedingsofthe2014Con-ferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages1746–1751.AssociationforComputationalLinguistics.ReinhardKneserandHermannNey.1995.Improvedbacking-offforM-gramlanguagemodeling.InPro-ceedingsoftheInternationalConferenceonAcoustics,Discurso,andSignalProcessing,volume1,pages181–184.IEEE.TomKo,VijayadityaPeddinti,DanielPovey,andSanjeevKhudanpur.2015.Audioaugmentationforspeechrecognition.InProceedingsofSixteenthAnnualCon-ferenceoftheInternationalSpeechCommunicationAssociation(INTERSPEECH),pages3586–3589.LynnLangtonandMatthewR.Durose.2013.Policebehaviorduringtrafficandstreetstops,2011.USDe-partmentofJustice,OfficeofJusticePrograms,Bu-reauofJusticeStatisticsWashington,DC.WeiLiandYunfangWu.2016.Multi-levelgatedre-currentneuralnetworkfordialogactclassification.InProceedingsofthe26thInternationalConferenceonComputationalLinguistics:Technicalpapers,pages1970–1979.YangLiu,KunHan,ZhaoTan,andYunLei.2017.Us-ingcontextinformationfordialogactclassificationinDNNframework.InProceedingsofthe2017Confer-enceonEmpiricalMethodsinNaturalLanguagePro-cessing,pages2170–2178.RichardJ.LundmanandRobertL.Kaufman.2003.Drivingwhileblack:Effectsofrace,etnicidad,andgenderoncitizenself-reportsoftrafficstopsandpo-liceactions.Criminology,41(1):195–220.Abdel-rahmanMohamed,FrankSeide,DongYu,JashaDroppo,AndreasStoicke,GeoffreyZweig,andGeraldPenn.2015.Deepbi-directionalrecurrentnetworksoverspectralwindows.InProceedingsof2015IEEEWorkshoponAutomaticSpeechRecognitionandUn-derstanding(ASRU),pages78–83.IEEE.Viet-AnNguyen,JordanBoyd-Graber,andPhilipResnik.2012.SITS:Ahierarchicalnonparametricmodelusingspeakeridentityfortopicsegmentationinmultipartyconversations.InProceedingsofthe50thAnnualMeetingoftheAssociationforComputationalLinguistics:LongPapers-Volume1,pages78–87.As-sociationforComputationalLinguistics.AdinoyiOmuya,VinodkumarPrabhakaran,andOwenRambow.2013.Improvingthequalityofminorityclassidentificationindialogacttagging.InProceed-ingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguis-tics:HumanLanguageTechnologies,pages802–807.MichaelJ.Paul.2012.MixedmembershipMarkovmod-elsforunsupervisedconversationmodeling.InPro-ceedingsofthe2012JointConferenceonEmpiricalMethodsinNaturalLanguageProcessingandCom-putationalNaturalLanguageLearning,pages94–104.AssociationforComputationalLinguistics.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
3
1
1
5
6
7
6
5
8
/
/
t
yo
a
C
_
a
_
0
0
0
3
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
481
DanielPovey,ArnabGhoshal,GillesBoulianne,LukasBurget,OndrejGlembek,NagendraGoel,MirkoHan-nemann,PetrMotl´ıˇcek,YanminQian,PetrSchwarz,JanSilovsk´y,GeorgStemmer,andKarelVesel´y.2011.TheKaldispeechrecognitiontoolkit.InIEEE2011WorkshoponAutomaticSpeechRecognitionandUnderstanding.IEEESignalProcessingSociety.VinodkumarPrabhakaran,OwenRambow,andMonaDiab.2012.Predictingovertdisplayofpowerinwrittendialogs.InProceedingsofthe2012Confer-enceoftheNorthAmericanChapteroftheAssocia-tionforComputationalLinguistics:HumanLanguageTechnologies,pages518–522.AssociationforCompu-tationalLinguistics.VinodkumarPrabhakaran,EmilyE.Reid,andOwenRambow.2014.Genderandpower:Howgenderandgenderenvironmentaffectmanifestationsofpower.InProceedingsofthe2014ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages1965–1976.AssociationforComputationalLin-guistics.CharlesH.RamseyandLaurieO.Robinson.2015.Fi-nalreportofthePresident’staskforceon21stcenturypolicing.Washington,corriente continua:OfficeofCommunityOri-entedPolicingServices.AlanRitter,ColinCherry,andBillDolan.2010.Unsu-pervisedmodelingofTwitterconversations.InHumanLanguageTechnologies:The2010AnnualConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics,pages172–180.Associa-tionforComputationalLinguistics.HarveySacks,EmanuelA.Schegloff,andGailJefferson.1974.Asimplestsystematicsfortheorganizationofturn-takingforconversation.Language,pages696–735.Has¸imSak,AndrewSenior,andFranc¸oiseBeaufays.2014.Longshort-termmemoryrecurrentneuralnet-workarchitecturesforlargescaleacousticmodeling.InFifteenthAnnualConferenceoftheInternationalSpeechCommunicationAssociation.RuslanSalakhutdinov,AndriyMnih,andGeoffreyHin-ton.2007.RestrictedBoltzmannmachinesforcollab-orativefiltering.InProceedingsofthe24thInterna-tionalConferenceonMachineLearning,pages791–798.ACM.EmanuelA.SchegloffandHarveySacks.1973.Openingupclosings.Semiotica,8(4):289–327.EmanuelA.Schegloff.1979.Identificationandrecog-nitionintelephoneconversationopenings.Everyday-Language:StudiesinEthnomethodology,NewYork,Irvington,pages23–78.NitishSrivastava,GeoffreyHinton,AlexKrizhevsky,Ilya Sutskever,andRuslanSalakhutdinov.2014.Dropout:Asimplewaytopreventneuralnetworksfromoverfitting.TheJournalofMachineLearningResearch,15(1):1929–1958.AndreasStolcke,KlausRies,NoahCoccaro,ElizabethShriberg,RebeccaBates,DanielJurafsky,PaulTay-lor,RachelMartin,CarolVanEss-Dykema,andMarieMeteer.2006.Dialogueactmodelingforautomatictaggingandrecognitionofconversationalspeech.Di-alogue,26(3).PrateekVerma,T.P.Vinutha,ParthePandit,andPreetiRao.2015.StructuralsegmentationofHindustaniconcertaudiowithposteriorfeatures.InProceedingsofthe2015IEEEInternationalConferenceonAcous-tics,SpeechandSignalProcessing(ICASSP),pages136–140.IEEE.KarelVesel´y,MirkoHannemann,andLukasBurget.2013.Semi-supervisedtrainingofdeepneuralnet-works.InProceedingsofthe2013IEEEWorkshoponAutomaticSpeechRecognitionandUnderstanding(ASRU).IEEE.RobVoigt,NicholasP.Camp,VinodkumarPrabhakaran,WilliamL.Hamilton,RebeccaC.Hetey,CamillaM.Griffiths,DavidJurgens,DanJurafsky,andJenniferL.Eberhardt.2017.Languagefrompolicebodycam-erafootageshowsracialdisparitiesinofficerrespect.ProceedingsoftheNationalAcademyofSciences,114(25):6521–6526.MarilynR.WhalenandDonH.Zimmerman.1987.Se-quentialandinstitutionalcontextsincallsforhelp.So-cialPsychologyQuarterly,pages172–185.GethinWilliamsandDanielP.W.Ellis.1999.Speech/musicdiscriminationbasedonposteriorprob-abilityfeatures.InProceedingsoftheSixthEuropeanConferenceonSpeechCommunicationandTechnol-ogy.DiyiYang,TanmaySinha,DavidAdamson,andCar-olynP.Rose.2013.Turnon,tunein,dropout:Anticipatingstudentdropoutsinmassiveopenonlinecourses.InProceedingsofthe2013NIPSData-DrivenEducationWorkshop,volume10,pages13–20.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
t
a
C
yo
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
d
oh
i
/
.
1
0
1
1
6
2
/
t
yo
a
C
_
a
_
0
0
0
3
1
1
5
6
7
6
5
8
/
/
t
yo
a
C
_
a
_
0
0
0
3
1
pag
d
.
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
482