计算语言学协会会刊, 卷. 6, PP. 467–481, 2018. 动作编辑器: Jordan Boyd-Graber .

计算语言学协会会刊, 卷. 6, PP. 467–481, 2018. 动作编辑器: Jordan Boyd-Graber .
提交批次: 11/2017; 修改批次: 2/2018; 已发表 7/2018.
C(西德:13)2018 计算语言学协会. 根据 CC-BY 分发 4.0 执照.

DetectingInstitutionalDialogActsinPoliceTrafficStopsVinodkumarPrabhakaranStanfordUniv.,CACamillaGriffithsStanfordUniv.,CAHangSuUCBerkeley,CAPrateekVermaStanfordUniv.,CANelsonMorganICSIBerkeley,CAJenniferL.EberhardtStanfordUniv.,CADanJurafskyStanfordUniv.,CA{vinodkpg,camillag,jleberhardt,jurafsky}@stanford.edu,{suhang3240,prateek119}@gmail.com,morgan@uprise.orgAbstractWeapplycomputationaldialogmethodstopolicebody-worncamerafootagetomodelconversationsbetweenpoliceofficersandcommunitymembersintrafficstops.Rely-ingonthetheoryofinstitutionaltalk,wede-velopalabelingschemeforpolicespeechdur-ingtrafficstops,andataggertodetectinsti-tutionaldialogacts(Reasons,Searches,Of-feringHelp)fromtranscribedtextattheturn(78%F-score)andstop(89%F-score)level.Wethendevelopspeechrecognitionandseg-mentationalgorithmstodetecttheseactsatthestoplevelfromrawcameraaudio(81%F-score,withevenhigheraccuracyforcrucialactslikeconveyingthereasonforthestop).Wedemonstratethatthedialogstructurespro-ducedbyourtaggercouldrevealwhetheroffi-cersfollowlawenforcementnormslikeintro-ducingthemselves,explainingthereasonforthestop,andaskingpermissionforsearches.Thisworkmaythereforeinformandaidef-fortstoensuretheproceduraljusticeofpolice-communityinteractions.1IntroductionImprovingtherelationshipbetweenpoliceofficersandthecommunitiestheyserveisacriticalsocietalgoal.Weproposetostudythisrelationshipbyap-plyingNLPtechniquestoconversationsbetweenof-ficersandcommunitymembersintrafficstops.Traf-ficstopsareoneofthemostcommonformsofpo-licecontactwithcommunitymembers,with10%ofU.S.adultspulledovereveryyear(LangtonandDurose,2013).Yetpastresearchonwhatpeopleex-perienceduringthesetrafficstopshasmainlybeenlimitedtoself-reportedbehaviorandpost-hocnar-ratives(LundmanandKaufman,2003;恩格尔,2005;Brunson,2007;Eppetal.,2014).Therapidadoptionofbody-worncamerasbypo-licedepartmentsintheU.S.(lawsin60%ofstatesintheU.S.encouragetheuseofbodycameras)andacrosstheworldhasprovidedunprecedentedinsightintotrafficstops.1Whilefootagefromthesecam-erasisusedasevidenceincontentiouscases,theun-structurednatureandimmensevolumeofvideodatameansthatmostofthisfootageisuntapped.RecentworkbyVoigtetal.(2017)demonstratedthatbody-worncamerafootagecouldbeusednotjustasevidenceincourt,butasdata.Theydevel-opedalgorithmstoautomaticallydetectthedegreeofrespectthatofficerscommunicatedtodriversincloseto1,000routinetrafficstopscapturedoncam-era.Itwasthefirststudytousemachinelearningtechniquestoextractinsightsfromthisfootage.Thisfootagecanbefurtherusedtounearththestructureofpolice-communityinteractionsandgainamorecomprehensivepictureofthetrafficstopasaneverydayinstitutionalpractice.Forinstance,knowingwhichrequeststheofficermakes,whetherandwhentheyintroducethemselvesorexplainthereasonforthestopisanovelwaytomeasurepro-ceduraljustice;asetoffairnessprinciplesrecom-mendedbythePresident’sTaskForceon21stCen-turyPolicing,2andendorsedbypolicedepartmentsacrosstheU.S.1https://en.wikipedia.org/wiki/Body_worn_video_(police_equipment)2http://www.theiacp.org/TaskForceReport

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
0
3
1
1
5
6
7
6
5
8

/

/
t

A
C
_
A
_
0
0
0
3
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

468

Weproposeautomaticallyextractingdialogstruc-turefrombodycamerafootagetocontributetoourunderstandingofpolice-communityinterac-tions.Werelyonthenotionofinstitutionaltalk(Heritage,2005),whichpositsthatdialogacts,top-ics,andnarrativeareheavilydefinedbytheinsti-tutionalcontext.Trafficstopsareakindofinstitu-tionaltalk;asare,forexample,doctor-patientinter-actions,counselingconversations,andcitizencallsforhelpfrompolice.Weintroduceamodelofinsti-tutionalactsfortrafficstopconversations.Sincetheofficerholdsapositionofpowerwithinthisinsti-tutionalcontext,theirdialogbehaviorhasagreaterinfluenceinshapingtheconversation(Couplandetal.,1991;Gnisci,2005);因此,wefocusonthein-stitutionalactsperformedbytheofficerinthispaper.Contributionsofourpaper:1)Atypologyofinstitutionaldialogactstomodelthestructureofpolice-driverinteractionsduringtrafficstops.2)Aninstitutionalacttaggerthatworksfromtranscribedwords(78%F-score)orfromrawaudio(60%F-score).3)Aclassifierthatusesthisdialogstruc-turetodetectactsatthestoplevel(e.g.,“DoesthisstopcontainaReason?”)(81%F-scorefromrawau-dio).4)Ananalysisofsalientdialogstructurepat-ternsintrafficstops;demonstratingitspotentialasatoolforpolicedepartmentstoassessandimprovepolicecommunityinteractions.2BackgroundComputationalworkonhuman-humanconversationhaslongfocusedondialogstructure,beginningwiththeinfluentialworkofGroszshowingtheho-mologybetweendialogandtaskstructure(Grosz,1977).Recentworkhasintegratedspeechacttheory(Austin,1975)andconversationalanalysis(Sche-gloffandSacks,1973;Sacksetal.,1974;Schegloff,1979)intomodelsofdialogactsfordomainslikemeetings(Angetal.,2005),telephonecalls(Stolckeetal.,2006),emails(Cohenetal.,2004),chats(Kimetal.,2010),andTwitter(Ritteretal.,2010).Ourmodelsextendthisworkbydrawingonthenotionofinstitutionaltalk(AtkinsonandDrew,1979),anapplicationofconversationalanalysistoenvironmentsinwhichthegoalsofparticipantsareinstitution-specific.Actions,theirsequences,andinterpretationsduringinstitutionaltalkdependnotonlyonthespeaker(asspeechacttheorysuggests)orthedialog(asconversationalanalystsargue),buttheyareinherentlytiedtotheinstitutionalcontext.Institutionaltalkhasbeenusedasatooltoun-derstandtheworkofsocialinstitutions.Forexam-ple,WhalenandZimmerman(1987)studieddialogstructureintranscriptsofcitizencallsforhelp.Theyobservedthatthe“regular,repetitiveandrepro-duciblefeaturesofcallsforpolice,fireorparamedicservices[…]arisefromsituatedpracticesresponsivetothesequentialandinstitutionalcontextsofthistypeofcall”.Suchrecurringpatternsinlanguageandconversationexistacrossdifferentinstitutionalcontextssuchasdoctor-patientinteractions,psycho-logicalcounseling,salescalls,courtroomconversa-tions,aswellastrafficstops(Heritage,2005).Deviationsfromthesesequentialconfigurationsareconsequential.Apoliceofficerfailingtoexplainthereasonforthetrafficstopcanleadtoaggrava-tioninthedriver(Gilesetal.,2007),andanofficer’sperceivedcommunicationskills(e.g.dotheylisten,takecivilianviewsintoaccount)predictcivilian’sat-titudestowardsthepolice(Gilesetal.,2006).Thesefindingsdemonstratetheimportanceofun-derstandingtheroleofinstitutionalcontextinshap-ingconversationstructure.Indoingso,ourpaperalsodrawsonrecentresearchonautomaticallyex-tractingstructurefromhuman-humandialog.Draw-ingonGrosz’soriginalinsights,Bangaloreetal.(2006)showhowtoextractahierarchicaltaskstruc-tureforcatalogorderingdialogswithsubtaskslikeopening,contact-information,order-item,related-offers,andsummary.Prabhakaranetal.(2012)andPrabhakaranetal.(2014)employdialogactanaly-sistostudycorrelatesofgenderandpowerinworkemails,whileAlthoffetal.(2016)studiedstructuralaspectsofsuccessfulcounselingconversations,andYangetal.(2013)andChandrasekaranetal.(2017)investigatedstructuresinonlineclassroomconver-sationsthatpredictsuccessorneedforintervention.Ourworkalsodrawsonanimportantlineofunsu-pervisedworkthatmodelstopicalstructureofcon-versations(BleiandMoreno,2001;EisensteinandBarzilay,2008;保罗,2012;Nguyenetal.,2012).Ourworkiscloselyrelatedtotheactivelineofre-searchinNLPondialogactclassification.Recently,recurrentneuralnetwork-baseddialogacttaggers,e.g.,Khanpouretal.(2016),LiandWu(2016)和

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
0
3
1
1
5
6
7
6
5
8

/

/
t

A
C
_
A
_
0
0
0
3
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

469

Liuetal.(2017),havepostedstate-of-the-artperfor-manceonbenchmarkdatasetssuchastheSwitch-boardcorpus(Jurafskyetal.,1997)andMRDA(Angetal.,2005).Sincethesecorporacomefromsignificantlydifferentdomains(telephoneconver-sationsandmeetingtranscripts,分别)thanours,andsinceweareinterestedspecificallyintheinstitutionalacts(e.g.,didtheofficerrequestdoc-umentationfromthedriver?)ratherthanthegen-eraldialogacts(didtheofficerissuearequest?),thesetaggersdonotdirectlyserveourpurpose.Fur-thermore,ourdataisanorderofmagnitudesmaller(around7Ksentences)thanthesecorpora;makingitinfeasibletotrainin-domainrecurrentnetworks.Priortoneuralnetworkapproaches,supportvec-tormachinesandconditionalrandomfields(Cohenetal.,2004;Kimetal.,2010;Kimetal.,2012;Omuyaetal.,2013)werethestate-of-the-artalgo-rithmsonthistask.Theseapproachesalsoincorpo-ratedcontextualandstructuralinformationintotheclassifier.Forinstance,Kimetal.(2012)usedlexi-calinformationfrompreviousutterancesinpredict-ingthedialogactofacurrentutterance;andOmuyaetal.(2013)usesfeaturessuchastherelativeposi-tionofanutterancew.r.tthewholedialog.Wedrawfromthislineofwork;wealsoexperimentwithpo-sitionalandcontextualfeaturesinadditiontolexicalfeatures.Furthermore,weusefeaturesthatcapturetheinstitutionalcontextoftheconversation.3InstitutionalDialogActsofTrafficStopsWebeginwithaframeworkforanalyzingthestruc-tureofinteractionsinthisimportantbutunderstud-ieddomainoftrafficstopconversations,developedbyapplyingadata-orientedapproachtobodycam-erafootage.Ourgoalistocreateaframeworkthatcanbeatoolforpolicedepartments,policymakers,andthegeneralpublictounderstand,assessandim-provepolicingpractices.3.1DataWeusetheVoigtetal.(2017)datasetofbodycameraaudiofrom981vehiclestopsconductedbytheOak-landPoliceDepartmentduringthemonthofApril2014.Thisamountsto35hoursofspeech,hand-transcribedto94Kspeakerturnsand757Kwords.Officer.:Sir,hello,myname’sOfficer[NAME]oftheOak-landPoliceDepartment.[GREETING]Driver:Hi.Officer.:ThereasonwhyIpulledyouoveriswhenyoupassedmebackthereyouweretextingortalkingonyourcellphone.[REASON]Driver:Iwaslookingatatext,yes.Officer.:Okay.Doyouhaveum,whatyearisthecaryou’redriving?[DETAILS]Driver:It’sa2010.Officer.:2010.Doyoustilllivein[ADDRESS]?[DETAILS]Driver:Yes.[…]Officer.:Allright,sir.Thisisacitationforhavingyourcellphoneinyourhandwhileyou’redriving.[]YouactuallyhavetwomonthsonorbeforeJune7thtotakecareofthecitation,okay?Pleasedrivecare-fully.[SANCTION;POSITIVECLOSING]Driver:Okay.Officer.:Thankyou.Table1:Excerptfromatrafficstopconversationwithin-stitutionalactsin[蓝色的](names/addressesredacted).3.2TrafficStopsasInstitutionalTalkTrafficstopspossessallthreecharacteristicsofinsti-tutionaltalk(Heritage,2005):我)participants’goalsaretiedtotheirinstitution-relevantidentity(e.g.of-ficer&driver);二)therearespecialconstraintsonwhatisallowablewithintheinteraction;三、)therearespecialinferencesthatareparticulartothecon-text.Table1presentsanexcerptfromatrafficstopconversationfromourcorpus:Theofficergreetsthecommunitymember,givesthereasonforthestop,asksaboutpersonaldetails,issuesthesanc-tion,andclosesbyencouragingsafedriving.Weareinterestedinsuchrecurringsequencesofinstitution-specificdialogacts,orinstitutionalacts,whichcom-bineaspectsofdialogactsandthoseoftopicalseg-ments,allconditionedbytheinstitutionalcontext.3.3DevelopingtheTypologyTodevelopthetaxonomyofinstitutionaldialogacts,webeginwithadata-orientedexploration:identify-ingrecurringsequencesoftopicsegmentsusingthe(unsupervised)mixedmembershipMarkovmodel(保罗,2012).3Figure1showsthetopicsegmentsassignedbya10-topicmodelonthetrafficstopofTable1.Themodelidentifieddifferentspansofcon-3Wetrainedthemodelonasubsetof541stoptranscriptsfromourdata,exploringdifferentnumbersoftopics.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
0
3
1
1
5
6
7
6
5
8

/

/
t

A
C
_
A
_
0
0
0
3
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

470

Figure1:TopicassignmentsfromMixedMembershipMarkovModeling(保罗,2012)onasamplestop(turnsgofromtoptobottom;x-axisshowsprobabilitiesassignedtoeachtopic;rightarethetoptopicwords).Themodelidentifiesthereasonforthestop(orange),driver’sdocu-ments(蓝色的),driver’saddressanddemographics(purple),thesanction(beige)andclosing(黄色的).versation;theofficergivesthereasonforthestop(orange),asksfordocuments(蓝色的),collectsdriverinformation(purple),thenintheend,therearespansofissuingasanction(beige)andclosing(黄色的).Whilethesetopicalassignmentshelpfullysuggestahigh-levelnotionofthestructureoftheseconver-sations,theydonotcapturethespecificactsofficersdo.Wenextturnedtotheproceduraljusticeliter-ature,whichhighlightsspecificacts.Forinstance,questioningthedriver’slegitimacyforbeingsome-where(whyareyouhere?)ordrivingacar(whosecarisit?)areactsthattriggernegativereactionsindrivers(Eppetal.,2014).Ontheotherhand,officersintroducingthemselvesandexplainingthereasonsforthestopareimportantproceduraljusticefacetsthatcommunicatefairnessandrespect(RamseyandRobinson,2015).Informedbytheproceduraljusticeliterature,thePresident’sTaskForcerecommenda-tions,andareviewoftheunsupervisedtopicseg-ments,twooftheauthorsmanuallyanalyzedtwentystoptranscriptstoidentifyinstitutionaldialogacts.Wefocusedonactsthattendtorecur(e.g.ci-tations),andthosewithproceduraljusticeinterest(e.g.reasons,introductions),teasingapartactswithsimilargoalsbutdifferentillocutionaryforce(ex-plicitlystatingvs.implyingthereasonforthestop;orrequestingtosearchthevehiclevs.statingthatasearchwasbeingconducted).Thisprocessresultedinaninitialcodingschemeoftwentytwoinstitu-tionalactsinninecategories.Wealsoobservethattherecurringactsbycommunitymemberswereof-teninresponsetoofficers’acts(e.g.,respondingtodemographicquestions),astheirpositionofpowergivesthemhigherinfluenceinshapingtheconversa-tion(Gilesetal.,2007).因此,wefocusonofficerspeechtocaptureourinstitutionalactannotations.3.4AnnotatingInstitutionalActsFromeachstoptranscript,weselectedallofficerturns(excludingthosedirectedtotheradiodis-patcher),andannotatedeachsentenceofeachturn.Inthefirstround,threeannotatorsannotatedthesame10stopsusingthetaxonomyandmanualdevelopedabovewithanaveragepair-wiseinter-annotatoragreementofκ=0.79.Wediscussedthesourcesofdisagreement,ratifiedtheannotations,andupdatedtheannotationmanualtoclarifyactde-scriptions.Duringthisprocess,wealsoupdatedtheannotationmanualtoincludefouradditionalinstitu-tionalacts,resultinginasetoftwentyfiveactsinelevencategories.Table2presentsthisfinaltypol-ogy,alongwithactualexamplesfromourdata.Wethenperformedtwosubsequentroundsofthree-wayparallelannotationsobtainingaveragepair-wiseκvaluesof0.84and0.88,respectively.Onceweobtainedhighagreement,weconductedafourthroundwhereeachannotatorannotatedasep-aratesetof30stops.Stopswerechosenatran-domfromtheentirecorpusforeachround;然而,sevenofthepreviouslyannotatedstopswereincor-rectlyincludedinthefinalroundofannotations,re-sultinginatotalof113annotatedstops(7081sen-tences,4245轮流).Table1showsresultinglabels.4LearningtoDetectInstitutionalActsWenowinvestigatewhetherwecantrainamodelthatcanautomaticallydetecttheinstitutionalactsduringthecourseofatrafficstop.InSections5-7,wepresentaninstitutionalacttagger,anddescribethreeincreasinglydifficultevaluationsettings:1.Usingmanualtranscripts:Wetrainandtestaninstitutionalacttaggeronthemanualtranscripts.Thistaskissimilartodialogacttagging(例如,(Stolckeetal.,2006)),butithastheimportantdistinctionthatitneedstocapturesdialogstruc-tureattheintersectionofthegeneraldialogacts

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
0
3
1
1
5
6
7
6
5
8

/

/
t

A
C
_
A
_
0
0
0
3
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

471

Event(Coarse-grained)Event(Fine-grained)CountExampleUtterancesGREETINGGreeting98“Whatsup,yall?”,“Howyoudoing,男人?”,“Hello.”Introduction16“Hi.Imofficer,OaklandPD”REASONQuestionAwareness12“YouknowwhyImpullingyouover?”Explicit127“ReasonIpulledyouoverisforacellphoneviolation.”Implicit19“Didntseethestopsign?”DOCUMENTSRequestingDocuments252“Youhaveyourdriverslicense,registrationandinsurance?”DETAILSDemographics71“Howoldareyou?”,“Whatsyourlastname?”Address65“What’syouraddress?”,“Wheredoyouliveat?”SANCTIONIssuingCitation37“Okay,asIsay,thereasonI’mcitingyouisforfailuretoyieldtooncomingtraffic.”IssuingFix-itTicket31“I’llgiveyouafix-itticketfortheheadlight,leftfrontheadlight,allright?”IssuingWarning19“I’llgiveyouawarningtoday.”MentionLenience50“Imcuttingyouguysabreak”POSITIVECLOSINGFarewell86“Allright.Drivesafe”,“Allright,guys.Takecare”,“Haveagoodday.”ORDERSHandsOnWheel9“Heyjustkeepyourhandsonthesteeringwheelman”TurnCarOff37“Hey,turnthecaroff”LEGITIMACYVehicleOwnership41“Thisyourcar?”QuestioningIntent15“Whatareyoudoingouthere?”HISTORYWarrants3“Doyouknowyougotalittlewarranttoo?”Probation/Parole16“Youknowyou’reonprobation,正确的?”Arrests4“Doyou,um,haveyoueverbeenarrested?”OFFERHELPGivingVoice19“Doyouhaveanyquestions?”,“Youunderstand?”OfferingHelp5“Needanyhelpgettingbackonthetraffic?”,“Youneeddirections?”SEARCHRequestforSearch3“DoyoumindifIuhsearchthecar?”StatementofSearch7“Youreonprobationsoyouhaveasearchclause.”Weapons15“YougotnothingonyouIneedtoworryabout?”,“Noweapons,正确的?”Table2:Typologyofinstitutionalactsduringtrafficstops.Column1showsthe11-waycoarse-grainedgroupings.Column2showsthe25-wayfine-grainedinstitutionalactlabelsusedforannotations,andColumn3showsthenumberofsentenceslabeledwitheachacts.(e.g.,requests,responses)andthetopicalstruc-ture.Section5presentstheexperimentsonbuild-ingtheinstitutionalacttaggerforthisdomain.2.UsingASR:Wedevelopanautomaticspeechrecognizerthatworksinourdomain,andusesthetextitgenerates,insteadofmanualtranscripts,totrainandtestthemodel.Thedownstreaminsti-tutionalacttaggingframeworkstaysthesame.Thissettingisnotfullyautomatic,aswestillrelyonthemanuallyidentifiedsegmentsofau-diowhereofficersspoke.Section6firstpresentsexperimentsonbuildingtheASRsystemforthisdomain,andthenpresentsresultsonusingASR-generatedtextforinstitutionalacttagging.3.Fromrawaudio:Webuildautomaticmeanstodetectthesegmentsofofficers’speech,applytheASRonthosesegments,andthenusethetextthusproducedtodetectinstitutionalacts,build-ingafullyautomatictaggerwithnohumaninter-vention.Section7firstdescribestheexperimentsondetectingtheofficers’speechautomatically,andthenpresentsresultsoninstitutionalacttag-ginginthisfullyautomaticsetting.Forallourexperiments,wemergelabelsfromallsentencesineachturn,makingthisamulti-label(insteadofmulti-class)classificationtask.4Onlyaround7%oftheinstitutionalactbearingutteranceshadmultipleacts.Commonco-occurrenceswereGREETINGandREASON,andGREETINGandOR-DERS,e.g.,Hey,turnthecaroff.Howyoudoing?5InstitutionalActTaggingfromManualTranscriptsWeadoptasupervisedmachinelearningapproachtothetaskofinstitutionalacttagging.Wedrawfrompriorworkintheareaofdialogactmodeling,whilealsoaddingfeaturesthatspecificallycapturethein-stitutionalcontextoftrafficstopconversations.5.1AlgorithmsWecomparedthreesupervisedtextclassificationmethods:SupportVectorMachine(支持向量机)(CortesandVapnik,1995)andExtremelyRandomized4Wepresentturn-level(insteadofsentence-level)predic-tionstofacilitatecomparisonswithexperimentspresentedinSection6&7;sentence-levelexperimentswereperformedus-ingmanualtranscriptsandyieldedslightlybetternumbers.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
0
3
1
1
5
6
7
6
5
8

/

/
t

A
C
_
A
_
0
0
0
3
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

472

树木(ERT)(Geurtsetal.,2006),5whichareeffi-cientandtendtoworkwellwithsmallerdatasetslikeours,andConvolutionalNeuralNetwork(CNN)(Kim,2014),whichcapturesvariablelengthpatternswithoutfeatureengineering.ForSVM,weusetheone-vs-allmulti-labelalgorithm(ERTandCNNin-herentlydealwithmulti-labelclassification)andusethebalancedmodetoaddresstheskewedlabeldis-tribution(0.5%to3.5%positivecases).Inthebal-ancedmode,positiveandnegativeexamplesarebal-ancedattrainingtime.ForCNN,weusetwocon-volutionallayersoffiltersizes3and4and20filterswithreluactivationandmax-poolingwithpoolsize2.Thisisfollowedbytwodenselayers,andafi-nallayerwithsigmoidactivationandbinarycrossentropylosstohandlemulti-labelclassification.Whilesomepriorworkindialogacttagging(例如,(Kimetal.,2010;Kimetal.,2012)haveshownthatsequencetaggingalgorithmssuchasconditionalrandomfields(病例报告表)havesomeadvantageovertextclassificationapproachessuchasSVMs,prelimi-naryexperimentsusingCRFsrevealedthistonotbethecaseinourcorpus.5.2FeaturesLexicalFeatures:WeusedunigramsandbigramsasindicatorfeaturesforSVMandERT.Weinitial-izetheinputlayerofCNNwithwordembeddingstrainedusingourentiretranscribeddataset.6Patternfeatures:Weuseindicatorfeaturesfortwotypesofpatterns.1)Foreachinstitutionalact,wehand-craftedalistoflinguisticpatterns;e.g.,thepatternfeatureforGREETINGincludedhowareyou,hello,andgoodmorning,amongoth-ers.2)Weuseasemi-automaticallybuiltdictionaryofoffenses(e.g.,taillight)byqueryingthewordembeddingmodeltrainedonalltranscriptswithaseedlistofoffenses,resultinginalargelistofof-fensesandvariationsoftheirusage(e.g.,breaklight,rearlite)withhighincidenceinsomeacts(e.g.,REASON,SANCTION).5ERTisavariantoftherandomforestalgorithm,withthedifferencethatthesplitsateachstepareselectedatrandomratherthanusingapresetcriteria.6Inpreliminaryexperiments,wefoundthatSVMsus-ingthesewordembeddings(orGloVeembeddings)performedworsethanusingngramfeaturesdirectly.AlgorithmPRFExtremelyRandomizedTrees80.963.671.2Conv.NeuralNetwork77.457.365.8SVM78.976.277.5SVM(-ngrams)15.483.326.0支持向量机(-图案)78.474.476.4支持向量机(-结构)76.374.275.3支持向量机(-图案&结构)76.371.974.0Table3:Micro-averagedprecision(磷),记起(右)andF-score(F)forexperimentsusingmanualtranscripts.Structuralfeatures:1)Thenumberofwordsintheutterance,sincesomeacts(e.g.,GREETING)re-quirefewerwordsthanothers(e.g.,SANCTION).Webinnedthisfeatureintofourbins:<3,4-10,11-20,and>20.2)Thepositionoftheutterancewithintheconversation(e.g.,SANCTIONislikelytohap-penlate,andGREETINGearly),binnedtooneormoreof:firstfive,firstquarter,firstthird,firsthalf,lasthalf,lastthird,lastquarter,andlastfive.Otherfeatures:Wetriedotherfeaturessuchas1)ngramsfrompreviousutterances,2)ngramsfromdriver’sresponses,3)dependencyparsepatterns,4)word/sentenceembeddings,and5)topicassign-mentsobtainedfromthemixedmembershipMarkovmodel(保罗,2012)discussedinSection3.3.Thesefeaturesturnedoutnottobehelpfulforthistask,andwedonotincludethoseresultshere.5.3ExperimentsandResultsTable3presentsmicro-averaged(i.e.,weightedav-erageofeachclass)precision,recallandF-measureobtainedon10-foldcrossvalidation.7WhileERTpostedthehighestprecisionof80.9%atalowre-callof63.6%,SVMreportedthehighestrecallof76.2%withoutahugedentinprecision.Overall,weobtainthebestmicro-averagedF-scoreof77.5%usingSVM.CNNperformedworsethanbothERTandSVM.8WealsoperformedanablationstudytoseetherelativeimportanceoffeaturesintheSVM7CNN:batchsizeof10,dropoutof0.3,adam,10epochs.SVM:C=1,linearkernel.ERT:100estimators,maxtreedepth75,#offeaturescappedat20%ofallfeatures.Parametervaluesobtainedusinggrid-searchwithinthetrainingsetforeachfold.8SinceCNNperformedmuchworsethanSVMwithlexi-calfeaturesalone(lastrow),presumablybecauseofthesmallamountofdata,wedidnotperformmoreCNNexperiments.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
0
3
1
1
5
6
7
6
5
8

/

/
t

A
C
_
A
_
0
0
0
3
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

473

Figure2:Top25most(byabsolutevalue)weightedfea-turesintheGREETINGmodel.model.Asexpected,thengramfeaturescontributethemost;removingthemdrasticallyloweredperfor-mance.Patternsandstructuralfeatureshadasmallerimpactonperformance.Weinspectedtheweightsassignedtothefeaturesbyamodeltrainedontheentiredataset.Themod-elscreatedforeachinstitutionalacthadatleastonepatternorstructurefeatureinthetoptwentyfivefeatures.Figure2showsthefeatureweightsassignedtothemodeldetectingGREETING.Themodelup-weightedutteranceswithgreetingpat-terns(GREETINGS),firstutterances(FIRST),andutterancesinthefirstquarter(FIRSTQUART),whiledown-weightinglongerutterances(LENGTH11-20)andthosethatmentionlenience(LENIENCE).6InstitutionalActTaggingusingASRTheinstitutionalacttaggerofSection5reliesonmanualtranscriptions,makingitnotscalabletothethousandsoftrafficstopsconductedeverymonth.Wenowinvestigateusingautomaticspeechrecogni-tion,whileassumingmanualsegmentation,i.e.,weknowthetimesegmentswhereanofficerspoketothedriver;inthenextsectionweexploretheaddi-tionaltaskofautomaticofficerturndetection.6.1DataAugmentationTrafficstopshaveconsiderablenoise(风,traffic,horns),重叠,anddifficultvocabulary(名字,ad-dresses,行话),makingitachallengingdomainforoff-the-shelfautomaticspeechrecognizers(ASR).然而,our35hoursoftranscribedspeechisin-sufficienttotrainadomain-specificrecognizer.WeDataRecordingsUtterancesHoursTrain603+2435407,408494Dev663,2413.6Test1134,2484.6Table4:DatausedtobuildtheASRmodels.thereforeemploytwodataaugmentationtechniques.First,weperturbourdatabyframe-shiftingandfilterbankadjustmentfollowingtheprocedurede-scribedin(Koetal.,2015).Inframe-shifting,wechangethestartingpointofeachframe,makingfea-turesgeneratedfromtheseframesslightlydiffer-entfromtheoriginalones.Forfilterbankadjust-ment,wemovethelocationsofthecenterfrequen-ciesoffilterbanktriangularfrequencybinsduringfeatureextraction.Thismethodincreasesourtrain-ingdata5-foldto180hours.Second,wemakeuseofthe300-hourSwitchboardtelephonespeechdataset(GodfreyandHolliman,1997)tocreatead-ditionaldata.WefirstupsampleSwitchboardspeechtothe16KHzofourdata,andthenmixthemwithnoisesamplesrandomlypickedfromourdatawherespeechisnotidentified,usingarandomspeech-to-noise-ratiobetween0and10.Thismethodcon-tributesanother300hoursofspeechfortraining.6.2AcousticModelingWeimplementedtwoacousticmodels,aBi-directionalLongShort-TermMemorynetwork(BLSTM)(Gravesetal.,2013)andaDeepNeuralNetHiddenMarkovModel(DNN-HMM)tri-phonebaseline.WhileLSTMbasedapproachesgenerallyworkbetter,theyaremuchslowertotrain,sowewantedtoknowiftheirworderrorimprovementsin-deedtranslatedtoacttaggerimprovements.DNN-HMMsystemtrainingfollowsthestandardpipelineintheKalditoolkit(Poveyetal.,2011;Vesel´yetal.,2013).FramealignmentsgeneratedfromatraditionalGaussianmixturemodelbasedsystemareusedastargetsand40-dimensionfMLLRfeatures(Gales,1998)areusedasinputstotheDNNtoaidspeakeradaptation.ThenetworkwastrainedusingRestrictedBoltzmannMachine(RBM)basedpretraining(Salakhutdinovetal.,2007)andthendis-criminativelytrainedusingstochasticgradientde-scentwithcross-entropyaslossfunction.(Vesel´y

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
0
3
1
1
5
6
7
6
5
8

/

/
t

A
C
_
A
_
0
0
0
3
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

474

DataPerplexityTrafficstops79.4+Switchboard75.9+Fisher74.3Table5:LanguagemodelperplexityonDevset.etal.,2013)describesmoretrainingdetails.WetrainedtheBLSTMusingtherecipeproposedbyMohamedetal.(2015).TheBLSTMisusedtomodelshortsegmentsofspeech(withaslidingwin-dowof40frames),andpredictframe-levelHMMstatesateachtimeframe9.Weuse6hiddenlayersand512LSTMcellsineachdirection.Dropout(Sri-vastavaetal.,2014),peepholeconnections(Gersetal.,2002)andgradientclippingareadoptedtosta-bilizetraining(Saketal.,2014).AsinDNN-HMMtraining,fMLLRfeaturesandframealignmentsareusedasinputsandtargetsrespectively.Fordecoding,frameposteriorsfromtheacousticmodelarefedintoaweightedfinitestatetransducerwithHMMs,context-dependenttri-phonemodels,alexicon,10anda3-gramlanguagemodelwithKneser-Neysmoothing(KneserandNey,1995).6.3LanguageModelDataAugmentationTomitigatelanguagemodeldatascarcity,weusetranscriptionsfromtheSwitchboardandFisher(Cierietal.,2004)语料库,addingabout3.12Mand21.1Mwords,respectively.Separatelanguagemod-elsaretrainedonthesedatasets,andtheninterpo-latedwiththetrafficstoplanguagemodel;interpola-tionweightswerechosenbyminimizingperplexityonaseparateDevset.Table5showstheperplexitiesofdifferentlanguagemodelsonthisDevset.6.4EvaluatingASRModelsTable4showsstatisticsofthedatausedtobuildtheASRsystem.Wekeptasidethe113institu-tionalactannotatedstopsfromSection3astestset.Theremaining669stopsweredivided9:1intoTrainandDevsets.TheTrainsetalsoincludesthe2435recordingsfromtheSwitchboardcorpora.9Notethatthisrecipeisdifferentfromtheend-to-endap-proachwhereLSTMmodeltakesinthewholeutteranceandpredictphone/wordoutputsdirectly(GravesandJaitly,2014)10CMUdictionary(CMUdictv0.7a)isused.ModelDevTestDNN57.048.5BLSTM49.745.0BLSTM(-dataaugmentation)56.951.4BLSTM(-LMinterpolation)50.245.7Table6:WorderrorratefordifferentASRmodels.ASRSource1Best10BestDNN57.263.6BLSTM65.065.3Table7:Micro-averagedF-scoresoninstitutionalactpre-dictionusingdifferentASRsources.Table6showsworderrorratesunderdifferentset-tings.Overall,weobtainrelativelyhigherrorrates,largelyduetothenoisyenvironmentoftheaudiointhisdomain.BLSTMperformsbetterthanDNN-HMM,consistentwithpriorresearch(Mohamedetal.,2015;Saketal.,2014).11InterpolatingSwitch-boardandFisherlanguagemodelsprovidesafurtherboostof0.7percentagepoints.6.5InstitutionalActTaggingExperimentsWenowusetextgeneratedbyASRtotrainandtesttheinstitutionalacttaggerofSection4.Toincreaserecall,wealsomadeuseofN-bestlistoutputfromtheASRsystems,collectingngramandpatternfea-turesfromthetop10candidatetranscriptions.TheL1penaltyintheSVMlimitstheimpactofthere-sultingnoisierngramsonprecision.Table7presentsmicro-averagedF-scores.BLSTMwith10BestobtainedthebestF-scoreof65.3.Whileusing10BestlistsonlyhelpedmarginallyforBLSTM,ithelpedtheDNNenoughtoeliminatemostofthegapinperformancewithBLSTMs.OurresultssuggestthatdownstreamtaskswithefficiencyconstraintscouldemployDNNswithoutahugedentinperformancebymakinguseofNBestorlatticeoutput.11NotethatourTestset,designedformeasuringinstitutionalactdetection,consistsofonlypoliceofficerstalkingclosetothecamera;hencetheworderrorratecanbelowerthantheDev,whichisdesignedtomeasureoverallASRperformanceandincludescommunitymemberspeechaswell.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
0
3
1
1
5
6
7
6
5
8

/

/
t

A
C
_
A
_
0
0
0
3
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

475

ASRSource1Best10BestDNN43.756.0BLSTM53.859.8Table8:Micro-averagedF-scoresoninstitutionalactpre-dictionfromrawaudiousingdifferentASRsources.7InstitutionalActTaggingfromRawAudioWenowturntothetaskofdetectinginstitutionalactsdirectlyfromrawbodycameraaudio.Thisrequiresdetectingspanswithspeechactivityanddistinguish-ingthemfromnoise—voiceactivitydetection—andidentifyingsegmentsspokenbythepoliceofficers.7.1FindingOfficerSpeechSegmentsOurgoalistofindregionsoftheaudiowithahighprobabilityofbeingofficerspeech.Wecouldnotbuildastandardsupervisedofficer-versus-otherclassifier,becausethestopscontainlargeuntran-scribedregionsofofficerspeech(wedidnottran-scribesegmentswheretheofficerwas,forexample,talkingtothedispatcherinthecar).Wethereforeinsteadbuiltatwo-outputclassifiertodiscriminatebetweentheofficerandcommunitymemberspeech,andusedatunedthreshold(0.55)ontheposteriorprobabilityofofficerasourvoiceactivitydetector,drawingontheintuitionsof(WilliamsandEllis,1999;Vermaetal.,2015)whofoundthatposteriorfeaturesonspeechtasksalsoimprovedspeech/non-speechperformance.Ourmodelisa3-layerfullyconnectedneuralnetworkwith1024neuronstrainedwithcrossentropyloss.12Figure3sketchesthear-chitecture.Weruntheclassifieroneach.5secondspan;(recall=.97andprecision=.90ontheDevsetofTable4),andthenmergeclassificationstoasingleturnifadjacentspansareclassifiedasofficerspeech,witha500mslenienceforpauses.7.2InstitutionalActTaggingExperimentsWenowpresentexperimentsusingtheautomaticallyidentifiedofficerspeechsegments.Attrainingtime,weusetheASRgeneratedtextusinggoldsegments;12Patchof210mswithastrideof50ms.Audiowasdown-sampledto16kHz,andconvertedto21-dimensionalmagni-tudemel-filterbankrepresentationcoveringfrequenciesfrom0-8kHz.FFTsizewas512with10mshopand30msframesize.Figure3:DetectingOfficerSpeechsegments.attesttime,weusethesameASRmodeltogeneratetextforthepredictedsegments.Sincethepredictedsegmentsdonotexactlymatchgoldsegments,weuseafuzzy-matchingapproachforevaluation.Ifagoldsegmentcontainsanactandanoverlappingpredictedsegmenthasthesameact,weconsideritatruepositive.Ifagoldsegmentcontainsanact,butnoneoftheoverlappingpredictedsegmentshavethatact,itiscountedasafalsenegative.Ifanactisidentifiedinoneofthepredictedsegments,withoutanyoftheoverlappinggoldsegmentshavingit,thenweconsideritafalsepositive.Table8presentsresultsusingthisevaluationscheme.Again,BLSTMusingthe10BeststrategyobtainedthebestF-scoreof59.8%.BothBLSTMandDNNbenefitedsignificantlyfromusingthe10Bestlikelypredictions.AsintheASRexperi-ments,theDNNsubstantiallyclosesthegapinper-formancebyusingthe10Beststrategy.8StopLevelActDetectionOurthreeprevioussetsofmodelsfocusedonlabel-ingeachofficerturnwithoneormoreinstitutionalacts.Formanypurposes,itsufficestoaskafarsim-plerquestion:doesanactoccursomewhereinthetrafficstop?Fromaproceduraljusticestandpoint,forexample,wewanttoknowwhethertheofficerexplainedthereasonforthestop;wemaynotcareabouttheturninwhichthereasonoccurred.Wecallthistaskstop-levelactdetection,inwhicheachstopislabeledasapositiveinstanceofanactifthatparticularactoccurredinitinthegoldlabels.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
0
3
1
1
5
6
7
6
5
8

/

/
t

A
C
_
A
_
0
0
0
3
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

476

UsingManualTranscriptsUsingASRTranscriptsUsingRawAudioEventCountPrec.Rec.F-meas.Prec.Rec.F-meas.Prec.Rec.F-meas.GREETING8092.390.091.184.588.886.670.291.379.4REASON9694.793.894.294.386.590.296.484.490.0DOCUMENTS10097.097.097.095.993.094.496.892.094.4DETAILS5686.289.387.768.878.673.366.166.166.1SANCTION7994.181.087.184.281.082.690.382.386.1POSITIVECLOSING7191.287.389.284.476.180.090.667.677.4ORDERS3287.184.485.790.387.588.996.687.591.8LEGITIMACY4178.470.774.489.763.474.385.729.343.6HISTORY1177.863.670.075.054.663.271.445.555.6OFFERHELP1871.483.376.982.477.880.082.477.880.0SEARCH1070.070.070.066.720.030.860.030.040.0MicroAverage(Weighted)90.487.589.086.581.784.085.577.181.1MacroAverage(Unweighted)85.582.883.983.373.476.882.468.573.1Table9:Stoplevelinstitutionalactpresencedetectionresults(foreachlabel).Ouralgorithmissimple:runourbestturn-basedacttagger,andifthetaggerlabelsaninstitutionalactanywhereintheconversation,tagtheconversationashavingthatclass.13Weexploreallthreesettings:manualsegmentsandtranscripts,manualsegmentswithASR,andautomaticsegmentswithASR.Wecompareourresultswithadialog-structure-ignorantlexicalbaseline:simplymergealltextfea-tures(ngramsandpatterns)fromalltheofficerturnsinastopandusethemtoclassifywhetherthestopdidordidn’tcontainanact.Ourgoalhereistoseewhetherdialogstructureisusefulforthistask;ifso,thetaggerbasedondialogturnsshouldoutperformtheglobaltextclassifier.Table10showsthatusingtheoutputoftheturn-basedclassifiertodostopclassificationoffersahugeadvantageoverthestructure-ignorantbaseline,re-ducingF-scoreerrorby49%whileusingmanualtranscripts,andby22%whileappliedtorawaudio.Table9andTable11summarizethedifferentex-perimentspresentedinSections4-8.Table9breaksdownperformanceforeachofthe11acts,whileTa-ble11comparesturn-leveltostop-levelresults.Despiteourrelativelysmalltrainingresources(113stopswithdialogactlabels,ASRandsegmen-tationtrainingdatafromonemonth),performanceatthestopleveldirectlyfromrawaudioissurpris-inglyhigh.Forinstance,detectingwhetherornotthecommunitymemberwasexplainedthereasontheywerestopped—animportantquestionforpro-13Weusethebestsystemfromeachsetofexperiments:SVMmodelusingngrams,图案,andstructurefeaturestrainedonmanualtranscriptsorfromtheBLSTMASRmodel.PRFManual(Lexicalbaseline)79.677.678.6Manual(OurTagger)90.487.589.0ASR(Lexicalbaseline)78.075.676.8ASR(OurTagger)86.581.784.0RawAudio(Lexicalbaseline)79.671.475.2RawAudio(OurTagger)85.577.181.1Table10:Stoplevelinstitutionalactdetectionusingourtagger,comparedtoalexicalbaselinemodeltrainedonallthewordsspokenbytheofficer,withoutaccountingforthedialogstructure.TextsourceManualASRASRSegmentationsourceManualManualAutoTurnlevel77.565.359.8Stoplevel89.084.081.1Table11:Summary:Micro-averagedF-scoresacrossdif-ferenttext/segmentationsources.ceduraljustice—weobtainedaround96%precisionwithan84%recallfromrawcameraaudio.9ConversationTrajectoriesTheinstitutionalactsthathappenduringatrafficstop,whentheyoccur,andinwhatorderareallofimportancetopolicedepartments.Forinstance,thePresident’sTaskForceon21stCenturyPolicingrec-ommends(andsomedepartmentsrequire)thatoffi-cersidentifythemselvesandstatethereasonforthestopasanimportantaspectoffairness.However,

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
0
3
1
1
5
6
7
6
5
8

/

/
t

A
C
_
A
_
0
0
0
3
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

477

Figure4:Prototypicalconversationstructureoftrafficstops;transitionprobabilitiesbasedon900stopsfromApr’14.Figure5:Presenceofinstitutionalactsinthe900stopsofblackorwhitedriversfromthemonthofApril2014.policedepartmentscurrentlyhavenowayofeasilymeasuringhowconsistentlysuchpoliciesarecar-riedoutduringtrafficstops.Theyalsohavenowaytotesttheeffectivenessofanytrainingprogramsorpolicyupdatesthataremeanttoaffecttheseconver-sations.Inthissection,wedemonstratethatourinstitu-tionalacttaggerprovidesanefficientandreliabletoolfordepartmentstodetectandmonitorconver-sationalpatternsduringtrafficstops.Specifically,wefocusonconversationalopenings,afundamen-talaspectofconversations(SchegloffandSacks,1973)thatisalsoimportantforproceduraljus-tice(WhalenandZimmerman,1987;RamseyandRobinson,2015).Forinstance,doofficersstarttheconversationswithagreeting?Arethedriverstoldthereasonwhytheywerestopped?Wasthereasongivenbeforeorafteraskingfortheirdocumentation?Wefirstapplyourhighperformance(78%F-scoreatturnlevel;89%atstoplevel)taggingmodelonmanualtranscripts.Figure5showsthepercentageofstopsmadeinwhicheachoftheeleveninstitutionalactswaspresent.Around17%ofstopsdidnotpro-videareasonatall.Only69%ofthestopsstartedwithagreeting,andanevensmallerpercentageofstopsendedwithapositiveclosing.Whilethesehighlevelstatisticsprovideawindowintothesecon-versations,ourinstitutionaleventtaggerallowsustogaindeeperperspectives.Usingtheturn-leveltagsassignedbyoursystem,wecalculatethetransitionprobabilitiesbetweendi-alogacts.Figure4showsatrafficstop‘narrativeschema’orscript,extractedfromthehighproba-bilitytransitions.Variationsfromthisprototypicalscriptcanbeausefultoolforpolicedepartmentstostudyhowpolicecommunityinteractionsdif-feracrossdifferentsquads,citylocations,ordrivercharacteristicslikerace.Figure6,forexample,showsdifferentconversa-tionalpathsthatofficerstakebeforeexplainingthereasonforthestop.Inoveraquarterofthestops,eitherthereasonisnotgiven,oritisgivenafteris-suingordersorrequestingdocuments.Theseviola-tionsofpolicingrecommendationsorrequirementscanimpactthedrivers’attitudeandperceptionofthelegitimacyoftheinstitution.Figure6:ConversationalPathstoGivingReason.10DiscussionInthissection,weoutlinesomeofthelimitationsofourworkanddiscussfuturedirectionsofresearch.First,ourworkisbasedondatafromasinglepo-licedepartment(theOaklandPoliceDepartmentintheStateofCalifornia)intheU.S.Theschemawedevelopedmayneedtobeupdatedforittobeappli-cabletootherpolicedepartments;especiallythoseinothercountries,wherethelaws,policiesandcul-turearoundpolicingmaybesignificantlydifferent.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
0
3
1
1
5
6
7
6
5
8

/

/
t

A
C
_
A
_
0
0
0
3
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

478

Duetothesensitivenatureofthedata,wewillnotbeabletopubliclyreleasetherawannotationsde-scribedinSection3.4.However,wewillreleasethelabelingschemeforinstitutionalactsintrafficstops,alongwiththeannotationmanual.Webelievethatitwillserveasastartingpointforfutureresearchersworkinginthisdomain.Likeanydata-orientedapproach,ourmachinelearningmodelsmayhavecapturedtheidiosyn-crasiesoftheparticulardepartmentrepresentedinourdataset.Sincewearenotawareofanyotherpo-licedepartments’body-worncamerafootagethatisavailableforresearch,wehavenowaytoguaran-teethatourmodelsaredirectlyapplicabletootherpolicedepartments’data.Ourinstitutionalacttaggerenablesustoperformlargescalesocialscienceanalysescontrollingforvariousconfounds,whichisinfeasibletoperformusinghand-labeleddata.However,althoughourmodelsobtainhighperformanceindetectingindi-vidualinstitutionalacts,itmayalsocapturebiasesthatexistinthedata(HopkinsandKing,2010).因此,ourmodelsshouldbecorrectedforbiasesbeforetheymaybeusedtoestimateproportionsinanycategoryofstops.Inthispaper,wefocusonofficers’speechalone,sincetheconversationalinitiativewithrespecttotheinstitutionalactsliesmostlywiththeofficer.How-ever,drivers’speechmayalsoneedtobetakenintoaccountsometimes;e.g.,ifanofficersaysyestoadriver’squestiondidyoustopmeforrunningtheredlight?,theofficerhasinfactgiventhereasonforthestopeventhoughtheirwordsalonedon’tconveythatfact.Moreover,drivers’speechmayalsocontributetohowtheconversationsareshaped.However,sincethecameraisfurtherawayfromthedriverthantheofficer,andsincetheenvironmentisnoisy,theaudioqualityofdrivers’speechispoor,andfurtherworkisrequiredtoextractusefulinformationfromdriver’sspeech.Thisisanimportantlineoffuturework.Thevideoinformationfromthebody-camerafootagemaypotentiallyhelpinthediarizationandsegmentationtasks,andinanalyzingtheeffectstheinstitutionalactshaveonthedriver.However,sincemanyofthestopsoccuratnightwhenthevideoisoftendark,itisnotstraightforwardtoextractusefulinformationfromthem.Thisisanotherdirectionoffuturework.11ConclusionInthispaper,wedevelopedatypologyofinstitu-tionaldialogactstomodelthestructureofpoliceofficerinteractionswithdriversintrafficstops.Itenablesafine-grainedandcontextualizedanalysisofdialogstructurethatgenericdialogactsfailtoprovide.Webuiltsupervisedtaggersfordetect-ingtheseinstitutionaldialogactsfrominteractionscapturedonpolicebody-worncameras,achievingaround78%F-scoreattheturnleveland89%F-scoreatthestoplevel.Ourtaggerdetectsinstitu-tionalactsatthestopleveldirectlyfromrawbody-cameraaudiowith81%F-score,withevenhigheraccuracyonimportantactslikegivingthereasonforastop.Finally,weuseourinstitutionalacttaggerononemonth’sworthofstopstoextractinsightsaboutthefrequencyandorderinwhichtheseactsoccur.Thestrainsonpolice-communityrelationsintheU.S.makeitevermoreimportanttodevelopinsightsintohowconversationsbetweenpoliceandcommu-nitymembersareshaped.Untilnow,wehavenothadareliablewaytounderstandthedynamicsofthesestops.Inthispaper,wepresentanovelwaytolookattheseconversationsandgainactionableinsightsintotheirstructure.Beingabletoauto-maticallyextractthisinformationdirectlyfromrawbody-worncamerafootageholdsimmensepotentialnotonlyforpolicedepartments,butalsoforpolicymakersandthegeneralpublicaliketounderstandandimprovethisubiquitousinstitutionalpractice.Thecorecontributionofthispaperisatechnicaloneofdetectinginstitutionalactsinthedomainoftrafficstops,fromtextandfromunstructuredaudiofilesextractedfromrawbody-worncamerafootage.Currentworkaimstoimprovetheperformanceofthesegmentationanddiarizationcomponents,withthehopeofreducingsomeoftheperformancegapwithoursystemrunongoldtranscripts.WealsoplantoextendthepreliminaryanalyseswedescribeinSection9,forinstance,studyinghowthedifferentconversationalpathsandthepresenceorabsenceofcertainacts(suchasgreetingsorreason)shapestherestoftheconversation,includinghowitchangesthecommunitymember’slanguageuse.Finally,ourmodelallowsustostudywhetherpolicetraininghasaneffectonthekindsofconversationsthatpoliceofficershavewiththecommunitiestheyserve.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
0
3
1
1
5
6
7
6
5
8

/

/
t

A
C
_
A
_
0
0
0
3
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

479

AcknowledgmentsWethanktheanonymousreviewersaswellastheactioneditor,JordanBoyd-Graber,forhelpfulfeed-backonanearlierdraftofthispaper.Thisre-searchwassupportedbyaJohnD.andCatherineT.MacArthurFoundationawardgrantedtoJ.L.Eber-hardtandD.Jurafsky,aswellasNSFgrantsIIS-1514268andIIS-1159679.WealsothanktheCityofOaklandandtheOaklandPoliceDepartmentfortheirsupportandcooperationinthisproject.ReferencesTimAlthoff,KevinClark,andJureLeskovec.2016.Large-scaleanalysisofcounselingconversations:Anapplicationofnaturallanguageprocessingtomentalhealth.TransactionsoftheAssociationforComputa-tionalLinguistics,4:463–476.JeremyAng,YangLiu,andElizabethShriberg.2005.Automaticdialogactsegmentationandclassificationinmultipartymeetings.InProceedingsofIEEEInter-nationalConferenceonAcoustics,Speech,andSignalProcessing,volume1,pages1061–1064.IEEE.J.MaxwellAtkinsonandPaulDrew.1979.OrderinCourt.Springer.JohnLangshawAustin.1975.HowToDoThingsWithWords.OxfordUniversityPress.SrinivasBangalore,GiuseppeDiFabbrizio,andAmandaStent.2006.Learningthestructureoftask-drivenhuman-humandialogs.InProceedingsofthe21stIn-ternationalConferenceonComputationalLinguisticsand44thAnnualMeetingoftheAssociationforCom-putationalLinguistics,pages201–208.AssociationforComputationalLinguistics.DavidM.BleiandPedroJ.Moreno.2001.Topicseg-mentationwithanaspecthiddenMarkovmodel.InProceedingsofthe24thAnnualInternationalACMSI-GIRconferenceonResearchandDevelopmentinIn-formationRetrieval,pages343–348.ACM.RodK.Brunson.2007.“PoliceDon’tLikeBlackpeo-ple”:African-AmericanYoungMen’sAccumulatedPoliceExperiences.Criminology&PublicPolicy,6(1):71–101.MuthuKumarChandrasekaran,CarrieEpp,Min-YenKan,andDianeLitman.2017.Usingdiscoursesig-nalsforrobustinstructorinterventionprediction.InProceedingsoftheAAAIConferenceonArtificialIn-telligence.ChristopherCieri,DavidMiller,andKevinWalker.2004.TheFishercorpus:Aresourceforthenextgenerationsofspeech-to-text.InProceedingsoftheFourthInternationalConferenceonLanguageRe-sourcesandEvaluation(LREC’04).EuropeanLan-guageResourcesAssociation(ELRA).WilliamW.Cohen,VitorR.Carvalho,andTomM.Mitchell.2004.Learningtoclassifyemailinto“speechacts”.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing,volume4,pages309–316.AssociationforComputa-tionalLinguistics.CorinnaCortesandVladimirVapnik.1995.Support-vectornetworks.Machinelearning,20(3):273–297.JustineCoupland,NikolasCoupland,andHowardGiles.1991.Accommodationtheory,沟通,con-textandconsequences.ContextsofAccommodation,pages1–68.JacobEisensteinandReginaBarzilay.2008.Bayesianunsupervisedtopicsegmentation.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLan-guageProcessing,pages334–343.AssociationforComputationalLinguistics.RobinS.Engel.2005.Citizens’perceptionsofdis-tributiveandproceduralinjusticeduringtrafficstopswithpolice.JournalofResearchinCrimeandDelin-quency,42(4):445–481.CharlesR.Epp,StevenMaynard-Moody,andDonaldP.Haider-Markel.2014.PulledOver:HowPoliceStopsDefineRaceandCitizenship.UniversityofChicagoPress.MarkJ.F.Gales.1998.MaximumlikelihoodlineartransformationsforHMM-basedspeechrecognition.ComputerSpeech&语言,12(2):75–98.FelixA.Gers,NicolN.Schraudolph,andJ¨urgenSchmid-huber.2002.LearningprecisetimingwithLSTMre-currentnetworks.JournalofMachineLearningRe-search,3(八月):115–143.PierreGeurts,DamienErnst,andLouisWehenkel.2006.Extremelyrandomizedtrees.MachineLearning,63(1):3–42.HowardGiles,JenniferFortman,Ren´eDailey,ValerieBarker,ChristopherHajek,MichelleChernikoffAn-derson,andNicholasO.Rule.2006.Communica-tionaccommodation:Lawenforcementandthepublic.AppliedInterpersonalCommunicationMatters:Fam-ily,健康,andCommunityRelations,5:241–269.HowardGiles,ChristopherHajek,ValerieBarker,Mei-ChenLin,YanBingZhang,MaryLeeHummert,andMichelleC.Anderson.2007.Accommodationandinstitutionaltalk:Communicativedimensionsofpo-licecivilianinteractions.InLanguage,DiscourseandSocialPsychology,pages131–159.Springer.AugustoGnisci.2005.Sequentialstrategiesofaccom-modation:Anewmethodincourtroom.BritishJour-nalofSocialPsychology,44(4):621–643.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
0
3
1
1
5
6
7
6
5
8

/

/
t

A
C
_
A
_
0
0
0
3
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

480

JohnJ.GodfreyandEdwardHolliman.1997.Switchboard-1release2.LinguisticDataConsortium,费城,926:927.AlexGravesandNavdeepJaitly.2014.Towardsend-to-endspeechrecognitionwithrecurrentneuralnetworks.InInternationalConferenceonMachineLearning,pages1764–1772.AlexGraves,Abdel-rahmanMohamed,andGeoffreyHinton.2013.Speechrecognitionwithdeeprecurrentneuralnetworks.In2013IEEEInternationalCon-ferenceonAcoustics,SpeechandSignalProcessing(ICASSP),pages6645–6649.IEEE.BarbaraJ.Grosz.1977.Therepresentationanduseoffo-cusindialogueunderstanding.Technicalreport,SRIInternationalMenloParkUnitedStates.JohnHeritage.2005.Conversationanalysisandinstitu-tionaltalk.HandbookofLanguageandSocialInter-action,pages103–147.DanielJ.HopkinsandGaryKing.2010.Amethodofautomatednonparametriccontentanalysisforso-cialscience.AmericanJournalofPoliticalScience,54(1):229–247.DanielJurafsky,RebeccaBates,NoahCoccaro,RachelMartin,MarieMeteer,KlausRies,ElizabethShriberg,AndreasStolcke,PaulTaylor,andCarolVanEss-Dykema.1997.Automaticdetectionofdiscoursestructureforspeechrecognitionandunderstanding.InProceedingsofthe1997IEEEWorkshoponAutomaticSpeechRecognitionandUnderstanding,pages88–95.IEEE.HamedKhanpour,NishithaGuntakandla,andRodneyNielsen.2016.Dialogueactclassificationindomain-independentconversationsusingadeeprecurrentneu-ralnetwork.InProceedingsofthe26thInternationalConferenceonComputationalLinguistics:TechnicalPapers,pages2012–2021.SuNamKim,LawrenceCavedon,andTimothyBald-win.2010.Classifyingdialogueactsinone-on-onelivechats.InProceedingsofthe2010ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages862–871.AssociationforComputationalLin-guistics.SuNamKim,LawrenceCavedon,andTimothyBaldwin.2012.Classifyingdialogueactsinmulti-partylivechats.InProceedingsofthe26thPacificAsiaCon-ferenceonLanguage,信息,andComputation,pages463–472.YoonKim.2014.Convolutionalneuralnetworksforsen-tenceclassification.InProceedingsofthe2014Con-ferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages1746–1751.AssociationforComputationalLinguistics.ReinhardKneserandHermannNey.1995.Improvedbacking-offforM-gramlanguagemodeling.InPro-ceedingsoftheInternationalConferenceonAcoustics,Speech,andSignalProcessing,volume1,pages181–184.IEEE.TomKo,VijayadityaPeddinti,DanielPovey,andSanjeevKhudanpur.2015.Audioaugmentationforspeechrecognition.InProceedingsofSixteenthAnnualCon-ferenceoftheInternationalSpeechCommunicationAssociation(INTERSPEECH),pages3586–3589.LynnLangtonandMatthewR.Durose.2013.Policebehaviorduringtrafficandstreetstops,2011.USDe-partmentofJustice,OfficeofJusticePrograms,Bu-reauofJusticeStatisticsWashington,DC.WeiLiandYunfangWu.2016.Multi-levelgatedre-currentneuralnetworkfordialogactclassification.InProceedingsofthe26thInternationalConferenceonComputationalLinguistics:Technicalpapers,pages1970–1979.YangLiu,KunHan,ZhaoTan,andYunLei.2017.Us-ingcontextinformationfordialogactclassificationinDNNframework.InProceedingsofthe2017Confer-enceonEmpiricalMethodsinNaturalLanguagePro-cessing,pages2170–2178.RichardJ.LundmanandRobertL.Kaufman.2003.Drivingwhileblack:Effectsofrace,种族,andgenderoncitizenself-reportsoftrafficstopsandpo-liceactions.Criminology,41(1):195–220.Abdel-rahmanMohamed,FrankSeide,DongYu,JashaDroppo,AndreasStoicke,GeoffreyZweig,andGeraldPenn.2015.Deepbi-directionalrecurrentnetworksoverspectralwindows.InProceedingsof2015IEEEWorkshoponAutomaticSpeechRecognitionandUn-derstanding(ASRU),pages78–83.IEEE.Viet-AnNguyen,JordanBoyd-Graber,andPhilipResnik.2012.SITS:Ahierarchicalnonparametricmodelusingspeakeridentityfortopicsegmentationinmultipartyconversations.InProceedingsofthe50thAnnualMeetingoftheAssociationforComputationalLinguistics:LongPapers-Volume1,pages78–87.As-sociationforComputationalLinguistics.AdinoyiOmuya,VinodkumarPrabhakaran,andOwenRambow.2013.Improvingthequalityofminorityclassidentificationindialogacttagging.InProceed-ingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguis-tics:HumanLanguageTechnologies,pages802–807.MichaelJ.Paul.2012.MixedmembershipMarkovmod-elsforunsupervisedconversationmodeling.InPro-ceedingsofthe2012JointConferenceonEmpiricalMethodsinNaturalLanguageProcessingandCom-putationalNaturalLanguageLearning,pages94–104.AssociationforComputationalLinguistics.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
0
3
1
1
5
6
7
6
5
8

/

/
t

A
C
_
A
_
0
0
0
3
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

481

DanielPovey,ArnabGhoshal,GillesBoulianne,LukasBurget,OndrejGlembek,NagendraGoel,MirkoHan-nemann,PetrMotl´ıˇcek,YanminQian,PetrSchwarz,JanSilovsk´y,GeorgStemmer,andKarelVesel´y.2011.TheKaldispeechrecognitiontoolkit.InIEEE2011WorkshoponAutomaticSpeechRecognitionandUnderstanding.IEEESignalProcessingSociety.VinodkumarPrabhakaran,OwenRambow,andMonaDiab.2012.Predictingovertdisplayofpowerinwrittendialogs.InProceedingsofthe2012Confer-enceoftheNorthAmericanChapteroftheAssocia-tionforComputationalLinguistics:HumanLanguageTechnologies,pages518–522.AssociationforCompu-tationalLinguistics.VinodkumarPrabhakaran,EmilyE.Reid,andOwenRambow.2014.Genderandpower:Howgenderandgenderenvironmentaffectmanifestationsofpower.InProceedingsofthe2014ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages1965–1976.AssociationforComputationalLin-guistics.CharlesH.RamseyandLaurieO.Robinson.2015.Fi-nalreportofthePresident’staskforceon21stcenturypolicing.Washington,直流:OfficeofCommunityOri-entedPolicingServices.AlanRitter,ColinCherry,andBillDolan.2010.Unsu-pervisedmodelingofTwitterconversations.InHumanLanguageTechnologies:The2010AnnualConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics,pages172–180.Associa-tionforComputationalLinguistics.HarveySacks,EmanuelA.Schegloff,andGailJefferson.1974.Asimplestsystematicsfortheorganizationofturn-takingforconversation.Language,pages696–735.Has¸imSak,AndrewSenior,andFranc¸oiseBeaufays.2014.Longshort-termmemoryrecurrentneuralnet-workarchitecturesforlargescaleacousticmodeling.InFifteenthAnnualConferenceoftheInternationalSpeechCommunicationAssociation.RuslanSalakhutdinov,AndriyMnih,andGeoffreyHin-ton.2007.RestrictedBoltzmannmachinesforcollab-orativefiltering.InProceedingsofthe24thInterna-tionalConferenceonMachineLearning,pages791–798.ACM.EmanuelA.SchegloffandHarveySacks.1973.Openingupclosings.Semiotica,8(4):289–327.EmanuelA.Schegloff.1979.Identificationandrecog-nitionintelephoneconversationopenings.Everyday-Language:StudiesinEthnomethodology,NewYork,Irvington,pages23–78.NitishSrivastava,GeoffreyHinton,AlexKrizhevsky,IlyaSutskever,andRuslanSalakhutdinov.2014.Dropout:Asimplewaytopreventneuralnetworksfromoverfitting.TheJournalofMachineLearningResearch,15(1):1929–1958.AndreasStolcke,KlausRies,NoahCoccaro,ElizabethShriberg,RebeccaBates,DanielJurafsky,PaulTay-lor,RachelMartin,CarolVanEss-Dykema,andMarieMeteer.2006.Dialogueactmodelingforautomatictaggingandrecognitionofconversationalspeech.Di-alogue,26(3).PrateekVerma,T.P.Vinutha,ParthePandit,andPreetiRao.2015.StructuralsegmentationofHindustaniconcertaudiowithposteriorfeatures.InProceedingsofthe2015IEEEInternationalConferenceonAcous-tics,SpeechandSignalProcessing(ICASSP),pages136–140.IEEE.KarelVesel´y,MirkoHannemann,andLukasBurget.2013.Semi-supervisedtrainingofdeepneuralnet-works.InProceedingsofthe2013IEEEWorkshoponAutomaticSpeechRecognitionandUnderstanding(ASRU).IEEE.RobVoigt,NicholasP.Camp,VinodkumarPrabhakaran,WilliamL.Hamilton,RebeccaC.Hetey,CamillaM.Griffiths,DavidJurgens,DanJurafsky,andJenniferL.Eberhardt.2017.Languagefrompolicebodycam-erafootageshowsracialdisparitiesinofficerrespect.ProceedingsoftheNationalAcademyofSciences,114(25):6521–6526.MarilynR.WhalenandDonH.Zimmerman.1987.Se-quentialandinstitutionalcontextsincallsforhelp.So-cialPsychologyQuarterly,pages172–185.GethinWilliamsandDanielP.W.Ellis.1999.Speech/musicdiscriminationbasedonposteriorprob-abilityfeatures.InProceedingsoftheSixthEuropeanConferenceonSpeechCommunicationandTechnol-ogy.DiyiYang,TanmaySinha,DavidAdamson,andCar-olynP.Rose.2013.Turnon,tunein,dropout:Anticipatingstudentdropoutsinmassiveopenonlinecourses.InProceedingsofthe2013NIPSData-DrivenEducationWorkshop,volume10,pages13–20.

D

w
n

A
d
e
d

F
r


H

t
t

p

:
/
/

d

r
e
C
t
.


t
.

e
d

/
t

A
C

/

A
r
t

C
e

p
d

F
/

d


/

.

1
0
1
1
6
2

/
t

A
C
_
A
_
0
0
0
3
1
1
5
6
7
6
5
8

/

/
t

A
C
_
A
_
0
0
0
3
1
p
d

.

F


y
G

e
s
t

t


n
0
7
S
e
p
e


e
r
2
0
2
3

482计算语言学协会会刊, 卷. 6, PP. 467–481, 2018. 动作编辑器: Jordan Boyd-Graber . 图像
计算语言学协会会刊, 卷. 6, PP. 467–481, 2018. 动作编辑器: Jordan Boyd-Graber . 图像
计算语言学协会会刊, 卷. 6, PP. 467–481, 2018. 动作编辑器: Jordan Boyd-Graber . 图像
计算语言学协会会刊, 卷. 6, PP. 467–481, 2018. 动作编辑器: Jordan Boyd-Graber . 图像
计算语言学协会会刊, 卷. 6, PP. 467–481, 2018. 动作编辑器: Jordan Boyd-Graber . 图像
计算语言学协会会刊, 卷. 6, PP. 467–481, 2018. 动作编辑器: Jordan Boyd-Graber . 图像

下载pdf