Transactions of the Association for Computational Linguistics, vol. 6, pp. 91–106, 2018. Action Editor: Alexander Clark.

Transactions of the Association for Computational Linguistics, vol. 6, pp. 91–106, 2018. Action Editor: Alexander Clark.
Submission batch: 7/2017; Revision batch: 10/2017; Published 2/2018.

2018 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.

c
(cid:13)

TowardsEvaluatingNarrativeQualityInStudentWritingSwapnaSomasundaran1,MichaelFlor1,MartinChodorow2HillaryMolloy3BinodGyawali1LauraMcCulla11EducationalTestingService,660RosedaleRoad,Princeton,NJ08541,USA2HunterCollegeandtheGraduateCenter,CUNY,NewYork,NY10065,USA3EducationalTestingService,90NewMontgomeryStreet,SanFrancisco,CA94105,USA{ssomasundaran,mflor,hmolloy,bgyawali,LMcCulla}@ets.orgmartin.chodorow@hunter.cuny.eduAbstractThisworklaysthefoundationforautomatedassessmentsofnarrativequalityinstudentwriting.Wefirstmanuallyscoreessaysfornarrative-relevanttraitsandsub-traits,andmeasureinter-annotatoragreement.Wethenexplorelinguisticfeaturesthatareindicativeofgoodnarrativewritingandusethemtobuildanautomatedscoringsystem.Experimentsshowthatourfeaturesaremoreeffectiveinscoringspecificaspectsofnarrativequalitythanastate-of-the-artfeatureset.1IntroductionNarrative,whichincludespersonalexperiencesandstories,realorimagined,isamediumofexpressionthatisusedfromtheveryearlystagesofachild’slife.Narrativesarealsoemployedinvariouscapac-itiesinschoolinstructionandassessment.Forex-ample,theCommonCoreStateStandards,aned-ucationalinitiativeintheUnitedStatesthatdetailsrequirementsforstudentknowledgeingradesK-12,employsliterature/narrativesasoneofitsthreelanguageartsgenres.Withtheincreasedfocusonautomatedevaluationofstudentwritingineduca-tionalsettings(Adams,2014),automatedmethodsforevaluatingnarrativeessaysatscalearebecomingincreasinglyimportant.Automatedscoringofnarrativeessaysisachal-lengingarea,andonethathasnotbeenexploredex-tensivelyinNLPresearch.Previousworkonauto-matedessayscoringhasfocusedoninformational,argumentative,persuasiveandsource-basedwritingconstructs(StabandGurevych,2017;NguyenandLitman,2016;Farraetal.,2015;Somasundaranetal.,2014;BeigmanKlebanovetal.,2014;ShermisandBurstein,2013).Similarly,operationalessayscoringengines(AttaliandBurstein,2006;Elliot,2003)aregearedtowardsevaluatinglanguageprofi-ciencyingeneral.Inthiswork,welaytheground-workandpresentthefirstresultsforautomatedscor-ingofnarrativeessays,focusingonnarrativequality.Oneofthechallengesinnarrativequalityanal-ysisisthescarcityofscoredessaysinthisgenre.Wedescribeadetailedmanualannotationstudyonscoringstudentessaysalongmultipledimensionsofnarrativequality,suchasnarrativedevelopmentandnarrativeorganization.UsingascoringrubricadaptedfromtheU.S.CommonCoreStateStan-dards,weannotated942essayswrittenfor18differ-entessay-promptsbystudentsfromthreedifferentgradelevels.Thisdatasetprovidesavarietyofstorytypesandlanguageproficiencylevels.Wemeasuredinter-annotatoragreementtounderstandreliabilityofscoringstoriesfortraits(e.g.,development)aswellassub-traits(e.g.,plotdevelopmentandtheuseofnarrativetechniques).Anumberoftechniquesforwritinggoodstoriesaretargetedbythescoringrubrics.Weimplementedasystemforautomaticallyscoringdifferenttraitsofnarratives,usinglinguisticfeaturesthatcapturesomeofthosetechniques.Weinvestigatedtheeffec-tivenessofeachfeatureforscoringnarrativetraitsandanalyzedtheresultstoidentifysourcesoferrors.Themaincontributionsofthisworkareasfol-lows:(1)Tothebestofourknowledge,thisisthefirstdetailedannotationstudyonscoringnarra-tiveessaysfordifferentaspectsofnarrativequality.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
0
7
1
5
6
7
6
2
0

/

/
t

l

a
c
_
a
_
0
0
0
0
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

92

(2)Wepresentanautomatedsystemforscoringnar-rativequality,withlinguisticfeaturesspecifictoen-codingaspectsofgoodstory-telling.Thissystemoutperformsastate-of-the-artessay-scoringsystem.(3)Wepresentanalysesoftraitandoverallscoringofnarrativeessays,whichprovideinsightsintotheaspectsofnarrativesthatareeasy/difficultforhu-mansandmachinestoevaluate.2RelatedWork2.1NarrativeassessmentsResearchershaveapproachedmanualassessmentsofcreativewritinginavarietyofways.The“consensualassessmenttechnique”(Amabile,1982;Broekkampetal.,2009)evaluatesstudents’creativewritingoncriteriasuchascreativity,originalityandtechnicalquality.Consensusscoringisused,butthegenreisconsideredtobetoosubjectiveforcloseagreementbetweenscorers.Story-tellinginchildrenhasbeenstudiedandevaluatedusinganumberoftechniques.Forex-ample,theTestofNarrativeLanguage(GillamandPearson,2004)isastandardized,picture-based,norm-referencedmeasureofnarrativeability,usedtoidentifylanguagedisabilities.SteinandGlenn(1979)usedastory-schemaapproachtoevaluatestoryrecallinschoolchildren.MillerandChapman(1985)adaptedittoscorestoryre-telling,mainlyforclinicalpurposes.Similarly,narrativere-tellingisrecordedandanalyzedforlength,syntax,cohesion,andstorygrammarintheStrongNarrativeAssess-mentProcedure(Strongetal.,1998).TheIndexofNarrativeComplexity(Petersenetal.,2008)scoresoralnarrativesonseveraldimensionsandisusedtostudytheeffectivenessofclinicalinterventions.OlinghouseandLeaird(2009)usedpicture-promptsforelicitingnarrativesfromabout200stu-dentsatthe2ndand4thgradelevels.Thestorieswereevaluatedfororganization,developmentandcreativevocabulary,butthestudyfocusedonvocab-ularycharacteristicsatdifferentgradelevels.McK-eoughetal.(2006)studied150studentnarrativesinordertocomparetalentedandaveragewriters.HalpinandMoore(2006)analyzedstudents’re-tellingofexemplarstories.Theyfocusedoneventextraction,withthefinalgoalofprovidingadviceinaninteractivestory-tellingenvironment.Passonneauetal.(2007)annotatedoralretellingsofthesamestoryonthreeconsecutivedaysinordertostudyandmodelchildren’scomprehension.2.2NarrativeAnalysisinComputationalLinguisticsResearchonnarrativesinComputationalLinguisticshasemployedfables,fairytales,andliterarytexts,aimingatrepresenting,understandingandextract-inginformation,e.g.,Charniak(1972).Goyaletal.(2010)analyzedAesop’sfables,producingauto-maticplot-unitrepresentations(Lehnert,1981)withatask-specificknowledgebaseofaffect.Charactertraitsandpersonasinstorieshavealsobeenanalyzed.Forexample,Elsner(2012)pro-posedarichrepresentationofstory-charactersforthepurposeofsummarizingandrepresentingnov-els.Bammanetal.(2014)automaticallyinferredlatentcharactertypesinEnglishnovels.Valls-Vargasetal.(2014)extractedcharactersandrolesfromRussianfolktales,basedontheiractions.Chaturvedietal.(2015)analyzedshortstoriesforcharacters’desiresandbuiltasystemtorecognizedesirefulfillment,usingtextualentailment.Researchershavealsostudiedsocialnetworksandhavemodeledrelationshipsinstories(Elsonetal.,2010;Celikyilmazetal.,2010).Agarwaletal.(2013)modeledcharacterinteractionsfromAliceinWonderlandforthepurposeofsocialnetworkanal-ysis.Chaturvedietal.(2016)modeledcharacterrelationshipsinnovels,usingstructuredprediction.Wiebe(1994)proposedamethodfortrackingpsychologicalpointsofviewinnarratives,lookingatprivatestatesandsubjectivesentences.Oves-dotterAlmandSproat(2005)studiedemotionalse-quencingandtrajectoriesin22Grimm’sfairytales.Wareetal.(2011)analyzeddimensionsofconflictinfoursimple,constructedstories,withthegoalofevaluatingstorycontent.Similarly,Swansonetal.(2014)analyzedblognarrativesfornarrativeclausesub-typessuchasorientation,actionandevaluation.Reaganetal.(2016)usedsentimentanalysistogen-erateemotionalprofilesforEnglishnovels.NLPmethodshavealsobeenusedformodelingandunderstandingnarrativestructures(Finlayson,2012;Elson,2012).SeeFinlayson(2013)andMani(2012)fordetailedliteraturesurveys.Oneimportantaspectofanarrativeisthatit

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
0
7
1
5
6
7
6
2
0

/

/
t

l

a
c
_
a
_
0
0
0
0
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

93

conveysasequenceofevents(Fludernik,2009;Almeida,1995).ChambersandJurafsky(2009;2008)presentedtechniquesfortheautomaticacqui-sitionofeventchainsandeventschemas(Chambers,2013),whicharerelatedtoearliernotionsofscriptsasprepackagedchunksofknowledge(SchankandAbelson,1977).Thislineofresearchhasreceivedagreatdealofattention(Nguyenetal.,2015;Bal-asubramanianetal.,2013;Jansetal.,2012;McIn-tyreandLapata,2010).Fornarratives,OuyangandMcKeown(2015)focusedonautomaticdetectionofcompellingevents.Bogeletal.(2014)workedonextractionandtemporalorderingofeventsinnarra-tives.Basedonthe‘NarrativeClozeTest’(ChambersandJurafsky,2008),Mostafazadehetal.(2016)pre-sentedaframeworkforevaluatingstoryunderstand-ingalgorithms,the‘StoryClozeTest’,whosegoalistopredictaheld-outcontinuationofashortstory.Ourresearchdifferssignificantlyfrompreviouswork.Weaimtoevaluate,onanintegerscale,thequalityofnarrativesinstudent-generatedessays.In-sightsfrompreviousworkonnarrativeanalysiscanbeusefulforourpurposesiftheycapturenarrativetechniquesemployedbystudentwriters,andiftheycorrelatewithscoresrepresentingnarrativequality.Itisstillanopenquestionwhetheranelaboraterep-resentationandunderstandingofthestoryisneededforevaluatingstudentwriting,orwhetherencod-ingfeaturesthatcapturedifferentnarrativeaspectsmightbesufficient.Further,dependingonthetypeofstory,notallaspectsofnarrativeanalysismaycomeintoplay.Forexample,plotconstructionandnarrativeelementssuchasconflictmaybecentraltocreatingahypotheticalstoryaboutanantiquetrunk,butnotsomuchinapersonalstoryaboutatravelexperience.Tothebestofourknowledge,thisworkmakesafirstattemptatinvestigatingtheevaluationofnarrativequalityusingautomatedmethods.2.3AutomatedessayscoringThereareanumberofautomatedessayscoring(AES)systems,manyofwhichareusedoper-ationally,suchase-raterr(AttaliandBurstein,2006),Intellimetric(Elliot,2003),theIntelligentEs-sayAssessor(Landaueretal.,2003)andProjectEs-sayGrade(Page,1994).However,thesepreviousstudieshavenotbeenfocusedonnarratives.Inasomewhatrelatedstudytothisone,Somasun-daranetal.(2015)scoredoralnarrativesthatweregeneratedbyinternationalstudentsinresponsetoaseriesofpictures.Someofthefeaturesusedinthatstudyoverlapwithourworkduetotheoverlapinthegenre;however,theirfocuswasonscoringtheresponseforlanguageproficiency.Graphfeatures,whichwehaveusedinthiswork,havebeenshowntobeeffectiveincapturingideadevelopmentines-says(Somasundaranetal.,2016).Thisworkalsoemploysgraphfeatures,butitisoneofthemanyweexploreforencodingthevariouslinguisticphenom-enathatcharacterizegoodnarratives.3DataOurdatacomprisesnarrativeessayswrittenbyschoolstudentsintheCriterionR(cid:13)program1,anon-linewritingevaluationservicefromEducationalTestingService.Itisaweb-based,instructor-ledwritingtoolthathelpsstudentsplan,writeandrevisetheiressays.Narrativeessayswereobtainedfromgradelevels7,10and12.Eachessaywaswritteninresponsetooneof18story-tellingpromptsrelatedtopersonalexperiences,hypotheticalsituations,orfictionalstories.Belowaresomeexampleprompts:[PersonalExperience]Therearemomentsineveryone’sliveswhentheyfeelprideandaccomplishmentaftercompletingachallengingtask.Writeastoryaboutyourproudestmoment.[HypotheticalSituation]Pretendthatonemorningyouwakeupandfindoutthatyou’vebecomeyourteacherforaday!Whathappened?Whatdoyoudo?Doyoulearnanything?Writeastoryaboutwhathappens.Useyourimagination![FictionalStory]Throughouttheyears,manyhaveplacedmes-sagesinsealedbottlesanddroppedthebottlesintotheoceanwheretheyeventuallywasheduponforeignshores.Occasion-allythefinderhasevencontactedthesender.Writeastoryaboutfindingyourownmessageinabottle.Theaverageessaylengthinourdatais320words,witharangeof3to1310wordsandastandarddevi-ationof195.Asampleessay,“Messageinabottle”,inresponsetothefictionstorypromptaboveispre-sentedbelow:Lastyear,Iwentbacktomyhometown.TherewasabigbeautifulbeachonwhichIhadoftenplayedasachild.Nevertheless,whenIwenttothebeach,itchanged.Ilookedagreatdealoftrash,and1https://criterion.ets.org/criterion

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
0
7
1
5
6
7
6
2
0

/

/
t

l

a
c
_
a
_
0
0
0
0
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

94

manyanimaldisappeared.Withoutoriginalbreath-takingscene,therehadbeendestroyedverywell.Allofasudden,IwatchedabottleWhenIwalkedonthebeach.Iopenedthebottlewithmycu-riosity.Therewasamessageinthebottle.Themes-sagewas“Whoeveryouare,pleasehelpthisbeach.Weneedmorecleanbeachtosurvive.”Iwassur-prisedthatthismessageshouldbefromtheseacrea-ture.Theyneedhumans’help,ortheywoulddie.Therefore,Ipersuadedtheotherpeoplewholivetheretocleanthebeachimmediately.Theyallagreedtocomeandtohelpthoseanimals.Finally,withalotofpeople’shelp,thebeachbecamebeau-tifulasbefore.Ithoughtthatthosewhoundertheseawereverycomfortableandhappytoliveacleansurroundings.4ScoringNarrativeEssaysOurworkfocusesonautomaticallyevaluatingandscoringtheproficiencyofnarrativeconstructioninstudentessays.Therefore,weusearubric2createdbyeducationexpertsandteachers,andpresentedbySmarterBal-anced,anassessmentalignedtoU.S.StateStandardsforgradesK-12.4.1TraitScoringThescoringrubricprovidesguidelinesforscor-ingessaysalongthreetraits(dimensions):Pur-pose/Organization(hereafter,referredtoasOrgani-zationorOrg.),Development/Elaboration(Develop-mentorDev.)andConventions(orConv.).Eachofthedimensionsisdescribedbelow.4.1.1OrganizationOrganizationisconcernedwiththewayastoryisarrangedingeneral.Itfocusesoneventcoherence,onwhetherthestoryhasacoherentstartandend-ing,andwhetherthereisaplottoholdallthepiecesofthestorytogether.Thisdimensionisjudgedonascaleof1-4integerscorepoints,with4beingthehighestscore.Therubricprovidesthefollowingcri-teriaforanessayofscorepoint4intermsoffiveas-pectsorsub-traits:“Theorganizationofthenarrativeisfullysustainedandthefocusisclearandmaintainedthroughout:1.aneffectivePlot;2.effectivelyestablishes2https://portal.smarterbalanced.org/library/en/performance-task-writing-rubric-narrative.pdfCharacter/Setting/POV;3.consistentuseofavarietyofTransitioningstrategies;4.natural,logicalSequencingofevents;5.effectiveOpening/Closing.”Anessayisjudgednon-scorableifitisinsuffi-cient,writteninalanguageotherthanEnglish,off-topic,oroff-purpose.Suchessaysareassignedascoreof0inourscheme.4.1.2DevelopmentDevelopmentfocusesonhowthestoryisdevel-oped.Itevaluateswhetherthestoryprovidesvividdescriptions,andwhetherthereischaracterdevel-opment.Thisdimensionisalsojudgedonascaleof1-4integerscorepoints,with4beingthehigh-estscore.AsinthescoringofOrganization,inourscheme,non-scorableessaysareassigneda0scoreforDevelopment.Therubricprovidesthefollowingcriteriaforanessayofscorepoint4intermsoffiveaspectsorsub-traits:“Thenarra-tiveprovidesthorough,effectiveelaborationusingrele-vantdetails,dialogue,and/ordescription:1.clearlyde-velopedCharacter/Setting/Events;2.connectionsmadetoSourceMaterials;3.effectiveuseofavarietyofNarrativeTechniques;4.effectiveuseofsensory,con-crete,andfigurativeLanguage;5.effective,appropriateStyle.”4.1.3ConventionsThisdimensionevaluatesthelanguageprofi-ciency,judgedonascaleof1-3integerscorepoints,with3beingthehighestscore.Accordingtotherubrics,thefollowingcharacterizesanessayofscorepoint3:“Theresponsedemonstratesanadequatecom-mandofconventions:adequateuseofcorrectsentenceformation,punctuation,capitalization,grammarusage,andspelling.”4.2Sub-traitscoringAsnotedabove,OrganizationandDevelopmentareeachcomposedof5sub-traits.Wescoredthesesub-traitsmanuallyusingthesame4-pointscaleasthemaintraitscores.Thisyields10sub-traitscoresinadditiontothe3maintraitscores,foratotalof13manuallyassignedscoresperessay.Weproducedguidelinesandselectedasmallsetofbenchmarkes-saysfortrainingtwoscorers.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
0
7
1
5
6
7
6
2
0

/

/
t

l

a
c
_
a
_
0
0
0
0
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

95

4.3NarrativeandTotalScoresBasedonthehuman-assignedtraitscores,wede-riveNarrativeandTotalcompositescoresforeachessay.TheNarrativescoreforeachessayiscal-culatedbysummingtheOrganizationandDevelop-menttraitscores.ThisgivestheessayaNarrativescoreonanintegerscalefrom0to8.Wesumupthethreetraitscores(Organization+Development+Conventions)togetaTotalscoreonanintegerscalefrom0to11.EventhoughNarrativeandTotalcompositesarenotdefinedseparately/independentlyfromtheircomponents,theyprovideuswithanesti-mateofhowmanualandautomatedscoringwillper-formonthesedataforscenarioswhere,forexample,asingleoverallscorehastobeassigned.5AnnotationandDataStatisticsTworesearchassistants,bothco-authorsonthepa-perbutnotinvolvedinsystemdevelopment,per-formedthescoring.BothannotatorsarenativespeakersofEnglishwithmorethanfouryearsoflinguisticannotationexperience.Usingthescor-ingrubricdescribedabove,theleadannotatorcre-atedaguidelineandbenchmarkdatasetof20es-saysspanningallscorepoints.Thiswasusedfortrainingasecondannotatorandthreeresearchers(allco-authorsonthepaper),andtheresultingfeedbackwasusedtorefinetheguidelines.Tworoundsoftrainingwereconducted,with10and20essaysre-spectively.Ascorediscrepancyofmorethanonepointforanyofthetraitstriggeredadiscussioninordertobringthescorescloser(thatis,thescoresshouldonlydifferbyonepoint).Exactagreementwasnotsoughtduetotheverysubjectivenatureofjudgingstories.Oneoftheresearchersservedasad-judicatorforthediscussions.Nospecifictrainingwasperformedforthesub-traits;disagreementsonsub-traitswerediscussedonlywithintrait-leveldis-cussions.Oncethetrainingwascompleted,atotalof942essays3werescored.Ofthese,598essaysweresinglyscoredand344essaysweredouble-scoredtomeasureagreement.Scoringofeachessaythusin-volvedassigning13scores(3traits+10sub-traits)andtookapproximately10to20minutes.Table13Fordatarequestsseehttps://www.ets.org/research/contact/data_requests/.showsthedistributionofscoresacrossthescore-pointsforthethreetraits.4Score01234Org.4063217381241Dev.4084270319229Conv.-115365462-Table1:Scoredistributionsfortraits5.1Inter-annotatorAgreementTocalculateagreement,weuseQuadraticWeightedKappa(QWK)(Cohen,1968),awell-establishedmetricinassessmentthattakesintoaccountagree-mentduetochance.Itisequivalenttoaformofintra-classcorrelationand,inmostcases,iscompa-rabletoPearson’sr.TheQWKscalculatedover344doublyannotatedessaysarereportedinTable2.Thethreemaintraitsareshowninbold,thesub-traitsareprefixedwitha”:”,andthecompositetraits(Narra-tiveandTotal)areshowninitalics.Trait:Sub-traitQWKOrganization0.71:Plot0.62:Characters/Setting/POV0.65:Transitioning0.57:Sequencing0.63:Opening/Closing0.66Development0.73:Characters/Setting/Events0.68:NarrativeTechniques0.64:Language0.59:SourceMaterials0.52:Style0.58Convention0.46Narrative(Org.+Dev.)0.76Total(Org.+Dev.+Conv.)0.76Table2:Inter-annotatoragreementFortheOrganizationandDevelopmenttraits,whichcapturethenarrativeaspectsofwriting,4The“MessageinaBottle”sampleessayinSection3re-ceivedscoresofOrg.:3,Dev.:4,andConv.:3.ThehighscoreforConventionsreflectstherubric’srequirementofadequate(butnotstellar)commandoflanguageusage.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
0
7
1
5
6
7
6
2
0

/

/
t

l

a
c
_
a
_
0
0
0
0
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

96

scoringagreementisquitehigh:Organization(QWK=0.71)andDevelopment(QWK=0.73).ThisresultispromisingasitindicatesthatOrganizationandDevelopmentofstory-tellingcanbereliablyscoredbyhumans.Surprisingly,theagreementforthenon-narrativedimension,Conventions,isonlyrathermoderate(QWK=0.46).DiscussionamongthetwoannotatorsrevealedthatthecriteriaforthescorepointsinConventionswereverysubjective.Forexample,theyhaddifficultydecidingonwhenaConventionsviolation,suchasaspecificgrammati-calerror,wassevere,andhowmuchvarietyamongtheerrortypeswasneededtomovetheConventionsscorefromonescorepointtoanother.Table2showsthatagreementforallsub-traitsislowerthanagreementforthecorrespondingtrait.Sub-traitagreementresultsalsoshowthatsomestorytraitsaremorereliablyscoredthanothers.Forexample,itiseasiertoevaluategoodopeningsandclosingsinstories(QWK=0.66)thantoevaluatethequalityofstorystyle(QWK=0.58).Evaluationofstylisticdevicesandwhethertheyindeedenhancethestoryisrathersubjective.AgreementfortheNarrativeandTotalscoresisalsoquitegood.NarrativeachievesahigherQWKthanitsindividualcomponents.ThehighagreementoftheTotalscoresisinteresting,asitincorporatestheConventionsscores,onwhichsubstantialagree-mentwasnotachieved.5.2Inter-traitcorrelationsPreviousresearchonwritinghasshownthattraitsareusuallycorrelated(Leeetal.,2010;Bacha,2001;Kleinetal.,1998).Wealsoobservedthisinourdata.Inter-traitcorrelations(Pearson’sr)areshowninTa-ble3.ScoresforOrganizationandDevelopment,arehighlycorrelated(r=0.88),andeachisalsocorrelatedwithConventions(r=0.40and0.42,re-spectively),albeitnotasstrongly.Notsurprisingly,thecompositescores,NarrativeandTotal,arehighlycorrelatedtotheircomponents.6LinguisticFeaturesWeusedthescoringrubricasaguidelineforexplor-ingconstruct-relevantfeatureswithaviewtowardsautomatedanalysis.Wedevelopedsetsoffeaturesforthedifferentnarrativecharacteristics.EachsetisOrg.Dev.Conv.Nar.Tot.Org.1.000.880.400.970.93Dev.1.000.420.970.94Conv.1.000.420.64Nar.1.000.97Total1.00Table3:Scorecorrelationsfortraits,NarrativeandTotal.describedindetailinthefollowingsections.6.1TransitionFeatureSetEffectiveorganizationofideasandeventsistypi-callyachievedwiththeuseofdiscoursemarkers.Inordertoencodeeffectivetransitioning,wecom-piledatransition-cuelexicon,andconstructedfea-turesbasedonit.Wecompiledalistof234discoursecuesfromthePennDiscourseTreebank(PDTB)manual(Prasadetal.,2008),andwemanuallycollectedalistoftran-sitioncuesfromthewebbyminingwebsitesthatprovidetipsongoodessay/narrativewriting.Thelatter,withatotalof484unigramsandmulti-wordexpressions,ismorefocusedoncuesthatareusedcommonlytowritestories(e.g.,cuesthatprovidelocationalortemporalconnections)thantheformer.Usingthelexicon,weextractedtwofeaturesfromeachessay:thenumberofcuesintheessayandthatnumberdividedbytheessaylength.Thesetwofea-turesformtheTransitionfeatureset.6.2Event-orientedFeatureSetEventsarethebuildingblocksofnarratives,andgoodstory-tellinginvolvesskillfullystringingeventstogether.Weconstructanevent-basedfea-tureset,Events,tocaptureeventcohesionandco-herence.FollowingthemethodologyproposedbyChambersandJurafsky(2008),webuiltadatabaseofeventpairsfromtheGigaWordFifthEditioncor-pus(Parkeretal.,2011).Specifically,weusedtheAnnotatedGigaworddistribution(Napolesetal.,2012),whichhasbeenautomaticallyannotatedwithtypeddependencyinformation(deMarneffeandManning,2008).FollowingChambersandJu-rafsky(2008),wedefineeventsasverbsinatext(excludingbe/have/do)andpairsofeventsarede-finedasthoseverbsthatshareargumentsinthetext.

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
0
7
1
5
6
7
6
2
0

/

/
t

l

a
c
_
a
_
0
0
0
0
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

97

Inthepresentworkwelimitourscopetothefol-lowingsetof(typeddependency)arguments:nsubj,dobj,nsubjpass,xsubj,csubj,csubjpass.Toestimateeventcohesion,weextractalleventpairsfromanessayafterpre-processingitwiththeStanfordCoreNLPtoolkit(Manningetal.,2014).Eventtokensfromtheessayarelinkedintopairswhentheyshareafillerintheirarguments.Foressays,weuseStanfordco-referenceresolutionformatchingfillersofverb-argumentslots.Foralleventpairsextractedfromanessay,wequerytheeventsdatabasetoretrievethepairassociationvalue(weusethepoint-wisemutualinformation(ChurchandHanks,1990)).Wedefinethreequantitativemea-surestoencodeeventcohesion:(1)totalcountofeventpairsintheessay;(2)proportionofin-essayevent-pairsthatareactuallyfoundintheeventsdatabase;(3)proportionofin-essayevent-pairsthathavesubstantialassociation(weusePMI≥2).Wealsocaptureaspectsofcoherenteventse-quencing.Forthis,wecomputeeventchains,whicharedefinedassequencesofeventsthatsharethesameactororobject,insubjectordirectobjectrole(ChambersandJurafsky,2008).Specifically,ween-codethefollowingadditionalfeaturesintheEventsfeatureset:(4)thelengthofthelongestchainfoundintheessay(i.e.,numberofeventpairsinthechain);(5)thescoreofthelongestchain(computedasthesumofPMIvaluesforalllinks(eventpairs)ofthechain);(6)thelengthofthesecondlongestchainfoundintheessay;(7)thescoreofthehighestscor-ingchainintheessay;(8)thescoreofthesecondhighestscoringchainintheessay;(9)thescoreofthelowestscoringchainistheessay;(10)thesumofscoresforallchainsintheessay.Foreachofthefeatures4-10,wealsoproduceafeaturethatisnormalizedbythelogoftheessaylength(logword-count).6.3Subjectivity-basedFeatureSetEvaluativeandsubjectivelanguageisusedtode-scribecharacters(e.g.,foolish,smart),situations(e.g.,grand,impoverished)andcharacters’privatestates(e.g.,thoughts,beliefs,happiness,sadness)(Wiebe,1994).Theseareevidencedwhencharac-tersaredescribedandstory-linesaredeveloped.Weusetwolexiconsfordetectingsentimentandsubjectivewords:theMPQAsubjectivitylexicon(Wilsonetal.,2005)andasentimentlexicon,AS-SESS,developedforessayscoring(BeigmanKle-banovetal.,2012).MPQAassociatesaposi-tive/negative/neutralpolaritycategorytoitsentries,whileASSESSassignsapositive/negative/neutralpolarityprobabilitytoitsentries.WeconsideratermfromASSESStobepolarifthesumofpositiveandnegativeprobabilitiesisgreaterthan0.65(basedonmanualinspectionofthelexicon).Theneutralcat-egoryinMPQAcomprisessubjectivetermsthatin-dicatespeechactsandprivatestates(e.g.,view,as-sess,believe),whichisvaluableforourpurposes.TheneutralcategoryinASSESSconsistsofnon-subjectivewords(e.g.,woman,technologies),whichweignore.Thepolarentriesofthetwolexiconsdif-fertoo.ASSESSprovidespolarityforwordsbasedontheemotionstheyevoke.Forexample,alive,awakenedandbirtharehighlypositive,whilecrash,bombingsandcyclonearestronglynegative.WeconstructaSubjectivityfeaturesetcomprisedof6featuresencoding,foreachessay,thepresence(abinaryfeature)andthecountofMPQAandAS-SESSpolarwordsandMPQAneutralwords.6.4DetailingFeatureSetProvidingspecificdetails,suchasnamestochar-acters,anddescribingthestoryelements,helpsindevelopingthenarrativeandprovidingdepthtothestory.Propernouns,adjectivesandadverbscomeintoplaywhenawriterprovidesdescriptions.Thus,wecreateaDetailsfeaturesetcomprisedofatotalof6featuresencoding,separately,thepresence(abinaryfeature)andthecountofpropernouns,ad-jectivesandadverbs.6.5GraphFeatureSetGraphstatisticshavebeenreportedtobeeffectiveforcapturingdevelopmentandcoherenceinessays(MesgarandStrube,2016;Somasundaranetal.,2016).WecloselyfollowtheimplementationandfeaturesdescribedinSomasundaranetal.(2016)forcapturingnarrativedevelopment(duetospaceconstraintswereferthereadertotheoriginalpaper).Graphswereconstructedfromessaysbyrepresent-ingeachcontentword(wordtype)inasentenceasanodeinthegraph.Linksweredrawnbetweenwordsbelongingtoadjacentsentences.Featuresbasedonconnectivity,shapeandPageRankwereextracted,

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
0
7
1
5
6
7
6
2
0

/

/
t

l

a
c
_
a
_
0
0
0
0
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

98

givingatotalof19Graphfeatures.Specifically,thefeaturesusedwere:percentageofnodeswithde-greesone,twoandthree;thehighest,second-highestandmediandegreeinthegraph;thehighestdegreedividedbythetotalnumberoflinks;thetopthreePageRankvaluesinthegraph,theirrespectiveneg-ativelogarithms,andtheiressaylength-normalizedversions;themedianPageRankvalueinthegraph,itsnegativelogandessaylength-normalizedver-sion.6.6ContentwordusageContentwordusage,alsoknownaslexicaldensity(Ure,1971),referstotheamountofopen-class(con-tentwords)usedinanessay.Thegreaterproportionofcontentwordsinatext,themoredifficultorad-vanceditis(Yu,2010;O’Loughlin,1995),andithasbeensuggestedthat,foracademicdiscourse,toomuchlexicaldensityisdetrimentaltoclarity(Hal-lidayandMartin,1993).TheContentfeatureistheinverseoftheproportionofcontentwords(POStaggednoun/verb/adjective/adverb)toallwordsofanessay.6.7PronounUsageTheuseofpronounsinstory-writinghasseveralim-portantaspects.Ononehand,pronounscanindi-catethepointofview(perspective)inwhichthestoryiswritten(Fludernik,2009;Rimmon-Kenan,2002).Perspectiveisimportantinbothconstructionandcomprehensionofnarrative(Rimmon-Kenan,2002).Theuseofpronounsisalsorelatedtoreaderengagement(Mentzelletal.,1999)andim-mersion(Oatley,1999).Storieswithfirstpersonpronounsleadtostrongerreaderimmersion,whilestorieswritteninthirdpersonleadtostrongerreaderarousal(Hartungetal.,2016).Inourdata,wecountedpersonalpronouns(e.g.,I,he,it),includingcontractions(e.g.,he’s),andpossessivepronouns(e.g.,my,his).Foreachstory,thecountswerenor-malizedbyessaylength.Asinglefeature,Pronoun,wasencodedusingtheproportionoffirstandthirdpersonsingularpronounsintheessay.6.8ModalFeatureAsanaccountofconnectedevents,anarrativetyp-icallyusesthepasttense.Bycontrast,modalsap-pearbeforeuntensedverbsandgenerallyrefertothepresentorthefuture.Theyexpressthedegreeofability(can,could),probability(shall,will,would,may,might),orobligation/necessity(should,must).Anoverabundanceofmodalsinanessaymightbeanindicationthatitisnotanarrativeorisonlymarginallyso.ThisideaiscapturedintheModalfeature,whichistheproportionofmodalstoallwordsofanessay.6.9StativeVerbsStativeverbsareverbsthatdescribestates,andaretypicallycontrastedwithdynamicverbs,whichdescribeevents(actionsandactivities)(Vendler,1967).Innarrativetexts,stativeverbsareoftenusedindescriptivepassages(Smith,2005),buttheydonotcontributetotheprogressionofeventsinastory(Almeida,1995;Prince,1973).Ourconjectureisthatifatextcontainstoomanystativeverbs,thenitmaynothaveenoughofaneventsequence,whichisahallmarkofanarrative.Wecompiledalistof62Englishstativeverbs(e.g.,know,own,resemble,prefer)fromvariouslinguisticresourcesontheweb.Duringprocessingofanessay,weidentifyverbsbyPOStags,andstativeverbsvialist-lookup.Sepa-rately,weidentifycopularusesof“tobe”andcountthemasstatives.Ourfeature,Statives,isthepropor-tionofstativeverbsoutofallverbsinanessay.7ExperimentsOurexperimentsinvestigatethefollowingques-tions:(1)Isitpossibletoscorenarrativequalitytraitsinessaysusingautomatedmethods?(2)Whichofourfeaturesetsareeffectiveforscoringnarrativequalitytraits?(3)Howdoournarrative-inspiredfea-turesperformascomparedtoabaselinethatiscom-petitivebutdoesnotspecificallyaddressthenarra-tiveconstruct?(4)Howdoesoverallscoringofnar-rativeessaysdifferfromtraitscoring?(5)Whatarethebestfeaturecombinationsfornarrativescoring?Toanswerthesequestions,webuiltandevaluatedscoringsystemsforeachtrait,overallNarrativeandTotalscores.Ineachcase,weperformeddetailedablationstudiesatthefeature-setlevel.Wehave10featuressets(9featuresetsdescribedaboveplusabaselinefeatureset);thus1024featuresetcombi-nationswereinvestigated.Asourtraitsarehighlycorrelated,weusedallofourfeaturesforbuilding

l

D
o
w
n
o
a
d
e
d

f
r
o
m
h

t
t

p

:
/
/

d
i
r
e
c
t
.

m

i
t
.

e
d
u

/
t

a
c
l
/

l

a
r
t
i
c
e

p
d

f
/

d
o

i
/

.

1
0
1
1
6
2

/
t

l

a
c
_
a
_
0
0
0
0
7
1
5
6
7
6
2
0

/

/
t

l

a
c
_
a
_
0
0
0
0
7
p
d

.

f

b
y
g
u
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

99

systemsforeachtrait,leavingittotheablationpro-cesstorevealthemostpromisingfeaturesetcombi-nation.7.1BaselineE-rater(AttaliandBurstein,2006),astate-of-the-artcommercialsystemforautomaticessayscor-ing,usesacomprehensivesuiteoffeaturescover-ingmanyaspectsofwritingquality,suchasgram-mar,languageuse,mechanics,fluency,style,or-ganization,anddevelopment.Weuseallofthefeaturesfrome-rater,atotalof10features,astheBaselinefeatureset.Whilee-raterisnotdesignedfortraitscoring,itincorporatesfeaturesthatad-dressthetraitsofinterestinthiswork.Develop-mentandOrganizationarecapturedbyfeaturesthat,amongotherthings,countandencodethenum-berandlengthofdiscourseelementssuchasthe-sis,mainpoints,supportingideas,andconclusion(Bursteinetal.,2003).7.2ResultsWeexperimentedwithLinearRegression,Sup-portVectorRegression(RBFkernel),RandomForests,andElasticNetlearnersfromthescikit-learntoolkit(Pedregosaetal.,2011),with10-foldcross-validationon942essays.AsLinearRegres-sionresultswereconsistentlybetter,bothforBase-lineandforourfeatures,weonlyreportresultsfromthislearner.Trimmingofthepredictedlin-earregressionoutputwasperformed;thatis,ifthepredictedscorewasabovethemaxscore,orbe-lowtheminscore,itwasassignedthemaxortheminscore,respectively.Bootstrappingexperiments(Berg-Kirkpatricketal.,2012;EfronandTibshirani,1994)wereperformedtotestforstatisticalsignifi-cance(weused1000bootstrapsamples).Foreachtrait-scoringexperiment,weextractedallthefeatures(describedinSection6)fromthees-saysandusedthecorrespondinghumantraitscoresfortrainingandtesting.Thus,theinputessaysandtheirfeaturesarethesameacrossallexperiments.Whatvariesisthetraittobepredictedand,conse-quently,theperformanceoffeaturesetsaswellasthebestfeaturecombination.Table4showstheperformanceofBaseline,theindividualfeatures,allfeatures,andthebestfea-turecombination,forallthreetraits,overallNar-rativeandTotalscoring.Performanceofindivid-ualfeaturesthatexhibitsomepredictivepowerisalsoshowninthetable.Thesingle-measurefea-turesModal,Pronoun,Content,andStativeshownopredictivepowerindividually(QWKs=0)andareomittedfromthetableforspacereasons.OrganizationUnderstandably,BaselineperformspoorlyforscoringOrganizationinnarratives,asitsfocusisevaluatingoverallwritingproficiency.In-dividualfeaturesets,Details,Transition,EventsandSubjectivity,havesomepredictivecapability,butitisnotveryhigh.Thisisnotsurprisingastheyeachencodeonlyaspecificaspectofnarrativequality.TheGraphfeaturesetoutperformstheBaselinefea-tureset,butthedifferenceisnotstatisticallysignif-icant.Whenallfeaturesareputtogether(Allfea-tures),theQWKobtainedis0.56,whichissubstan-tiallyhigherthanBaseline(p<0.001),butasnotasgoodasthebestperformingfeatureset.Thebestcombinationofourproposedfeatures(Details+Modal+Pronoun+Content+Graph+Sub-jectivity+Transition)achievesaQWKof0.60,sub-stantiallybetterperformancethanBaseline(p<0.001),reflectinganimprovementof13percentagepoints.Thisresultindicatesthatdevelopingfeaturestoencodenarrativequalityisimportantforevaluat-ingOrganizationinnarrativeessays.Mostofourexploredfeaturesets,eventhosethatdonotindivid-uallyperformwell,arepartofthebestsystem.TwofeaturesetsthatarenotpresentinthebestfeaturecombinationareStativesandEvents.Theexclusionoftheformerisreasonable–stativeverbsarere-latedtostorydevelopment.TheexclusionofEventsissurprising,asitintuitivelyencodesthecoherenceofevents,impactingtheorganizationoftheessay.ThebestfeaturecombinationthatincludesEventsachievesaQWKof0.58.TheBaselinefeaturesarenotpartofthebestsystem,confirmingourintuitionthatfeaturesthatspecificallyencodenarrativequal-ityareneededforthisnarrativetrait.Fromourablationresults,weinspectedthetop10best-performingfeaturesetcombinationsinordertodeterminewhichfeaturesconsistentlyproducegoodsystems.Pronoun,Content,GraphandSubjectivitywereapartofall10ofthe10topsystems,Transitionwasin9,Detailswasin7andModalwasin6featuresets.Thissuggeststhatsingletonfeaturessuchas l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 0 7 1 5 6 7 6 2 0 / / t l a c _ a _ 0 0 0 0 7 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 100 FeaturesetOrganizationDevelopmentConventionsNarrativeTotalBaseline0.470.510.440.530.60Details0.360.410.190.390.41Transition0.390.500.230.490.48Events0.390.430.260.450.45Subjectivity0.410.470.200.470.46Graph0.490.540.170.560.54Allfeatures0.560.630.460.650.67Bestfeaturecombination*0.600.660.500.670.70Table4:Performance(QWK)onpredictingtraitsandNarrativeandTotalscores;Bestfeaturecombinations:*ForOrganization:Details+Modal+Pronoun+Content+Graph+Subjectivity+Transition;*ForDevelopment:Details+Modal+Content+Graph+Statives+Transition;*ForConventions:Baseline+Details+Graph;*ForNarrative:Baseline+Details+Modal+Pronoun+Content+Graph+Statives+Subjectivity+Transition;*ForTotal:Details+Baseline+Modal+Content+Graph+Subjectivity+TransitionPronounandContentareindeeduseful,eventhoughtheycannotbeusedinisolation.DevelopmentWeobservesimilartrendsseenwiththeOrganizationtrait–theBaselinefeaturesetdoesnotcaptureDevelopmentveryeffectively,andsomeindividualfeaturesetshavepredictivepowerforthistraitbutperformpoorly.Graphoutper-formsBaseline,butthisisnotstatisticallysignifi-cant.UsingalloftheavailablefeaturesproducesQWK=0.63,asignificantimprovementoverBase-line,(p<0.001).Thebestsystemachievesaper-formanceofQWK=0.66,outperformingBaselineby15percentagepoints(p<0.001).Thebestfeaturecombinationcontains6ofthe9proposedfeaturesanddiffersfromthebestfeaturesforOrganizationbytheinclusionofStativesandtheexclusionofPro-nounandSubjectivity.Content,GraphandTransi-tionalsooccurinallofthetop10best-performingsystems.ConventionsEventhoughscoringlanguagecon-ventionsisnotthefocusofthiswork,wewerecu-rioushowwellourfeaturesevaluatethisdimension.Weobservethatoverallperformanceislowerthanfortheothertwotraits,whichistobeexpectedaswedonothavehighhumaninter-rateragreementtostartwith.TheBaselinee-raterfeaturesetisthebestperformingindividualfeatureset,andthenarrative-specificfeaturesperformratherpoorly.Usingallfeatures(QWK=0.46)onlyproducesa2pointim-provementoverBaseline,whichisnotstatisticallysignificant.AddingDetailsandGraphtoBaselineproducesthebestsystem,animprovementof6per-centagepoints,QWK=0.50,(p<0.001).Allthreefeaturesarealsotheonlyfeaturesetsthatconsis-tentlyoccurinallthe10top-performingsystems.NarrativeIngeneral,theresultsforNarrativescoringfollowthesametrendsastheresultsforOr-ganization.GraphfeaturesoutperformtheBaselinesignificantly(p<0.05).Usingallavailablefeaturesproducesasignificantimprovementinperformance(0.65QWK;p<0.001).Baselinefeaturesarenowapartofthebestfeaturesetcombination(Baseline+Details+Modal+Pronoun+Content+Graph+Sta-tives+Subjectivity+Transition),whichachievesaQWKof0.67,animprovementof14percentagepoints(p<0.001).ThebestfeaturecombinationwithouttheBaselinefeaturesachievesQWK=0.66,andthisisnotstatisticallydifferentfromtheper-formanceofthebestsystem.Modal,Content,andGraphoccurinall10,andSubjectivityandTransi-tionoccurinnineofthetop10featurecombinations.TotalForTotalscoring,theBaselinefeaturesetisthebestperformingindividualfeatureset,withQWK=0.60.Usingallfeaturesproducesasignifi-cant(p<0.001)performanceboostat0.67QWK.Thebestfeaturecombination(Details+Baseline+Modal+Content+Graph+Subjectivity+Transition)improvesoverBaselineby10percentagepoints, l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 0 7 1 5 6 7 6 2 0 / / t l a c _ a _ 0 0 0 0 7 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 101 withaQWKof0.70(p<0.001).ThebestresultobtainedbyafeaturecombinationwithoutBaseline(Details+Modal+Content+Graph+Subjectivity+Transition)isQWK=0.68,whichissignificantlyhigherthantheBaselineperformance(p<0.001),indicatingthatourfeaturesareabletoeffectivelyscoreessaysbythemselves,aswellasincombina-tionwiththeBaselinefeaturestogetanimprovedsystem.ExceptforDetailsandTransition,allfea-turesofthebestsystemalsooccurinallthetop-10systems.8AnalysisandDiscussionTheresultsshowthatourproposedfeaturesvaryineffectiveness.GraphfeaturesprovedtobemoreeffectivethanTransition,SubjectivityandDetails.Theeffectivenessofsingle-measurefeatures(Pro-noun,Statives,ContentandModal)wasevidentbytheirinclusioninthebestcombinationmodels.AlthoughEventswasreasonablypredictiveonitsownforOrganizationandDevelopment,itwasnotfoundinthebestperformingcombinations,nordiditparticipateinthetop10featuresetsforanyofthetraits.Thissurprisingresultsuggeststhatotherfeatures,whicharecorrelatedwithEvents,mustbestrongerindicatorsofnarrativecompetence.Ourresultsalsoshownoclearsegregationoffea-turesbytrait,asmostofthefeaturesappearinginthebestmodelsforOrganizationandDevelopmentwerethesame.Weattributethistothehighcorrela-tionbetweenthehumanscoresforthetwotraits;amodelthatisgoodforonewillbegoodfortheother.8.1CorrelationStudyWeperformedcorrelationanalysistotestifourintu-itionsregardingthefeaturesets,asdiscussedinSec-tion6,aresupportedbythedata,andtostudytheeffectoflength.Lengthisawell-knownconfound-ingfactorinessayscoringaslongeressaystendtogethigherscores(ChodorowandBurstein,2004).Thisalsoappliestonarratives,asitisdifficulttotellagoodstorywithoutusingasufficientamountofwords.Inourdata,Pearsoncorrelationsofessaylengthwithhumanscoresare:Conv.:0.35,Dev.:0.58,Org.:0.54.However,itisimportantthatourencodedfeaturescapturemorethanjustthelengthofthenarrative.Inordertotestthis,weconductedcor-FeatOrgDevConvBase.0.19(0.28)0.19(0.41)0.39(0.43)Detl.0.17(0.21)0.16(0.20)0.08(0.18)Trans.-0.10(0.23)-0.15(0.22)-0.05(0.27)Event0.20(0.27)0.19(0.26)0.14(0.19)Subj.0.17(0.48)0.19(0.52)0.07(0.12)Graph0.36(0.61)0.39(0.65)0.06(0.28)Cont.-0.19(-0.30)-0.20(-0.31)-0.20(-0.28)Pron.0.19(0.18)0.17(0.17)0.12(0.10)Modal-0.17(-0.17)-0.21(-0.17)-0.01(-0.18)Statv.-0.10(-0.18)-0.10(-0.18)-0.05(-0.11)Table5:Maximalpartialcorrelationswithscores,con-trollingforlength(simplecorrelationsinparentheses).relationanalysisbetweeneachfeatureandhumantraitscorebypartiallingoutlength.Table5showsthemaximalpartialcorrelationofeachfeaturesetwiththehumanscores.Forfeaturesetsthatcontainonlyasinglefeature(e.g.,Modal),wedirectlyreportthepartialcorrelationforthatfea-ture.Forfeaturesetsthatcontainmultiplefeatures,duetospaceconstraints,wereportthemaximalpar-tialcorrelationachievedbyanyfeaturewithinthatset5.Thevalueintheparenthesesindicatesthecor-respondingfeature’ssimplecorrelationwithscore.WeobservethatforallfeaturesexceptPronounandModal,thecorrelationwithscoredropswhenlengthisaccountedfor,indicatingtheinfluenceofessaylengthonscores.Thiseffectismorepro-nouncedinfeaturesthatemploycounts(e.g.,countsofadverbs),asmoresupportisfoundinlongeres-says.Thebaselineiscorrelatedmorewithconven-tionsthanthetwonarrativetraits.Anoppositeeffectisseenforournarrative-specificfeatures.Theneg-ativesignforStatives,ContentandModalsupportsourintuitionsregardingthesefeatures–moreuseofthesereducesstoryquality.8.2ErrorAnalysisTable6showsthehuman-machineconfusionmatrixforDevelopmenttraitscores.Confusionmatricesforothertraitsalsoshowasimilartrend.Weobservethatmostoftheerrorsinscorepredictionareatadja-centscorepoints.Thisisperhapsinpartduetoourhuman-humanagreementcriterionduringdataan-5Notethat,withinaset,differentfeaturesmighthavemaxi-mumvaluesfordifferenttraits l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 0 7 1 5 6 7 6 2 0 / / t l a c _ a _ 0 0 0 0 7 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 102 HumanMachine01234total089185040182843508421815910112703008320531319400912595229Table6:Human-machineconfusionmatrixforDevelop-menttraitsscoresnotation–disagreementofonescorepointdidnottriggeradjudication.Thesystemencountersmoredifficultypredictingthecorrectscoresattheendsofthescale(scorepoints0-1andscorepoint4).Thedifficultywithscores0and1ispartiallyattributabletothesmallamountoftrainingdataforthesescores.Inamoredetailedanalysisofthehuman-machinediscrepancies,wefirstfocusonthefortyessaysthatwererated0bytheannotators(Table6,row1).Themachineandhumanagreedononlyeightofthese.Alleightarenon-narratives,andsevenofthemareextremelyshort(3to51words).Twentysevenoftheremaining32werewell-written,long,non-narrativeessays(andthusoff-purposeaccord-ingtoourrubric).Forexample,oneoftheessays,whichwaswrittenfora“describeatravelexperi-ence”prompt,presentedadiscussionabouttheedu-cationaladvantagesoftravelingeneral.Next,weconsiderthe84essays(allnarratives)thatwererated1bytheannotators(row2ofTable6).Ofthese,theeightthatwerescored0bythema-chinewererathershort(length15to69words)andpoorlywritten.Thehumanandthemachineagreedon28essays,whoseaveragelengthwassomewhatlonger(93words).Forthe43essaysthatthema-chineover-scoredby1point,theaveragelengthwas154words.Allfiveessaysthatthemachineover-scoredby2pointswerelong,rangingfrom200to421words,butwereeitherexpositoryessaysorwereverypoorlywritten.Thisscoringpatternsug-geststhathuman-machinedisagreementisatleastpartiallyrootedinessaylength.Fortheessaysthatwererated4bythehumanan-notators(Table6,lastrow),themachineunderesti-matednineessaysby2points.Theseessayswererelativelyshort(from135to383words).Forcom-parison,inthe125essayswherethemachineunder-estimatedthehumanscorebyonlyonepoint,theaveragelengthwas418words.Forthe95essaysthatwerescored4byboththehumanandmachine,theaveragelengthwas653words.Asimilareffectoflengthwasseenamongtheessaysscored2and3bythehumanannotators.Theerroranalysisatthelowestrangeofhumanscoresdemonstratesthatanaccuratesystemmustbeabletoproperlyhandlenon-narrativeessays.Onepossiblesolutionistoconsidercouplingoursys-temwithabinarynarrativeclassifierthatwouldflagnon-narrativeessays.Furtherresearchisalsoclearlyneededtoreducetheinfluenceofessaylengthonautomatedscoring.Thiswasparticularlydemon-stratedforessayswherewritersmanagedtoproducewellwritten,butveryshort,storiesthatwereunder-scoredbythemachine.9ConclusionsandFutureWorkInthisarticle,wehavepresentedevidencethathu-manscanreliablyscoredevelopmentandorganiza-tiontraitsandtheirsub-traitsinnarratives,andthatsomesub-traitscanbemorereliablyscoredthanoth-ers.Wehavealsopresentedevidencethatautomatedsystemswithnarrative-specificfeaturescanreliablyscorenarrativequalitytraitsandcandososignifi-cantlybetterthanastate-of-the-artsystemdesignedtoassessgeneralwritingproficiency.Scoringnarrativeessaysischallengingbecausetypicallythereisnorightanswer,noranylimittothecreativepossibilitiesineffectivestory-telling.Inthiswork,wehaveexploredonlytheproverbialtipoftheicebergintermsoffeaturesandmethodsforscoringnarrativeessays.Whileweareencouragedbyourresults,webelievethatfurtherimprovementwillre-quiremoreelaboraterepresentationsofstorycontentandmeaning.Accordingly,weplantoexploreau-tomatedevaluationofnarrativesub-traits,includingplot,pointofviewandcharacterdevelopment,andoftherelationshipsamongthem.ReferencesCaraleeJ.Adams.2014.Essay-gradingsoftwareseenastime-savingtool.EducationWeek,33(25):13–15. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 0 7 1 5 6 7 6 2 0 / / t l a c _ a _ 0 0 0 0 7 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 103 ApoorvAgarwal,AnupKotalwar,andOwenRambow.2013.Automaticextractionofsocialnetworksfromliterarytext:AcasestudyonAliceinWonderland.InProceedingsofthe6thInternationalJointConferenceonNaturalLanguageProcessing,pages1202–1208.MichaelJ.Almeida.1995.Timeinnarratives.DeixisinNarrative:ACognitiveSciencePerspective,pages159–189.TeresaM.Amabile.1982.Socialpsychologyofcreativ-ity:Aconsensualassessmenttechnique.JournalofPersonalityandSocialPsychology,43(5):997–1013.YigalAttaliandJillBurstein.2006.Automatedessayscoringwithe-raterv.2.0.JournalofTechnology,Learning,andAssessment,4:3.NahlaBacha.2001.Writingevaluation:Whatcanan-alyticversusholisticessayscoringtellus?System,29(3):371–383.NiranjanBalasubramanian,StephenSoderland,Mausam,andOrenEtzioni.2013.Generatingcoherenteventschemasatscale.InProceedingsofthe2013Confer-enceonEmpiricalMethodsinNaturalLanguagePro-cessing,pages1721–1731,Seattle,WA,October.DavidBamman,TedUnderwood,andNoahA.Smith.2014.ABayesianmixedeffectsmodelofliterarycharacter.InProceedingsofthe52ndAnnualMeet-ingoftheAssociationforComputationalLinguistics,pages370–379,Baltimore,MA,USA,June.BeataBeigmanKlebanov,JillBurstein,NitinMadnani,AdamFaulkner,andJoelTetreault.2012.Build-ingsubjectivitylexicon(s)fromscratchforessaydata.ComputationalLinguisticsandIntelligentTextPro-cessing,pages591–602.BeataBeigmanKlebanov,NitinMadnani,JillBurstein,andSwapnaSomasundaran.2014.Contentimpor-tancemodelsforscoringwritingfromsources.InProceedingsofthe52ndAnnualMeetingoftheAsso-ciationforComputationalLinguistics(ShortPapers),pages247–252.TaylorBerg-Kirkpatrick,DavidBurkett,andDanKlein.2012.Anempiricalinvestigationofstatisticalsig-nificanceinNLP.InProceedingsofthe2012JointConferenceonEmpiricalMethodsinNaturalLan-guageProcessingandComputationalNaturalLan-guageLearning,pages995–1005.AssociationforComputationalLinguistics.ThomasBogel,JannikStrotgen,andMichaelGertz.2014.Computationalnarratology:Extractingtenseclustersfromnarrativetexts.InNicolettaCalzo-lari(ConferenceChair),KhalidChoukri,ThierryDeclerck,HrafnLoftsson,BenteMaegaard,JosephMariani,AsuncionMoreno,JanOdijk,andSteliosPiperidis,editors,ProceedingsoftheNinthInterna-tionalConferenceonLanguageResourcesandEval-uation,Reykjavik,Iceland,May.EuropeanLanguageResourcesAssociation(ELRA).HeinBroekkamp,TanjaJanssen,andHuubvandenBergh.2009.Istherearelationshipbetweenlitera-turereadingandcreativewriting?JournalofCreativeBehavior,43(4):281–296.JillBurstein,DanielMarcu,andKevinKnight.2003.Findingthewritestuff:Automaticidentificationofdiscoursestructureinstudentessays.IEEEIntelligentSystems,18(1):32–39.AsliCelikyilmaz,DilekHakkani-Tur,HuaHe,GregKondrak,andDenilsonBarbosa.2010.Theactor-topicmodelforextractingsocialnetworksinliterarynarrative.InInProceedingsofthe24thAnnualCon-ferenceonNeuralInformationProcessingSystems.NathanaelChambersandDanJurafsky.2008.Unsuper-visedlearningofnarrativeeventchains.InProceed-ingsofACL-08:HLT,pages789–797.NathanaelChambersandDanJurafsky.2009.Unsu-pervisedlearningofnarrativeschemasandtheirpar-ticipants.InProceedingsofthe47thAnnualMeetingoftheACLandthe4thIJCNLPoftheAFNLP,pages602–610.NathanaelChambers.2013.Eventschemainductionwithaprobabilisticentity-drivenmodel.InProceed-ingsofthe2013ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages1798–1807,Seattle,WA,October.EugeneCharniak.1972.Towardamodelofchildren’sstorycomprehension.Technicalreport,MIT,Cam-bridge,MA,USA.SnigdhaChaturvedi,DanGoldwasser,andHalDaumeIII.2015.Ask,andshallyoureceive?:Understandingdesirefulfillmentinnaturallanguagetext.arXivpreprintarXiv:1511.09460.SnigdhaChaturvedi,ShashankSrivastava,HalDaum´eIII,andChrisDyer.2016.Modelingevolvingrelationshipsbetweencharactersinliterarynovels.InProceedingsoftheThirtiethAssociationfortheAdvancementofArtificialIntelligenceConferenceonArtificialIntelligence,pages2704–2710.Associ-ationfortheAdvancementofArtificialIntelligencePress.MartinChodorowandJillBurstein.2004.Beyondessaylength:Evaluatinge-rater’sperformanceonTOEFLessays.TOEFLresearchreport73,EducationalTest-ingService,Princeton,NJ,USA.KennethWardChurchandPatrickHanks.1990.Wordassociationnorms,mutualinformation,andlexicogra-phy.ComputationalLinguistics,16(1):22–29,March.JacobCohen.1968.Weightedkappa:Nominalscaleagreementprovisionforscaleddisagreementorpartialcredit.PsychologicalBulletin,70(4):213. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 0 7 1 5 6 7 6 2 0 / / t l a c _ a _ 0 0 0 0 7 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 104 Marie-CatherinedeMarneffeandChristopherD.Man-ning.2008.TheStanfordtypeddependenciesrepre-sentation.InCOLINGWorkshoponCross-frameworkandCross-domainParserEvaluation.BradleyEfronandRobertJ.Tibshirani.1994.Anintro-ductiontothebootstrap.CRCpress.ScottElliot.2003.Intellimetric:Fromheretovalid-ity.Automatedessayscoring:Across-disciplinaryperspective,pages71–86.MichaElsner.2012.Character-basedkernelsfornovel-isticplotstructure.InProceedingsofthe13thConfer-enceoftheEuropeanChapteroftheAssociationforComputationalLinguistics,pages634–644.Associa-tionforComputationalLinguistics.DavidK.Elson,NicholasDames,andKathleenR.McK-eown.2010.Extractingsocialnetworksfromliteraryfiction.InProceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics,pages138–147.AssociationforComputationalLinguistics.DavidK.Elson.2012.ModelingNarrativeDiscourse.Ph.D.thesis,ColumbiaUniversity.NouraFarra,SwapnaSomasundaran,andJillBurstein.2015.Scoringpersuasiveessaysusingopinionsandtheirtargets.InTenthWorkshoponInnovativeUseofNLPforBuildingEducationalApplications.MarkAlanFinlayson.2012.Learningnarrativestruc-turefromannotatedfolktales.Ph.D.thesis,Mas-sachusettsInstituteofTechnology.MarkA.Finlayson.2013.Asurveyofcorporaincompu-tationalandcognitivenarrativescience.SpracheUndDatenverarbeitung(InternationalJournalforLan-guageDataProcessing),37(1–2).MonikaFludernik.2009.AnIntroductiontoNarratol-ogy.Routledge,London.RonaldB.GillamandNilsA.Pearson.2004.TNL:TestofNarrativeLanguage.Austin,TX:Pro-Ed.AmitGoyal,EllenRiloff,andHalDaum´eIII.2010.Au-tomaticallyproducingplotunitrepresentationsfornar-rativetext.InProceedingsofthe2010ConferenceonEmpircalMethodsinNaturalLanguageProcessing,Boston,MA.MichaelA.K.HallidayandJamesR.Martin.1993.WritingScience:LiteracyandDiscursivePower.TheFalmerPress,London.HarryHalpinandJohannaD.Moore.2006.Eventex-tractioninaplotadviceagent.InProceedingsofthe21stInternationalConferenceonComputationalLin-guisticsandthe44thAnnualMeetingoftheAssoci-ationforComputationalLinguistics,pages857–864,Stroudsburg,PA,USA.AssociationforComputationalLinguistics.FranziskaHartung,MichaelBurke,PeterHagoort,andRoelM.Willems.2016.Takingperspective:Personalpronounsaffectexperientialaspectsofliteraryread-ing.PLoSONE,5(11).BramJans,StevenBethard,IvanVulic,andMarieFrancineMoens.2012.SkipN-gramsandrankingfunctionsforpredictingscriptevents.InProceedingsofthe13thConferenceoftheEuro-peanChapteroftheAssociationforComputationalLinguistics,pages336–344,Avignon,France,April.StephenP.Klein,BrianM.Stecher,RichardJ.Shavel-son,DanielMcCaffrey,TorOrmseth,RobertM.Bell,KathyComfort,andAbdulR.Othman.1998.An-alyticversusholisticscoringofscienceperformancetasks.AppliedMeasurementinEducation,11(2):121–137.ThomasK.Landauer,DarrellLaham,andPeterW.Foltz.2003.Automatedscoringandannotationofessayswiththeintelligentessayassessor.Automatedessayscoring:Across-disciplinaryperspective,pages87–112.Yong-WonLee,ClaudiaGentile,andRobertKantor.2010.Towardautomatedmulti-traitscoringofessays:Investigatinglinksamongholistic,analytic,andtextfeaturescores.AppliedLinguistics,31(3):391–417.WendyG.Lehnert.1981.Plotunitsandnarrativesum-marization.CognitiveScience,5(4):293–331.InderjeetMani.2012.Computationalmodelingofnar-rative.SynthesisLecturesonHumanLanguageTech-nologies,5(3):1–142.ChristopherD.Manning,MihaiSurdeanu,JohnBauer,JennyFinkel,StevenJ.Bethard,andDavidMcClosky.2014.TheStanfordCoreNLPnaturallanguagepro-cessingtoolkit.InAssociationforComputationalLin-guisticsSystemDemonstrations,pages55–60.NeilMcIntyreandMirellaLapata.2010.Plotinductionandevolutionarysearchforstorygeneration.InPro-ceedingsofthe48thAnnualMeetingoftheAssocia-tionforComputationalLinguistics,pages1562–1572,Uppsala,Sweden.AnneMcKeough,RandyGenereux,andJoanJeary.2006.Structure,content,andlanguageusage:Howdoexceptionalandaveragestorywritersdiffer?HighAbilityStudies,17(2):203–223.PhyllisMentzell,ElizabethVanderLei,andDuaneH.Roen.1999.AudienceConsiderationsforEvaluatingWriting.EvaluatingWriting:TheRoleofTeacher’sKnowledgeaboutText,Learning,andCulture.MohsenMesgarandMichaelStrube.2016.Lexicalcoherencegraphmodelingusingwordembeddings.InProceedingsofthe2016ConferenceoftheNorthAmericanChapteroftheAssociationforComputa-tionalLinguistics:HumanLanguageTechnologies,pages1414–1423. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 0 7 1 5 6 7 6 2 0 / / t l a c _ a _ 0 0 0 0 7 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 105 JonMillerandRobinChapman.1985.SystematicAnal-ysisofLanguageTranscripts.Madison,WI:LanguageAnalysisLaboratory.NasrinMostafazadeh,NathanaelChambers,XiaodongHe,DeviParikh,DhruvBatra,PushmeetKohliLucyVanderwende,andJamesAllen.2016.Acorpusandclozeevaluationfordeeperunderstandingofcom-monsensestories.InProceedingsofthe2016Confer-enceoftheNorthAmericanChapteroftheAssocia-tionforComputationalLinguistics:HumanLanguageTechnologies,pages839–849,SanDiego,California,June12-17.AssociationforComputationalLinguis-tics.CourtneyNapoles,MatthewGormley,andBenjaminVanDurme.2012.AnnotatedGigaword.InProceed-ingsoftheJointWorkshoponAutomaticKnowledgeBaseConstruction&Web-scaleKnowledgeExtrac-tion,pages95–100.HuyNguyenandDianeJ.Litman.2016.Improv-ingArgumentMininginStudentEssaysbyLearningandExploitingArgumentIndicatorsversusEssayTop-ics.InProceedingsoftheTwenty-NinthInternationalFloridaArtificialIntelligenceResearchSocietyCon-ference,pages485–490.Kiem-HieuNguyen,XavierTannier,OlivierFerret,andRomaricBesancon.2015.Generativeeventschemainductionwithentitydisambiguation.InProceed-ingsofthe53rdAnnualMeetingoftheAssociationforComputationalLinguisticsandthe7thInterna-tionalJointConferenceonNaturalLanguageProcess-ing,pages188–197,Beijing,China,July.KeithOatley.1999.Meetingsofminds:Dialogue,sym-pathy,andidentification,inreadingfiction.Poetics,26:439–454.NatalieG.OlinghouseandJacquelineT.Leaird.2009.Therelationshipbetweenmeasuresofvocabularyandnarrativewritingqualityinsecond-andfourth-gradestudents.Reading&Writing,22(5):545–565.KieranO’Loughlin.1995.Lexicaldensityincandidateoutputondirectandsemi-directversionsofanoralproficiencytest.LanguageTesting,12:217–237.JessicaOuyangandKathleenMcKeown.2015.Mod-elingreportableeventsasturningpointsinnarrative.InProceedingsofthe2015ConferenceonEmpiri-calMethodsinNaturalLanguageProcessing,pages2149–2158,Lisbon,Portugal.CeciliaOvesdotterAlmandRichardSproat.2005.Emo-tionalsequencinganddevelopmentinfairytales.InInternationalConferenceonAffectiveComputingandIntelligentInteraction,pages668–674.Springer.EllisBattenPage.1994.Computergradingofstudentprose,usingmodernconceptsandsoftware.TheJour-nalofExperimentalEducation,62(2):127–142.RobertParker,DavidGraff,JunboKong,KeChen,andKazuakiMaeda.2011.EnglishGigawordFifthEdi-tion.Philadelphia:LinguisticDataConsortium.RebeccaJ.Passonneau,AdamGoodkind,andElenaT.Levy.2007.AnnotationofChildren’sOralNarra-tions:ModelingEmergentNarrativeSkillsforCom-putationalApplications.InProceedingsoftheTwen-tiethInternationalFloridaArtificialIntelligenceRe-searchSocietyConference,pages253–258.F.Pedregosa,G.Varoquaux,A.Gramfort,V.Michel,B.Thirion,O.Grisel,M.Blondel,P.Prettenhofer,R.Weiss,V.Dubourg,J.Vanderplas,A.Passos,D.Cournapeau,M.Brucher,M.Perrot,andE.Duches-nay.2011.Scikit-learn:MachinelearninginPython.JournalofMachineLearningResearch,12:2825–2830.DouglasB.Petersen,SandraLaingGillam,andRonaldB.Gillam.2008.Emergingproceduresinnar-rativeassessment:Theindexofnarrativecomplexity.TopicsinLanguageDisorders,28(2):115–130.RashmiPrasad,NikhilDinesh,AlanLee,EleniMilt-sakaki,LivioRobaldo,AravindK.Joshi,andBon-nieL.Webber.2008.ThePennDiscourseTreeBank2.0.InProceedingsoftheSixthInternationalConfer-enceonLanguageResourcesandEvaluation.GeraldPrince.1973.AGrammarofStories:AnIntro-duction.Mouton,TheHague.AndrewJ.Reagan,LewisMitchell,DilanKiley,Christo-pherM.Danforth,andPeterSheridanDodds.2016.Theemotionalarcsofstoriesaredominatedbysixba-sicshapes.TheEuropeanPhysicalJournalDataSci-ence,5(1):31.ShlomithRimmon-Kenan.2002.NarrativeFiction:ContemporaryPoetics.Routledge,London.RogerC.SchankandRobertP.Abelson.1977.Scripts,Plans,GoalsandUnderstanding:AnInquiryintoHu-manKnowledgeStructures.LawrenceErlbaumAsso-ciates,Hillsdale,NJ,USA.MarkD.ShermisandJillBurstein.2013.HandbookofAutomatedEssayEvaluation:CurrentApplicationsandNewDirections.Routledge.CarlotaS.Smith.2005.Aspectualentitiesandtenseindiscourse.InPaulaKempchinskyandRoumyanaSlabakova,editors,AspectualInquiries,pages223–237.SpringerNetherlands,Dordrecht.SwapnaSomasundaran,JillBurstein,andMartinChodorow.2014.Lexicalchainingformeasuringdiscoursecoherencequalityintest-takeressays.InProceedingsofthe25thInternationalConferenceonComputationalLinguistics:TechnicalPapers.SwapnaSomasundaran,ChongMinLee,MartinChodorow,andXinhaoWang.2015.Automatedscor-ingofpicture-basedstorynarration.InProceedings l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . e d u / t a c l / l a r t i c e - p d f / d o i / . 1 0 1 1 6 2 / t l a c _ a _ 0 0 0 0 7 1 5 6 7 6 2 0 / / t l a c _ a _ 0 0 0 0 7 p d . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 106 oftheTenthWorkshoponInnovativeUseofNLPforBuildingEducationalApplications.SwapnaSomasundaran,BrianRiordan,BinodGyawali,andSu-YounYoon.2016.Evaluatingargumentativeandnarrativeessaysusinggraphs.InProceedingsofthe26thInternationalConferenceonComputationalLinguistics:TechnicalPapers,pages1568–1578,Os-aka,Japan,December.ChristianStabandIrynaGurevych.2017.Recogniz-ingInsufficientlySupportedArgumentsinArgumen-tativeEssays.InProceedingsofthe15thConferenceoftheEuropeanChapteroftheAssociationforCom-putationalLinguistics:LongPapers,volume1.NancyL.SteinandChristineG.Glenn.1979.AnAnal-ysisofStoryComprehensioninElementarySchoolChildren:Atestofaschema.NewDirectionsinDis-courseProcessing.CarolJ.Strong,MercerMayer,andMariannaMayer.1998.TheStrongNarrativeAssessmentProcedure.ThinkingPublications.ReidSwanson,ElaheRahimtoroghi,ThomasCorcoran,andMarilynA.Walker.2014.Identifyingnarrativeclausetypesinpersonalstories.InProceedingsofthe15thAnnualMeetingoftheSpecialInterestGrouponDiscourseandDialogue,page171.JeanUre.1971.Lexicaldensityandregisterdifferenti-ation.InPerrenG.E.andTrimJ.L.M.,editors,Ap-plicationsoflinguistics.SelectedpapersoftheSecondInternationalCongressofAppliedLinguistics,Cam-bridge1969,pages443–452.CambridgeUniversityPress,Cambridge,UK.JosepValls-Vargas,JichenZhu,andSantiagoOntan´on.2014.Towardautomaticroleidentificationinunan-notatedfolktales.InProceedingsoftheTenthAsso-ciationfortheAdvancementofArtificialIntelligenceConferenceonArtificialIntelligenceandInteractiveDigitalEntertainment,pages188–194.AdvancementofArtificialIntelligencePress.ZenoVendler.1967.LinguisticsinPhilosophy.CornellUniversityPress,Ithaca,NY.StephenG.Ware,BrentE.Harrison,RobertMichaelYoung,andDavidL.Roberts.2011.Initialresultsformeasuringfourdimensionsofnarrativeconflict.InTheFourthWorkshoponIntelligentNarrativeTech-nologiesatthe2011AIandInteractiveDigitalEnter-tainmentConference.JanyceM.Wiebe.1994.Trackingpointofviewinnar-rative.ComputationalLinguistics,20(2):233–287.TheresaWilson,JanyceWiebe,andPaulHoffmann.2005.Recognizingcontextualpolarityinphrase-levelsentimentanalysis.InProceedingsoftheConferenceonHumanLanguageTechnologyandEmpiricalMeth-odsinNaturalLanguageProcessing,pages347–354.AssociationforComputationalLinguistics.GuoxingYu.2010.Lexicaldiversityinwritingandspeakingtaskperformances.AppliedLinguistics,31(2):236–259.
Download pdf