计算语言学协会会刊, 2 (2014) 79–92. 动作编辑器: Mirella Lapata.
Submitted 12/2013; 已发表 2/2014. C
(西德:13)
2014 计算语言学协会.
TheLanguageDemographicsofAmazonMechanicalTurkElliePavlick1MattPost2AnnIrvine2DmitryKachaev2ChrisCallison-Burch1,21ComputerandInformationScienceDepartment,UniversityofPennsylvania2HumanLanguageTechnologyCenterofExcellence,JohnsHopkinsUniversityAbstractWepresentalargescalestudyofthelanguagesspokenbybilingualworkersonMechanicalTurk(MTurk).Weestablishamethodologyfordeterminingthelanguageskillsofanony-mouscrowdworkersthatismorerobustthansimplesurveying.Wevalidateworkers’self-reportedlanguageskillclaimsbymeasuringtheirabilitytocorrectlytranslatewords,andbygeolocatingworkerstoseeiftheyresideincountrieswherethelanguagesarelikelytobespoken.Ratherthanpostingaone-offsurvey,wepostedpaidtasksconsistingof1,000as-signmentstotranslateatotalof10,000wordsineachof100languages.Ourstudyranforseveralmonths,andwashighlyvisibleontheMTurkcrowdsourcingplatform,increas-ingthechancesthatbilingualworkerswouldcompleteit.Ourstudywasusefulbothtocre-atebilingualdictionariesandtoactascen-susofthebilingualspeakersonMTurk.Weusethisdatatorecommendlanguageswiththelargestspeakerpopulationsasgoodcandidatesforotherresearcherswhowanttodevelopcrowdsourced,multilingualtechnologies.Tofurtherdemonstratethevalueofcreatingdataviacrowdsourcing,wehireworkerstocreatebilingualparallelcorporainsixIndianlan-guages,andusethemtotrainstatisticalma-chinetranslationsystems.1OverviewCrowdsourcingisapromisingnewmechanismforcollectingdatafornaturallanguageprocessingre-search.Accesstoafast,便宜的,andflexiblework-forceallowsustocollectnewtypesofdata,poten-tiallyenablingnewlanguagetechnologies.BecausecrowdsourcingplatformslikeAmazonMechanicalTurk(MTurk)giveresearchersaccesstoaworld-wideworkforce,oneobviousapplicationofcrowd-sourcingisthecreationofmultilingualtechnologies.WithanincreasingnumberofactivecrowdworkerslocatedoutsideoftheUnitedStates,thereiseventhepotentialtoreachfluentspeakersoflowerresourcelanguages.Inthispaper,weinvestigatethefeasi-bilityofhiringlanguageinformantsonMTurkbyconductingthefirstlarge-scaledemographicstudyofthelanguagesspokenbyworkersontheplatform.Thereareseveralcomplicatingfactorswhentry-ingtotakeacensusofworkersonMTurk.Theworkers’identitiesareanonymized,andAmazonprovidesnoinformationabouttheircountriesofori-ginortheirlanguageabilities.Postingasimplesur-veytohaveworkersreportthisinformationmaybeinadequate,自从(A)manyworkersmayneverseethesurvey,(乙)manyoptnottodoone-offsurveyssincepotentialpaymentislow,和(C)validatingtheanswersofrespondentsisnotstraightforward.Ourstudyestablishesamethodologyfordeter-miningthelanguagedemographicsofanonymouscrowdworkersthatismorerobustthansimplesur-veying.Weaskworkerswhatlanguagestheyspeakandwhatcountrytheylivein,andvalidatetheirclaimsbymeasuringtheirabilitytocorrectlytrans-latewordsandbyrecordingtheirgeolocation.Toincreasethevisibilityandthedesirabilityofourtasks,wepost1,000assignmentsineachof100lan-guages.Thesetaskseachconsistoftranslating10foreignwordsintoEnglish.Twoofthe10wordshaveknowntranslations,allowingustovalidatethattheworkers’translationsareaccurate.Weconstructbilingualdictionarieswithupto10,000entries,withthemajorityofentriesbeingnew.Surveyingthousandsofworkersallowsustoana-lyzecurrentspeakerpopulationsfor100languages.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
我
我
t
.
e
d
你
/
t
A
C
我
/
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
1
0
.
1
1
6
2
/
t
A
C
_
A
_
0
0
1
6
7
/
1
5
6
6
8
7
9
/
t
A
C
_
A
_
0
0
1
6
7
.
p
d
F
我
乙
y
G
你
e
s
t
哦
n
0
7
S
e
p
t
e
米
乙
e
r
2
0
2
3
80
11/26/13turkermap.htmlfile:///Users/ellie/Documents/Research/turker-demographics/code/src/20130905/paper-rewrite/turkermap.html1/11111,9981,9981,998Figure1:Thenumberofworkerspercountry.ThismapwasgeneratedbasedongeolocatingtheIPaddressof4,983workersinourstudy.Omittedare60workerswhowerelocatedinmorethanonecountryduringthestudy,and238workerswhocouldnotbegeolocated.Thesizeofthecirclesrepresentsthenumberofworkersfromeachcountry.ThetwolargestareIndia(1,998工人)andtheUnitedStates(866).Tocalibratethesizes:thePhilippineshas142workers,Egypthas25,Russiahas10,andSriLankahas4.Thedataalsoallowsustoanswerquestionslike:Howquicklyisworkcompletedinagivenlanguage?Arecrowdsourcedtranslationsreliablygood?Howoftendoworkersmisrepresenttheirlanguageabili-tiestoobtainfinancialrewards?2BackgroundandRelatedWorkAmazon’sMechanicalTurk(MTurk)isanon-linemarketplaceforworkthatgivesemployersandresearchersaccesstoalarge,low-costwork-force.MTurkallowsemployerstoprovidemicro-paymentsinreturnforworkerscompletingmicro-tasks.ThebasicunitsofworkonMTurkarecalled‘HumanIntelligenceTasks’(HITs).MTurkwasde-signedtoaccommodatetasksthataredifficultforcomputers,butsimpleforpeople.Thisfacilitatesresearchintohumancomputation,wherepeoplecanbetreatedasafunctioncall(vonAhn,2005;Littleetal.,2009;QuinnandBederson,2011).Ithasappli-cationtoresearchareaslikehuman-computerinter-action(Bighametal.,2010;Bernsteinetal.,2010),computervision(SorokinandForsyth,2008;Dengetal.,2010;Rashtchianetal.,2010),speechpro-cessing(Margeetal.,2010;Laneetal.,2010;ParentandEskenazi,2011;Eskenazietal.,2013),andnatu-rallanguageprocessing(Snowetal.,2008;Callison-BurchandDredze,2010;Lawsetal.,2011).OnMTurk,researcherswhoneedworkcompletedarecalled‘Requesters’,andworkersareoftenre-ferredtoas‘Turkers’.MTurkisatruemarket,mean-ingthatTurkersarefreetochoosetocompletetheHITswhichinterestthem,andRequesterscanpricetheirtaskscompetitivelytotrytoattractworkersandhavetheirtasksdonequickly(Faridanietal.,2011;SingerandMittal,2011).Turkersremainanony-moustoRequesters,andallpaymentoccursthroughAmazon.Requestersareabletoacceptsubmittedworkorrejectworkthatdoesnotmeettheirstan-dards.TurkersareonlypaidifaRequesteracceptstheirwork.SeveralreportsexamineMechanicalTurkasaneconomicmarket(Ipeirotis,2010A;LehdonvirtaandErnkvist,2011).WhenAmazonintroducedMTurk,itfirstofferedpaymentonlyinAmazoncredits,andlateroffereddirectpaymentinUSdollars.Morere-cently,ithasexpandedtoincludeoneforeigncur-rency,theIndianrupee.Despiteitspaymentsbe-inglimitedtotwocurrenciesorAmazoncredits,MTurkclaimsoverhalfamillionworkersfrom190countries(亚马逊,2013).Thissuggeststhatitsworkerpopulationshouldrepresentadiversesetoflanguages.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
我
我
t
.
e
d
你
/
t
A
C
我
/
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
1
0
.
1
1
6
2
/
t
A
C
_
A
_
0
0
1
6
7
/
1
5
6
6
8
7
9
/
t
A
C
_
A
_
0
0
1
6
7
.
p
d
F
我
乙
y
G
你
e
s
t
哦
n
0
7
S
e
p
t
e
米
乙
e
r
2
0
2
3
81
AdemographicstudybyIpeirotis(2010乙)fo-cusedonage,性别,martialstatus,incomelev-els,motivationforworkingonMTurk,andwhetherworkersuseditasaprimaryorsupplementalformofincome.ThestudycontrastedIndianandUSworkers.Rossetal.(2010)completedalongitudi-nalfollow-onstudy.AnumberofotherstudieshaveinformallyinvestigatedTurkers’languageabilities.MunroandTily(2011)compiledsurveyresponsesof2,000Turkers,revealingthatfourofthesixmostrepresentedlanguagescomefromIndia(thetopsixbeingHindi,Malayalam,Tamil,西班牙语,法语,andTelugu).IrvineandKlementiev(2010)hadTurkersevaluatetheaccuracyoftranslationsthathadbeenautomaticallyinductedfrommonolingualtexts.Theyexaminedtranslationsof100wordsin42low-resourcelanguages,andreportedgeolocatedcountriesfortheirworkers(印度,theUS,罗马尼亚,巴基斯坦,Macedonia,拉脱维亚,BangladeshandthePhilippines).IrvineandKlementievdiscussedthedifficultyofqualitycontrolandassessingtheplausi-bilityofworkers’languageskillsforrarelanguages,whichweaddressinthispaper.SeveralresearchershaveinvestigatedusingMTurktobuildbilingualparallelcorporaforma-chinetranslation,ataskwhichstandstobenefitlowcost,highvolumetranslationondemand(Ger-mann,2001).Ambatietal.(2010)conductedapilotstudybyposting25sentencestoMTurkforSpan-ish,Chinese,Hindi,Telugu,Urdu,andHaitianCre-ole.Inastudyof2000Urdusentences,ZaidanandCallison-Burch(2011)presentedmethodsforachievingprofessional-leveltranslationqualityfromTurkersbysolicitingmultipleEnglishtranslationsofeachforeignsentence.Zbibetal.(2012)usedcrowdsourcingtoconstructa1.5millionwordpar-allelcorpusofdialectArabicandEnglish,train-ingastatisticalmachinetranslationsystemthatpro-ducedhigherqualitytranslationsofdialectArabicthanasystematrainedon100timesmoreMod-ernStandardArabic-Englishparalleldata.Zbibetal.(2013)conductedasystematicstudythatshowedthattraininganMTsystemoncrowdsourcedtrans-lationsresultedinthesameperformanceastrainingonprofessionaltranslations,at15thecost.Huetal.(2010;Huetal.(2011)performedcrowdsourcedtranslationbyhavingmonolingualspeakerscollab-orateanditerativelyimproveMToutput.English689Tamil253Malayalam219Hindi149Spanish131Telugu87Chinese86Romanian85Portuguese82Arabic74Kannada72German66French63Polish61Urdu56Tagalog54Marathi48Russian44Italian43Bengali41Gujarati39Hebrew38Dutch37Turkish35Vietnamese34Macedonian31Cebuano29Swedish26Bulgarian25Swahili23Hungarian23Catalan22Thai22Lithuanian21Punjabi21Others≤20Table1:Self-reportednativelanguageof3,216bilingualTurkers.Notshownare49languageswith≤20speakers.Weomit1,801Turkerswhodidnotreporttheirnativelanguage,243whoreported2na-tivelanguages,and83with≥3nativelanguages.Severalresearchershaveexaminedcostoptimiza-tionusingactivelearningtechniquestoselectthemostusefulsentencesorfragmentstotranslate(Am-batiandVogel,2010;BloodgoodandCallison-Burch,2010;Ambati,2012).Tocontrastourresearchwithpreviouswork,themaincontributionsofthispaperare:(1)arobustmethodologyforassessingthebilingualskillsofanonymousworkers,(2)thelargest-scalecensustodateoflanguageskillsofworkersonMTurk,和(3)adetailedanalysisofthedatagatheredinourstudy.3ExperimentalDesignThecentraltaskinthisstudywastoinvestigateMe-chanicalTurk’sbilingualpopulation.Weaccom-plishedthisthroughself-reportedsurveyscombinedwithaHITtotranslateindividualwordsfor100languages.Weevaluatetheaccuracyofthework-ers’translationsagainstknowntranslations.Incaseswherethesewerenotexactmatches,weusedasec-ondpassmonolingualHIT,whichaskedEnglishspeakerstoevaluateifaworker-providedtranslationwasasynonymoftheknowntranslation.DemographicquestionnaireAtthestartofeachHIT,Turkerswereaskedtocompleteabriefsurveyabouttheirlanguageabilities.Thesurveyaskedthefollowingquestions:•Is[语言]yournativelanguage?•Howmanyyearshaveyouspoken[语言]?
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
我
我
t
.
e
d
你
/
t
A
C
我
/
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
1
0
.
1
1
6
2
/
t
A
C
_
A
_
0
0
1
6
7
/
1
5
6
6
8
7
9
/
t
A
C
_
A
_
0
0
1
6
7
.
p
d
F
我
乙
y
G
你
e
s
t
哦
n
0
7
S
e
p
t
e
米
乙
e
r
2
0
2
3
82
•IsEnglishyournativelanguage?•HowmanyyearshaveyouspokenEnglish?•Whatcountrydoyoulivein?Weautomaticallycollectedeachworker’scurrentlo-cationbygeolocatingtheirIPaddress.Atotalof5,281uniqueworkerscompletedourHITs.Ofthese,3,625providedanswerstooursurveyquestions,andwewereabletogeolocate5,043.Figure1plotsthelocationofworkersacross106countries.Table1givesthemostcommonself-reportednativelan-guages.SelectionoflanguagesWedrewourdatafromthedifferentlanguageversionsofWikipedia.Wese-lectedthe100languageswiththelargestnumberofarticles1(Table2).Foreachlanguage,wechosethe1,000mostviewedarticlesovera1yearperiod,2andextractedthe10,000mostfrequentwordsfromthem.TheresultingvocabulariesservedastheinputtoourtranslationHIT.TranslationHITForthetranslationtask,weaskedTurkerstotranslateindividualwords.WeshowedeachwordinthecontextofthreesentencesthatweredrawnfromWikipedia.Turkerswereal-lowedtomarkthattheywereunabletotranslateaword.Eachtaskcontained10words,8ofwhichwerewordswithunknowntranslations,and2ofwhichwerequalitycontrolwordswithknowntrans-lations.Wegavespecialinstructionfortranslat-ingnamesofpeopleandplaces,givingexamplesofhowtohandle‘BarackObama’and‘Australia’usingtheirinterlanguagelinks.Forlanguageswithnon-Latinalphabets,namesweretransliterated.Thetaskpaid$0.15forthetranslationof10words.Eachsetof10wordswasindependentlytranslatedbythreeseparateworkers.5,281workerscompleted256,604translationassignments,totalingmorethan3millionwords,overaperiodofthreeandahalfmonths.GoldstandardtranslationsAsetofgoldstan-dardtranslationswereautomaticallyharvestedfrom1http://meta.wikimedia.org/wiki/List_of_Wikipedias2http://dumps.wikimedia.org/other/pagecounts-raw/500K+ARTICLES:德语(的),英语(在),西班牙语(英语),法语(fr),Italian(它),Japanese(ja),Dutch(nl),Polish(pl),Portuguese(点),俄语(茹)100K-500KARTICLES:Arabic(阿尔),Bulgarian(bg),Catalan(加州),Czech(cs),Danish(和),Esperanto(eo),Basque(欧洲联盟),Persian(F A),Finnish(fi),Hebrew(他),Hindi(你好),Croatian(小时),Hungarian(胡),Indonesian(id),Korean(ko),Lithuanian(lt),Malay(多发性硬化症),Norwe-gian(Bokmal)(不),Romanian(ro),Slovak(sk),Slovenian(sl),Ser-bian(sr),Swedish(sv),Turkish(tr),UKrainian(英国),Vietnamese(六),Waray-Waray(战争),Chinese(zh)10K-100KARTICLES:Afrikaans(af)Amharic(am)Asturian(ast)Azerbaijani(az)Belarusian(是)Bengali(BN)BishnupriyaManipuri(bpy)Breton(br)Bosnian(bs)Cebuano(ceb)Welsh(赛)Zazaki(diq)Greek(el)WestFrisian(风云)爱尔兰语(ga)Galician(gl)Gujarati(gu)Haitian(ht)Armenian(hy)Icelandic(是)Javanese(jv)Geor-gian(ka)Kannada(kn)Kurdish(ku)Luxembourgish(lb)Latvian(lv)Malagasy(mg)Macedonian(mk)Malayalam(ml)Marathi(mr)Neapolitan(nap)LowSaxon(nds)Nepali(这是)Newar/NepalBhasa(新的)Norwegian(Nynorsk)(恩)Piedmontese(pms)Sicil-ian(scn)Serbo-Croatian(sh)Albanian(sq)Sundanese(su)Swahili(sw)Tamil(ta)Telugu(te)Thai(th)Tagalog(tl)Urdu(ur)Yoruba(yo)<10KARTICLES:CentralBicolano(bcl)Tibetan(bo)Ilokano(ilo)Punjabi(pa)Kapampangan(pam)Pashto(ps)Sindhi(sd)Somali(so)Uzbek(uz)Wolof(wo)Table2:Alistofthelanguagesthatwereusedinourstudy,groupedbythenumberofWikipediaarticlesinthelanguage.Eachlanguage’scodeisgiveninparentheses.Theselanguagecodesareusedinotherfiguresthroughoutthispaper.Wikipediaforeverylanguagetouseasembeddedcontrols.WeusedWikipedia’sinter-languagelinkstopairtitlesofEnglisharticleswiththeircorre-spondingforeignarticle’stitle.Togetamoretrans-latablesetofpairs,weexcludedanypairswhere:(1)theEnglishwordwasnotpresentintheWordNetontology(Miller,1995),(2)eitherarticletitlewaslongerthanasingleword,(3)theEnglishWikipediapagewasasubcategoryofpersonorplace,or(4)theEnglishandtheforeigntitleswereidenticalorasubstringoftheother.Manualevaluationofnon-identicaltranslationsWecountedalltranslationsthatexactlymatchedthegoldstandardtranslationascorrect.Fornon-exactmatcheswecreatedasecond-passqualityas-suranceHIT.TurkerswereshownapairofEn-glishwords,oneofwhichwasaTurker’stransla-tionoftheforeignwordusedforqualitycontrol,andtheotherofwhichwasthegold-standardtrans-lationoftheforeignword.Evaluatorswereaskedwhetherthetwowordshadthesamemeaning,andchosebetweenthreeanswers:‘Yes’,‘No’,or‘Re- l D o w n o a d e d f r o m h t t p : / / d i r e c t . m l l i t . e d u / t a c l / a r t i c e - p d f / d o i / 1 0 . 1 1 6 2 / t a c _ a _ 0 0 1 6 7 / 1 5 6 6 8 7 9 / t a c _ a _ 0 0 1 6 7 . p d f l b y g u e s t o n 0 7 S e p t e m b e r 2 0 2 3 83 Figure2:DaystocompletethetranslationHITsfor40ofthelanguages.Tickmarksrepresentthecom-pletionofindividualassignments.latedbutnotsynonymous.’Examplesofmean-ingequivalentpairsinclude:
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
我
我
t
.
e
d
你
/
t
A
C
我
/
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
1
0
.
1
1
6
2
/
t
A
C
_
A
_
0
0
1
6
7
/
1
5
6
6
8
7
9
/
t
A
C
_
A
_
0
0
1
6
7
.
p
d
F
我
乙
y
G
你
e
s
t
哦
n
0
7
S
e
p
t
e
米
乙
e
r
2
0
2
3
86
Avg.Turkerquality(#Ts)PrimarylocationsPrimarylocationsInregionOutofregionofTurkersinregionofTurkersoutofregionHindi0.63(296)0.69(7)印度(284)UAE(5)英国(3)SaudiArabia(2)俄罗斯(1)Oman(1)Tamil0.65(273)**0.25(2)印度(266)我们(3)加拿大(2)Tunisia(1)埃及(1)Malayalam0.76(234)0.83(2)印度(223)UAE(6)我们(3)SaudiArabia(1)Maldives(1)Spanish0.81(191)0.84(18)我们(122)墨西哥(16)西班牙(14)印度(15)NewZealand(1)巴西(1)French0.75(170)0.82(11)印度(62)我们(45)法国(23)希腊(2)荷兰(1)日本(1)Chinese0.60(116)0.55(21)我们(75)新加坡(13)中国(9)HongKong(6)澳大利亚(3)德国(2)German0.82(91)0.77(41)德国(48)我们(25)奥地利(7)印度(34)荷兰(1)希腊(1)Italian0.86(90)*0.80(42)意大利(42)我们(29)罗马尼亚(7)印度(33)爱尔兰(2)西班牙(2)Amharic0.14(16)**0.01(99)我们(14)埃塞俄比亚(2)印度(70)乔治亚州(9)Macedonia(5)Kannada0.70(105)NA(0)印度(105)Arabic0.74(60)**0.60(45)埃及(19)约旦(16)摩洛哥(9)我们(19)印度(11)加拿大(3)Sindhi0.19(96)0.06(9)印度(58)巴基斯坦(37)我们(1)Macedonia(4)乔治亚州(2)印度尼西亚(2)Portuguese0.87(101)0.96(3)巴西(44)Portugal(31)我们(15)罗马尼亚(1)日本(1)以色列(1)Turkish0.76(76)0.80(27)Turkey(38)我们(18)Macedonia(8)印度(19)巴基斯坦(4)台湾(1)Telugu0.80(102)0.50(1)印度(98)我们(3)UAE(1)SaudiArabia(1)Irish0.74(54)0.71(47)我们(39)爱尔兰(13)英国(2)印度(36)罗马尼亚(5)Macedonia(2)Swedish0.73(54)0.71(45)我们(25)瑞典(22)芬兰(3)印度(23)Macedonia(6)Croatia(2)Czech0.71(45)*0.61(50)我们(17)CzechRepublic(14)Serbia(5)Macedonia(22)印度(10)英国(5)Russian0.15(67)*0.12(27)我们(36)Moldova(7)俄罗斯(6)印度(14)Macedonia(4)英国(3)Breton0.17(3)0.18(89)我们(3)印度(83)Macedonia(2)中国(1)Table3:Translationqualitywhenpartitioningthetranslationsintotwogroups,onecontainingtranslationssubmittedbyTurkerswhoselocationiswithinregionsthatplausiblyspeaktheforeignlanguage,andtheothercontainingtranslationsfromTurkersoutsidethoseregions.Ingeneral,in-regionTurkersprovidehigherqualitytranslations.(**)indicatesdifferencessignificantatp=0.05,(*)atp=0.10.etal.,2013)tocompilethelistofcountrieswhereeachlanguageisspoken.Table3comparestheav-eragetranslationqualityofassignmentscompletedwithintheregionofeachlanguage,andcomparesittothequalityofassignmentscompletedoutsidethatregion.Ourworkersreportedspeaking95languagesna-tively.USworkersalonereported61nativelan-guages.Overall,4,297workerswerelocatedinaregionlikelytospeakthelanguagefromwhichtheyweretranslating,and2,778workerswerelocatedincountriesconsideredoutofregion(meaningthataboutathirdofour5,281TurkerscompletedHITsinmultiplelanguages).Table3showsthedifferencesintranslationqual-itywhencomputedusingin-regionversusout-of-regionTurkers,forthelanguageswiththegreatestnumberofworkers.Withinregionworkerstypi-callyproducedhigherqualitytranslations.GiventhenumberofIndianworkersonMechanicalTurk,itisunsurprisingthattheyrepresentmajorityofout-of-regionworkers.Forthelanguagesthathadmorethan75outofregionworkers(Malay,Amharic,Ice-landic,Sicilian,Wolof,andBreton),Indianworkersrepresentedatleast70%oftheoutofregionworkersineachlanguage.Afewlanguagesstandoutforhavingsuspiciouslystrongperformancebyoutofregionworkers,no-tablyIrishandSwedish,forwhichoutofregionworkersaccountforanearequivalentvolumeandqualityoftranslationstotheinregionworkers.Thisisadmittedlyimplausible,consideringtherelativelysmallnumberofIrishspeakersworldwide,andtheverylownumberlivinginthecountriesinwhichourTurkerswerebased(primarilyIndia).Suchresultshighlightthefactthatcheatingusingonlinetransla-tionresourcesisarealproblem,anddespiteourbesteffortstoremoveworkersusingGoogleTranslate,somecheatingisstillevident.Restrictingtowithinregionworkersisaneffectivewaytoreducetheprevalenceofcheating.Wediscussthelanguageswhicharebestsupportedbytruenativespeakersinsection6.SpeedoftranslationFigure2givesthecomple-tiontimesfor40languages.The10languagestofinishintheshortestamountoftimewere:Tamil,Malayalam,Telugu,Hindi,Macedonian,西班牙语,Serbian,Romanian,Gujarati,andMarathi.SevenofthetenfastestlanguagesarefromIndia,whichisun-
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
我
我
t
.
e
d
你
/
t
A
C
我
/
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
1
0
.
1
1
6
2
/
t
A
C
_
A
_
0
0
1
6
7
/
1
5
6
6
8
7
9
/
t
A
C
_
A
_
0
0
1
6
7
.
p
d
F
我
乙
y
G
你
e
s
t
哦
n
0
7
S
e
p
t
e
米
乙
e
r
2
0
2
3
87
32024681012141618202224262830800,0000100,000200,000300,000400,000500,000600,000700,000MalayalamTamilTeluguHindiUrduBengaliFigure6:Thetotalvolumeoftranslations(measuredinEnglishwords)asafunctionofelapseddays.sentenceEnglish+dictionarylanguagepairsforeignwordsentriesBengali22k732k22kHindi40k1,488k22kMalayalam32k863k23kTamil38k916k25kTelugu46k1,097k21kUrdu35k1,356k20kTable4:Sizeofparallelcorporaandbilingualdic-tionariescollectedforeachlanguage.surprisinggiventhegeographicdistributionofwork-ers.Somelanguagesfollowthepatternofhavingasmatteringofassignmentscompletedearly,withtheratepickinguplater.Figure6givesthethroughputofthefull-sentencetranslationtaskforthesixIndianlanguages.ThefastestlanguagewasMalayalam,forwhichwecol-lectedhalfamillionwordsoftranslationsinjustun-deraweek.Table4givesthesizeofthedatasetthatwecreatedforeachoftheselanguages.TrainingSMTsystemsWetrainedstatisticaltranslationmodelsfromtheparallelcorporathatwecreatedforthesixIndianlanguagesusingtheJoshuamachinetranslationsystem(Postetal.,2012).Table5showsthetranslationperformancewhentrainedonthebitextsalone,andwhenincorporatingthebilingualdictionariescreatedinourearlierHIT.Thescoresreflecttheperformancewhentestedonheldoutsentencesfromthetrainingdata.Addingthedic-trainedonbitext+BLEUlanguagebitextsalonedictionaries∆Bengali12.0317.295.26Hindi16.1918.101.91Malayalam6.659.723.07Tamil8.089.661.58Telugu11.9413.701.76Urdu19.2221.982.76Table5:BLEUscoresfortranslatingintoEnglishusingbilingualparallelcorporabythemselves,andwiththeadditionofsingle-worddictionaries.ScoresarecalculatedusingfourreferencetranslationsandrepresentthemeanofthreeMERTruns.tionariestothetrainingsetproducesconsistentper-formancegains,rangingfrom1to5BLEUpoints.Thisrepresentsasubstantialimprovement.Itisworthnoting,然而,thatwhilethesourcedoc-umentsforthefullsentencesusedfortestingwerekeptdisjointfromthoseusedfortraining,thereisoverlapbetweenthesourcematerialsforthedictio-nariesandthosefromthetestset,sinceboththedic-tionariesandthebitextsourcesentencesweredrawnfromWikipedia.6DiscussionCrowdsourcingplatformslikeMechanicalTurkgiveresearchersinstantaccesstoadiversesetofbilin-gualworkers.Thisopensupexcitingnewavenuesforresearcherstodevelopnewmultilingualsystems.Thedemographicsreportedinthisstudyarelikelytoshiftovertime.Amazonmayexpanditspaymentstonewcurrencies.Postinglong-runningHITsinotherlanguagesmayrecruitmorespeakersofthoselan-guages.Newcrowdsourcingplatformsmayemerge.Thedatapresentedhereprovidesavaluablesnap-shotofthecurrentstateofMTurk,andthemethodsusedcanbeappliedgenerallyinfutureresearch.Basedonourstudy,wecanconfidentlyrecom-mend13languagesasgoodcandidatesforresearchnow:Dutch,法语,德语,Gujarati,Italian,Kan-nada,Malayalam,Portuguese,Romanian,Serbian,西班牙语,Tagalog,andTelugu.TheselanguageshavelargeTurkerpopulationswhocompletetasksquicklyandaccurately.Table6summarizesthestrengthsandweaknessesofall100languagescov-eredinourstudy.Severalotherlanguagesareviable
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
我
我
t
.
e
d
你
/
t
A
C
我
/
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
1
0
.
1
1
6
2
/
t
A
C
_
A
_
0
0
1
6
7
/
1
5
6
6
8
7
9
/
t
A
C
_
A
_
0
0
1
6
7
.
p
d
F
我
乙
y
G
你
e
s
t
哦
n
0
7
S
e
p
t
e
米
乙
e
r
2
0
2
3
88
workersqualityspeedmanyhighfastDutch,法语,德语,Gu-jarati,Italian,Kannada,Malay-alam,Portuguese,Romanian,Serbian,西班牙语,Tagalog,Tel-uguslowArabic,Hebrew,爱尔兰语,Punjabi,Swedish,TurkishlowfastHindi,Marathi,Tamil,UrduormediumslowBengali,BishnupriyaMa-nipuri,Cebuano,Chinese,Nepali,Newar,Polish,俄语,Sindhi,TibetanfewhighfastBosnia,Croatian,Macedonian,Malay,Serbo-CroatianslowAfrikaans,Albanian,Aragonese,Asturian,Basque,Belarusian,Bulgarian,CentralBicolano,Czech,Danish,Finnish,Galacian,Greek,Haitian,Hungarian,Icelandic,Ilokano,Indonesian,Japanese,Javanese,Kapampangan,Kazakh,Korean,Lithuanian,LowSaxon,Malagasy,Nor-wegian(Bokmal),Sicilian,Slovak,Slovenian,Thai,UKra-nian,Uzbek,Waray-Waray,WestFrisian,Yorubalowfast–ormediumslowAmharic,Armenian,Azer-baijani,Breton,Catalan,Georgian,Latvian,Luxembour-gish,Neapolitian,Norwegian(Nynorsk),Pashto,Pied-montese,索马里,Sudanese,Swahili,Tatar,Vietnamese,Walloon,WelshnonelowormediumslowEsperanto,Ido,Kurdish,Per-sian,Quechua,Wolof,ZazakiTable6:ThegreenboxshowsthebestlanguagestotargetonMTurk.Theselanguageshavemanywork-erswhogeneratehighqualityresultsquickly.Wedefinedmanyworkersas50ormoreactivein-regionworkers,highqualityas≥70%accuracyonthegoldstandardcontrols,andfastifallofthe10,000wordswerecompletedwithintwoweeks.candidatesprovidedadequatequalitycontrolmech-anismsareusedtoselectgoodworkers.SinceMechanicalTurkprovidesfinancialincen-tivesforparticipation,manyworkersattempttocompletetaskseveniftheydonothavethelan-guageskillsnecessarytodoso.SinceMTurkdoesnotprovideanyinformationaboutworkersdemo-graphics,includingtheirlanguagecompetencies,itcanbehardtoexcludesuchworkers.AsaresultnaivedatacollectiononMTurkmayresultinnoisydata.Avarietyoftechniquesshouldbeincorporatedintocrowdsourcingpipelinestoensurehighqualitydata.Asabestpractice,wesuggest:(1)restrictingworkerstocountriesthatplausiblyspeaktheforeignlanguageofinterest,(2)embeddinggoldstandardcontrolsoradministeringlanguagepretests,ratherthanrelyingsolelyonself-reportedlanguageskills,和(3)excludingworkerswhosetranslationshavehighoverlapwithonlinemachinetranslationsys-temslikeGoogletranslate.Ifcheatingusingexter-nalresourcesislikely,thenalsoconsider(4)record-inginformationliketimespentonaHIT(cumulativeandonindividualitems),patternsinkeystrokelogs,tab/windowfocus,etc.AlthoughourstudytargetedbilingualworkersonMechanicalTurk,andneglectedmonolingualwork-ers,webelieveourresultsreliablyrepresentthecur-rentspeakerpopulations,sincethevastmajorityoftheworkavailableonthecrowdsourcedplatformiscurrentlyEnglish-only.Wethereforeassumethenumberofnon-Englishspeakersissmall.Inthefu-ture,itmaybedesirabletorecruitmonolingualfor-eignworkers.Insuchcases,werecommendotherteststovalidatetheirlanguageabilitiesinplaceofourtranslationtest.Thesecouldincludeperform-ingnarrativecloze,orlisteningtoaudiofilescon-tainingspeechindifferentlanguageandidentifyingtheirlanguage.7DatareleaseWiththepublicationofthispaper,wearereleasingalldataandcodeusedinthisstudy.Ourdatareleaseincludestherawdata,alongwithbilingualdictionar-iesthatarefilteredtobehighquality.Itwillinclude256,604translationassignmentsfrom5,281Turkersand20,952synonymassignmentsfrom1,005Turk-ers,alongwithmetainformationlikegeolocation
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
我
我
t
.
e
d
你
/
t
A
C
我
/
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
1
0
.
1
1
6
2
/
t
A
C
_
A
_
0
0
1
6
7
/
1
5
6
6
8
7
9
/
t
A
C
_
A
_
0
0
1
6
7
.
p
d
F
我
乙
y
G
你
e
s
t
哦
n
0
7
S
e
p
t
e
米
乙
e
r
2
0
2
3
89
andtimesubmitted,plusexternaldictionariesusedforvalidation.Thedictionarieswillcontain1.5Mtotaltranslatedwordsin100languages,alongwithcodetofilterthedictionariesbasedondifferentcri-teria.ThedataalsoincludesparallelcorporaforsixIndianlanguages,ranginginsizebetween700,000to1.5millionwords.8AcknowledgementsThismaterialisbasedonresearchsponsoredbyaDARPAComputerScienceStudyPanelphase3awardentitled“CrowdsourcingTranslation”(con-tractD12PC00368).Theviewsandconclusionscontainedinthispublicationarethoseoftheauthorsandshouldnotbeinterpretedasrepresentingoffi-cialpoliciesorendorsementsbyDARPAortheU.S.Government.ThisresearchwassupportedbytheJohnsHopkinsUniversityHumanLanguageTech-nologyCenterofExcellenceandthroughgiftsfromMicrosoftandGoogle.Theauthorswouldliketothanktheanonymousreviewersfortheirthoughtfulcomments,whichsub-stantiallyimprovedthispaper.ReferencesAmazon.2013.Servicesummarytourforre-questersonAmazonMechanicalTurk.https://requester.mturk.com/tour.VamshiAmbatiandStephanVogel.2010.Cancrowdsbuildparallelcorporaformachinetranslationsystems?InProceedingsoftheNAACLHLT2010WorkshoponCreatingSpeechandLanguageDatawithAmazon’sMechanicalTurk.AssociationforComputationalLin-guistics.VamshiAmbati,StephanVogel,andJaimeCarbonell.2010.Activelearningandcrowd-sourcingforma-chinetranslation.InProceedingsofthe7thInterna-tionalConferenceonLanguageResourcesandEvalu-ation(LREC).VamshiAmbati.2012.ActiveLearningandCrowd-sourcingforMachineTranslationinLowResourceScenarios.Ph.D.thesis,LanguageTechnologiesIn-stitute,SchoolofComputerScience,CarnegieMellonUniversity,Pittsburgh,PA.MichaelS.Bernstein,GregLittle,RobertC.Miller,BjrnHartmann,MarkS.Ackerman,DavidR.Karger,DavidCrowell,andKatrinaPanovich.2010.Soylent:awordprocessorwithacrowdinside.InProceed-ingsoftheACMSymposiumonUserInterfaceSoft-wareandTechnology(UIST).JeffreyP.Bigham,ChandrikaJayant,HanjieJi,GregLit-tle,AndrewMiller,RobertC.Miller,RobinMiller,AubreyTatarowicz,BrandynWhite,SamualWhite,andTomYeh.2010.VizWiz:nearlyreal-timean-swerstovisualquestions.InProceedingsoftheACMSymposiumonUserInterfaceSoftwareandTechnol-ogy(UIST).MichaelBloodgoodandChrisCallison-Burch.2010.Large-scalecost-focusedactivelearningforstatisti-calmachinetranslation.InProceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics.ChrisCallison-BurchandMarkDredze.2010.CreatingspeechandlanguagedatawithAmazon’sMechanicalTurk.InProceedingsoftheNAACLHLT2010Work-shoponCreatingSpeechandLanguageDatawithAmazon’sMechanicalTurk,pages1–12,LosAngeles,June.AssociationforComputationalLinguistics.JiaDeng,AlexanderBerg,KaiLi,andLiFei-Fei.2010.Whatdoesclassifyingmorethan10,000imagecate-goriestellus?InProceedingsofthe12thEuropeanConferenceofComputerVision(ECCV,pages71–84.MaxineEskenazi,Gina-AnneLevow,HelenMeng,GabrielParent,andDavidSuendermann.2013.CrowdsourcingforSpeechProcessing,ApplicationstoDataCollection,TranscriptionandAssessment.Wi-ley.SiamakFaridani,Bj¨ornHartmann,andPanagiotisG.Ipeirotis.2011.What’stherightprice?pricingtasksforfinishingontime.InThirdAAAIHumanCompu-tationWorkshop(HCOMP’11).UlrichGermann.2001.Buildingastatisticalmachinetranslationsystemfromscratch:Howmuchbangforthebuckcanweexpect?InACL2001WorkshoponData-DrivenMachineTranslation,Toulouse,France.ChangHu,BenjaminB.Bederson,andPhilipResnik.2010.Translationbyiterativecollaborationbetweenmonolingualusers.InProceedingsofACMSIGKDDWorkshoponHumanComputation(HCOMP).ChangHu,PhilipResnik,YakovKronrod,VladimirEi-delman,OliviaBuzek,andBenjaminB.Bederson.2011.Thevalueofmonolingualcrowdsourcinginareal-worldtranslationscenario:Simulationusinghaitiancreoleemergencysmsmessages.InPro-ceedingsoftheSixthWorkshoponStatisticalMa-chineTranslation,pages399–404,Edinburgh,Scot-land,July.AssociationforComputationalLinguistics.PanagiotisG.Ipeirotis.2010a.Analyzingthemechani-calturkmarketplace.InACMXRDS,December.PanagiotisG.Ipeirotis.2010b.DemographicsofMechanicalTurk.TechnicalReportWorkingpaper
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
我
我
t
.
e
d
你
/
t
A
C
我
/
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
1
0
.
1
1
6
2
/
t
A
C
_
A
_
0
0
1
6
7
/
1
5
6
6
8
7
9
/
t
A
C
_
A
_
0
0
1
6
7
.
p
d
F
我
乙
y
G
你
e
s
t
哦
n
0
7
S
e
p
t
e
米
乙
e
r
2
0
2
3
90
CeDER-10-01,NewYorkUniversity,SternSchoolofBusiness.AnnIrvineandAlexandreKlementiev.2010.UsingMe-chanicalTurktoannotatelexiconsforlesscommonlyusedlanguages.InWorkshoponCreatingSpeechandLanguageDatawithMTurk.IanLane,MatthiasEck,KayRottmann,andAlexWaibel.2010.Toolsforcollectingspeechcorporaviamechanical-turk.InProceedingsoftheNAACLHLT2010WorkshoponCreatingSpeechandLan-guageDatawithAmazon’sMechanicalTurk,LosAn-geles.FlorianLaws,ChristianScheible,andHinrichSch¨utze.2011.Activelearningwithamazonmechanicalturk.InProceedingsofthe2011ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,爱丁堡,Scotland.MatthewLease,JessicaHullman,JeffreyP.Bigham,JuhoKimMichaelS.Bernsteinand,WalterLasecki,SaeidehBakhshi,TanushreeMitra,andRobertC.Miller.2013.MechanicalTurkisnotanony-mous.http://dx.doi.org/10.2139/ssrn.2228728.ViliLehdonvirtaandMirkoErnkvist.2011.Knowl-edgemapofthevirtualeconomy:Convertingthevirtualeconomyintodevelopmentpotential.http://www.infodev.org/en/Document.1056.pdf,April.AnInfoDevPublication.M.PaulLewis,GaryF.Simons,andCharlesD.Fennig(编辑。).2013.Ethnologue:Languagesoftheworld,seventeenthedition.http://www.ethnologue.com.GregLittle,LydiaB.Chilton,RobMiller,andMaxGold-man.2009.Turkit:Toolsforiterativetasksonme-chanicalturk.InProceedingsoftheWorkshoponHumanComputationattheInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD-HCOMP’09),Paris.MatthewMarge,SatanjeevBanerjee,andAlexanderRudnicky.2010.UsingtheAmazonMechanicalTurktotranscribeandannotatemeetingspeechforextrac-tivesummarization.InWorkshoponCreatingSpeechandLanguageDatawithMTurk.GeorgeA.Miller.1995.WordNet:alexicaldatabaseforenglish.CommunicationsoftheACM,38(11):39–41.RobertMunroandHalTily.2011.Thestartoftheart:Introductiontotheworkshoponcrowdsourcingtechnologiesforlanguageandcognitionstudies.InCrowdsourcingTechnologiesforLanguageandCog-nitionStudies,Boulder.ScottNovotneyandChrisCallison-Burch.2010.Cheap,fastandgoodenough:Automaticspeechrecognitionwithnon-experttranscription.InHumanLanguageTechnologies:The2010AnnualConferenceoftheNorthAmericanChapteroftheAssociationforCom-putationalLinguistics,pages207–215.AssociationforComputationalLinguistics.GabrielParentandMaxineEskenazi.2011.Speakingtothecrowd:lookingatpastachievementsinusingcrowdsourcingforspeechandpredictingfuturechal-lenges.InProceedingsInterspeech2011,SpecialSes-siononCrowdsourcing.MattPost,ChrisCallison-Burch,andMilesOsborne.2012.Constructingparallelcorporaforsixindianlanguagesviacrowdsourcing.InProceedingsoftheSeventhWorkshoponStatisticalMachineTranslation,pages401–409,Montr´eal,加拿大,June.AssociationforComputationalLinguistics.AlexanderJ.QuinnandBenjaminB.Bederson.2011.Humancomputation:Asurveyandtaxonomyofagrowingfield.InComputerHumanInteraction(CHI).CyrusRashtchian,PeterYoung,MicahHodosh,andJu-liaHockenmaier.2010.CollectingimageannotationsusingAmazon’sMechanicalTurk.InWorkshoponCreatingSpeechandLanguageDatawithMTurk.JoelRoss,LillyIrani,M.SixSilberman,AndrewZal-divar,andBillTomlinson.2010.Whoarethecrowd-workers?:ShiftingdemographicsinAmazonMechan-icalTurk.Inalt.CHIsessionofCHI2010extendedabstractsonhumanfactorsincomputingsystems,At-lanta,Georgia.YaronSingerandManasMittal.2011.Pricingmecha-nismsforonlinelabormarkets.InThirdAAAIHumanComputationWorkshop(HCOMP’11).RionSnow,BrendanO’Connor,DanielJurafsky,andAndrewY.Ng.2008.Cheapandfast-butisitgood?Evaluatingnon-expertannotationsfornaturallanguagetasks.InProceedingsofEMNLP.AlexanderSorokinandDavidForsyth.2008.Utilitydataannotationwithamazonmechanicalturk.InFirstIEEEWorkshoponInternetVisionatCVPR.LuisvonAhn.2005.HumanComputation.Ph.D.thesis,SchoolofComputerScience,CarnegieMellonUni-versity,Pittsburgh,PA.OmarF.ZaidanandChrisCallison-Burch.2011.Crowd-sourcingtranslation:Professionalqualityfromnon-professionals.InProceedingsofthe49thAnnualMeetingoftheAssociationforComputationalLin-guistics:HumanLanguageTechnologies,pages1220–1229.AssociationforComputationalLinguistics.RabihZbib,ErikaMalchiodi,JacobDevlin,DavidStallard,SpyrosMatsoukas,RichardSchwartz,JohnMakhoul,OmarF.Zaidan,andChrisCallison-Burch.2012.MachinetranslationofArabicdialects.InThe2012ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics.Asso-ciationforComputationalLinguistics.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
我
我
t
.
e
d
你
/
t
A
C
我
/
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
1
0
.
1
1
6
2
/
t
A
C
_
A
_
0
0
1
6
7
/
1
5
6
6
8
7
9
/
t
A
C
_
A
_
0
0
1
6
7
.
p
d
F
我
乙
y
G
你
e
s
t
哦
n
0
7
S
e
p
t
e
米
乙
e
r
2
0
2
3
91
RabihZbib,GretchenMarkiewicz,SpyrosMatsoukas,RichardSchwartz,andJohnMakhoul.2013.Sys-tematiccomparisonofprofessionalandcrowdsourcedreferencetranslationsformachinetranslation.InPro-ceedingsofthe2013ConferenceoftheNorthAmer-icanChapteroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies,亚特兰大,乔治亚州.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
我
我
t
.
e
d
你
/
t
A
C
我
/
A
r
t
我
C
e
–
p
d
F
/
d
哦
我
/
1
0
.
1
1
6
2
/
t
A
C
_
A
_
0
0
1
6
7
/
1
5
6
6
8
7
9
/
t
A
C
_
A
_
0
0
1
6
7
.
p
d
F
我
乙
y
G
你
e
s
t
哦
n
0
7
S
e
p
t
e
米
乙
e
r
2
0
2
3
92
下载pdf