Transactions of the Association for Computational Linguistics, vol. 3, pp. 271–282, 2015. Action Editor: Hal Daum´e III. - IA de Investigación especializada en el MIT

Transacciones de la Asociación de Lingüística Computacional, volumen. 3, páginas. 271–282, 2015. Editor de acciones: Hal Daum´e III.
Lote de envío: 3/2015; Publicado 5/2015.

2015 Asociación de Lingüística Computacional. Distributed under a CC-BY-NC-SA 4.0 licencia.

C
(cid:13)

DomainAdaptationforSyntacticandSemanticDependencyParsingUsingDeepBeliefNetworksHaitongYang,TaoZhuangandChengqingZongNationalLaboratoryofPatternRecognitionInstituteofAutomation,ChineseAcademyofSciences,Beijing,100190,Porcelana{htyang,tao.zhuang,cqzong}@nlpr.ia.ac.cnAbstractIncurrentsystemsforsyntacticandseman-ticdependencyparsing,peopleusuallyde-ﬁneaveryhigh-dimensionalfeaturespacetoachievegoodperformance.Butthesesystemsoftensuffersevereperformancedropsonout-of-domaintestdataduetothediversityoffea-turesofdifferentdomains.Thispaperfo-cusesonhowtorelievethisdomainadapta-tionproblemwiththehelpofunlabeledtar-getdomaindata.Weproposeadeeplearningmethodtoadaptbothsyntacticandsemanticparsers.Withadditionalunlabeledtargetdo-maindata,ourmethodcanlearnalatentfea-turerepresentation(LFR)thatisbeneﬁcialtobothdomains.ExperimentsonEnglishdataintheCoNLL2009sharedtaskshowthatourmethodlargelyreducedtheperformancedroponout-of-domaintestdata.Moreover,wegetaMacroF1scorethatis2.32pointshigherthanthebestsystemintheCoNLL2009sharedtaskinout-of-domaintests.1IntroductionBothsyntacticandsemanticdependencyparsingarethestandardtasksintheNLPcommunity.Thestate-of-the-artmodelperformswellifthetestdatacomesfromthedomainofthetrainingdata.Butifthetestdatacomesfromadifferentdomain,theperfor-mancedropsseverely.TheresultsofthesharedtasksofCoNLL2008and2009(Surdeanuetal.,2008;Hajiˇcetal.,2009)alsosubstantiatestheargument.Torelievethedomainadaptation,inthispaper,weproposeadeeplearningmethodforbothsyntacticandsemanticparsers.Wefocusonthesituationthat,besidessourcedomaintrainingdataandtargetdo-maintestdata,wealsohavesomeunlabeledtargetdomaindata.Manysyntacticandsemanticparsersaredevel-opedusingasupervisedlearningparadigm,whereeachdatasampleisrepresentedasavectoroffea-tures,usuallyahigh-dimensionalfeature.Theper-formancedegradationontargetdomaintestdataismainlycausedbythediversityoffeaturesofdiffer-entdomains,i.e.,manyfeaturesintargetdomaintestdataareneverseeninsourcedomaintrainingdata.Previousworkhaveshownthatusingwordclus-terstoreplacethesparselexicalizedfeatures(Kooetal.,2008;Turianetal.,2010),helpsrelievetheperformancedegradationonthetargetdomain.Butforsyntacticandsemanticparsing,peoplealsousealotofsyntacticfeatures,i.e.,featuresextractedfromsyntactictrees.Forexample,therelationpathbe-tweenapredicateandanargumentisasyntacticfea-tureusedinsemanticdependencyparsing(Johans-sonandNugues,2008).Figure1showsanexam-pleofthisrelationpathfeature.Obviously,syntac-ticfeatureslikethisarealsoverysparseandusu-allyspeciﬁctoeachdomain.Themethodofclus-teringfailsingeneralizingthesekindsoffeatures.Ourmethod,sin embargo,isverydifferentfromclus-teringspeciﬁcfeaturesandsubstitutingthesefea-turesusingtheirclusters.Instead,weattackthedo-mainadaptionproblembylearningalatentfeaturerepresentation(LFR)fordifferentdomains,whichissimilartoTitov(2011).Formalmente,weproposeaDeepBeliefNetwork(DBN)modeltorepresentadatasampleusingavectoroflatentfeatures.ThislatentfeaturevectorisinferredbyourDBNmodel

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
3
8
1
5
6
6
7
7
2

/
t

a
C
_
a
_
0
0
1
3
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

272

wantsShepaytoayou.visitPSBJOPRDIMOBJNMODROOTOBJFigure1:Apathfeatureexample.TherededgesarethepathbetweenSheandvisitandthustherelationpathfea-turebetweenthemisSBJ↑OPRD↓IM↓OBJ↓basedonthedatasample’soriginalfeaturevector.OurDBNmodelistrainedunsupervisedlyonorig-inalfeaturevectorsofdatainbothdomains:train-ingdatafromthesourcedomain,andunlabeleddatafromthetargetdomain.SoourDBNmodelcanpro-duceacommonfeaturerepresentationfordatafrombothdomains.Acommonfeaturerepresentationcanmaketwodomainsmoresimilarandthusisveryhelpfulfordomainadaptation(Blitzer,2006).Dis-criminativemodelsusingourlatentfeaturesadaptbettertothetargetdomainthanmodelsusingorigi-nalfeatures.Discriminativemodelsinsyntacticandsemanticparsersusuallyusemillionsoffeatures.ApplyingatypicalDBNtolearnasensibleLFRonthatmanyoriginalfeaturesiscomputationallytooexpensiveandimpractical(Rainaetal.,2009).Por lo tanto,weconstraintheDBNbysplittingtheoriginalfeaturesintogroups.Inthisway,welargelyreducethecom-putationalcostandmakeLFRlearningpractical.WecarriedoutexperimentsontheEnglishdataoftheCoNLL2009sharedtask.Weuseabasicpipelinedsystemandcomparetheeffectivenessofthetwofea-turerepresentations:originalfeaturerepresentationandourLFR.Usingtheoriginalfeatures,theper-formancedroponout-of-domaintestdatais10.58pointsinMacroF1score.Incontrast,usingtheLFR,theperformancedropisonly4.97points.AndwehaveachievedaMacroF1scoreof80.83%ontheout-of-domaintestdata.Asfarasweknow,thisisthebestresultonthisdatasettodate.2RelatedWorkDependencyparsingandsemanticrolelabelingaretwostandardtasksintheNLPcommunity.Therehavebeenmanyworksonthetwotasks(McDon-aldetal.,2005;GildeaandJurafsky,2002;YangandZong,2014;ZhuangandZong,2010a;ZhuangandZong,2010b,etc.).Amongthem,researchesondomainadaptationfordependencyparsingandSRLaredirectlyrelatedtoourwork.Dredzeetal.,(2007)showthatdomainadaptationishardforde-pendencyparsingbasedonresultsintheCoNLL2007sharedtask(Nivreetal.,2007).Chenetal.,(2008)adaptedasyntacticdependencyparserbylearningreliableinformationonshorterdependen-ciesinunlabeledtargetdomaindata.Buttheydonotconsiderthetaskofsemanticdependencypars-ing.Huangetal.,(2010)usedanHMM-basedla-tentvariablelanguagemodeltoadaptaSRLsystem.Theirmethodistailoredforachunking-basedSRLsystemandcanhardlybeappliedtoourdependencybasedtask.Westonetal.,(2008)useddeepneuralnetworkstoimproveanSRLsystem.Buttheirtestsareonin-domaindata.Onmethodology,theworkinGlorotetal.,(2011)andTitov(2011)iscloselyrelatedtoours.TheyalsofocusonlearningLFRsfordomainadaptation.However,theirworkdealswithdomainadaptationforsentimentclassiﬁcation,whichusesmuchfewerfeaturesandtrainingsamples.Sotheydonotneedtoworryaboutcomputationalcostasmuchaswedo.Titov(2011)usedagraphicalmodelthathasonlyonelayerofhiddenvariables.Oncontrast,weneedtouseamodelwithtwolayersofhiddenvariablesandsplittheﬁrsthiddenlayertoreducecomputationalcost.ThemodelofTitov(2011)alsoembodiesaspeciﬁcclassiﬁer.Butourmodelisin-dependentoftheclassiﬁertobeused.Glorotetal.,(2011)usedamodelcalledStackedDenoisingAuto-Encoders,whichalsocontainsmultiplehiddenlayers.However,theydonotexploitthehierarchi-calstructureoftheirmodeltoreducecomputationalcost.Bysplitting,ourmodelcontainsmuchlesspa-rametersthantheirs.Infact,themodelsinGlorotetal.,(2011)andTitov(2011)cannotbeappliedtoourtasksimplybecauseofthehighcomputationalcost.3OurDBNModelforLFRIndiscriminativemodels,eachdatasampleisrep-resentedasavectoroffeatures.OurDBNmodelmapsthisoriginalfeaturevectortoavectoroflatent

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
3
8
1
5
6
6
7
7
2

/
t

a
C
_
a
_
0
0
1
3
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

273

features.Andweusethislatentfeaturevectortorep-resentthesample,i.e.,wereplacethewholeoriginalfeaturevectorbythelatentfeaturevector.Inthissection,weintroducehowourDBNmodelrepre-sentadatasampleasavectoroflatentfeatures.Be-foreintroducingourDBNmodel,weﬁrstreviewasimplermodelcalledRestrictedBoltzmanMachines(RBM)(Hintonetal.,2006).WhentrainingaDBNmodel,RBMisusedasabasicunitinaDBN.3.1RestrictedBoltzmannMachinesAnRBMisanundirectedgraphicalmodelwithalayerofvisiblevariablesv=(v1,…,vm),andalayerofhiddenvariablesh=(h1,…,hn).Thesevariablesarebinary.Figure2showsagraphicalrep-resentationofanRBM………….(a)(b)hvhvFigure2:GraphicalrepresentationsofanRBM:(a)rep-resentsanRBM.(b)isamorecompactrepresentationTheparametersofanRBMareθ=(W.,a,b)whereW=(Wij)m×nisamatrixwithWijbe-ingtheweightfortheedgebetweenviandhj,anda=(a1,…,am),b=(b1,…,bn)arebiasvectorsforvandhrespectively.TheprobabilisticmodelofanRBMis:pag(v,h|i)=1Z(i)exp.(−E(v,h))(1)whereE(v,h)=−mXi=1aivi−nXj=1bjhj−mXi=1nXj=1viwijhjZ(i)=Xv,hexp(−E(v,h))BecausetheconnectionsinanRBMareonlybe-tweenvisibleandhiddenvariables,theconditionaldistributionoverahiddenoravisiblevariableisquitesimple:pag(hj=1|v)=σ(bj+mXi=1viwij)(2)pag(vi=1|h)=σ(ai+nXj=1hiwij)(3)whereσ(X)=1/(1+exp.(−x))isthelogisticsig-moidfunction.AnRBMcanbeefﬁcientlytrainedonasequenceofvisiblevectorsusingtheContrastiveDivergencemethod(Hinton,2002).3.2TheProblemofLargeScaleInoursyntacticandsemanticparsingtask,allfea-turesarebinary.Soeachdatasample(anshiftac-tioninsyntacticparsingoranargumentcandidateinsemanticparsing)isrepresentedasabinaryfeaturevector.Bytreatingasample’sfeaturevectorasvis-iblevariablevectorinanRBM,andtakinghiddenvariablesaslatentfeatures,wecouldgettheLFRofthissampleusingtheRBM.However,foroursyntacticandsemanticparsingtasks,trainingsuchanRBMiscomputationallyimpracticalduetothefollowingconsiderations.Letm,ndenoterespec-tivelythenumberofvisibleandhiddenvariablesintheRBM.ThenthereareO(mn)parametersinthisRBM.IfwetraintheRBMondsamples,thenthetimecomplexityforContrastiveDivergencetrainingisO(mnd).Forsyntacticorsemanticparsing,thereareover1millionuniquebinaryfeatures,andmil-lionsoftrainingsamples.Thatmeansbothmanddareinanorderof106.Withmandnofthatorder,nshouldnotbechosentoosmalltogetasensibleLFR(Hinton,2010).Ourexperienceindicatesthatnshouldbeatleastinanorderof103.NowweseewhytheO(mnd)complexityisformidableforourtask.3.3OurDBNModelADBNisaprobabilisticgenerativemodelthatiscomposedofmultiplelayersofstochastic,latentvariables(Hintonetal.,2006).Themotivationofus-ingaDBNistwo-fold.First,previousresearchhasshownthatadeepnetworkcancapturehigh-levelcorrelationsbetweenvisiblevariablesbetterthananRBM(bengio,2009).Segundo,asshowninthepre-cedingsubsection,thelargescaleofourtaskposes

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
3
8
1
5
6
6
7
7
2

/
t

a
C
_
a
_
0
0
1
3
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

274

…h2vh1……………………Figure3:OurDBNmodel.Thebluenodesstandforthevisiblevariables(v)andtheblanknodestandsforthehiddenvariables(h1andh2).Thesymbolsarealsousedintheﬁguresofthefollowingsubsectins.agreatchallengeforlearninganLFR.Bymanipu-latingthehierarchicalstructureofaDBN,wecansigniﬁcantlyreducethenumberofparametersintheDBNmodel.ThislargelyreducesthecomputationalcostfortrainingtheDBN.Withoutthistechnique,itisimpracticaltolearnaDBNmodelwiththatmanyparametersonlargetrainingsets.AsshowninFig.3,ourDBNmodelcontains2layersofhiddenvariables:h1,h2,andavisiblevec-torv.Thevisiblevectorcorrespondstoasample’soriginalfeaturevector.Thesecond-layerhiddenvariablevectorh2areusedastheLFRofthissam-ple.Supposetherearem,n1,n2variablesinv,h1,h2respectively.ToreducethenumberofparametersintheDBN,wesplititsﬁrstlayer(h1−v)intokgroups,aswewillexplaininthefollowingsub-section.Weconﬁnetheconnectionsinthislayertovariableswithinthesamegroup.Sothereareonlymn1/kparametersintheﬁrstlayer.Withoutsplitting,thenumberofparameterswouldbemn1.Therefore,learningthatmanyparametersrequirestoomuchcomputation.Bysplitting,wereducethenumberofparametersbyafactorofk.Ifwechoosekbigenough,learningisfeasible.Thesecondlayer(h2−h1)isfullyconnected,sothatthevariablesinthesecondlayercancapturetherelationsbetweenvariablesindifferentgroupsintheﬁrstlayer.Therearen1n2parametersinthesec-ondlayers.Becausen1andn2arerelativelysmall,learningtheparametersinthesecondlayerisalsofeasible.Insummary,bysplittingtheﬁrstlayerintogroups,wehavelargelyreducedthenumberofpa-rametersinourDBNmodel.ThismakeslearningourDBNmodelpracticalforourtask.Inourtask,visiblevariablescorrespondstooriginalbinaryfea-turesandthesecondlayerhiddenvariablesareusedastheLFRoftheseoriginalfeatures.Onedeﬁciencyofsplittingisthattherelationshipsbetweenoriginalfeaturesindifferentgroupscannotbecapturedbyhiddenvariablesintheﬁrstlayer.However,thisde-ﬁciencyiscompensatedbyusingthesecondlayertocapturerelationshipsbetweenallvariablesintheﬁrstlayer.Inthisway,thesecondlayerstillcap-turestherelationshipsbetweenalloriginalfeaturesindirectly.3.3.1SplittingFeaturesintoGroupsWhenwesplittheﬁrstlayerintokgroups,ev-erygroup,exceptthelastone,containsbm/kcvis-iblevariablesandbn1/kchiddenvariables.Thelastgroupcontainstheremainingvisibleandhiddenvariables.Buthowtosplitthevisiblevariables,i.e.,theoriginalfeatures,intothesegroups?Ofcoursetherearemanywaystosplittheoriginalfeatures.Butitisdifﬁculttoﬁndagoodprincipletosplit.Sowetriedtwosplittingstrategiesinthispaper.Theﬁrststrategyisverysimple.Wearrangeallfeaturesastheordertheyappearedinthetrainingdata.Sup-poseeachgroupcontainsroriginalfeatures.Wejustputtheﬁrstruniquefeaturesoftrainingdataintotheﬁrstgroup,thefollowingruniquefeaturesintothesecondgroup,andsoon.Thesecondstrategyismoresophisticated.Allfeaturescanbedividedintothreecategories:thecommonfeatures,thesource-speciﬁcfeaturesandthetarget-speciﬁcfeatures.Itsmainideaistomakeeachgroupcontainthethreecategoriesoffeaturesevenly,whichwethinkmakesthedistributionoffea-turesclosetothe‘true’distributionoverdomains.LetFsandFtdenotethesetsoffeaturesthatap-pearedonsourceandtargetdomaindatarespec-tively.WecollectFsandFtfromourtrainingdata.ThefeaturesinFsandFtareareorderedthesameastheordertheyappearedintrainingdata.AndletFs∩t=Fs∩Ft(thecommonfeatures),Fs\t=Fs\Ft(thesource-speciﬁcfeatures),Ft\s=Ft\Fs(thetarget-speciﬁcfeatures).So,toevenlydis-tributefeaturesinFs∩t,Fs\tandFt\stoeachgroup,eachgroupshouldconsistof|Fs∩t|/k,|Fs\t|/kand|Ft\s|/kfeaturesfromFs∩t,Fs\tandFt\srespec-

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
3
8
1
5
6
6
7
7
2

/
t

a
C
_
a
_
0
0
1
3
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

275

tively.Therefore,weputtheﬁrst|Fs∩t|/kfeaturesfromFs∩t,theﬁrst|Fs\t|/kfeaturesfromFs\tandtheﬁrst|Ft\s|/kfeaturesfromFt\sintotheﬁrstgroup.Similarly,weputthesecond|Fs∩t|/kfea-turesfromFs∩t,thesecond|Fs\t|/kfeaturesfromFs\tandthesecond|Ft\s|/kfeaturesfromFt\sintothesecondgroup.TheintuitionofthisstrategyistoletfeaturesinFs∩tactaspivotfeaturesthatlinkfea-turesinFs\tandFt\sineachgroup.Inthisway,theﬁrsthiddenlayermightcapturebetterrelationshipsbetweenfeaturesfromsourceandtargetdomains.3.3.2LFRofaSampleGivenasamplerepresentedasavectoroforigi-nalfeatures,ourDBNmodelwillrepresentitasavectoroflatentfeatures.Thesample’soriginalfea-turevectorcorrespondstothevisiblevectorvinourDBNmodelinFigure3.OurDBNmodelusesthesecond-layerhiddenvariablevectorh2torepresentthissample.Therefore,wemustinferthevalueofhiddenvariablesinthesecond-layergiventhevis-iblevector.ThisinferencecanbedoneusingthemethodsinHintonetal.,(2006).Giventhevisiblevector,thevaluesofthehiddenvariablesineverylayercanbeefﬁcientlyinferredinasingle,bottom-uppass.3.4TrainingOurDBNModelInferenceinaDBNissimpleandfast.Nonetheless,trainingaDBNismorecomplicated.ADBNcanbetrainedintwostages:greedylayer-wisepretrainingandﬁnetuning(Hintonetal.,2006).3.4.1GreedyLayer-wisePretrainingInthisstage,theDBNistreatedasastackofRBMsasshowninFigure4.ThesecondlayeristreatedasasingleRBM.TheﬁrstlayeristreatedaskparallelRBMswitheachgroupbeingoneRBM.ThesekRBMsareparal-lelbecausetheirvisiblevariablevectorsconstituteapartitionoftheoriginalfeaturevector.Inthisstage,wetraintheseconstituentRBMsinabottom-uplayer-wisemanner.Tolearnparametersintheﬁrstlayer,weonlyneedtolearntheparametersofeachRBMintheﬁrstlayer.Withtheoriginalfeaturevectorvgiven,thesekRBMscanbetrainedusingtheContrastiveDiver-gencemethod(Hinton,2002).Aftertheﬁrstlayeris…h2h1………………RBM……RBM……RBM……RBMFigure4:StackofRBMsinpretraining.trained,wewillﬁxtheparametersintheﬁrstlayerandstarttotrainthesecondlayer.FortheRBMofthesecondlayer,itsvisiblevari-ablesarethehiddenvariablesintheﬁrstlayer.Givenanoriginalfeaturevectorv,weﬁrstinfertheacti-vationprobabilitiesforthehiddenvariablesintheﬁrstlayerusingequation(2).Andweusetheseac-tivationprobabilitiesasvaluesforvisiblevariablesinthesecondlayerRBM.ThenwetrainthesecondlayerRBMusingcontrastivedivergencealgorithm.Notethattheactivationprobabilitiesarenotbinaryvalues.Butthisisonlyatrickfortrainingbecauseusingprobabilitiesgenerallyproducesbettermodels(Hintonetal.,2006).Thistrickdoesnotchangeourassumptionthateachvariableisbinary.3.4.2FineTuningThegreedylayer-wisepretraininginitializestheparametersofourDBNtosensiblevalues.Butthesevaluesarenotoptimalandtheparametersneedtobeﬁnetuned.Forﬁnetuning,weunrolltheDBNtoformanautoencoderasinHintonandSalakhutdinov(2006),whichisshowninFigure5.Inthisautoencoder,thestochasticactivitiesofbi-naryhiddenvariablesarereplacedbyitsactivationprobabilities.Sotheautoencoderisinessenceafeed-forwardneuralnetwork.Wetunetheparam-etersofourDBNmodelonthisautoencoderusingbackpropagationalgorithm.4DomainAdaptationwithOurDBNModelInthissection,weintroducehowtouseourDBNmodeltoadaptabasicsyntacticandsemanticde-

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
3
8
1
5
6
6
7
7
2

/
t

a
C
_
a
_
0
0
1
3
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

276

……………………………………………Figura 5:UnrollingtheDBN.pendencyparsingsystemtotargetdomain.4.1TheBasicPipelinedSystemWebuildatypicalpipelinedsystem,whichﬁrstan-alyzesyntacticdependencies,andthenanalyzese-manticdependencies.Thisbasicsystemonlyservesasaplatformforexperimentingwithdifferentfea-turerepresentations.Sowejustbrieﬂyintroduceourbasicsysteminthissubsection.4.1.1SyntacticDependencyParsingForsyntacticdependencyparsing,weuseade-terministicshift-reducemethodasinNivreetal.,(2006).Ithasfourbasicactions:left-arc,right-arc,shift,andreduce.Aclassiﬁerisusedtodetermineanactionateachstep.Todecidethelabelforeachdependencylink,weextendtheleft/right-arcactionstotheircorrespondingmulti-labelactions,leadingto31left-arcand66right-arcactions.Altogethera99-classproblemisyieldedforparsingactionclassiﬁ-cation.WeaddarcstothedependencygraphinanarceagermannerasinHalletal.,(2007).Wealsoprojectivizethenon-projectivesequencesintrainingdatausingthetransformationfromNivreandNils-son(2005).Amaximumentropyclassiﬁerisusedtomakedecisionsateachstep.ThefeaturesutilizedarethesameasthoseinZhaoetal.,(2008).4.1.2SemanticDependencyParsingOursemanticdependencyparserissimilartotheoneinCheetal.,(2009).Weﬁrsttrainapredicatesenseclassiﬁerontrainingdata,usingthesamefea-turesasinCheetal.,(2009).De nuevo,amaximumen-tropyclassiﬁerisemployed.Givenapredicate,weneedtodecideitssemanticdependencyrelationwitheachwordinthesentence.Toreducethenumberofargumentcandidates,weadoptthepruningstrat-egyinZhaoetal.,(2009),whichisadaptedfromthestrategyinXueandPalmer(2004).Intheseman-ticroleclassiﬁcationstage,weuseamaximumen-tropyclassiﬁertopredicttheprobabilitiesofacan-didatetobeeachsemanticrole.Wetraintwodiffer-entclassiﬁersforverbandnounpredicatesusingthesamefeaturesasinCheetal.,(2009).Weuseasim-plemethodforpostprocessing.Iftherearedupli-cateargumentsforARG0∼ARG5,wepreservetheonewiththehighestclassiﬁcationprobabilityandremoveitsduplicates.4.2AdaptingtheBasicSystemtoTargetDomainInourbasicpipelinesystem,boththesyntacticandsemanticdependencyparsersarebuiltusingdis-criminativemodels.Wetrainasyntacticparsingmodelandasemanticparsingmodelusingtheorig-inalfeaturerepresentation.Wewillrefertothissyn-tacticparsingmodelasOriSynModel,andthese-manticparsingmodelasOriSemModel.However,thesetwomodelsdonotadaptwelltothetargetdo-main.SoweusetheLFRofourDBNmodeltotrainnewsyntacticandsemanticparsingmodels.WewillrefertothenewsyntacticparsingmodelasLatSyn-Model,andthenewsemanticparsingmodelasLat-SemModel.DetailsofusingourDBNmodelareasfollows.4.2.1AdaptingtheSyntacticParserTheinputdatafortrainingourDBNmodelaretheoriginalfeaturevectorsontrainingandunla-beleddata.Therefore,totrainourDBNmodel,weﬁrstneedtoextracttheoriginalfeaturesforsyntacticparsingonthesedata.Featuresontrainingdatacanbedirectlyextractedusinggolden-standardannota-tions.Onunlabeleddata,sin embargo,somefeaturescannotbedirectlyextracted.Thisisbecauseoursyntacticparseruseshistory-basedfeatureswhichdependonpreviousactionstakenwhenparsingasentence.Therefore,featuresonunlabeleddatacanonlybeextractedafterthedataareparsed.Tosolvethisproblem,weﬁrstparsetheunlabeleddatausingthealreadytrainedOriSynModel.Inthisway,nosotros

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
3
8
1
5
6
6
7
7
2

/
t

a
C
_
a
_
0
0
1
3
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

277

canobtainthefeaturesontheunlabeleddata.Be-causeofthepoorperformanceoftheOriSynModelonthetargetdomain,theextractedfeaturesonun-labeleddatacontainssomenoise.However,exper-imentsshowthatourDBNmodelcanstilllearnagoodLFRdespitethenoiseintheextractedfeatures.UsingtheLFR,wecantrainthesyntacticparsingmodelLatSynModel.ThenbyapplyingtheLFRontestandunlabeleddata,wecanparsethedatausingLatSynModel.ExperimentsinlatersectionsshowthattheLatSynModeladaptsmuchbettertothetar-getdomainthantheOriSynModel.4.2.2AdaptingtheSemanticParserThesituationhereissimilartotheadaptationofthesyntacticparser.Featuresontrainingdatacanbedirectlyextracted.Toextractfeaturesonunla-beleddata,weneedtohavesyntacticdependencytreesonthisdata.SoweuseourLatSynModeltoparsetheunlabeleddataﬁrst.Andweautomaticallyidentifypredicatesonunlabeleddatausingaclas-siﬁerasinCheetal.,(2008).Thenweextracttheoriginalfeaturesforsemanticparsingonunlabeleddata.ByfeedingoriginalfeaturesextractedonthesedatatoourDBNmodel,welearntheLFRforse-manticdependencyparsing.UsingtheLFR,wecantrainthesemanticparsingmodelLatSemModel.5Experiments5.1ExperimentSetup5.1.1ExperimentDataWeusetheEnglishdataintheCoNLL2009sharedtaskforexperiments.Thetrainingdataandin-domaintestdataarefromtheWSJcorpus,whereastheout-of-domaintestdataisfromtheBrowncorpus.Wealsouseunlabeleddataconsist-ingofthefollowingsectionsoftheBrowncorpus:k,l,METRO,norte,P.Thetestdataareexcerptsfromﬁc-tions.Theunlabeleddataarealsoexcerptsfromﬁc-tionsorstories,whicharesimilartothetestdata.AlthoughtheunlabeleddataisactuallyannotatedinRelease3ofthePennTreebank,wedonotuseanyinformationcontainedintheannotation,onlyusingtherawtexts.Thetraining,testandunlabeleddatacontains39279,425,and16407sentencesrespec-tively.5.1.2SettingsofOurDBNModelForthesyntacticparsingtask,thereare748,598originalfeaturesintotal.Weuse7,486hiddenvari-ablesintheﬁrstlayerand3,743hiddenvariablesinthesecondlayer.Forsemanticparsing,thereare1,074,786originalfeatures.Weuse10,748hiddenvariablesintheﬁrstlayerand5,374hiddenvariablesinthesecondlayer.InourDBNmodels,weneedtodeterminethenumberofgroupsk.Becauselargerkmeanslesscomputationalcost,kshouldnotbesettoosmall.Weempiricallysetkasfollows:accordingtoourexperience,eachgroupshouldcontainabout5000originalfeatures.Wehaveabout106originalfea-turesinourtasks.Soweestimatek≈106/5000=200.Andwesetktobe200intheDBNmodelsforbothsyntacticandsemanticparsing.Asforsplit-tingstrategy,weusethemoresophisticatedoneinsubsection3.3.1becauseitshouldgeneratebetterre-sultsthanthesimpleone.5.1.3DetailsofDBNTrainingIngreedypretrainingoftheDBN,thecontrastivedivergencealgorithmisconﬁguredasfollows:thetrainingdataisdividedtomini-batches,eachcon-taining100samples.Theweightsareupdatedwithalearningrateof0.3,momentumof0.9,weightde-cayof0.0001.Eachlayeristrainedfor30passes(epochs)overtheentiretrainingdata.Inﬁne-tuning,thebackpropagationalgorithmisconﬁguredasfollows:Thetrainingdataisdividedtomini-batches,eachcontaining50samples.Theweightsareupdatedwithalearningrateof0.1,mo-mentumof0.9,weightdecayof0.0001.Theﬁne-tuningisrepeatedfor50epochsovertheentiretrain-ingdata.WeusethefastcomputingtechniqueinRainaetal.,(2009)tolearntheLFRs.Moreover,ingreedypretraining,wetrainRBMsintheﬁrstlayerinpar-allel.5.2ResultsandDiscussionWeusetheofﬁcialevaluationmeasuresoftheCoNLL2009sharedtask,whichconsistofthreedif-ferentscores:(i)syntacticdependenciesarescoredusingthelabeledattachmentscore,(ii)semanticde-pendenciesareevaluatedusingalabeledF1score,y(iii)theoveralltaskisscoredwithamacroav-

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
3
8
1
5
6
6
7
7
2

/
t

a
C
_
a
_
0
0
1
3
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

278

TestdataSystemLASSemF1MacroF1WSJOri87.6384.8286.24Lat87.3084.2585.80BrownOri79.7271.5775.67Lat82.8478.7580.83Table1:Theresultsofourbasicandadaptedsystemserageofthetwopreviousscores.ThethreescoresabovearerepresentedbyLAS,SemF1,andMacroF1respectivelyinthispaper.5.2.1ComparisonwithUn-adaptedSystemOurbasicsystemusestheOriSynModelforsyn-tacticparsing,andtheOriSemModelforsemanticparsing.OuradaptedsystemusestheLatSynModelforsyntacticparsing,andtheLatSemModelforse-manticparsing.TheresultsofthesetwosystemsareshowninTable1,inwhichourbasicandadaptedsystemsaredenotedasOriandLatrespectively.FromtheresultsinTable1,wecanseethatLatperformsslightlyworsethanOrionin-domainWSJtestdata.Butontheout-of-domainBrowntestdata,LatperformsmuchbetterthanOri,with5pointsim-provementinMacroF1score.Thisshowstheeffec-tivenessofourmethodfordomainadaptationtasks.5.2.2DifferentSplittingConﬁgurationsAsdescribedinsubsection5.1.2,wehaveem-piricallysetthenumberofgroupsktobe200andchosenthemoresophisticatedsplittingstrategy.Inthissubsection,weexperimentwithdifferentsplit-tingconﬁgurationstoseetheireffects.Undereachsplittingconﬁguration,welearntheLFRsusingourtheDBNmodels.UsingtheLFRs,wetesttheouradaptedsystemsonbothin-domainandout-of-domaindata.Thereforewegetmanytestresults,eachcorrespondingtoasplittingconﬁgura-tion.Thein-domainandout-of-domaintestresultsarereportedinTable2andTable3respectively.Inthesetwotables,‘s1’and‘s2’representsthesim-pleandthemoresophisticatedsplittingstrategiesinsubsection3.3.1respectively.‘k’representsthenumberofgroupsinourDBNmodels.Forbothsyntacticandsemanticparsing,weusethesamekintheirDBNmodels.The‘Time’columnreportsthetrainingtimeofourDBNmodelsforbothsyn-tacticandsemanticparsing.Theunitofthe‘Time’StrkTime(h)LASSemF1MacroF1s110039285.9582.4284.1920026185.7682.1483.9530021885.4881.6883.5840019684.8080.2482.52s210039286.2283.0384.6320026186.1082.8984.5030021885.7282.2483.9840019684.9681.1383.05Table2:Resultsofdifferentsplittingconﬁgurationsonin-domainWSJdevelopmentdataStrkTime(h)LASSemF1MacroF1s110039282.8178.7780.8220026182.7378.4980.6330021882.4477.9080.3740019681.8376.7279.31s210039282.9579.0381.0320026182.8478.7580.8330021882.6378.3480.5040019681.9776.9879.51Table3:Resultsofdifferentsplittingconﬁgurationsonout-of-domainBrowntestdatacolumnisthehour.PleasenotethatweonlyneedtotrainourDBNmodelsonce.AndwereportthetrainingtimeinTable2.Foreasyviewing,were-peatthosetrainingtimesinTable3.ButthisdoesnotmeanweneedtotrainnewDBNmodelsforout-of-domaintest.FromTables2and3wegetthefollowingobser-vations:Primero,althoughthemoresophisticatedsplittingstrategy‘s2’generateslightlybetterresultthanthesimplestrategy‘s1’,thedifferenceisnotsigniﬁ-cant.ThismeansthatthehierarchicalstructureofourDBNmodelcanrobustlycapturetherelation-shipsbetweenfeatures.Evenwiththesimplesplit-tingstrategy‘s1’,westillgetquitegoodresults.Second,the‘Time’columninTable2showsthatdifferentsplittingstrategieswiththesamekvaluehasthesametrainingtime.Thisisreasonablebe-causetrainingtimeonlydependsonthenumberofparametersinourDBNmodel.Anddifferentsplit-tingstrategiesdonotaffectthenumberofparame-tersinourDBNmodel.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
3
8
1
5
6
6
7
7
2

/
t

a
C
_
a
_
0
0
1
3
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

279

Tercero,thenumberofgroupskaffectsboththetrainingtimeandtheﬁnalresults.Whenkincreases,thetrainingtimereducesbuttheresultsdegrade.Askgetslarger,thetimereductiongetslessobvious,butthedegradationofresultsgetsmoreobvious.Whenk=100,200,300,thereisnotmuchdiffer-encebetweentheresults.ThisshowsthattheresultsofourDBNmodelisnotsensitivetothevaluesofkwithinarangeof100aroundourinitialestima-tion200.Butwhenkisfurtherawayfromoures-timation,e.g.k=400,theresultsgetsigniﬁcantlyworse.PleasenotethattheresultsinTables2and3arenotusedtotunetheparameterkortochooseasplit-tingstrategyinourDBNmodel.Asmentionedinsubsection5.1.2,wehavechosenk=200andthemoresophisticatedsplittingstrategybeforehand.Inthispaper,wealwaysusetheresultswithk=200andthe‘s2’strategyasourmainresults,eventhoughtheresultswithk=100arebetter.5.3TheSizeofUnlabeledTargetDomainDataAninterestingquestionforourmethodishowmuchunlabeledtargetdomaindatashouldbeused.Toem-piricallyanswerthisquestion,welearnseveralLFRsbygraduallyaddingmoreunlabeleddatatotrainourDBNmodel.WecomparedtheperformanceoftheseLFRsasshowninFigure6. 74767880828486880300060009000120001500018000Target Domain TestSource Domain TestFigure6:MacroF1scoresontestdatawithrespecttothesizeofunlabeledtargetdomaindatausedinDBNtrain-ing.ThehorizontalaxisisthenumberofsentencesinunlabeledtargetdomaindataandthecoordinateaxisistheMacroF1Score.FromFigure6,wecanseethatbyaddingmoreunlabeledtargetdomaindata,oursystemadaptsbet-tertothetargetdomainwithonlysmalldegradationofresultonsourcedomain.However,withmoreun-labeleddataused,theimprovementontargetdomainresultgraduallygetssmaller.5.4ComparisonwithothermethodsInthissubsection,wecompareourmethodwithsev-eralsystems.Thesearedescribedbelow.Daume07.Daum´eIII(2007)proposedasimpleandeffectiveadaptationmethodbyaugmentingfea-turevector.Itsmainideaistoaugmentthefeaturevector.Theytookeachfeatureintheoriginalprob-lemandmadethreeversionsofit:ageneralversion,asource-speciﬁcversionandatarget-speciﬁcver-sion.Thus,theaugmentedsourcedatacontainsonlygeneralandsource-speciﬁcversions;theaugmentedtargetdatacontainsgeneralandtarget-speciﬁcver-sions.Inthebaselinesystem,weadoptthesametechniquefordependencyandsemanticparsing.Chen.TheparticipationsystemofZhaoetal.,(2009),reachedthebestresultintheout-of-domaintestoftheCoNLL2009sharedtask.InDaum´eIIIandandMarcu(2006),theypre-sentedanddiscussedseveral‘obvious’waystoat-tackthedomainadaptationproblemwithoutdevel-opingnewalgorithms.Followingtheiridea,wecon-structsimilarsystems.OnlySrc.Thesystemistrainedononlythedataofthesourcedomain(Noticias).OnlyTgt.Thesystemistrainedononlythedataofthetargetdomain(Fiction).All.Thesystemistrainedonalldataofthesourcedomainandthetargetdomain.ItisworthnotingthattrainingthesystemsofDaume07,OnlyTgtandAllneedthelabeleddataofthetargetdomain.WeutilizeOnlySrctoparsetheunlabeleddataofthetargetdomaintogeneratethelabeleddata.ALlcomparisonresultsareshowninTable4,inwhichthe‘Diff’columnisthedifferenceofscoresonin-domainandout-of-domaintestdata.First,wecompareOnlySrc,OnlyTgtandAll.WecanseethatOnlyTgtperformsverypoorbothinthesourcedomainandinthetargetdomain.ItisnothardtounderstandthatOnlyTgtperformspoorinthesourcedomainbecauseoftheadaptationprob-lem.OnlyTgtalsoperformspoorinthetargetdo-main.WethinkthemainreasonisthatOnlyTgtistrainedontheautoparseddatainwhichthereare

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
3
8
1
5
6
6
7
7
2

/
t

a
C
_
a
_
0
0
1
3
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

280

ScoreSystemWSJBrownDiffLASOnlySrc87.6379.727.91OnlyTgt73.2578.305.05All87.4180.546.87Daume0787.4780.467.01Chen89.1982.386.81Ours87.3082.844.46SemF1OnlySrc84.8271.5713.25OnlyTgt73.7470.343.40All84.6872.7511.93Daume0784.5272.9011.62Chen86.1574.5811.57Ours84.2578.755.50MacroF1OnlySrc86.2475.6710.57OnlyTgt73.5074.320.82All86.0476.659.40Daume0786.0076.689.32Chen87.6978.519.18Ours85.8080.834.97Table4:Comparisonwithothermethods.manyparsingerrors.ButwenotethatAllperformsbetterthanbothOnlySrcandOnlyTgtonthetargetdomaintest,althoughitstrainingdatacontainssomeautoparseddata.Therefore,thedataofthetargetdomain,labeledorunlabeled,arepotentialinalle-viatingtheadaptationproblemofdifferentdomains.ButAlljustputstheautoparseddataofthetargetdomainintothetrainingset.Thus,itsimprovementonthetestdataofthetargetdomainislimited.Infact,howtousethedataofthetargetdomain,espe-ciallytheunlabeleddata,intheadaptationproblemisstillanopenandhottopicinNLPandmachinelearning.Second,wecompareDaume07,Allandourmethod.InDaume07,theyreportedimprovementonthetargetdomaintest.Butonepointtonoteisthatthetargetdomaindatausedintheirexperi-mentsislabeledwhileinourcasethereisonlyun-labeleddata.WecanseeDaume07havecompara-bleperformancewithAllinwhichthereisnotanyadaptationstrategybesidesaddingmoredataofthetargetdomain.Wethinkthemainreasonisthattherearemanyparsingerrorsinthedataofthetar-getdomain.ButourmethodperformsmuchbetterthanDaume07andAlleventhoughsomefaultydataarealsoutilizedinoursystem.Thissuggeststhatourmethodsuccessfullylearnsnewrobustrepresen-tationsfordifferentdomains,evenwhentherearesomenoisydata.Third,wecompareChenwithourmethod.Chenreachedthebestresultintheout-of-domaintestoftheCoNLL2009sharedtask.TheresultsinTable4showthatChen’ssystemperformsbetterthanoursonin-domaintestdata,especiallyonLASscore.Chen’ssystemusesasophisticatedgraph-basedsyn-tacticdependencyparser.Graph-basedparsersusesubstantiallymorefeatures,e.g.morethan1.3×107featuresareusedinMcDonaldetal.,(2005).LearninganLFRforthatmanyfeatureswouldtakemonthsoftimeusingourDBNmodel.Soatpresentweonlyuseatransition-basedparser.Thebetterper-formanceofChen’ssystemmainlycomesfromtheirsophisticatedsyntacticparsingmethod.Toreducethesparsityoffeatures,Chen’ssys-temuseswordclusterfeaturesasinKooetal.,(2008).Onout-of-domaintests,sin embargo,oursys-temstillperformsmuchbetterthanChen’s,espe-ciallyonsemanticparsing.Toourknowledge,onout-of-domaintestsonthisdataset,oursystemhasobtainedthebestperformancetodate.Moreim-portantly,theperformancedifferencebetweenindo-mainandout-of-domaintestsismuchsmallerinoursystem.Thisshowsthatoursystemadaptsmuchbettertothetargetdomain.6ConclusionsInthispaper,weproposeaDBNmodeltolearnLFRsforsyntacticandsemanticparsers.TheseLFRsarecommonrepresentationsoforiginalfea-turesinbothsourceandtargetdomains.SyntacticandsemanticparsersusingtheLFRsadapttotar-getdomainmuchbetterthanthesameparsersus-ingoriginalfeaturerepresentation.Ourmodelpro-videsauniﬁedmethodthatadaptsbothsyntacticandsemanticdependencyparserstoanewdomain.Inthefuture,wehopetofurtherscaleupourmethodtoadaptparsingmodelsusingsubstantiallymorefeatures,suchasgraph-basedsyntacticdependencyparsingmodels.Wewillalsosearchforbettersplit-tingstrategiesforourDBNmodel.Finally,althoughourexperimentsareconductedonsyntacticandse-manticparsing,itisexpectedthattheproposedap-

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
3
8
1
5
6
6
7
7
2

/
t

a
C
_
a
_
0
0
1
3
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

281

proachcanbeappliedtothedomainadaptationofothertaskswithlittleadaptationefforts.AcknowledgementsTheresearchworkhasbeenpartiallyfundedbytheNaturalScienceFoundationofChinaunderGrantNo.61333018andsupportedbytheWestLightFoundationofChineseAcademyofSciencesunderGrantNo.LHXZ201301.Wethankthethreeanony-mousreviewersandtheActionEditorfortheirhelp-fulcommentsandsuggestions.ReferencesYoshuaBengio.2009.LearningDeepArchitecturesforAI.InFoundationsandTrendsinMachineLearning,2(1):1-127.JohnBlitzer,RyanMcDonaldandFernandoPereira.2006.DomainAdaptationwithsturcturalcorrespon-dancelearning.InProceedingsofACL-2006.WanxiangChe,ZhenghuaLi,YuxuanHu,YongqiangLi,BingQin,TingLiuandShengLi.2008.ACascadedSyntacticandSemanticDependencyParsingSystem.InProceedingsofCoNLL-2008sharedtask.WanxiangChe,ZhenghuaLi,YongqiangLi,YuhangGuo,BingQinandTingLiu.2009.MultilingualDependency-basedSyntacticandSemanticParsing.InProceedingsofCoNLL-2009sharedtask.WenliangChen,YouzhengWuandHitoshiIsahara.2008.Learningreliableinformationfordependencyparsingadaptation.InProceedingsofCOLING-2008.HalDaum´eIII.2007.FrustratinglyEasyDomainAdap-tation.InProceedingsofACL-2007.HalDaum´eIIIandDanielMarcu.2006.DomainAdap-tationforStatisticalClassifer.InJournalofArtiﬁcialIntelligenceResearch,26(2006),101-126.MarkDredze,JohnBlitzer,ParthaP.Talukdar,KuzmanGanchev,JoaoGracaandFernandoPereira.2007.FrustratinglyHardDomainAdaptationforDepen-dencyParsing.InProceedingsofEMNLP-CoNLL-2007.XavierGlorot,AntoineBordesandYoshuaBengio.2011.DomainAdaptationforLarge-ScaleSentimentClassiﬁcation:ADeepLearningApproach.InPro-ceedingsofInternationalConferenceonMachineLearning(ICML)2011.DanielGildeaandDanielJurafsky.2002.Automaticla-belingforsemanticroles.InComputationalLinguis-tics,28(3):245-288.I.Goodfellow,Q.Le,A.SaxeandA.Ng.2009.Mea-suringinvariancesindeepnetworks.InProceedingsofAdvancesinNeuralInformationProcessingSys-tems(NIPS)2011.JanHajiˇc,MassimilianoCiaramita,RichardJohans-son,DaisukeKawahara,MariaAnt`oniaMart´ı,Llu´ısM`arquez,AdamMeyers,JoakimNivre,SebastianPad´o,JanˇStˇep´anek,PavelStraˇn´ak,MihaiSurdeanu,NianwenXueandYiZhang.2009.TheCoNLL-2009SharedTask:SyntacticandSemanticDependenciesinMultipleLanguages.InProceedingsofCoNLL-2009.J.Hall,J.Nilsson,J.Nivre,G.Eryiˇgit,B.Megyesi,M.Nilsson,andM.Saers.2007.SingleMaltorBlended?AStudyinMultilingualParserOptimization.InPro-ceedingsofEMNLP-CoNLL-2007.GeoffreyHinton.2010.APracticalGuidetoTrain-ingRestrictedBoltzmannMachines.InTechnicalre-port2010-003,MachineLearningGroup,UniversityofToronto.GeoffreyHinton.2002.Trainingproductsofexpertsbyminimizingconstrastivedivergence.InNeuralCom-putation,14(8):1711-1800.GeoffreyHinton,SimonOsinderoandYee-WhyeTeh.2006.Afastlearningalgorithmfordeepbeliefnets.InNeuralComputation,18(7):1527-1554.GeoffreyHintonandR.Salakhutdinov.2006.Reducingthedimensionalityofdatawithneuralnetworks.InScience,313(5786),504-507.RichardJohanssonandPierreNugues.2008.Dependency-basedsemanticrolelabelingofProp-Bank.InProceedingsofEMNLP-2008.TerryKoo,XavierCarrerasandMichaelCollins.2008.SimpleSemi-supervisedDependencyParsing.InPro-ceedingsofACL-HLT-2008.Llu´ısM`arquez,XavierCarreras,KennethC.LitkowskiandSuzanneStevenson.2008.SemanticRoleLabel-ing:AnIntroductiontotheSpecialIssue.InCompu-tationalLinguistics,34(2):145-159.RyanMcDonald,FernandoPereira,JanHajˇc,andKirilRibarov.2005.Non-projectivedependencyparsingusingspanningtreealgortihms.InProceedingsofNAACL-HLT-2005.J.Nivre,J.Hall,S.K¨ubler,R.Mcdonald,J.Nilsson,S.Riedel,andD.Yuret.2007.TheCoNLL2007SharedTaskonDependencyParsing.InProceedingsofCoNLL-2007.J.Nivre,J.Hall,J.Nilsson,G.EryiˇgitandS.Marinov.2006.LabeledPseudo-ProjectiveDependencyParsingwithSupportVectorMachines.InProceedingsofCoNLL-2006.J.Nivre,andJ.Nilsson.2005.Pseudo-projectivedepen-dencyparsing.InProceedingsofACL-2005.RajatRaina,AnandMadhavan,andAndrewY.Ng.2009.Large-scaleDeepUnsupervisedLearningus-ingGraphicsProcessors.InProceedingsofthe26th

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu

/
t

a
C
yo
/

a
r
t
i
C
mi
–
pag
d

F
/

d
oh

i
/

1
0
1
1
6
2

/
t

a
C
_
a
_
0
0
1
3
8
1
5
6
6
7
7
2

/
t

a
C
_
a
_
0
0
1
3
8
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
8
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

282

AnnualInternationalConferenceonMachineLearn-ing(ICML),pages152-164.MihaiSurdeanu,RichardJohansson,AdamMeyers,Llu´ısM`arquezandJoakimNivre.2008.TheCoNLL-2008SharedTaskonJointParsingofSyntacticandSemanticDependencies.InProceedingsofCoNLL-2008.IvanTitov.2011.DomainAdaptationbyConstrainingInter-DomainVariabilityofLatentFeatureRepresen-tation.InProceedingsofACL-2011.JosephTurian,LevRatinovandYoshuaBengio.2010.Wordrepresentations:asimpleandgeneralmethodforsemi-supervisedlearning.InProceedingsofACL-2010.J.Weston,F.Rattle,andR.Collobert.2008.DeepLearn-ingviaSemi-SupervisedEmbedding.InProceed-ingsofInternationalConferenceonMachineLearn-ing(ICML).NianwenXueandMarthaPalmer.2004.Calibratingfea-turesforsemanticrolelabeling.InProceedingsofEMNLP-2004.HaitongYangandChengqingZong.2014.Multi-PredicateSemanticRoleLabeling.InProceedingsofEMNLP-2014.HaiZhao,WenliangChen,ChunyuKit,GuodongZhou.2009.MultilingualDependencyLearning:ExploitingRichFeaturesforTaggingSyntacticandSemanticDe-pendencies.InProceedingsofCoNLL-2009sharedtask.HaiZhaoandChunyuKit.2008.ParsingSyntacticandSemanticDependencieswithTwoSingle-StageMax-imumEntropyModels.InProceedingsofCoNLL-2008.TaoZhuangandChengqingZong.2010a.AMinimumErrorWeightingCombinationStrategyforChineseSe-manticRoleLabeling.InProceedingsofCOLING2010.TaoZhuangandChengqingZong.2010b.JointInferenceforBilingualSemanticRoleLabeling.InProceedingsofEMNLP2010. Transacciones de la Asociación de Lingüística Computacional, volumen. 3, páginas. 271–282, 2015. Editor de acciones: Hal Daum´e III. imagen

Transacciones de la Asociación de Lingüística Computacional, volumen. 3, páginas. 271–282, 2015. Editor de acciones: Hal Daum´e III. imagen

Descargar PDF