What topic do you need documentation on?
Robust Dialogue State Tracking with Weak Supervision and Sparse Data
Robust Dialogue State Tracking with Weak Supervision and Sparse Data Michael Heck, Nurul Lubis, Carel van Niekerk, Shutong Feng, Christian Geishauser, Hsien-Chin Lin, Milica Gaˇsi´c Heinrich Heine University D¨usseldorf, Allemagne {heckmi,lubis,niekerk,fengs,geishaus,linh,gasic}@hhu.de Abstract Generalizing dialogue state tracking (DST) to new data is especially challenging due to the strong reliance on abundant and fine-grained supervision during training. Sample sparsity, distributional shift, and the occurrence of new concepts
Learning Fair Representations via Rate-Distortion Maximization
Learning Fair Representations via Rate-Distortion Maximization Somnath Basu Roy Chowdhury and Snigdha Chaturvedi UNC Chapel Hill, Etats-Unis {somnath, snigdha}@cs.unc.edu Abstract representations Text learned by machine learning models often encode undesirable de- mographic information of the user. Predictive models based on these representations can rely on such information, resulting in biased deci- sions. We present a novel debiasing technique, Fairness-aware Rate Maximization (FaRM), that removes protected
Causal Inference in Natural Language Processing:
Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond Amir Feder1,10∗, Katherine A. Keith2, Emaad Manzoor3, Reid Pryzant4, Dhanya Sridhar5, Zach Wood-Doughty6, Jacob Eisenstein7, Justin Grimmer8, Roi Reichart1, Margaret E. Roberts9, Brandon M. Stewart10, Victor Veitch7,11, and Diyi Yang12 1Technion – Israel Institute of Technology, Israel 2Williams College, USA 3University of Wisconsin – Madison, USA 4Microsoft, USA 5Columbia University, Canada 6Northwestern University, Etats-Unis
Getting BART to Ride the Idiomatic Train: Learning to Represent
Getting BART to Ride the Idiomatic Train: Learning to Represent Idiomatic Expressions Ziheng Zeng and Suma Bhat Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Champaign, IL USA {zzeng13, spbhat2}@illinois.edu Abstract Idiomatic expressions (IEs), characterized by their non-compositionality, are an impor- langue. They have tant part of natural been a classical challenge to NLP, including pre-trained language models that drive today’s state-of-the-art.
DP-Parse: Finding Word Boundaries from Raw Speech
DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon Robin Algayres1,3,5 Tristan Ricoul3 Julien Karadayi3 Hugo Laurenc¸on3 Salah Zaiem3 Abdelrahman Mohamed2 Benoˆıt Sagot4 Emmanuel Dupoux1,3,4,5 Meta AI Research, France1, Meta AI Research, USA2, ENS/PSL, Paris, France3, EHESS, Paris, France4, Inria, Paris, France5, {robin.algayres, benoit.sagot}@inria.fr dpx@fb.com Abstract Finding word boundaries in continuous speech is challenging as there is little or no equivalent of a
On Decoding Strategies for Neural Text Generators
On Decoding Strategies for Neural Text Generators Gian Wiher Clara Meister Ryan Cotterell ETH Z¨urich, Suisse {gian.wiher, clara.meister, ryan.cotterell}@inf.ethz.ch Abstract When generating text from probabilistic mod- le, the chosen decoding strategy has a pro- found effect on the resulting text. Yet the properties elicited by various decoding strate- gies do not always transfer across natural lan- guage generation tasks. Par exemple, while mode-seeking methods like
How to Dissect a Muppet: The Structure of
How to Dissect a Muppet: The Structure of Transformer Embedding Spaces Timothee Mickus∗ University of Helsinki, Finland timothee.mickus @helsinki.fi Denis Paperno Utrecht University, The Netherlands d.paperno@uu.nl Mathieu Constant Universit´e de Lorraine, CNRS,ATILF, France Mathieu.Constant @univ-lorraine.fr Abstract Pretrained embeddings based on the Trans- former architecture have taken the NLP community by storm. We show that they can mathematically be reframed as a sum of vector factors
Compositional Generalization in Multilingual
Compositional Generalization in Multilingual Semantic Parsing over Wikidata Ruixiang Cui, Rahul Aralikatte, Heather Lent, and Daniel Hershcovich Department of Computer Science University of Copenhagen, Denmark {rc, rahul, hcl, dh}@di.ku.dk Abstract Semantic parsing (SP) allows humans to lever- age vast knowledge resources through natural interaction. Cependant, parsers are mostly de- signed for and evaluated on English resources, such as CFQ (Keysers et al., 2020), the cur-
Learning English with Peppa Pig
Learning English with Peppa Pig Mitja Nikolaus Aix-Marseille University, France mitja.nikolaus@univ-amu.fr Afra Alishahi Tilburg University, The Netherlands a.alishahi@uvt.nl Grzegorz Chrupała Tilburg University, The Netherlands grzegorz@chrupala.me Abstract Recent computational models of the acqui- sition of spoken language via grounding in perception exploit associations between spo- ken and visual modalities and learn to represent speech and visual data in a joint vector space. A major unresolved issue
Temporal Effects on Pre-trained Models for Language Processing Tasks
Temporal Effects on Pre-trained Models for Language Processing Tasks Oshin Agarwal University of Pennsylvania, USA oagarwal@seas.upenn.edu Ani Nenkova Adobe Research, USA nenkova@adobe.com Abstract Keeping the performance of language tech- nologies optimal as time passes is of great practical interest. We study temporal effects on model performance on downstream language tasks, establishing a nuanced terminology for such discussion and identifying factors essen- tial to conduct a
Dependency Parsing with Backtracking using
Dependency Parsing with Backtracking using Deep Reinforcement Learning Franck Dary, Maxime Petit, Alexis Nasr Aix Marseille Univ, Universit´e de Toulon, CNRS, LIS, Marseille, France {franck.dary,maxime.petit,alexis.nasr}@lis-lab.fr Abstract Greedy algorithms for NLP such as transition- based parsing are prone to error propagation. One way to overcome this problem is to al- low the algorithm to backtrack and explore an alternative solution in cases where new evi- dence
A Survey of Text Games for Reinforcement Learning Informed by
A Survey of Text Games for Reinforcement Learning Informed by Natural Language Philip Osborne Department of Computer Science University of Manchester United Kingdom philiposbornedata@gmail.com Heido N˜omm Department of Computer Science University of Manchester United Kingdom heidonomm@gmail.com Andr´e Freitas Department of Computer Science University of Manchester United Kingdom andre.freitas@manchester.ac.uk Abstract Reinforcement Learning has shown success in a number of complex virtual environments. Cependant, many challenges still
Reducing Conversational Agents’ Overconfidence Through
Reducing Conversational Agents’ Overconfidence Through Linguistic Calibration Sabrina J. Mielke1,2 Arthur Szlam2 Emily Dinan2 Y-Lan Boureau2 1Department of Computer Science, Université Johns Hopkins, USA 2Facebook AI Research, USA sjmielke@jhu.edu {aszlam,edinan,ylan}@fb.com Abstract While improving neural dialogue agents’ fac- tual accuracy is the object of much research, another important aspect of communication, less studied in the setting of neural dialogue, is transparency about ignorance. In this work,
Saturated Transformers are Constant-Depth Threshold Circuits
Saturated Transformers are Constant-Depth Threshold Circuits William Merrill∗† Ashish Sabharwal∗ Noah A. Smith∗‡ ∗Allen Institute for AI, USA †New York University, USA ‡University of Washington, USA willm@nyu.edu {ashishs, noah}@allenai.org Abstract Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages. Recent work has shown that transformers with hard atten- tion are quite
Generate, Annotate, and Learn: NLP with Synthetic Text
Generate, Annotate, and Learn: NLP with Synthetic Text Xuanli He1 Islam Nassar1 Jamie Kiros2 Gholamreza Haffari1 Mohammad Norouzi2 1Monash University, Australia 2Google Research, Brain Team, Canada {xuanli.he1, gholamreza.haffari}@monash.edu, mnorouzi@google.com Abstract This paper studies the use of language mod- els as a source of synthetic unlabeled text for NLP. We formulate a general framework called ‘‘generate, annotate, et apprendre (GAL)’’ to take advantage of synthetic text
High Quality Rather than High Model Probability:
High Quality Rather than High Model Probability: Minimum Bayes Risk Decoding with Neural Metrics Markus Freitag, David Grangier, Qijun Tan, Bowen Liang Google Research, Etats-Unis {freitag, grangier, qijuntan, bowenl}@google.com Abstract In Neural Machine Translation, it is typically assumed that the sentence with the high- est estimated probability should also be the translation with the highest quality as mea- sured by humans. In this work, nous
Formal Language Recognition by Hard Attention Transformers:
Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity Yiding Hao, Dana Angluin, and Robert Frank Yale University New Haven, CT, Etats-Unis {yiding.hao, robert.frank, dana.angluin}@yale.edu Abstract This paper analyzes three formal models of Transformer encoders that differ in the form of their self-attention mechanism: unique hard attention (UHAT); generalized unique hard at- tension (GUHAT), which generalizes UHAT; and averaging hard attention (AHAT). Nous
Minimum Description Length Recurrent Neural Networks
Minimum Description Length Recurrent Neural Networks Nur Lan1,2, Michal Geyer2, Emmanuel Chemla1,3∗, Roni Katzir2∗ 1Ecole Normale Sup´erieure, France 2Tel Aviv University, Israel 3EHESS, PSL University, CNRS {nlan,chemla}@ens.fr michalgeyer@mail.tau.ac.il rkatzir@tauex.tau.ac.il Abstract We train neural networks to optimize a Min- imum Description Length score, c'est, to balance between the complexity of the net- work and its accuracy at a task. We show that networks optimizing this