Saturated Transformers are Constant-Depth Threshold Circuits William Merrill∗† Ashish Sabharwal∗ Noah A. Smith∗‡ ∗Allen Institute for AI, USA †New York University, USA ‡University of Washington, USA willm@nyu.edu {ashishs, noah}@allenai.org Abstract Transformers have become a standard…
Suchkategorieangehen
Generate, Annotate, and Learn: NLP with Synthetic Text
Generate, Annotate, and Learn: NLP with Synthetic Text Xuanli He1 Islam Nassar1 Jamie Kiros2 Gholamreza Haffari1 Mohammad Norouzi2 1Monash University, Australia 2Google Research, Brain Team, Kanada {xuanli.he1, gholamreza.haffari}@monash.edu, mnorouzi@google.com Abstract This paper studies the use…
High Quality Rather than High Model Probability:
High Quality Rather than High Model Probability: Minimum Bayes Risk Decoding with Neural Metrics Markus Freitag, David Grangier, Qijun Tan, Bowen Liang Google Research, USA {freitag, grangier, qijuntan, bowenl}@google.com Abstract In Neural Machine Translation, Es…
Formal Language Recognition by Hard Attention Transformers:
Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity Yiding Hao, Dana Angluin, and Robert Frank Yale University New Haven, CT, USA {yiding.hao, robert.frank, dana.angluin}@yale.edu Abstract This paper analyzes three formal models of…
Minimum Description Length Recurrent Neural Networks
Minimum Description Length Recurrent Neural Networks Nur Lan1,2, Michal Geyer2, Emmanuel Chemla1,3∗, Roni Katzir2∗ 1Ecole Normale Sup´erieure, France 2Tel Aviv University, Israel 3EHESS, PSL University, CNRS {nlan,chemla}@ens.fr michalgeyer@mail.tau.ac.il rkatzir@tauex.tau.ac.il Abstract We train neural networks to…
Heterogeneous Supervised Topic Models
Heterogeneous Supervised Topic Models Dhanya Sridhar(cid:2) and Hal Daum´e III† and David Blei ♠ (cid:2)Universit´e de Montr´eal and Mila-Quebec AI Institute, Canada dhanya.sridhar@mila.quebec †University of Maryland and Microsoft Research, USA hal3@umd.edu ♠Columbia University, USA david.blei@columbia.edu…
Fact Checking with Insufficient Evidence
Fact Checking with Insufficient Evidence Pepa Atanasova Jakob Grue Simonsen Christina Lioma Isabelle Augenstein Department of Computer Science, University of Copenhagen, Denmark {pepa, simonsen, c.lioma, augenstein}@di.ku.dk Abstract Automating the fact checking (FC) process relies on…
True Few-Shot Learning with Prompts—A Real-World Perspective
True Few-Shot Learning with Prompts—A Real-World Perspective Timo Schick and Hinrich Sch ¨utze Center for Information and Language Processing (CIS), LMU Munich, Germany schickt@cis.lmu.de, inquiries@cislmu.org Abstract Prompt-based approaches excel at few-shot learning. Jedoch, Perez et…
Data-to-text Generation with Variational Sequential Planning
Data-to-text Generation with Variational Sequential Planning Ratish Puduppully and Yao Fu and Mirella Lapata Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB, UK r.puduppully@sms.ed.ac.uk yao.fu@ed.ac.uk…
Uncertainty Estimation and Reduction of Pre-trained Models for
Uncertainty Estimation and Reduction of Pre-trained Models for Text Regression Yuxia Wang(cid:2) Daniel Beck(cid:2) Timothy Baldwin(cid:2) Karin Verspoor†(cid:2) (cid:2) The University of Melbourne, Melbourne, Victoria, Australia †RMIT University, Melbourne, Victoria, Australia yuxiaw@student.unimelb.edu.au d.beck@unimelb.edu.au tb@ldwin.net karin.verspoor@rmit.edu.au…