Documentation - Recherche en IA spécialisée au MIT

What topic do you need documentation on?

EDITOR: An Edit-Based Transformer with Repositioning

EDITOR: An Edit-Based Transformer with Repositioning for Neural Machine Translation with Soft Lexical Constraints Weijia Xu University of Maryland weijia@cs.umd.edu Marine Carpuat University of Maryland marine@cs.umd.edu Abstract We introduce an Edit-Based TransfOrmer with Repositioning (EDITOR), which makes sequence generation flexible by seamlessly allowing users to specify preferences in out- put lexical choice. Building on recent models for non-autoregressive sequence generation (Gu et al., 2019), EDITOR

Aligning Faithful Interpretations with their Social Attribution

Aligning Faithful Interpretations with their Social Attribution Alon Jacovi Bar Ilan University alonjacovi@gmail.com Yoav Goldberg Bar Ilan University and Allen Institute for AI yoav.goldberg@gmail.com Abstract We find that the requirement of model inter- pretations to be faithful is vague and incom- plete. With interpretation by textual highlights as a case study, we present several failure cases. Borrowing concepts from social science, we identify that the

Morphology Matters: A Multilingual Language Modeling Analysis

Morphology Matters: A Multilingual Language Modeling Analysis Hyunji Hayley Park University of Illinois hpark129@illinois.edu Katherine J. Zhang Carnegie Mellon University kjzhang@cmu.edu Coleman Haley Johns Hopkins University chaley7@jhu.edu Kenneth Steimel Indiana University ksteimel@iu.edu Han Liu University of Chicago∗ hanliu@uchicago.edu Lane Schwartz University of Illinois lanes@illinois.edu Abstract Prior studies in multilingual language model- ing (par exemple., Cotterell et al., 2018; Mielke et al., 2019) disagree on whether or

Supertagging the Long Tail with Tree-Structured Decoding

Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories Jakob Prange Nathan Schneider Vivek Srikumar Georgetown University University of Utah {jp1724, nathan.schneider}@georgetown.edu svivek@cs.utah.edu l D o w n o a d e d f r o m h t t p : / / direct . m i t . e d u / t a c l /

Infusing Finetuning with Semantic Dependencies

Infusing Finetuning with Semantic Dependencies Zhaofeng Wu♠ Hao Peng♠ Noah A. Smith♠♦ ♠Paul G. Allen School of Computer Science & Engineering, University of Washington ♦Allen Institute for Artificial Intelligence {zfw7,hapeng,nasmith}@cs.washington.edu Abstract For natural language processing systems, two kinds of evidence support the use of text representations from neural language models ‘‘pretrained’’ on large unannotated corpora: performance on application-inspired bench- marks (Peters et al., 2018, inter

WikiAsp: A Dataset for Multi-domain Aspect-based Summarization

WikiAsp: A Dataset for Multi-domain Aspect-based Summarization Prashant Budania2 Hiroaki Hayashi1 Peng Wang2 Chris Ackerson2 Raj Neervannan2 Graham Neubig1 1Language Technologies Institute, Carnegie Mellon University 2AlphaSense {hiroakih,gneubig}@cs.cmu.edu {pbudania,pwang,cackerson,rneervannan}@alpha-sense.com Abstract Aspect-based summarization is the task of gen- erating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. Cependant, due to

Latent Compositional Representations Improve Systematic

Latent Compositional Representations Improve Systematic Generalization in Grounded Question Answering Ben Bogin1 Sanjay Subramanian2 Matt Gardner2 Jonathan Berant1,2 1Tel-Aviv University 2Allen Institute for AI {ben.bogin,joberant}@cs.tau.ac.il, {sanjays,mattg}@allenai.org Abstract Answering questions that involve multi-step reasoning requires decomposing them and using the answers of intermediate steps to reach the final answer. Cependant, state-of-the- art models in grounded question answering often do not explicitly perform decomposition, leading to difficulties

KEPLER: A Unified Model for Knowledge Embedding and

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation Xiaozhi Wang1, Tianyu Gao3, Zhaocheng Zhu4,5, Zhengyan Zhang1 Zhiyuan Liu1,2∗, Juanzi Li1,2, and Jian Tang4,6,7∗ 1Department of CST, BNRist; 2KIRC, Institute for AI, Tsinghua University, Beijing, Chine {wangxz20,zy-z19}@mails.tsinghua.edu.cn {liuzy,lijuanzi}@tsinghua.edu.cn 3Department of Computer Science, Princeton University, Princeton, New Jersey, USA tianyug@princeton.edu 4Mila – Qu´ebec AI Institute; 5Univesit´e de Montr´eal; 6HEC, Montr´eal, Canada zhaocheng.zhu@umontreal.ca, jian.tang@hec.ca 7CIFAR AI

Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals

Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals Yanai Elazar1,2 Shauli Ravfogel1,2 Alon Jacovi1 Yoav Goldberg1,2 1Computer Science Department, Bar Ilan University 2Allen Institute for Artificial Intelligence {yanaiela,shauli.ravfogel,alonjacovi,yoav.goldberg}@gmail.com Abstract A growing body of work makes use of probing in order to investigate the working of neural models, often considered black boxes. Recently, an ongoing debate emerged surrounding the limitations of the probing paradigm. In this work,

Recursive Non-Autoregressive Graph-to-Graph Transformer

Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement Alireza Mohammadshahi Idiap Research Institute / EPFL alireza.mohammadshahi@idiap.ch James Henderson Idiap Research Institute james.henderson@idiap.ch Abstract We propose the Recursive Non-autoregressive architecture Graph-to-Graph Transformer (RNGTr) for the iterative refinement of arbi- trary graphs through the recursive application of a non-autoregressive Graph-to-Graph Trans- former and apply it to syntactic dependency parsing. We demonstrate the power and effec-

Modeling Content and Context with Deep Relational Learning

Modeling Content and Context with Deep Relational Learning Maria Leonor Pacheco and Dan Goldwasser Department of Computer Science Purdue University West Lafayette, IN 47907 {pachecog, dgoldwas}@purdue.edu Abstract Building models for realistic natural language tasks requires dealing with long texts and ac- counting for complicated structural depen- dencies. Neural-symbolic representations have emerged as a way to combine the reasoning capabilities of symbolic methods, with the expressiveness

Augmenting Transformers with KNN-Based

Augmenting Transformers with KNN-Based Composite Memory for Dialog Angela Fan Facebook AI Research Universit´e de Lorraine LORIA angelafan@fb.com Claire Gardent CNRS/LORIA claire.gardent@loria.fr Chlo´e Braud CNRS/IRIT chloe.braud@irit.fr Antoine Bordes Facebook AI Research abordes@fb.com Abstract Various machine learning tasks can benefit from access to external information of different modalities, such as text and images. Recent work has focused on learning architectures with large memories capable of storing

On the Relationships Between the Grammatical Genders of Inanimate

On the Relationships Between the Grammatical Genders of Inanimate Nouns and Their Co-Occurring Adjectives and Verbs Adina Williams∗1 Ryan Cotterell∗,2,3 Lawrence Wolf-Sonkin4 Dami´an Blasi5 Hanna Wallach6 2ETH Z¨urich 5Universit¨at Z¨urich 1Facebook AI Research 4Johns Hopkins University 3University of Cambridge 6Microsoft Research adinawilliams@fb.com ryan.cotterell@inf.ethz.ch lawrencews@jhu.edu damian.blasi@uzh.ch wallach@microsoft.com Abstract We use large-scale corpora in six different gendered languages, along with tools from NLP and information theory, à

Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

Deciphering Undersegmented Ancient Scripts Using Phonetic Prior Jiaming Luo CSAIL, MIT j luo@csail.mit.edu Frederik Hartmann University of Konstanz frederik.hartmann @uni-konstanz.de Enrico Santus Bayer enrico.santus@bayer.com Regina Barzilay CSAIL, MIT regina@csail.mit.edu Yuan Cao Google Brain yuancao@google.com Abstract Most undeciphered lost languages exhibit two characteristics that pose significant decipher- ment challenges: (1) the scripts are not fully segmented into words; (2) the closest known language is not determined.

Efficient Content-Based Sparse Attention with Routing Transformers

Efficient Content-Based Sparse Attention with Routing Transformers Aurko Roy Mohammad Saffar Ashish Vaswani David Grangier Google Research {aurkor,msaffar,avaswani,grangier}@google.com Abstract Self-attention has recently been adopted for a wide range of sequence modeling problems. Despite its effectiveness, self-attention suffers from quadratic computation and memory re- quirements with respect to sequence length. Successful approaches to reduce this complex- ity focused on attending to local sliding win- dows or

Revisiting Multi-Domain Machine Translation

Revisiting Multi-Domain Machine Translation MinhQuang Pham† ‡, Josep Maria Crego†, Franc¸ois Yvon‡ ‡Universit´e Paris-Saclay, CNRS, LIMSI, 91400, Orsay, France francois.yvon@limsi.fr †SYSTRAN, 5 rue Feydeau, 75002 Paris, France {minhquang.pham,josep.crego}@systrangroup.com Abstract When building machine translation systems, one often needs to make the best out of hetero- geneous sets of parallel data in training, and to robustly handle inputs from unexpected domains in testing. This multi-domain scenario has

Reducing Confusion in Active Learning for Part-Of-Speech Tagging

Reducing Confusion in Active Learning for Part-Of-Speech Tagging Aditi Chaudhary1, Antonios Anastasopoulos2,∗, Zaid Sheikh1, Graham Neubig1 1Language Technologies Institute, Carnegie Mellon University 2Department of Computer Science, George Mason University {aschaudh,zsheikh,gneubig}@cs.cmu.edu antonis@gmu.edu Abstract Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost. This is now an es- sential tool for building low-resource syntactic analyzers such as part-of-speech (POS)

A Primer in BERTology: What We Know About How BERT Works

A Primer in BERTology: What We Know About How BERT Works Anna Rogers Center for Social Data Science University of Copenhagen arogers@sodas.ku.dk Olga Kovaleva Dept. of Computer Science University of Massachusetts Lowell okovalev@cs.uml.edu Anna Rumshisky Dept. of Computer Science University of Massachusetts Lowell arum@cs.uml.edu Abstract Transformer-based models have pushed state of the art in many areas of NLP, but our under- standing of what is