Documentation - Specialized Research AI at MIT

What topic do you need documentation on?

There Once Was a Really Bad Poet, It Was Automated but

There Once Was a Really Bad Poet, It Was Automated but You Didn’t Know It Jianyou Wang1, Xiaoxuan Zhang1, Yuren Zhou2, Christopher Suh1, Cynthia Rudin1,2 Duke University {1Computer Science, 2Statistics} Department, United States jw542@duke.edu, zhangxiaoxuanaa@gmail.com yuren.zhou@duke.edu, csuh09@gmail.com, cynthia@cs.duke.edu Abstract three important pieces: Limerick generation exemplifies some of the most difficult challenges faced in poetry generation, as the poems must tell a story in only five

Context-aware Adversarial Training for Name Regularity Bias in

Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition Abbas Ghaddar, Philippe Langlais†, Ahmad Rashid, and Mehdi Rezagholizadeh Huawei Noah’s Ark Lab, Montreal Research Center, Canada †RALI/DIRO, Universit´e de Montr´eal, Canada abbas.ghaddar@huawei.com, felipe@iro.umontreal.ca ahmad.rashid@huawei.com, mehdi.rezagholizadeh@huawei.com Abstract In this work, we examine the ability of NER models to use contextual information when predicting the type of an ambiguous entity. We introduce NRB, a new

Dialogue State Tracking with Incremental Reasoning

Dialogue State Tracking with Incremental Reasoning Lizi Liao, Le Hong Long, Yunshan Ma, Wenqiang Lei, Tat-Seng Chua School of Computing National University of Singapore {liaolizi.llz, yunshan.ma, wenqianglei}@gmail.com lehonglong@u.nus.edu chuats@comp.nus.edu.sg Abstract Tracking dialogue states to better interpret user goals and feed downstream policy learning is a bottleneck in dialogue management. Common practice has been to treat it as a problem of classifying dialogue content into a

Characterizing English Variation across

Characterizing English Variation across Social Media Communities with BERT Li Lucy and David Bamman University of California, Berkeley {lucy3 li, dbamman}@berkeley.edu Abstract Much previous work characterizing language variation across Internet social groups has fo- cused on the types of words used by these groups. We extend this type of study by em- ploying BERT to characterize variation in the senses of words as well, analyzing

Optimizing over subsequences generates context-sensitive languages

Optimizing over subsequences generates context-sensitive languages Andrew Lamont University of Massachusetts Amherst alamont@linguist.umass.edu Abstract Phonological generalizations are finite-state. While Optimality Theory is a popular frame- work for modeling phonology, it is known to generate non-finite-state mappings and languages. This paper demonstrates that Opti- mality Theory is capable of generating non- context-free languages, contributing to the characterization of its generative capacity. This is achieved with minimal

Data-to-text Generation with Macro Planning

Data-to-text Generation with Macro Planning Ratish Puduppully and Mirella Lapata Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB r.puduppully@sms.ed.ac.uk mlap@inf.ed.ac.uk Abstract Recent approaches to data-to-text generation have adopted the very successful encoder- decoder architecture or variants thereof. These models generate text that is fluent (but often imprecise) and perform quite poorly at select- ing appropriate

Iterative Paraphrastic Augmentation with Discriminative Span Alignment

Iterative Paraphrastic Augmentation with Discriminative Span Alignment Ryan Culkin J. Edward Hu Elias Stengel-Eskin Guanghui Qin Benjamin Van Durme Johns Hopkins University {rculkin, edward.hu, elias, gqin, vandurme}@jhu.edu Abstract We introduce a novel paraphrastic augmenta- tion strategy based on sentence-level lexically constrained paraphrasing and discriminative span alignment. Our approach allows for the large-scale expansion of existing datasets or the rapid creation of new datasets using a

Neural OCR Post-Hoc Correction of Historical Corpora

Neural OCR Post-Hoc Correction of Historical Corpora Lijun Lyu1, Maria Koutraki1, Martin Krickl2, Besnik Fetahu1,3 1L3S Research Center, Leibniz University of Hannover / Hannover, Germany 2Austrian National Library / Vienna, Austria 3Amazon / Seattle, WA, USA lyu@L3S.de, koutraki@L3S.de, martin.krickl@onb.ac.at, besnikf@amazon.com Abstract Optical character recognition (OCR) is crucial for a deeper access to historical collections. OCR needs to account for orthographic vari- ations, typefaces, or language

A Computational Framework for Slang Generation

A Computational Framework for Slang Generation Zhewei Sun1, Richard Zemel1,2, Yang Xu1,2 1Department of Computer Science, University of Toronto, Toronto, Canada 2Vector Institute for Artificial Intelligence, Toronto, Canada {zheweisun, zemel, yangxu}@cs.toronto.edu Abstract lan- Slang is a common type of informal guage, but its flexible nature and paucity of data resources present challenges for existing natural language systems. We take an ini- tial step toward machine

Decontextualization: Making Sentences Stand-Alone

Decontextualization: Making Sentences Stand-Alone Eunsol Choi2∗, Jennimaria Palomaki1, Matthew Lamm1, Tom Kwiatkowski1, Dipanjan Das1, Michael Collins1 1Google Research 2Department of Computer Science, The University of Texas at Austin eunsol@cs.utexas.edu, {jpalomaki,mrlamm,tomkwiat,dipanjand,mjcollins}@google.com Abstract Models for question answering, dialogue agents, and summarization often interpret the mean- ing of a sentence in a rich context and use that meaning in a new context. Taking excerpts of text can be

An Error Analysis Framework for Shallow Surface Realization

An Error Analysis Framework for Shallow Surface Realization Anastasia Shimorina Yannick Parmentier Claire Gardent Universit´e de Lorraine, CNRS, LORIA, F-54000 Nancy, France {anastasia.shimorina, yannick.parmentier, claire.gardent}@loria.fr Abstract The metrics standardly used to evaluate Nat- ural Language Generation (NLG) models, such as BLEU or METEOR, fail to provide information on which linguistic factors impact performance. Focusing on Surface Realization (SR), the task of converting an unordered dependency

Parameter Space Factorization for Zero-Shot Learning

Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages Edoardo M. Pontiκ, Ivan Vuli´cκ, Ryan Cotterellκ, ζ, Marinela Parovi´cκ, Roi Reichartτ , Anna Korhonenκ κUniversity of Cambridge ζETH Z¨urich τ Technion, IIT κ{ep490,iv250,rdc42,mp939,alk23}@cam.ac.uk τ roiri@ie.technion.ac.il Abstract Most combinations of NLP tasks and lan- guage varieties lack in-domain examples for supervised training because of the paucity of annotated data. How can neural models make sample-efficient

SummEval: Re-evaluating Summarization Evaluation

SummEval: Re-evaluating Summarization Evaluation Alexander R. Fabbri†∗ Wojciech Kry´sci ´nski‡ ∗ Bryan McCann‡ Caiming Xiong‡ Richard Socher‡ Dragomir Radev† ‡ †Yale University ‡Salesforce Research {alexander.fabbri,dragomir.radev}@yale.edu, {kryscinski,cxiong}@salesforce.com richard@socher.org bryan.mccann.is@gmail.com Abstract The scarcity of comprehensive up-to-date studies on evaluation metrics for text summari- zation and the lack of consensus regarding evaluation protocols continue to inhibit pro- gress. We address the existing shortcomings of summarization evaluation methods along

Extractive Opinion Summarization in Quantized Transformer Spaces

Extractive Opinion Summarization in Quantized Transformer Spaces Stefanos Angelidis1 Reinald Kim Amplayo1 Yoshihiko Suhara2 Xiaolan Wang2 Mirella Lapata1 1University of Edinburgh 2Megagon Labs s.angelidis@ed.ac.uk, reinald.kim@ed.ac.uk yoshi@megagon.ai, xiaolan@megagon.ai mlap@inf.ed.ac.uk Abstract We present the Quantized Transformer (QT), an unsupervised system for extractive opi- nion summarization. QT is inspired by Vector- Quantized Variational Autoencoders, which we repurpose for popularity-driven summari- zation. It uses a clustering interpretation of the

Unsupervised Learning of KB Queries in Task-Oriented Dialogs

Unsupervised Learning of KB Queries in Task-Oriented Dialogs Dinesh Raghu∗1,2, Nikhil Gupta†3, and Mausam1 1IIT Delhi, New Delhi, India 2IBM Research, New Delhi, India 3LimeChat, Gurgaon, India diraghu1@in.ibm.com, nikhil@limechat.ai, mausam@cse.iitd.ac.in Abstract 1 Introduction Task-oriented dialog (TOD) systems often need to formulate knowledge base (KB) que- ries corresponding to the user intent and use the query results to generate system responses. Existing approaches require dialog datasets

Adaptive Semiparametric Language Models

Adaptive Semiparametric Language Models Dani Yogatama, Cyprien de Masson d’Autume, Lingpeng Kong DeepMind London, United Kingdom {dyogatama,cyprien,lingpenk}@google.com Abstract We present a language model that combines a large parametric neural network (i.e., a transformer) with a non-parametric episodic memory component in an integrated architec- ture. Our model uses extended short-term con- text by caching local hidden states—similar to transformer-XL—and global long-term memory by retrieving a set

Did Aristotle Use a Laptop?

Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies Mor Geva1,2, Daniel Khashabi2, Elad Segal1, Tushar Khot2, Dan Roth3, Jonathan Berant1,2 2Allen Institute for AI 3University of Pennsylvania 1Tel Aviv University morgeva@mail.tau.ac.il, {danielk,tushark}@allenai.org, elad.segal@gmail.com, danroth@seas.upenn.edu joberant@cs.tau.ac.il Abstract in this challenge A key limitation in current datasets for multi- hop reasoning is that the required steps for answering the question are mentioned

Sparse, Dense, and Attentional Representations for Text Retrieval

Sparse, Dense, and Attentional Representations for Text Retrieval Yi Luan∗, Jacob Eisenstein∗, Kristina Toutanova∗, Michael Collins luanyi, jeisenstein, kristout, mjcollins @google.com } { Google Research Abstract between Dual encoders perform retrieval by encoding documents and queries into dense low- dimensional vectors, scoring each document by its inner product with the query. We investigate the capacity of this architecture relative to sparse bag-of-words models and attentional