Documentation - Specialized Research AI at MIT

What topic do you need documentation on?

Unsupervised Bitext Mining and Translation

Unsupervised Bitext Mining and Translation via Self-Trained Contextual Embeddings Phillip Keung• Julian Salazar• Yichao Lu• Noah A. Smith†‡ •Amazon †University of Washington ‡Allen Institute for AI {keung,julsal,yichaolu}@amazon.com nasmith@cs.washington.edu Abstract We describe an unsupervised method to create pseudo-parallel corpora for machine trans- lation (MT) from unaligned text. We use mul- tilingual BERT to create source and target sentence embeddings for nearest-neighbor search and adapt the model

Improving Dialog Evaluation with a Multi-reference Adversarial

Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining Ananya B. Sai∗ and Akash Kumar Mohankumar∗ and Siddhartha Arora and Mitesh M. Khapra {ananya, miteshk}@cse.iitm.ac.in, {makashkumar99, sidarora1990}@gmail.com Robert-Bosch Centre for Data Science and Artificial Intelligence Indian Institute of Technology, Madras Abstract There is an increasing focus on model-based dialog evaluation metrics such as ADEM, RUBER, and the more recent BERT-based metrics. These

Best-First Beam Search

Best-First Beam Search Clara Meister(cid:2) Tim Vieira(cid:2) Ryan Cotterell∗, (cid:2) (cid:2)ETH Z¨urich (cid:2)Johns Hopkins University ∗University of Cambridge clara.meister@inf.ethz.ch tim.vieira@gmail.com ryan.cotterell@inf.ethz.ch Abstract Decoding for many NLP tasks requires an ef- fective heuristic algorithm for approximating exact search because the problem of searching the full output space is often intractable, or impractical in many settings. The default algo- rithm for this job is beam search—a pruned

Syntactic Structure Distillation Pretraining for Bidirectional Encoders

Syntactic Structure Distillation Pretraining for Bidirectional Encoders Adhiguna Kuncoro∗♠♦ Lingpeng Kong∗♠ Daniel Fried∗♣ Dani Yogatama♠ Laura Rimell♠ Chris Dyer♠ Phil Blunsom♠♦ ♠DeepMind, London, UK ♦Department of Computer Science, University of Oxford, UK ♣Computer Science Division, University of California, Berkeley, CA, USA {akuncoro,lingpenk,dyogatama,laurarimell,cdyer,pblunsom}@google.com dfried@cs.berkeley.edu Abstract Textual representation learners trained on large amounts of data have achieved notable success on downstream tasks; intriguingly, they have also performed

Interactive Text Ranking with Bayesian Optimization: A Case Study on

Interactive Text Ranking with Bayesian Optimization: A Case Study on Community QA and Summarization Edwin Simpson1,2 Yang Gao1,3 Iryna Gurevych1 1Ubiquitous Knowledge Processing Lab, Technische Universita¨t Darmstadt, https://www.informatik.tu-darmstadt.de/ 2Dept. of Computer Science, University of Bristol, edwin.simpson@bristol.ac.uk 3Dept. of Computer Science, Royal Holloway, University of London, yang.gao@rhul.ac.uk Abstract For many NLP applications, such as question answering and summarization, the goal is to select the best solution

Multilingual Denoising Pre-training for Neural Machine Translation

Multilingual Denoising Pre-training for Neural Machine Translation Yinhan Liu‡∗, Jiatao Gu†∗, Naman Goyal†∗, Xian Li†, Sergey Edunov†, Marjan Ghazvininejad†, Mike Lewis†, and Luke Zettlemoyer‡ †Facebook AI ‡Birch Technology †{jgu,naman,xianl,edunov,ghazvini,mikelewis,lsz}@fb.com ‡yinhan@birch.ai Abstract This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—a sequence-to-sequence denoising auto-encoder pre-trained on large-scale mono- lingual corpora in many

oLMpics-On What Language Model Pre-training Captures

oLMpics-On What Language Model Pre-training Captures Alon Talmor1,2 Yanai Elazar1,3 Yoav Goldberg1,3 Jonathan Berant1,2 1The Allen Institute for AI 2Tel-Aviv University 3Bar-Ilan University {alontalmor@mail,joberant@cs}.tau.ac.il {yanaiela,yoav.goldberg}@gmail.com Abstract Recent success of pre-trained language models (LMs) has spurred widespread interest in the language capabilities that they possess. How- ever, efforts to understand whether LM repre- sentations are useful for symbolic reasoning tasks have been limited and scattered. In

Synthesizing Parallel Data of User-Generated Texts

Synthesizing Parallel Data of User-Generated Texts with Zero-Shot Neural Machine Translation Benjamin Marie Atsushi Fujita National Institute of Information and Communications Technology 3-5 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0289, Japan {bmarie, atsushi.fujita}@nict.go.jp Abstract Neural machine translation (NMT) systems are usually trained on clean parallel data. They can perform very well for translating clean in-domain texts. However, as demonstrated by previous work, the translation quality signifi- cantly

Consistent Transcription and Translation of Speech

Consistent Transcription and Translation of Speech Matthias Sperber, Hendra Setiawan, Christian Gollan, Udhyakumar Nallasamy, Matthias Paulik Apple {sperber,hendra,cgollan,udhay,mpaulik}@apple.com Abstract The conventional paradigm in speech trans- lation starts with a speech recognition step to generate transcripts, followed by a translation step with the automatic transcripts as input. this To address various shortcomings of paradigm, recent work explores end-to-end trainable direct models that translate without transcribing. However,

Sketch-Driven Regular Expression Generation

Sketch-Driven Regular Expression Generation from Natural Language and Examples Xi Ye♦ Qiaochu Chen♦ Xinyu Wang♠ Isil Dillig♦ Greg Durrett♦ ♦Department of Computer Science, The University of Texas at Austin ♠Computer Science and Engineering Department, University of Michigan, Ann Arbor {xiye,qchen,isil,gdurrett}@cs.utexas.edu xwangsd@umich.edu Abstract Recent systems for converting natural lan- guage descriptions into regular expressions (regexes) have achieved some success, but typically deal with short, formulaic text

Beat the AI: Investigating Adversarial Human Annotation

Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension Max Bartolo Alastair Roberts Johannes Welbl Sebastian Riedel Pontus Stenetorp Department of Computer Science University College London {m.bartolo,a.roberts,j.welbl,s.riedel,p.stenetorp}@cs.ucl.ac.uk Abstract Innovations in annotation methodology have been a catalyst for Reading Comprehension (RC) datasets and models. One recent trend to challenge current RC models is to involve a model in the annotation process: Humans create questions adversarially,

An Empirical Study on Robustness to Spurious Correlations using

An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models Lifu Tu1 ∗ Garima Lalwani2 Spandana Gella2 He He3 ∗ 1Toyota Technological Institute at Chicago 2Amazon AI 3New York University lifu@ttic.edu, {glalwani, sgella}@amazon.com, hehe@cs.nyu.edu Abstract Recent work has shown that pre-trained lan- guage models such as BERT improve robust- ness to spurious correlations in the dataset. Intrigued by these results, we find that

Nested Named Entity Recognition via

Nested Named Entity Recognition via Second-best Sequence Learning and Decoding Takashi Shibuya† ∗ Eduard Hovy† †Carnegie Mellon University, Pittsburgh, PA 15213, U.S.A. ∗Sony Corporation, Tokyo 141-8610, Japan shibuyat@jp.sony.com hovy@cmu.edu Abstract When an entity name contains other names within it, the identification of all combinations of names can become difficult and expensive. We propose a new method to recognize not only outermost named entities but also

Task-Oriented Dialogue as Dataﬂow Synthesis

Task-Oriented Dialogue as Dataﬂow Synthesis David Hall Jacob Andreas John Bufe Jean Crawford Kate Crim Hao Fang Alan Guo Wendy Iwaszuk Smriti Jha Percy Liang Aleksandr Nisnevich Adam Pauls Subhro Roy Stephon Striplin Yu Su Izabela Witoszko Jason Wolfe Jesse Rusak Christopher H. Lin David Burkett Jordan DeLoach Charles Chen Leah Dorner Dan Klein Kristin Hayes Kellie Hill Jayant Krishnamurthy Ilya Lintsbakh Dmitrij Petters Brent

Modeling Global and Local Node Contexts

Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs Leonardo F. R. Ribeiro†, Yue Zhang‡, Claire Gardent§ and Iryna Gurevych† †Research Training Group AIPHES and UKP Lab, Technische Universit¨at Darmstadt ‡School of Engineering, Westlake University, §CNRS/LORIA, Nancy, France ribeiro@aiphes.tu-darmstadt.de, yue.zhang@wias.org.cn claire.gardent@loria.fr, gurevych@ukp.informatik.tu-darmstadt.de Abstract Recent graph-to-text models generate text from graph-based data using either global or local aggregation to learn node representa- tions.

What Does My QA Model Know?

What Does My QA Model Know? Devising Controlled Probes Using Expert Knowledge Kyle Richardson and Ashish Sabharwal Allen Institute for AI, Seattle, WA, USA {kyler,ashishs@allenai.org} Abstract Open-domain question answering (QA) involves many knowledge and reasoning challenges, but are successful QA models actually learning such knowledge when trained on benchmark QA tasks? We investigate this via several new diagnostic tasks probing whether multiple- choice QA models

AMR Similarity Metrics from Principles

AMR Similarity Metrics from Principles Juri Opitz and Letitia Parcalabescu and Anette Frank Department for Computational Linguistics Heidelberg University 69120 Heidelberg opitz,parcalabescu,frank @cl.uni-heidelberg.de } { Abstract Different metrics have been proposed to com- pare Abstract Meaning Representation (AMR) graphs. The canonical SMATCH metric (Cai and Knight, 2013) aligns the variables of two graphs and assesses triple matches. The recent SEMBLEU metric (Song and Gildea, 2019)

PERL: Pivot-based Domain Adaptation for Pre-trained Deep

PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models Eyal Ben-David∗ Carmel Rabinovitz∗ Roi Reichart Technion, Israel Institute of Technology {eyalbd12@campus.|carmelrab@campus.|roiri@}technion.ac.il Abstract Pivot-based neural representation models have led to significant progress in domain adapta- tion for NLP. However, previous research following this approach utilize only labeled data from the source domain and unlabeled data from the source and target domains, but neglect to incorporate