Documentation - Recherche en IA spécialisée au MIT

What topic do you need documentation on?

Maintaining Common Ground in Dynamic Environments

Maintaining Common Ground in Dynamic Environments Takuma Udagawa1 and Akiko Aizawa1,2 The University of Tokyo, Tokyo, Japan1 National Institute of Informatics, Tokyo, Japan2 {takuma udagawa,aizawa}@nii.ac.jp Abstract Common grounding is the process of creat- ing and maintaining mutual understandings, which is a critical aspect of sophisticated human communication. While various task set- tings have been proposed in existing literature, they mostly focus on creating common ground

Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified

Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs Emanuele Bugliarello Ryan Cotterell Naoaki Okazaki Desmond Elliott University of Copenhagen University of Cambridge ETH Z¨urich Tokyo Institute of Technology emanuele@di.ku.dk, rcotterell@inf.ethz.ch, okazaki@c.titech.ac.jp, de@di.ku.dk Abstract Large-scale pretraining and task-specific fine- tuning is now the standard methodology for many tasks in computer vision and natural language processing. Recently, a multitude of methods have been

How Can We Know When Language Models Know?

How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering Zhengbao Jiang†, Jun Araki‡, Haibo Ding‡, Graham Neubig† †Languages Technologies Institute, Carnegie Mellon University, United States ‡Bosch Research, États-Unis {zhengbaj,gneubig}@cs.cmu.edu {jun.araki,haibo.ding}@us.bosch.com Abstract Recent works have shown that language mod- le (LM) capture different types of knowledge regarding facts or common sense. Cependant, because no model is perfect,

Unsupervised Abstractive Opinion Summarization

Unsupervised Abstractive Opinion Summarization by Generating Sentences with Tree-Structured Topic Guidance Masaru Isonuma1 Junichiro Mori1,2 Danushka Bollegala3 Ichiro Sakata1 1The University of Tokyo, Japan 2 RIKEN, Japan 3 University of Liverpool, United Kingdom isonuma@ipr-ctr.t.u-tokyo.ac.jp mori@mi.u-tokyo.ac.jp danushka@liverpool.ac.uk isakata@ipr-ctr.t.u-tokyo.ac.jp Abstract This paper presents a novel unsupervised abstractive summarization method for opin- ionated texts. While the basic variational autoencoder-based models assume a unimodal Gaussian prior for the latent

Relevance-guided Supervision for OpenQA with ColBERT

Relevance-guided Supervision for OpenQA with ColBERT Omar Khattab Stanford University, United States okhattab@stanford.edu Christopher Potts Stanford University, United States cgpotts@stanford.edu Matei Zaharia Stanford University, United States matei@cs.stanford.edu Abstract Systems for Open-Domain Question An- swering (OpenQA) generally depend on a retriever for finding candidate passages in a large corpus and a reader for extracting answers from those passages. In much recent work, the retriever is a

Neural Modeling for Named Entities and Morphology (NEMO2)

Neural Modeling for Named Entities and Morphology (NEMO2) Dan Bareket1,2 and Reut Tsarfaty1 1Bar Ilan University, Ramat-Gan, Israel 2Open Media and Information Lab (OMILab), The Open University of Israel, Israel dbareket@gmail.com, reut.tsarfaty@biu.ac.il Abstract Named Entity Recognition (NER) is a funda- mental NLP task, commonly formulated as classification over a sequence of tokens. Mor- phologically rich languages (MRLs) pose a challenge to this basic formulation, comme

Sensitivity as a Complexity Measure for Sequence Classification Tasks

Sensitivity as a Complexity Measure for Sequence Classification Tasks Michael Hahn Stanford University, United States mhahn2@stanford.edu Dan Jurafsky Stanford University, United States jurafsky@stanford.edu Richard Futrell University of California, Irvine, United States rfutrell@uci.edu Abstract We introduce a theoretical framework for understanding and predicting the complexity of sequence classification tasks, using a novel extension of the theory of Boolean function sensitivity. The sensitivity of a function, given

Neural Event Semantics for Grounded Language Understanding

Neural Event Semantics for Grounded Language Understanding Shyamal Buch Li Fei-Fei Noah D. Homme bon {shyamal,feifeili}@cs.stanford.edu ngoodman@stanford.edu Stanford University, United States Abstract We present a new conjunctivist framework, neural event semantics (NES), for composi- tional grounded language understanding. Our approach treats all words as classifiers that compose to form a sentence meaning by mul- tiplying output scores. These classifiers apply to spatial regions (events) and NES

Gender Bias in Machine Translation

Gender Bias in Machine Translation Beatrice Savoldi1,2, Marco Gaido1,2, Luisa Bentivogli2, Matteo Negri2, Marco Turchi2 1University of Trento, Italy 2Fondazione Bruno Kessler, Italy {bsavoldi,mgaido,bentivo,negri,turchi}@fbk.eu Abstract Machine translation (MT) technology has fa- cilitated our daily tasks by providing acces- sible shortcuts for gathering, traitement, and communicating information. Cependant, it can suffer from biases that harm users and society at large. As a relatively new field of

Let’s Play Mono-Poly: BERT Can Reveal Words’ Polysemy Level

Let’s Play Mono-Poly: BERT Can Reveal Words’ Polysemy Level and Partitionability into Senses Aina Gar´ı Soler Universit´e Paris-Saclay CNRS, LISN 91400, Orsay, France aina.gari@limsi.fr Marianna Apidianaki Department of Digital Humanities University of Helsinki Helsinki, Finland marianna.apidianaki@helsinki.fi Abstract Pre-trained language models (LMs) encode rich information about linguistic structure but their knowledge about lexical polysemy remains unclear. We propose a novel exper- imental setup for analyzing this

SOLOIST: Building Task Bots at Scale with

SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine Teaching Baolin Peng, Chunyuan Li, Jinchao Li Shahin Shayandeh, Lars Liden, Jianfeng Gao Microsoft Research, Redmond, États-Unis {bapeng,chunyl,jincli,shahins,lars.liden,jfgao}@microsoft.com Abstract We present a new method, SOLOIST,1 that uses transfer learning and machine teaching to build task bots at scale. We parameterize classical modular task-oriented dialog systems using a Transformer-based auto-regressive language model, which subsumes

Classifying Argumentative Relations

Classifying Argumentative Relations Using Logical Mechanisms and Argumentation Schemes Yohan Jo1 Seojin Bang1 Chris Reed2 Eduard Hovy1 1School of Computer Science, Carnegie Mellon University, United States 2Centre for Argument Technology, University of Dundee, United Kingdom 1{yohanj,seojinb,ehovy}@andrew.cmu.edu, 2c.a.reed@dundee.ac.kr Abstract While argument mining has achieved sig- nificant success in classifying argumentative relations between statements (support, attack, and neutral), we have a limited computa- tional understanding of logical

Strong Equivalence of TAG and CCG

Strong Equivalence of TAG and CCG Lena Katharina Schiffer and Andreas Maletti Faculty of Mathematics and Computer Science, Universit¨at Leipzig, Germany P.O. Box 100 920, D-04009 Leipzig, Allemagne {schiffer,maletti}@informatik.uni-leipzig.de Abstract Tree-adjoining grammar (TAG) and combina- tory categorial grammar (CCG) are two well- established mildly context-sensitive grammar formalisms that are known to have the same expressive power on strings (c'est à dire., generate the same class of string

Revisiting Few-shot Relation Classification:

Revisiting Few-shot Relation Classification: Evaluation Data and Classification Schemes Ofer Sabo1 Yanai Elazar1,2 Yoav Goldberg1,2 Ido Dagan1 1Computer Science Department, Bar Ilan University, Israel 2Allen Institute for Artificial Intelligence {ofersabo,yanaiela,yoav.goldberg,ido.k.dagan}@gmail.com Abstract We explore few-shot learning (FSL) for re- lation classification (RC). Focusing on the realistic scenario of FSL, in which a test instance might not belong to any of the target categories (none-of-the-above, [NOTA]), nous

Efficient Computation of Expectations under Spanning Tree Distributions

Efficient Computation of Expectations under Spanning Tree Distributions Ran Zmigrod , University of Cambridge, United Kingdom Tim Vieira , Ryan Cotterell , Université Johns Hopkins, United States ETH Z¨urich, United Kingdom rz279@cam.ac.uk tim.f.vieira@gmail.com ryan.cotterell@inf.ethz.ch Abstract We give a general framework for inference in spanning tree models. We propose unified al- gorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree

Pretraining the Noisy Channel Model for Task-Oriented Dialogue

Pretraining the Noisy Channel Model for Task-Oriented Dialogue Qi Liu2∗, Lei Yu1, Laura Rimell1, and Phil Blunsom1,2 1DeepMind, United Kingdom 2University of Oxford, United Kingdom qi.liu@cs.ox.ac.uk {leiyu,laurarimell,pblunsom}@google.com Abstract Direct decoding for task-oriented dialogue is known to suffer from the explaining-away ef- fect, manifested in models that prefer short and generic responses. Here we argue for the use of Bayes’ theorem to factorize the dialogue task

Self-supervised Regularization for Text Classification

Self-supervised Regularization for Text Classification Meng Zhou∗ Shanghai Jiao Tong University, China Zechen Li∗ Northeastern University, United States Pengtao Xie† UC San Diego, United States p1xie@eng.ucsd.edu zhoumeng9904@sjtu.edu.cn li.zec@northeastern.edu Abstract Text classification is a widely studied problem and has broad applications. In many real-world problems, the number of texts for training classification models is limited, which renders these models prone to overfitting. To address this problem,

Evaluating Document Coherence Modeling

Evaluating Document Coherence Modeling Aili Shen♣, Meladel Mistica♣, Bahar Salehi♣, Hang Li♦, Timothy Baldwin♣, Jianzhong Qi♣ ♣ The University of Melbourne, Australia ♦ AI Lab at ByteDance, Chine {aili.shen, misticam, tbaldwin, jianzhong.qi}@unimelb.edu.au baharsalehi@gmail.com, lihang.lh@bytedance.com Abstract While pretrained language models (LMs) have driven impressive gains over morpho- syntactic and semantic tasks, their ability to model discourse and pragmatic phenomena is less clear. As a step towards