What topic do you need documentation on?
Nurse is Closer to Woman than Surgeon? Mitigating Gender-Biased
Nurse is Closer to Woman than Surgeon? Mitigating Gender-Biased Proximities in Word Embeddings Vaibhav Kumar1∗ Tenzin Singhay Bhotia1∗ Vaibhav Kumar1∗ Tanmoy Chakraborty2 1Delhi Technological University, New Delhi, India 2IIIT-Delhi, India 1{kumar.vaibhav1o1, tenzinbhotia0, vaibhavk992}@gmail.com 2tanmoy@iiitd.ac.in Abstract Word embeddings are the standard model for semantic and syntactic representations of words. Unfortunately, these models have been shown to exhibit undesirable word associations result- ing from gender, racial, E
Topic Modeling in Embedding Spaces
Topic Modeling in Embedding Spaces Adji B. Dieng Columbia University New York, NY, USA abd2141@columbia.edu Francisco J. R. Ruiz∗ DeepMind London, UK franrruiz@google.com David M. Blei Columbia University New York, NY, USA david.blei@columbia.edu Abstract Topic modeling analyzes documents to learn meaningful patterns of words. Tuttavia, exist- ing topic models fail to learn interpretable topics when working with large and heavy- tailed vocabularies. A tal fine,
How Can We Know What Language Models Know?
How Can We Know What Language Models Know? Zhengbao Jiang1∗ Frank F. Xu1∗ Jun Araki2 Graham Neubig1 1Language Technologies Institute, Carnegie Mellon University 2Bosch Research North America {zhengbaj,fangzhex,gneubig}@cs.cmu.edu jun.araki@us.bosch.com Abstract Recent work has presented intriguing results examining the knowledge contained in lan- guage models (LMs) by having the LM fill in the blanks of prompts such as ‘‘Obama is a by profession’’. These prompts are
Consistent Unsupervised Estimators for Anchored PCFGs
Consistent Unsupervised Estimators for Anchored PCFGs Alexander Clark Department of Philosophy King’s College London alexsclark@gmail.com Nathana¨el Fijalkow CNRS, LaBRI, Bordeaux, and The Alan Turing Institute of Data Science, London nathanael.fijalkow@labri.fr Abstract Learning probabilistic context-free grammars (PCFGs) from strings is a classic problem in computational linguistics since Horning (1969). Here we present an algorithm based on distri- butional learning that is a consistent estimator for a
BLiMP: The Benchmark of Linguistic Minimal Pairs for English
BLiMP: The Benchmark of Linguistic Minimal Pairs for English Alex Warstadt1, Alicia Parrish1, Haokun Liu2, Anhad Mohananey2, Wei Peng2, Sheng-FuWang1, Samuel R. Bowman1,2,3 1Department of Linguistics New York University 2Department of Computer Science New York University 3Center for Data Science New York University {warstadt,alicia.v.parrish,haokunliu,anhad, weipeng,shengfu.wang,bowman}@nyu.edu Abstract We introduce The Benchmark of Linguistic Minimal Pairs (BLiMP),1 a challenge set for evaluating the linguistic knowledge of lan-
Hierarchical Mapping for Crosslingual Word Embedding Alignment
Hierarchical Mapping for Crosslingual Word Embedding Alignment Ion Madrazo Azpiazu and Maria Soledad Pera Department of Computer Science Boise State University {ionmadrazo,solepera}@boisestate.edu Abstract the pivot The alignment of word embedding spaces in different languages into a common crosslin- gual space has recently been in vogue. Strate- gies that do so compute pairwise alignments and then map multiple languages to a single pivot language (most often
Better Document-Level Machine Translation with Bayes’ Rule
Better Document-Level Machine Translation with Bayes’ Rule Lei Yu1, Laurent Sartran1, Wojciech Stokowiec1, Wang Ling1, Lingpeng Kong1, Phil Blunsom1,2, Chris Dyer1 1DeepMind, 2University of Oxford {leiyu, lsartran, wstokowiec, lingwang, lingpenk, pblunsom, cdyer}@google.com Abstract We show that Bayes’ rule provides an effective mechanism for creating document translation models that can be learned from only paral- lel sentences and monolingual documents—a compelling benefit because parallel documents are
Syntax-Guided Controlled Generation of Paraphrases
Syntax-Guided Controlled Generation of Paraphrases Ashutosh Kumar1 Kabir Ahuja2∗ Raghuram Vadapalli3∗ Partha Talukdar1 1Indian Institute of Science, Bangalore 2Microsoft Research, Bangalore 3Google, London ashutosh@iisc.ac.in, kabirahuja2431@gmail.com raghuram.4350@gmail.com, ppt@iisc.ac.in Abstract Given a sentence (per esempio., ‘‘I like mangoes’’) and a constraint (per esempio., sentiment flip), the goal of controlled text generation is to produce a sentence that adapts the input sentence to meet the requirements of the constraint (per esempio.,
TYDI QA: A Benchmark for Information-Seeking Question Answering
TYDI QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages Jonathan H. Clark␆␅ Eunsol Choi␅ Michael Collins␆ Dan Garrette␆ Tom Kwiatkowski␆ Vitaly Nikolaev␄␃ Jennimaria Palomaki␄␃ Google Research tydiqa@google.com Abstract Confidently making progress on multilingual modeling requires challenging, trustworthy evaluations. We present TYDI QA—a question answering dataset covering 11 typologically diverse languages with 204K question-answer pairs. The languages of TYDI QA are diverse with
Learning Lexical Subspaces in a Distributional Vector Space
Learning Lexical Subspaces in a Distributional Vector Space Kushal Arora∗ Aishik Chakraborty∗ Jackie C. K. Cheung School of Computer Science, McGill University Qu´ebec AI Instuite (Mila) {kushal.arora,aishik.chakraborty}@mail.mcgill.ca, jcheung@cs.mcgill.ca Abstract In this paper, we propose LEXSUB, a novel approach towards unifying lexical and dis- tributional semantics. We inject knowledge about lexical-semantic relations into distribu- tional word embeddings by defining subspaces of the distributional vector space in
How Furiously Can Colorless Green Ideas Sleep?
How Furiously Can Colorless Green Ideas Sleep? Sentence Acceptability in Context Jey Han Lau1,7 Carlos Armendariz2 Shalom Lappin2,3,4 Matthew Purver2,5 Chang Shu6,7 1The University of Melbourne 2Queen Mary University of London 3University of Gothenburg 4King’s College London 5Joˇzef Stefan Institute 6University of Nottingham Ningbo China 7DeepBrain jeyhan.lau@gmail.com, c.santosarmendariz@qmul.ac.uk shalom.lappin@gu.se, m.purver@qmul.ac.uk, scxcs1@nottingham.edu.cn Abstract We study the influence of context on sentence acceptability. First we compare the
CrossWOZ: A Large-Scale Chinese Cross-Domain
CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset Qi Zhu1, Kaili Huang2, Zheng Zhang1, Xiaoyan Zhu1, Minlie Huang1∗ 1Dept. of Computer Science and Technology, 1Institute for Artificial Intelligence, 1Beijing National Research Center for Information Science and Technology, 2Dept. of Industrial Engineering, Tsinghua University, Beijing, China {zhu-q18,hkl16,z-zhang15}@mails.tsinghua.edu.cn {zxy-dcs,aihuang}@tsinghua.edu.cn Abstract To advance multi-domain (cross-domain) dia- logue modeling as well as alleviate the short- age of Chinese task-oriented
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks Sascha Rothe Google Research rothe@google.com Shashi Narayan Google Research shashinarayan@google.com Aliaksei Severyn Google Research severyn@google.com Abstract Unsupervised pre-training of large neural mod- els has recently revolutionized Natural Language Processing. By warm-starting from the pub- licly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple benchmarks while saving significant amounts of compute time. So far the focus has
Unsupervised Discourse Constituency Parsing Using Viterbi EM
Unsupervised Discourse Constituency Parsing Using Viterbi EM Noriki Nishida and Hideki Nakayama Graduate School of Information Science and Technology The University of Tokyo {nishida,nakayama}@nlab.ci.i.u-tokyo.ac.jp Abstract In this paper, we introduce an unsupervised discourse constituency parsing algorithm. We use Viterbi EM with a margin-based crite- rion to train a span-based discourse parser in an unsupervised manner. We also propose initialization methods for Viterbi training of discourse
Machine Learning–Driven Language Assessment
Machine Learning–Driven Language Assessment Burr Settles and Geoffrey T. LaFlair Duolingo Pittsburgh, PA USA {burr,geoff}@duolingo.com Masato Hagiwara∗ Octanove Labs Seattle, WA USA masato@octanove.com Abstract We describe a method for rapidly creating lan- guage proficiency assessments, and provide experimental evidence that such tests can be valid, reliable, and secure. Our approach is the first to use machine learning and natural lan- guage processing to induce proficiency
Target-Guided Structured Attention Network for
Target-Guided Structured Attention Network for Target-Dependent Sentiment Analysis Ji Zhang Chengyao Chen Pengfei Liu Chao He Cane Wing-Ki Leung Wisers AI Lab, Wisers Information Limited, HKSAR, China {jasonzhang, stacychen, chaohe, caneleung}@wisers.com, ppfliu@gmail.com Abstract Target-dependent sentiment analysis (TDSA) aims to classify the sentiment of a text towards a given target. The major challenge of this task lies in modeling the semantic relatedness between a target and
Decoding Brain Activity Associated with Literal and Metaphoric Sentence
Decoding Brain Activity Associated with Literal and Metaphoric Sentence Comprehension Using Distributional Semantic Models Vesna G. Djokic† Jean Maillard‡ Luana Bulat‡ Ekaterina Shutova† †ILLC, University of Amsterdam, The Netherlands ‡Dept. of Computer Science & Tecnologia, University of Cambridge, United Kingdom vesna@imsquared.eu, jean@maillard.it, ltf24@cam.ac.uk, e.shutova@uva.nl Abstract Recent years have seen a growing interest within the natural language processing (PNL) community in evaluating the ability of seman-
Does Syntax Need to Grow on Trees?
Does Syntax Need to Grow on Trees? Sources of Hierarchical Inductive Bias in Sequence-to-Sequence Networks R. Thomas McCoy Department of Cognitive Science Johns Hopkins University tom.mccoy@jhu.edu Robert Frank Department of Linguistics Yale University robert.frank@yale.edu Tal Linzen Department of Cognitive Science Johns Hopkins University tal.linzen@jhu.edu Abstract Learners that are exposed to the same training data might generalize differently due to dif- fering inductive biases. In neural