What topic do you need documentation on?
Why Does Surprisal From Larger Transformer-Based Language Models
Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times? Byung-Doh Oh Department of Linguistics The Ohio State University, USA oh.531@osu.edu William Schuler Department of Linguistics The Ohio State University, USA schuler.77@osu.edu Abstract This work presents a linguistic analysis into why larger Transformer-based pre-trained lan- guage models with more parameters and lower perplexity nonetheless yield surprisal estimates that are
Discontinuous Combinatory Constituency Parsing
Discontinuous Combinatory Constituency Parsing Zhousi Chen and Mamoru Komachi Faculty of Systems Design Tokyo Metropolitan University 6-6 Asahigaoka, Hino, Tokyo 191-0065, Japan {chen-zhousi@ed., komachi@}tmu.ac.jp Abstract We extend a pair of continuous combinator- based constituency parsers (one binary and one multi-branching) into a discontinuous pair. Our parsers iteratively compose con- stituent vectors from word embeddings with- out any grammar constraints. Their empirical complexities are subquadratic. Nostro
Coreference Resolution through a seq2seq Transition-Based System
Coreference Resolution through a seq2seq Transition-Based System Bernd Bohnet1, Chris Alberti2, Michael Collins2 1Google Research, The Netherlands 2Google Research, USA {bohnetbd,chrisalberti,mjcollins}@google.com Abstract Most recent coreference resolution systems use search algorithms over possible spans to identify mentions and resolve coreference. We instead present a coreference resolution sys- tem that uses a text-to-text (seq2seq) paradigm to predict mentions and links jointly. We im- plement the coreference system
FeelingBlue: A Corpus for Understanding
FeelingBlue: A Corpus for Understanding the Emotional Connotation of Color in Context Amith Ananthram1 and Olivia Winn1 and Smaranda Muresan1,2 1Department of Computer Science, Università della Columbia, USA 2Data Science Institute, Università della Columbia, USA {amith,olivia,smara}@cs.columbia.edu Abstract While the link between color and emotion has been widely studied, how context-based changes in color impact the intensity of per- ceived emotions is not well understood. In this work,
Domain-Specific Word Embeddings with Structure Prediction
Domain-Specific Word Embeddings with Structure Prediction David Lassner1,2∗ Stephanie Brandl1,2,3∗ Anne Baillot4 Shinichi Nakajima1,2,5 1TU Berlin, Germany 2BIFOLD, Germany 3University of Copenhagen, Denmark 4Le Mans Universit´e, France 5RIKEN Center for AIP, Japan {lassner@tu-berlin.de,brandl@di.ku.dk} ∗Authors contributed equally. Abstract Complementary to finding good general word embeddings, an important question for repre- sentation learning is to find dynamic word em- beddings, Per esempio, across time or domain. Current
Improving Low-Resource Cross-lingual Parsing
Improving Low-Resource Cross-lingual Parsing with Expected Statistic Regularization Thomas Effland Columbia University, USA teffland@cs.columbia.edu Michael Collins Google Research, USA mjcollins@google.com Abstract We present Expected Statistic Regulariza tion (ESR), a novel regularization technique that utilizes low-order multi-task structural sta- tistics to shape model distributions for semi- supervised learning on low-resource datasets. We study ESR in the context of cross-lingual transfer for syntactic analysis (POS tagging and
Locally Typical Sampling
Locally Typical Sampling Clara Meister1 Tiago Pimentel2 Gian Wiher1 Ryan Cotterell1,2 1ETH Z¨urich, Switzerland 2University of Cambridge, UK clara.meister@inf.ethz.ch tp472@cam.ac.uk gian.wiher@inf.ethz.ch ryan.cotterell@inf.ethz.ch Abstract Today’s probabilistic language generators fall short when it comes to producing coherent and fluent text despite the fact that the underlying models perform well under standard metrics (per esempio., perplexity). This discrepancy has puzzled the language generation community for the last few years.
Helpful Neighbors:
Helpful Neighbors: Leveraging Neighbors in Geographic Feature Pronunciation Llion Jones† Richard Sproat† Haruko Ishikawa† Alexander Gutkin‡ †Google Japan ‡Google UK {llion,rws,ishikawa,agutkin}@google.com Abstract If one sees the place name Houston Mer- cer Dog Run in New York, how does one know how to pronounce it? Assuming one knows that Houston in New York is pro- nounced and not like the Texas ), then one can probably
OPAL: Ontology-Aware Pretrained Language Model for End-to-End
OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue Zhi Chen1, Yuncong Liu1, Lu Chen1∗, Su Zhu2, Mengyue Wu1, Kai Yu1∗ 1X-LANCE Lab, Department of Computer Science and Engineering MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University State Key Lab of Media Convergence Production Technology and Systems, Beijing, China 2AISpeech Co., Ltd., Suzhou, China {zhenchi713, chenlusz, kai.yu}@sjtu.edu.cn Abstract This paper presents
Meta-Learning a Cross-lingual Manifold for Semantic Parsing
Meta-Learning a Cross-lingual Manifold for Semantic Parsing Tom Sherborne and Mirella Lapata Institute for Language, Cognition and Computation School of Informatics, Università di Edimburgo 10 Crichton Street, Edinburgh EH8 9AB, UK tom.sherborne@ed.ac.uk, mlap@inf.ed.ac.uk Abstract Localizing a semantic parser to support new languages requires effective cross-lingual gen- eralization. Recent work has found success with machine-translation or zero-shot meth- ods, although these approaches can struggle to model
On the Role of Negative Precedent in Legal Outcome Prediction
On the Role of Negative Precedent in Legal Outcome Prediction Josef Valvoda Ryan Cotterell Simone Teufel University of Cambridge, UK ETH Z¨urich, Svizzera {jv406,sht25}@cam.ac.uk ryan.cotterell@inf.ethz.ch Abstract Every legal case sets a precedent by develop- ing the law in one of the following two ways. It either expands its scope, in which case it sets positive precedent, or it narrows it, in which case it sets
FAITHDIAL: A Faithful Benchmark for Information-Seeking Dialogue
FAITHDIAL: A Faithful Benchmark for Information-Seeking Dialogue Nouha Dziri† ♦ § Ehsan Kamalloo† Mo Yu¶∗ Edoardo M. Ponti♣ Sivan Milton‡ Osmar Zaiane† § Siva Reddy♦ ‡ ‡McGill University, Canada †University of Alberta, Canada ♦Mila – Quebec AI Institute, Canada ¶WeChat AI, Tencent, USA ♣University of Edinburgh, UK §Alberta Machine Intelligence Institute (Amii), Canada dziri@cs.ualberta.ca l D o w n o a d e d f
Morphology Without Borders: Clause-Level Morphology
Morphology Without Borders: Clause-Level Morphology Omer Goldman Bar Ilan University, Israel omer.goldman@gmail.com Reut Tsarfaty Bar Ilan University, Israel reut.tsarfaty@biu.ac.il Abstract Morphological tasks use large multi-lingual datasets that organize words into inflection tables, which then serve as training and eval- uation data for various tasks. Tuttavia, a closer inspection of these data reveals pro- found cross-linguistic inconsistencies, which arise from the lack of a clear linguistic
The Emergence of Argument Structure in Artificial Languages
The Emergence of Argument Structure in Artificial Languages Tom Bosc Mila Universit´e de Montr´eal, Canada bosct@mila.quebec Pascal Vincent Meta AI, Mila Universit´e de Montr´eal, Canada CIFAR AI Chair vincentp@iro.umontreal.ca Abstract Computational approaches to the study of lan- guage emergence can help us understand how natural languages are shaped by cognitive and sociocultural factors. Previous work focused on tasks where agents refer to a single entity.
An End-to-End Contrastive Self-Supervised Learning Framework for
An End-to-End Contrastive Self-Supervised Learning Framework for Language Understanding Hongchao Fang, Pengtao Xie∗ University of California San Diego, USA p1xie@eng.ucsd.edu Abstract Self-supervised learning (SSL) methods such as Word2vec, BERT, and GPT have shown great effectiveness in language understanding. Contrastive learning, as a recent SSL approach, has attracted increasing attention in NLP. Contro- trastive learning learns data representations by predicting whether two augmented data in- stances
A Survey on Cross-Lingual Summarization
A Survey on Cross-Lingual Summarization Jiaan Wang1∗, Fandong Meng2†, Duo Zheng4, Yunlong Liang2 Zhixu Li3†, Jianfeng Qu1 and Jie Zhou2 1School of Computer Science and Technology, Soochow University, Suzhou, China 2Pattern Recognition Center, WeChat AI, Tencent Inc, China 3Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai, China 4Beijing University of Posts and Telecommunications, Beijing, China jawang1@stu.suda.edu.cn, {fandongmeng,yunlonliang,withtomzhou}@tencent.com zd@bupt.edu.cn, zhixuli@fudan.edu.cn, jfqu@suda.edu.cn
Neuron-level Interpretation of Deep NLP Models: A Survey
Neuron-level Interpretation of Deep NLP Models: A Survey Hassan Sajjad♣∗ Nadir Durrani♠∗ Fahim Dalvi♠∗ ♣Faculty of Computer Science, Dalhousie University, Canada† ♠Qatar Computing Research Institute, HBKU, Doha, Qatar hsajjad@dal.ca, {ndurrani, faimaduddin}@hbku.edu.qa Abstract The proliferation of Deep Neural Networks in various domains has seen an increased need for interpretability of these models. Prelimi- nary work done along this line, and papers that surveyed such, are focused
Template-based Abstractive Microblog Opinion Summarization
Template-based Abstractive Microblog Opinion Summarization Iman Munire Bilal1,4, Bo Wang2,4, Adam Tsakalidis3,4, Dong Nguyen5, Rob Procter1,4, Maria Liakata1,3,4 1Department of Computer Science, University of Warwick, UK 2Center for Precision Psychiatry, Massachusetts General Hospital, USA 3School of Electronic Engineering and Computer Science, Queen Mary University of London, UK 4The Alan Turing Institute, London, UK 5Department of Information and Computing Sciences, Utrecht University, The Netherlands {iman.bilal|rob.procter}@warwick.ac.uk bwang29@mgh.harvard.edu