What topic do you need documentation on?
Improving Candidate Generation
Improving Candidate Generation for Low-resource Cross-lingual Entity Linking Shuyan Zhou, Shruti Rijhwani, John Wieting Jaime Carbonell, Graham Neubig Language Technologies Institute Carnegie Mellon University {shuyanzh,srijhwan,jwieting,jgc,gneubig}@cs.cmu.edu Abstract Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts. The first step of (X)EL is candidate generation, which retrieves a list of plausible candidate entities from
A Knowledge-Enhanced Pretraining Model
A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation Jian Guan1 Fei Huang1 Zhihao Zhao2 Xiaoyan Zhu1 Minlie Huang1∗ 1Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China 2School of Software, Beihang University, Beijing, China 1Institute for Artificial Intelligence, State Key Lab of Intelligent Technology and Systems 1Beijing National Research Center for Information Science and Technology j-guan19@mails.tsinghua.edu.cn,f-huang18@mails.tsinghua.edu.cn, extsuioku@gmail.com, zxy-dcs@tsinghua.edu.cn, aihuang@tsinghua.edu.cn Abstract Story generation, namely,
Attention-Passing Models for
Attention-Passing Models for Robust and Data-Efficient End-to-End Speech Translation Matthias Sperber1, Graham Neubig2, Jan Niehues1, Alex Waibel1,2 1Karlsruhe Institute of Technology, Germany 2Carnegie Mellon University, Etats-Unis {d'abord}.{last}@kit.edu, gneubig@cs.cmu.edu Abstract Speech translation has traditionally been ap- proached through cascaded models consisting of a speech recognizer trained on a corpus of transcribed speech, and a machine translation system trained on parallel texts. Several recent works have shown
A Graph-based Model for Joint Chinese Word Segmentation and
A Graph-based Model for Joint Chinese Word Segmentation and Dependency Parsing Hang Yan, Xipeng Qiu∗, Xuanjing Huang School of Computer Science, Fudan University, China Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Chine {hyan19, xpqiu, xjhuang}@fudan.edu.cn Abstract Chinese word segmentation and dependency parsing are two fundamental tasks for Chinese natural language processing. The dependency parsing is defined at the word-level. Therefore word segmentation is
SpanBERT: Improving Pre-training by Representing
SpanBERT: Improving Pre-training by Representing and Predicting Spans Mandar Joshi∗† Danqi Chen∗‡§ Yinhan Liu§ Daniel S. Weld†(cid:2) Luke Zettlemoyer†§ Omer Levy§ † Allen School of Computer Science & Engineering, University of Washington, Seattle, WA {mandar90,weld,lsz}@cs.washington.edu ‡ Computer Science Department, Princeton University, Princeton, NJ danqic@cs.princeton.edu (cid:2) Allen Institute of Artificial Intelligence, Seattle {danw}@allenai.org § Facebook AI Research, Seattle {danqi,yinhanliu,lsz,omerlevy}@fb.com Abstract We present SpanBERT, a pre-training method
Membership Inference Attacks on Sequence-to-Sequence Models:
Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System? Sorami Hisamoto∗ Works Applications s@89.io Kevin Duh Matt Post Johns Hopkins University {post,kevinduh}@cs.jhu.edu Abstract Data privacy is an important issue for ‘‘machine learning as a service’’ provid- ers. We focus on the problem of mem- bership inference attacks: Given a data sample and black-box access to a model’s API, determine whether
What BERT Is Not: Lessons from a New Suite of Psycholinguistic
What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models Allyson Ettinger Department of Linguistics University of Chicago aettinger@uchicago.edu Abstract Pre-training by language modeling has become a popular and successful approach to NLP tasks, but we have yet to understand exactly what linguistic capacities these pre-training processes confer upon models. In this paper we introduce a suite of diagnostics drawn
AMR-To-Text Generation with Graph Transformer
AMR-To-Text Generation with Graph Transformer Tianming Wang, Xiaojun Wan, Hanqi Jin Wangxuan Institute of Computer Technology, Peking University The MOE Key Laboratory of Computational Linguistics, Peking University {wangtm, wanxiaojun, jinhanqi}@pku.edu.cn Abstract Abstract meaning representation (AMR)-à- text generation is the challenging task of gener- ating natural language texts from AMR graphs, where nodes represent concepts and edges denote relations. The current state-of-the-art methods use graph-to-sequence models;
Paraphrase-Sense-Tagged Sentences
Paraphrase-Sense-Tagged Sentences Anne Cocos and Chris Callison-Burch Department of Computer and Information Science University of Pennsylvania odonnell.anne@gmail.com, ccb@cis.upenn.edu Abstract Many natural language processing tasks re- quire discriminating the particular meaning of a word in context, but building corpora for developing sense-aware models can be a chal- lenge. We present a large resource of example usages for words having a particular mean- ing, called Paraphrase-Sense-Tagged Sentences
Deep Contextualized Self-training for Low Resource Dependency Parsing
Deep Contextualized Self-training for Low Resource Dependency Parsing Guy Rotman and Roi Reichart Faculty of Industrial Engineering and Management, Technion, IIT grotman@campus.technion.ac.il roiri@ie.technion.ac.il Abstract Neural dependency parsing has proven very effective, achieving state-of-the-art results on numerous domains and languages. Malheureusement, it requires large amounts of labeled data, which is costly and laborious to create. In this paper we propose a self- training algorithm that alleviates
Insertion-based Decoding with Automatically Inferred Generation Order
Insertion-based Decoding with Automatically Inferred Generation Order Jiatao Gu†, Qi Liu(cid:2)∗, and Kyunghyun Cho‡† †Facebook AI Research (cid:2)University of Oxford ‡New York University, CIFAR Azrieli Global Scholar †{jgu, kyunghyuncho}@fb.com ‡qi.liu@st-hughs.ox.ac.uk Abstract Conventional neural autoregressive decoding commonly assumes a fixed left-to-right gener- ation order, which may be sub-optimal. In this work, we propose a novel decoding algorithm— InDIGO—which supports flexible sequence generation in arbitrary orders through
Efficient Contextual Representation Learning
Efficient Contextual Representation Learning With Continuous Outputs Liunian Harold Li†, Patrick H. Chen∗, Cho-Jui Hsieh∗, Kai-Wei Chang∗ †Peking University ∗University of California, Los Angeles liliunian@pku.edu.cn, patrickchen@g.ucla.edu {chohsieh, kwchang}@cs.ucla.edu Abstract Contextual representation models have achieved great success in improving various downstream natural language processing tasks. Cependant, these language-model-based encoders are dif- ficult to train due to their large parameter size and high computational complexity. By carefully
Massively Multilingual Sentence Embeddings for Zero-Shot
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond Mikel Artetxe University of the Basque Country (UPV/EHU)∗ mikel.artetxe@ehu.eus Holger Schwenk Facebook AI Research schwenk@fb.com Abstract We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts. Our system uses a single BiLSTM encoder with a shared byte-pair encoding vocabulary
Weakly Supervised Domain Detection
Weakly Supervised Domain Detection Yumo Xu and Mirella Lapata Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB yumo.xu@ed.ac.uk, mlap@inf.ed.ac.uk Abstract In this paper we introduce domain detection as a new natural language processing task. We argue that the ability to detect textual seg- ments that are domain-heavy (c'est à dire., sentences or phrases that are representative of
Morphological Analysis Using a Sequence Decoder
Morphological Analysis Using a Sequence Decoder Ekin Aky ¨urek∗ Erenay Dayanık∗ Deniz Yuret† Koc¸ University Artificial Intelligence Laboratory, ˙Istanbul, Turkey eakyurek13,edayanik16,dyuret@ku.edu.tr Abstract We introduce Morse, a recurrent encoder- decoder model that produces morphological analyses of each word in a sentence. The encoder turns the relevant information about the word and its context into a fixed size vector representation and the decoder gen- erates the sequence
Decomposing Generalization
Decomposing Generalization Models of Generic, Habitual, and Episodic Statements Venkata Govindarajan University of Rochester Benjamin Van Durme Johns Hopkins University Aaron Steven White University of Rochester Abstract We present a novel semantic framework for modeling linguistic expressions of generalization— generic, habitual, and episodic statements—as combinations of simple, real-valued referen- tial properties of predicates and their argu- ments. We use this framework to construct a dataset
Graph Convolutional Network with Sequential Attention for
Graph Convolutional Network with Sequential Attention for Goal-Oriented Dialogue Systems Suman Banerjee and Mitesh M. Khapra Department of Computer Science and Engineering, Robert Bosch Centre for Data Science and Artificial Intelligence (RBC-DSAI), Indian Institute of Technology Madras, India {suman, miteshk}@cse.iitm.ac.in Abstract Domain-specific goal-oriented dialogue sys- tems typically require modeling three types of inputs, namely, (je) the knowledge-base asso- ciated with the domain, (ii) the history
Tabula Nearly Rasa: Probing the Linguistic Knowledge of Character-level
Tabula Nearly Rasa: Probing the Linguistic Knowledge of Character-level Neural Language Models Trained on Unsegmented Text Michael Hahn∗ Stanford University mhahn2@stanford.edu Marco Baroni Facebook AI Research UPF Linguistics Department Catalan Institution for Research and Advanced Studies mbaroni@gmail.com Abstract (RNNs) have Recurrent neural networks reached striking performance in many natural language processing tasks. This has renewed interest in whether these generic sequence processing devices are inducing