联系客服
客服二维码

联系客服获取更多资料

微信号:LingLab1

客服电话:010-82185409

意见反馈
关注我们
关注公众号

关注公众号

linglab语言实验室

回到顶部
自然语言处理方向

6564 阅读 2020-10-09 09:17:11 上传

以下文章来源于 英语语言学

今日 cs.CL方向共计108篇文章。

知识图谱(1篇)


[1]:Joint Semantics and Data-Driven Path Representation for Knowledge Graph  Inference
标题:知识图推理的联合语义与数据驱动路径表示
作者:Guanglin Niu, Bo Li, Yongfei Zhang, Yongpan Sheng, Chuan Shi, Jingyang Li, Shiliang Pu
备注:12 pages, 6 tables, 4 figures
链接:https://arxiv.org/abs/2010.02602

摘要:Inference on a large-scale knowledge graph (KG) is of great importance for KG applications like question answering. The path-based reasoning models can leverage much information over paths other than pure triples in the KG, which face several challenges: all the existing path-based methods are data-driven, lacking explainability for path representation. Besides, some methods either consider only relational paths or ignore the heterogeneity between entities and relations both contained in paths, which cannot capture the rich semantics of paths well. To address the above challenges, in this work, we propose a novel joint semantics and data-driven path representation that balances explainability and generalization in the framework of KG embedding. More specifically, we inject horn rules to obtain the condensed paths by the transparent and explainable path composition procedure. The entity converter is designed to transform the entities along paths into the representations in the semantic level similar to relations for reducing the heterogeneity between entities and relations, in which the KGs both with and without type information are considered. Our proposed model is evaluated on two classes of tasks: link prediction and path query answering task. The experimental results show that it has a significant performance gain over several different state-of-the-art baselines.


文本朗读(1篇)


[1]:The Sequence-to-Sequence Baseline for the Voice Conversion Challenge  2020: Cascading ASR and TTS
标题:2020年语音转换挑战赛的序列对序列基线:级联ASR和TTS
作者:Wen-Chin Huang, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda
备注:Accepted to the ISCA Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020
链接:https://arxiv.org/abs/2010.02434

摘要:This paper presents the sequence-to-sequence (seq2seq) baseline system for the voice conversion challenge (VCC) 2020. We consider a naive approach for voice conversion (VC), which is to first transcribe the input speech with an automatic speech recognition (ASR) model, followed using the transcriptions to generate the voice of the target with a text-to-speech (TTS) model. We revisit this method under a sequence-to-sequence (seq2seq) framework by utilizing ESPnet, an open-source end-to-end speech processing toolkit, and the many well-configured pretrained models provided by the community. Official evaluation results show that our system comes out top among the participating systems in terms of conversion similarity, demonstrating the promising ability of seq2seq models to convert speaker identity. The implementation is made open-source at:this https URL.


语音识别(1篇)


[1]:Fine-Grained Grounding for Multimodal Speech Recognition
标题:多模式语音识别的细粒度基础
作者:Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott
备注:Accepted to Findings of EMNLP 2020
链接:https://arxiv.org/abs/2010.02384

摘要:Multimodal automatic speech recognition systems integrate information from images to improve speech recognition quality, by grounding the speech in the visual context. While visual signals have been shown to be useful for recovering entities that have been masked in the audio, these models should be capable of recovering a broader range of word types. Existing systems rely on global visual features that represent the entire image, but localizing the relevant regions of the image will make it possible to recover a larger set of words, such as adjectives and verbs. In this paper, we propose a model that uses finer-grained visual information from different parts of the image, using automatic object proposals. In experiments on the Flickr8K Audio Captions Corpus, we find that our model improves over approaches that use global visual features, that the proposals enable the model to recover entities and other related words, such as adjectives, and that improvements are due to the model's ability to localize the correct proposals.


词性标注(1篇)


[1]:Position-Aware Tagging for Aspect Sentiment Triplet Extraction
标题:面向情感三元组提取的位置感知标记
作者:Lu Xu, Hao Li, Wei Lu, Lidong Bing
备注:15 pages, 10 figures, accepted by EMNLP 2020
链接:https://arxiv.org/abs/2010.02609

摘要:Aspect Sentiment Triplet Extraction (ASTE) is the task of extracting the triplets of target entities, their associated sentiment, and opinion spans explaining the reason for the sentiment. Existing research efforts mostly solve this problem using pipeline approaches, which break the triplet extraction process into several stages. Our observation is that the three elements within a triplet are highly related to each other, and this motivates us to build a joint model to extract such triplets using a sequence tagging approach. However, how to effectively design a tagging approach to extract the triplets that can capture the rich interactions among the elements is a challenging research question. In this work, we propose the first end-to-end model with a novel position-aware tagging scheme that is capable of jointly extracting the triplets. Our experimental results on several existing datasets show that jointly capturing elements in the triplet using our approach leads to improved performance over the existing approaches. We also conducted extensive experiments to investigate the model effectiveness and robustness.


推理分析(4篇)


[1]:PRover: Proof Generation for Interpretable Reasoning over Rules
标题:PRover:规则上可解释推理的证明生成
作者:Swarnadeep Saha, Sayan Ghosh, Shashank Srivastava, Mohit Bansal
备注:EMNLP 2020 (15 pages)
链接:https://arxiv.org/abs/2010.02830

摘要:Recent work by Clark et al. (2020) shows that transformers can act as 'soft theorem provers' by answering questions over explicitly provided knowledge in natural language. In our work, we take a step closer to emulating formal theorem provers, by proposing PROVER, an interpretable transformer-based model that jointly answers binary questions over rule-bases and generates the corresponding proofs. Our model learns to predict nodes and edges corresponding to proof graphs in an efficient constrained training paradigm. During inference, a valid proof, satisfying a set of global constraints is generated. We conduct experiments on synthetic, hand-authored, and human-paraphrased rule-bases to show promising results for QA and proof generation, with strong generalization performance. First, PROVER generates proofs with an accuracy of 87%, while retaining or improving performance on the QA task, compared to RuleTakers (up to 6% improvement on zero-shot evaluation). Second, when trained on questions requiring lower depths of reasoning, it generalizes significantly better to higher depths (up to 15% improvement). Third, PROVER obtains near perfect QA accuracy of 98% using only 40% of the training data. However, generating proofs for questions requiring higher depths of reasoning becomes challenging, and the accuracy drops to 65% for 'depth 5', indicating significant scope for future work. Our code and models are publicly available atthis https URL



[2]:An Exploration of Arbitrary-Order Sequence Labeling via Energy-Based  Inference Networks
标题:基于任意阶序列的能量标号推理
作者:Lifu Tu, Tianyu Liu, Kevin Gimpel
备注:EMNLP 2020. The first two authors contributed equally
链接:https://arxiv.org/abs/2010.02789

摘要:Many tasks in natural language processing involve predicting structured outputs, e.g., sequence labeling, semantic role labeling, parsing, and machine translation. Researchers are increasingly applying deep representation learning to these problems, but the structured component of these approaches is usually quite simplistic. In this work, we propose several high-order energy terms to capture complex dependencies among labels in sequence labeling, including several that consider the entire label sequence. We use neural parameterizations for these energy terms, drawing from convolutional, recurrent, and self-attention networks. We use the framework of learning energy-based inference networks (Tu and Gimpel, 2018) for dealing with the difficulties of training and inference with such models. We empirically demonstrate that this approach achieves substantial improvement using a variety of high-order energy terms on four sequence labeling tasks, while having the same decoding speed as simple, local classifiers. We also find high-order energies to help in noisy data conditions.



[3]:Efficient Inference For Neural Machine Translation
标题:神经机器翻译的有效推理
作者:Yi-Te Hsu, Sarthak Garg, Yi-Hsiu Liao, Ilya Chatsviorkin
链接:https://arxiv.org/abs/2010.02416

摘要:Large Transformer models have achieved state-of-the-art results in neural machine translation and have become standard in the field. In this work, we look for the optimal combination of known techniques to optimize inference speed without sacrificing translation quality. We conduct an empirical study that stacks various approaches and demonstrates that combination of replacing decoder self-attention with simplified recurrent units, adopting a deep encoder and a shallow decoder architecture and multi-head attention pruning can achieve up to 109% and 84% speedup on CPU and GPU respectively and reduce the number of parameters by 25% while maintaining the same translation quality in terms of BLEU.



[4]:Inference Strategies for Machine Translation with Conditional Masking
标题:基于条件掩蔽的机器翻译推理策略
作者:Julia Kreutzer, George Foster, Colin Cherry
备注:EMNLP 2020
链接:https://arxiv.org/abs/2010.02352

摘要:Conditional masked language model (CMLM) training has proven successful for non-autoregressive and semi-autoregressive sequence generation tasks, such as machine translation. Given a trained CMLM, however, it is not clear what the best inference strategy is. We formulate masked inference as a factorization of conditional probabilities of partial sequences, show that this does not harm performance, and investigate a number of simple heuristics motivated by this perspective. We identify a thresholding strategy that has advantages over the standard "mask-predict" algorithm, and provide analyses of its behavior on machine translation tasks.


自然语言生成(11篇)


[1]:Keep CALM and Explore: Language Models for Action Generation in  Text-based Games
标题:保持冷静和探索:文本游戏中动作生成的语言模型
作者:Shunyu Yao, Rohan Rao, Matthew Hausknecht, Karthik Narasimhan
备注:EMNLP 2020
链接:https://arxiv.org/abs/2010.02903

摘要:Text-based games present a unique challenge for autonomous agents to operate in natural language and handle enormous action spaces. In this paper, we propose the Contextual Action Language Model (CALM) to generate a compact set of action candidates at each game state. Our key insight is to train language models on human gameplay, where people demonstrate linguistic priors and a general game sense for promising actions conditioned on game history. We combine CALM with a reinforcement learning agent which re-ranks the generated action candidates to maximize in-game rewards. We evaluate our approach using the Jericho benchmark, on games unseen by CALM during training. Our method obtains a 69% relative improvement in average game score over the previous state-of-the-art model. Surprisingly, on half of these games, CALM is competitive with or better than other models that have access to ground truth admissible actions. Code and data are available atthis https URL.



[2]:COD3S: Diverse Generation with Discrete Semantic Signatures
标题:COD3S:具有离散语义特征的多样生成
作者:Nathaniel Weir, João Sedoc, Benjamin Van Durme
备注:EMNLP2020 preprint
链接:https://arxiv.org/abs/2010.02882

摘要:We present COD3S, a novel method for generating semantically diverse sentences using neural sequence-to-sequence (seq2seq) models. Conditioned on an input, seq2seq models typically produce semantically and syntactically homogeneous sets of sentences and thus perform poorly on one-to-many sequence generation tasks. Our two-stage approach improves output diversity by conditioning generation on locality-sensitive hash (LSH)-based semantic sentence codes whose Hamming distances highly correlate with human judgments of semantic textual similarity. Though it is generally applicable, we apply COD3S to causal generation, the task of predicting a proposition's plausible causes or effects. We demonstrate through automatic and human evaluation that responses produced using our method exhibit improved diversity without degrading task performance.



[3]:Neural Mask Generator: Learning to Generate Adaptive Word Maskings for  Language Model Adaptation
标题:神经掩码生成器:学习生成语言模型自适应的自适应字掩码
作者:Minki Kang, Moonsu Han, Sung Ju Hwang
备注:19 pages, 9 figures, EMNLP 2020
链接:https://arxiv.org/abs/2010.02705

摘要:We propose a method to automatically generate a domain- and task-adaptive maskings of the given text for self-supervised pre-training, such that we can effectively adapt the language model to a particular target task (e.g. question answering). Specifically, we present a novel reinforcement learning-based framework which learns the masking policy, such that using the generated masks for further pre-training of the target language model helps improve task performance on unseen texts. We use off-policy actor-critic with entropy regularization and experience replay for reinforcement learning, and propose a Transformer-based policy network that can consider the relative importance of words in a given text. We validate our Neural Mask Generator (NMG) on several question answering and text classification datasets using BERT and DistilBERT as the language models, on which it outperforms rule-based masking strategies, by automatically learning optimal adaptive maskings.



[4]:Incorporating Behavioral Hypotheses for Query Generation
标题:结合行为假设生成查询
作者:Ruey-Cheng Chen, Chia-Jung Lee
备注:EMNLP 2020 short paper, 6 pages
链接:https://arxiv.org/abs/2010.02667

摘要:Generative neural networks have been shown effective on query suggestion. Commonly posed as a conditional generation problem, the task aims to leverage earlier inputs from users in a search session to predict queries that they will likely issue at a later time. User inputs come in various forms such as querying and clicking, each of which can imply different semantic signals channeled through the corresponding behavioral patterns. This paper induces these behavioral biases as hypotheses for query generation, where a generic encoder-decoder Transformer framework is presented to aggregate arbitrary hypotheses of choice. Our experimental results show that the proposed approach leads to significant improvements on top-$k$ word error rate and Bert F1 Score compared to a recent BART model.



[5]:StyleDGPT: Stylized Response Generation with Pre-trained Language Models
标题:StyleDGPT:使用预先训练的语言模型生成样式化的响应
作者:Ze Yang, Wei Wu, Can Xu, Xinnian Liang, Jiaqi Bai, Liran Wang, Wei Wang, Zhoujun Li
备注:Findings of EMNLP2020
链接:https://arxiv.org/abs/2010.02569

摘要:Generating responses following a desired style has great potentials to extend applications of open-domain dialogue systems, yet is refrained by lacking of parallel data for training. In this work, we explore the challenging task with pre-trained language models that have brought breakthrough to various natural language tasks. To this end, we introduce a KL loss and a style classifier to the fine-tuning step in order to steer response generation towards the target style in both a word-level and a sentence-level. Comprehensive empirical studies with two public datasets indicate that our model can significantly outperform state-of-the-art methods in terms of both style consistency and contextual coherence.



[6]:Investigating African-American Vernacular English in Transformer-Based  Text Generation
标题:基于变换器的非裔美国人白话英语文本生成研究
作者:Sophie Groenwold, Lily Ou, Aesha Parekh, Samhita Honnavalli, Sharon Levy, Diba Mirza, William Yang Wang
备注:7 pages, EMNLP 2020
链接:https://arxiv.org/abs/2010.02510

摘要:The growth of social media has encouraged the written use of African American Vernacular English (AAVE), which has traditionally been used only in oral contexts. However, NLP models have historically been developed using dominant English varieties, such as Standard American English (SAE), due to text corpora availability. We investigate the performance of GPT-2 on AAVE text by creating a dataset of intent-equivalent parallel AAVE/SAE tweet pairs, thereby isolating syntactic structure and AAVE- or SAE-specific language for each pair. We evaluate each sample and its GPT-2 generated text with pretrained sentiment classifiers and find that while AAVE text results in more classifications of negative sentiment than SAE, the use of GPT-2 generally increases occurrences of positive sentiment for both. Additionally, we conduct human evaluation of AAVE and SAE text generated with GPT-2 to compare contextual rigor and overall quality.



[7]:GRUEN for Evaluating Linguistic Quality of Generated Text
标题:生成文本语言质量评价的GRUEN
作者:Wanzheng Zhu, Suma Bhat
链接:https://arxiv.org/abs/2010.02498

摘要:Automatic evaluation metrics are indispensable for evaluating generated text. To date, these metrics have focused almost exclusively on the content selection aspect of the system output, ignoring the linguistic quality aspect altogether. We bridge this gap by proposing GRUEN for evaluating Grammaticality, non-Redundancy, focUs, structure and coherENce of generated text. GRUEN utilizes a BERT-based model and a class of syntactic, semantic, and contextual features to examine the system output. Unlike most existing evaluation metrics which require human references as an input, GRUEN is reference-less and requires only the system output. Besides, it has the advantage of being unsupervised, deterministic, and adaptable to various tasks. Experiments on seven datasets over four language generation tasks show that the proposed metric correlates highly with human judgments.



[8]:CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial  Text Generation
标题:CAT-Gen:通过控制对抗性文本生成提高NLP模型的健壮性
作者:Tianlu Wang, Xuezhi Wang, Yao Qin, Ben Packer, Kang Li, Jilin Chen, Alex Beutel, Ed Chi
备注:6 pages, accepted to EMNLP 2020
链接:https://arxiv.org/abs/2010.02338

摘要:NLP models are shown to suffer from robustness issues, i.e., a model's prediction can be easily changed under small perturbations to the input. In this work, we present a Controlled Adversarial Text Generation (CAT-Gen) model that, given an input text, generates adversarial texts through controllable attributes that are known to be invariant to task labels. For example, in order to attack a model for sentiment classification over product reviews, we can use the product categories as the controllable attribute which would not change the sentiment of the reviews. Experiments on real-world NLP datasets demonstrate that our method can generate more diverse and fluent adversarial texts, compared to many existing adversarial text generation approaches. We further use our generated adversarial examples to improve models through adversarial training, and we demonstrate that our generated attacks are more robust against model re-training and different model architectures.



[9]:KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation
标题:KGPT:基于知识的数据到文本生成的预培训
作者:Wenhu Chen, Yu Su, Xifeng Yan, William Yang Wang
备注:Accepted to Main Conference of EMNLP 2020 as Long Paper
链接:https://arxiv.org/abs/2010.02307

摘要:Data-to-text generation has recently attracted substantial interests due to its wide applications. Existing methods have shown impressive performance on an array of tasks. However, they rely on a significant amount of labeled data for each task, which is costly to acquire and thus limits their application to new tasks and domains. In this paper, we propose to leverage pre-training and transfer learning to address this issue. We propose a knowledge-grounded pre-training (KGPT), which consists of two parts, 1) a general knowledge-grounded generation model to generate knowledge-enriched text. 2) a pre-training paradigm on a massive knowledge-grounded text corpus crawled from the web. The pre-trained model can be fine-tuned on various data-to-text generation tasks to generate task-specific text. We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness. Under the fully-supervised setting, our model can achieve remarkable gains over the known baselines. Under zero-shot setting, our model without seeing any examples achieves over 30 ROUGE-L on WebNLG while all other baselines fail. Under the few-shot setting, our model only needs about one-fifteenth as many labeled examples to achieve the same level of performance as baseline models. These experiments consistently prove the strong generalization ability of our proposed frameworkthis https URL.



[10]:PAIR: Planning and Iterative Refinement in Pre-trained Transformers for  Long Text Generation
标题:PAIR:用于长文本生成的预训练变压器中的规划和迭代优化
作者:Xinyu Hua, Lu Wang
备注:Accepted at EMNLP 2020 as a long paper
链接:https://arxiv.org/abs/2010.02301

摘要:Pre-trained Transformers have enabled impressive breakthroughs in generating long and fluent text, yet their outputs are often "rambling" without coherently arranged content. In this work, we present a novel content-controlled text generation framework, PAIR, with planning and iterative refinement, which is built upon a large model, BART. We first adapt the BERT model to automatically construct the content plans, consisting of keyphrase assignments and their corresponding sentence-level positions. The BART model is employed for generation without modifying its structure. We then propose a refinement algorithm to gradually enhance the generation quality within the sequence-to-sequence framework. Evaluation with automatic metrics shows that adding planning consistently improves the generation quality on three distinct domains, with an average of 20 BLEU points and 12 METEOR points improvements. In addition, human judges rate our system outputs to be more relevant and coherent than comparisons without planning.



[11]:Acrostic Poem Generation
标题:绝唱诗一代
作者:Rajat Agarwal, Katharina Kann
备注:EMNLP 2020
链接:https://arxiv.org/abs/2010.02239

摘要:We propose a new task in the area of computational creativity: acrostic poem generation in English. Acrostic poems are poems that contain a hidden message; typically, the first letter of each line spells out a word or short phrase. We define the task as a generation task with multiple constraints: given an input word, 1) the initial letters of each line should spell out the provided word, 2) the poem's semantics should also relate to it, and 3) the poem should conform to a rhyming scheme. We further provide a baseline model for the task, which consists of a conditional neural language model in combination with a neural rhyming model. Since no dedicated datasets for acrostic poem generation exist, we create training data for our task by first training a separate topic prediction model on a small set of topic-annotated poems and then predicting topics for additional poems. Our experiments show that the acrostic poems generated by our baseline are received well by humans and do not lose much quality due to the additional constraints. Last, we confirm that poems generated by our model are indeed closely related to the provided prompts, and that pretraining on Wikipedia can boost performance.


文本分类(3篇)


[1]:Aspect Sentiment Classification with Aspect-Specific Opinion Spans
标题:观点方面有具体的分类
作者:Lu Xu, Lidong Bing, Wei Lu, Fei Huang
备注:EMNLP 2020
链接:https://arxiv.org/abs/2010.02696

摘要:Aspect based sentiment analysis, predicting sentiment polarity of given aspects, has drawn extensive attention. Previous attention-based models emphasize using aspect semantics to help extract opinion features for classification. However, these works are either not able to capture opinion spans as a whole, or not able to capture variable-length opinion spans. In this paper, we present a neat and effective structured attention model by aggregating multiple linear-chain CRFs. Such a design allows the model to extract aspect-specific opinion spans and then evaluate sentiment polarity by exploiting the extracted opinion features. The experimental results on four datasets demonstrate the effectiveness of the proposed model, and our analysis demonstrates that our model can capture aspect-specific opinion spans.



[2]:Cross-Lingual Text Classification with Minimal Resources by Transferring  a Sparse Teacher
标题:用最少资源转移稀疏教师的跨语言文本分类
作者:Giannis Karamanolakis, Daniel Hsu, Luis Gravano
备注:Accepted to Findings of EMNLP 2020 (Long Paper)
链接:https://arxiv.org/abs/2010.02562

摘要:Cross-lingual text classification alleviates the need for manually labeled documents in a target language by leveraging labeled documents from other languages. Existing approaches for transferring supervision across languages require expensive cross-lingual resources, such as parallel corpora, while less expensive cross-lingual representation learning approaches train classifiers without target labeled documents. In this work, we propose a cross-lingual teacher-student method, CLTS, that generates "weak" supervision in the target language using minimal cross-lingual resources, in the form of a small number of word translations. Given a limited translation budget, CLTS extracts and transfers only the most important task-specific seed words across languages and initializes a teacher classifier based on the translated seed words. Then, CLTS iteratively trains a more powerful student that also exploits the context of the seed words in unlabeled target documents and outperforms the teacher. CLTS is simple and surprisingly effective in 18 diverse languages: by transferring just 20 seed words, even a bag-of-words logistic regression student outperforms state-of-the-art cross-lingual methods (e.g., based on multilingual BERT). Moreover, CLTS can accommodate any type of student classifier: leveraging a monolingual BERT student leads to further improvements and outperforms even more expensive approaches by up to 12% in accuracy. Finally, CLTS addresses emerging tasks in low-resource languages using just a small number of word translations.



[3]:Identifying Spurious Correlations for Robust Text Classification
标题:鲁棒文本分类中伪相关的识别
作者:Zhao Wang, Aron Culotta
备注:Findings of EMNLP-2020
链接:https://arxiv.org/abs/2010.02458

摘要:The predictions of text classifiers are often driven by spurious correlations -- e.g., the term `Spielberg' correlates with positively reviewed movies, even though the term itself does not semantically convey a positive sentiment. In this paper, we propose a method to distinguish spurious and genuine correlations in text classification. We treat this as a supervised classification problem, using features derived from treatment effect estimators to distinguish spurious correlations from "genuine" ones. Due to the generic nature of these features and their small dimensionality, we find that the approach works well even with limited training examples, and that it is possible to transport the word classifier to new domains. Experiments on four datasets (sentiment classification and toxicity detection) suggest that using this approach to inform feature selection also leads to more robust classification, as measured by improved worst-case accuracy on the samples affected by spurious correlations.


信息抽取(2篇)


[1]:Extracting Implicitly Asserted Propositions in Argumentation
标题:论证中隐含命题的提取
作者:Yohan Jo, Jacky Visser, Chris Reed, Eduard Hovy
备注:EMNLP 2020
链接:https://arxiv.org/abs/2010.02654

摘要:Argumentation accommodates various rhetorical devices, such as questions, reported speech, and imperatives. These rhetorical tools usually assert argumentatively relevant propositions rather implicitly, so understanding their true meaning is key to understanding certain arguments properly. However, most argument mining systems and computational linguistics research have paid little attention to implicitly asserted propositions in argumentation. In this paper, we examine a wide range of computational methods for extracting propositions that are implicitly asserted in questions, reported speech, and imperatives in argumentation. By evaluating the models on a corpus of 2016 U.S. presidential debates and online commentary, we demonstrate the effectiveness and limitations of the computational models. Our study may inform future research on argument mining and the semantics of these rhetorical devices in argumentation.



[2]:MedFilter: Extracting Task-relevant Utterances from Medical Dialogue  through Integration of Discourse Structure and Ontological Knowledge
标题:MedFilter:整合语篇结构和本体论知识从医学对话中提取任务相关话语
作者:Sopan Khosla, Shikhar Vashishth, Jill Fain Lehman, Carolyn Rose
备注:Accepted as Long Paper to EMNLP 2020
链接:https://arxiv.org/abs/2010.02246

摘要:Information extraction from conversational data is particularly challenging because the task-centric nature of conversation allows for effective communication of implicit information by humans, but is challenging for machines. The challenges may differ between utterances depending on the role of the speaker within the conversation, especially when relevant expertise is distributed asymmetrically across roles. Further, the challenges may also increase over the conversation as more shared context is built up through information communicated implicitly earlier in the dialogue. In this paper, we propose the novel modeling approach MedFilter, which addresses these insights in order to increase performance at identifying and categorizing task-relevant utterances, and in so doing, positively impacts performance at a downstream information extraction task. We evaluate this approach on a corpus of nearly 7,000 doctor-patient conversations where MedFilter is used to identify medically relevant contributions to the discussion (achieving a 10% improvement over SOTA baselines in terms of area under the PR curve). Identifying task-relevant utterances benefits downstream medical processing, achieving improvements of 15%, 105%, and 23% respectively for the extraction of symptoms, medications, and complaints.


问答系统(4篇)


[1]:QADiscourse -- Discourse Relations as QA Pairs: Representation,  Crowdsourcing and Baselines
标题:作为问答对的话语关系:表征、众包与基线
作者:Valentina Pyatkin, Ayal Klein, Reut Tsarfaty, Ido Dagan
备注:To appear at EMNLP 2020
链接:https://arxiv.org/abs/2010.02815

摘要:Discourse relations describe how two propositions relate to one another, and identifying them automatically is an integral part of natural language understanding. However, annotating discourse relations typically requires expert annotators. Recently, different semantic aspects of a sentence have been represented and crowd-sourced via question-and-answer (QA) pairs. This paper proposes a novel representation of discourse relations as QA pairs, which in turn allows us to crowd-source wide-coverage data annotated with discourse relations, via an intuitively appealing interface for composing such questions and answers. Based on our proposed representation, we collect a novel and wide-coverage QADiscourse dataset, and present baseline algorithms for predicting QADiscourse relations.



[2]:Context Modeling with Evidence Filter for Multiple Choice Question  Answering
标题:基于证据过滤的多选问答上下文建模
作者:Sicheng Yu, Hao Zhang, Wei Jing, Jing Jiang
备注:6 pages, 2 figures
链接:https://arxiv.org/abs/2010.02649

摘要:Multiple-Choice Question Answering (MCQA) is a challenging task in machine reading comprehension. The main challenge in MCQA is to extract "evidence" from the given context that supports the correct answer. In the OpenbookQA dataset, the requirement of extracting "evidence" is particularly important due to the mutual independence of sentences in the context. Existing work tackles this problem by annotated evidence or distant supervision with rules which overly rely on human efforts. To address the challenge, we propose a simple yet effective approach termed evidence filtering to model the relationships between the encoded contexts with respect to different options collectively and to potentially highlight the evidence sentences and filter out unrelated sentences. In addition to the effective reduction of human efforts of our approach compared, through extensive experiments on OpenbookQA, we show that the proposed approach outperforms the models that use the same backbone and more training data; and our parameter analysis also demonstrates the interpretability of our approach.



[3]:DaNetQA: a yes/no Question Answering Dataset for the Russian Language
标题:DaNetQA:俄语是/否问答数据集
作者:Taisia Glushkova, Alexey Machnev, Alena Fenogenova, Tatiana Shavrina, Ekaterina Artemova, Dmitry I. Ignatov
备注:Analysis of Images, Social Networks and Texts - 9 th International Conference, AIST 2020, Skolkovo, Russia, October 15-16, 2020, Revised Selected Papers. Lecture Notes in Computer Science (this https URL), Springer 2020
链接:https://arxiv.org/abs/2010.02605

摘要:DaNetQA, a new question-answering corpus, follows (Clark et. al, 2019) design: it comprises natural yes/no questions. Each question is paired with a paragraph from Wikipedia and an answer, derived from the paragraph. The task is to take both the question and a paragraph as input and come up with a yes/no answer, i.e. to produce a binary output. In this paper, we present a reproducible approach to DaNetQA creation and investigate transfer learning methods for task and language transferring. For task transferring we leverage three similar sentence modelling tasks: 1) a corpus of paraphrases, Paraphraser, 2) an NLI task, for which we use the Russian part of XNLI, 3) another question answering task, SberQUAD. For language transferring we use English to Russian translation together with multilingual language fine-tuning.



[4]:Finding the Evidence: Localization-aware Answer Prediction for Text  Visual Question Answering
标题:寻找证据:文本视觉问答的定位感知答案预测
作者:Wei Han, Hantao Huang, Tao Han
备注:Accepted in COLING2020
链接:https://arxiv.org/abs/2010.02582

摘要:Image text carries essential information to understand the scene and perform reasoning. Text-based visual question answering (text VQA) task focuses on visual questions that require reading text in images. Existing text VQA systems generate an answer by selecting from optical character recognition (OCR) texts or a fixed vocabulary. Positional information of text is underused and there is a lack of evidence for the generated answer. As such, this paper proposes a localization-aware answer prediction network (LaAP-Net) to address this challenge. Our LaAP-Net not only generates the answer to the question but also predicts a bounding box as evidence of the generated answer. Moreover, a context-enriched OCR representation (COR) for multimodal fusion is proposed to facilitate the localization task. Our proposed LaAP-Net outperforms existing approaches on three benchmark datasets for the text VQA task by a noticeable margin.


机器翻译(6篇)


[1]:On the Sparsity of Neural Machine Translation Models
标题:神经机器翻译模型的稀疏性
作者:Yong Wang, Longyue Wang, Victor O.K. Li, Zhaopeng Tu
备注:EMNLP 2020
链接:https://arxiv.org/abs/2010.02646

摘要:Modern neural machine translation (NMT) models employ a large number of parameters, which leads to serious over-parameterization and typically causes the underutilization of computational resources. In response to this problem, we empirically investigate whether the redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different datasets and NMT architectures. We show that: 1) the pruned parameters can be rejuvenated to improve the baseline model by up to +0.8 BLEU points; 2) the rejuvenated parameters are reallocated to enhance the ability of modeling low-level lexical information.



[2]:Data Rejuvenation: Exploiting Inactive Training Examples for Neural  Machine Translation
标题:数据恢复:利用非活动训练实例进行神经机器翻译
作者:Wenxiang Jiao, Xing Wang, Shilin He, Irwin King, Michael R. Lyu, Zhaopeng Tu
备注:Accepted to EMNLP 2020 main conference, 12 pages
链接:https://arxiv.org/abs/2010.02552

摘要:Large-scale training datasets lie at the core of the recent success of neural machine translation (NMT) models. However, the complex patterns and potential noises in the large-scale data make training NMT models difficult. In this work, we explore to identify the inactive training examples which contribute less to the model performance, and show that the existence of inactive examples depends on the data distribution. We further introduce data rejuvenation to improve the training of NMT models on large-scale datasets by exploiting inactive examples. The proposed framework consists of three phases. First, we train an identification model on the original training data, and use it to distinguish inactive examples and active examples by their sentence-level output probabilities. Then, we train a rejuvenation model on the active examples, which is used to re-label the inactive examples with forward-translation. Finally, the rejuvenated examples and the active examples are combined to train the final NMT model. Experimental results on WMT14 English-German and English-French datasets show that the proposed data rejuvenation consistently and significantly improves performance for several strong NMT models. Extensive analyses reveal that our approach stabilizes and accelerates the training process of NMT models, resulting in final models with better generalization capability.



[3]:Multi-task Learning for Multilingual Neural Machine Translation
标题:多任务神经网络机器翻译的多任务学习
作者:Yiren Wang, ChengXiang Zhai, Hany Hassan Awadalla
备注:EMNLP 2020
链接:https://arxiv.org/abs/2010.02523

摘要:While monolingual data has been shown to be useful in improving bilingual neural machine translation (NMT), effectively and efficiently leveraging monolingual data for Multilingual NMT (MNMT) systems is a less explored area. In this work, we propose a multi-task learning (MTL) framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data. We conduct extensive empirical studies on MNMT systems with 10 language pairs from WMT datasets. We show that the proposed approach can effectively improve the translation quality for both high-resource and low-resource languages with large margin, achieving significantly better results than the individual bilingual models. We also demonstrate the efficacy of the proposed approach in the zero-shot setup for language pairs without bitext training data. Furthermore, we show the effectiveness of MTL over pre-training approaches for both NMT and cross-lingual transfer learning NLU tasks; the proposed approach outperforms massive scale models trained on single task.



[4]:Iterative Domain-Repaired Back-Translation
标题:迭代域修复反翻译
作者:Hao-Ran Wei, Zhirui Zhang, Boxing Chen, Weihua Luo
备注:EMNLP 2020 long paper
链接:https://arxiv.org/abs/2010.02473

摘要:In this paper, we focus on the domain-specific translation with low resources, where in-domain parallel corpora are scarce or nonexistent. One common and effective strategy for this case is exploiting in-domain monolingual data with the back-translation method. However, the synthetic parallel data is very noisy because they are generated by imperfect out-of-domain systems, resulting in the poor performance of domain adaptation. To address this issue, we propose a novel iterative domain-repaired back-translation framework, which introduces the Domain-Repair (DR) model to refine translations in synthetic bilingual data. To this end, we construct corresponding data for the DR model training by round-trip translating the monolingual sentences, and then design the unified training framework to optimize paired DR and NMT models jointly. Experiments on adapting NMT models between specific domains and from the general domain to specific domains demonstrate the effectiveness of our proposed approach, achieving 15.79 and 4.47 BLEU improvements on average over unadapted models and back-translation.



[5]:Participatory Research for Low-resourced Machine Translation: A Case  Study in African Languages
标题:低资源机器翻译的参与式研究:以非洲语言为例
作者:Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Tajudeen Kolawole, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddee Hassan Muhammad, Salomon Kabongo, Salomey Osei, Sackey Freshia, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa, Mofe Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Jane Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer, Jason Webster, Jamiil Toure Ali, Jade Abbott, Iroro Orife, Ignatius Ezeani, Idris Abdulkabir Dangana, Herman Kamper, Hady Elsahar, Goodness Duru, Ghollah Kioko, Espoir Murhabazi, Elan van Biljon, Daniel Whitenack, Christopher Onyefuluchi, Chris Emezue, Bonaventure Dossou, Blessing Sibanda, Blessing Itoro Bassey, Ayodele Olabiyi, Arshath Ramkilowan
备注:Findings of EMNLP 2020
链接:https://arxiv.org/abs/2010.02353

摘要:Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released underthis https URL.



[6]:We Don't Speak the Same Language: Interpreting Polarization through  Machine Translation
标题:我们说的语言不一样:通过机器翻译来解释两极分化
作者:Ashiqur R. KhudaBukhsh, Rupak Sarkar, Mark S. Kamlet, Tom M. Mitchell
链接:https://arxiv.org/abs/2010.02339

摘要:Polarization among US political parties, media and elites is a widely studied topic. Prominent lines of prior research across multiple disciplines have observed and analyzed growing polarization in social media. In this paper, we present a new methodology that offers a fresh perspective on interpreting polarization through the lens of machine translation. With a novel proposition that two sub-communities are speaking in two different \emph{languages}, we demonstrate that modern machine translation methods can provide a simple yet powerful and interpretable framework to understand the differences between two (or more) large-scale social media discussion data sets at the granularity of words. Via a substantial corpus of 86.6 million comments by 6.5 million users on over 200,000 news videos hosted by YouTube channels of four prominent US news networks, we demonstrate that simple word-level and phrase-level translation pairs can reveal deep insights into the current political divide -- what is \emph{black lives matter} to one can be \emph{all lives matter} to the other.


自动摘要(3篇)


[1]:Stepwise Extractive Summarization and Planning with Structured  Transformers
标题:基于结构化变换器的逐步抽取摘要与规划
作者:Shashi Narayan, Joshua Maynez, Jakub Adamek, Daniele Pighin, Blaž Bratanič, Ryan McDonald
备注:17 pages, EMNLP 2020
链接:https://arxiv.org/abs/2010.02744

摘要:We propose encoder-centric stepwise models for extractive summarization using structured transformers -- HiBERT and Extended Transformers. We enable stepwise summarization by injecting the previously generated summary into the structured transformer as an auxiliary sub-structure. Our models are not only efficient in modeling the structure of long inputs, but they also do not rely on task-specific redundancy-aware modeling, making them a general purpose extractive content planner for different tasks. When evaluated on CNN/DailyMail extractive summarization, stepwise models achieve state-of-the-art performance in terms of Rouge without any redundancy aware modeling or sentence filtering. This also holds true for Rotowire table-to-text generation, where our models surpass previously reported metrics for content selection, planning and ordering, highlighting the strength of stepwise modeling. Amongst the two structured transformers we test, stepwise Extended Transformers provides the best performance across both datasets and sets a new standard for these challenges.



[2]:SupMMD: A Sentence Importance Model for Extractive Summarization using  Maximum Mean Discrepancy
标题:SupMMD:一个基于最大均方差的抽取式摘要的句子重要性模型
作者:Umanga Bista, Alexander Patrick Mathews, Aditya Krishna Menon, Lexing Xie
备注:15 pages
链接:https://arxiv.org/abs/2010.02568

摘要:Most work on multi-document summarization has focused on generic summarization of information present in each individual document set. However, the under-explored setting of update summarization, where the goal is to identify the new information present in each set, is of equal practical interest (e.g., presenting readers with updates on an evolving news topic). In this work, we present SupMMD, a novel technique for generic and update summarization based on the maximum mean discrepancy from kernel two-sample testing. SupMMD combines both supervised learning for salience and unsupervised learning for coverage and diversity. Further, we adapt multiple kernel learning to make use of similarity across multiple information sources (e.g., text features and knowledge based concepts). We show the efficacy of SupMMD in both generic and update summarization tasks by meeting or exceeding the current state-of-the-art on the DUC-2004 and TAC-2009 datasets.



[3]:Multi-Fact Correction in Abstractive Text Summarization
标题:摘要摘要中的多事实校正
作者:Yue Dong, Shuohang Wang, Zhe Gan, Yu Cheng, Jackie Chi Kit Cheung, Jingjing Liu
备注:12 pages, accepted at EMNLP2020
链接:https://arxiv.org/abs/2010.02443

摘要:Pre-trained neural abstractive summarization systems have dominated extractive strategies on news summarization performance, at least in terms of ROUGE. However, system-generated abstractive summaries often face the pitfall of factual inconsistency: generating incorrect facts with respect to the source text. To address this challenge, we propose Span-Fact, a suite of two factual correction models that leverages knowledge learned from question answering models to make corrections in system-generated summaries via span selection. Our models employ single or multi-masking strategies to either iteratively or auto-regressively replace entities in order to ensure semantic consistency w.r.t. the source text, while retaining the syntactic structure of summaries generated by abstractive summarization models. Experiments show that our models significantly boost the factual consistency of system-generated summaries without sacrificing summary quality in terms of both automatic metrics and human evaluation.


文字蕴涵(1篇)


[1]:Universal Natural Language Processing with Limited Annotations: Try  Few-shot Textual Entailment as a Start
标题:作为一些通用注释:从有限的文本处理开始
作者:Wenpeng Yin, Nazneen Fatema Rajani, Dragomir Radev, Richard Socher, Caiming Xiong
备注:EMNLP2020 Long, camera-ready
链接:https://arxiv.org/abs/2010.02584

摘要:A standard way to address different NLP problems is by first constructing a problem-specific dataset, then building a model to fit this dataset. To build the ultimate artificial intelligence, we desire a single machine that can handle diverse new problems, for which task-specific annotations are limited. We bring up textual entailment as a unified solver for such NLP problems. However, current research of textual entailment has not spilled much ink on the following questions: (i) How well does a pretrained textual entailment system generalize across domains with only a handful of domain-specific examples? and (ii) When is it worth transforming an NLP task into textual entailment? We argue that the transforming is unnecessary if we can obtain rich annotations for this task. Textual entailment really matters particularly when the target NLP task has insufficient annotations.
Universal NLP can be probably achieved through different routines. In this work, we introduce Universal Few-shot textual Entailment (UFO-Entail). We demonstrate that this framework enables a pretrained entailment model to work well on new entailment domains in a few-shot setting, and show its effectiveness as a unified solver for several downstream NLP tasks such as question answering and coreference resolution when the end-task annotations are limited. Code:this https URL


词义消歧(1篇)


[1]:A Novel Challenge Set for Hebrew Morphological Disambiguation and  Diacritics Restoration
标题:希伯来语词形消歧和变音符号恢复的新挑战集
作者:Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Moshe Koppel, Reut Tsarfaty
链接:https://arxiv.org/abs/2010.02864

摘要:One of the primary tasks of morphological parsers is the disambiguation of homographs. Particularly difficult are cases of unbalanced ambiguity, where one of the possible analyses is far more frequent than the others. In such cases, there may not exist sufficient examples of the minority analyses in order to properly evaluate performance, nor to train effective classifiers. In this paper we address the issue of unbalanced morphological ambiguities in Hebrew. We offer a challenge set for Hebrew homographs -- the first of its kind -- containing substantial attestation of each analysis of 21 Hebrew homographs. We show that the current SOTA of Hebrew disambiguation performs poorly on cases of unbalanced ambiguity. Leveraging our new dataset, we achieve a new state-of-the-art for all 21 words, improving the overall average F1 score from 0.67 to 0.95. Our resulting annotated datasets are made publicly available for further research.


情感分析(3篇)


[1]:A Multi-Task Incremental Learning Framework with Category Name Embedding  for Aspect-Category Sentiment Analysis
标题:面向类别情感分析的类别名称嵌入多任务增量学习框架
作者:Zehui Dai, Cheng Peng, Huajie Chen, Yadong Ding
备注:EMNLP 2020 camera ready
链接:https://arxiv.org/abs/2010.02784

摘要:(T)ACSA tasks, including aspect-category sentiment analysis (ACSA) and targeted aspect-category sentiment analysis (TACSA), aims at identifying sentiment polarity on predefined categories. Incremental learning on new categories is necessary for (T)ACSA real applications. Though current multi-task learning models achieve good performance in (T)ACSA tasks, they suffer from catastrophic forgetting problems in (T)ACSA incremental learning tasks. In this paper, to make multi-task learning feasible for incremental learning, we proposed Category Name Embedding network (CNE-net). We set both encoder and decoder shared among all categories to weaken the catastrophic forgetting problem. Besides the origin input sentence, we applied another input feature, i.e., category name, for task discrimination. Our model achieved state-of-the-art on two (T)ACSA benchmark datasets. Furthermore, we proposed a dataset for (T)ACSA incremental learning and achieved the best performance compared with other strong baselines.



[2]:Multi-Instance Multi-Label Learning Networks for Aspect-Category  Sentiment Analysis
标题:面向类别情感分析的多实例多标签学习网络
作者:Yuncong Li, Cunxiang Yin, Sheng-hua Zhong, Xu Pan
备注:Long paper accepted by EMNLP 2020
链接:https://arxiv.org/abs/2010.02656

摘要:Aspect-category sentiment analysis (ACSA) aims to predict sentiment polarities of sentences with respect to given aspect categories. To detect the sentiment toward a particular aspect category in a sentence, most previous methods first generate an aspect category-specific sentence representation for the aspect category, then predict the sentiment polarity based on the representation. These methods ignore the fact that the sentiment of an aspect category mentioned in a sentence is an aggregation of the sentiments of the words indicating the aspect category in the sentence, which leads to suboptimal performance. In this paper, we propose a Multi-Instance Multi-Label Learning Network for Aspect-Category sentiment analysis (AC-MIMLLN), which treats sentences as bags, words as instances, and the words indicating an aspect category as the key instances of the aspect category. Given a sentence and the aspect categories mentioned in the sentence, AC-MIMLLN first predicts the sentiments of the instances, then finds the key instances for the aspect categories, finally obtains the sentiments of the sentence toward the aspect categories by aggregating the key instance sentiments. Experimental results on three public datasets demonstrate the effectiveness of AC-MIMLLN.



[3]:Sentiment Analysis for Reinforcement Learning
标题:强化学习的情感分析
作者:Ameet Deshpande, Eve Fleisig
备注:Work in progress
链接:https://arxiv.org/abs/2010.02316

摘要:While reinforcement learning (RL) has been successful in natural language processing (NLP) domains such as dialogue generation and text-based games, it typically faces the problem of sparse rewards that leads to slow or no convergence. Traditional methods that use text descriptions to extract only a state representation ignore the feedback inherently present in them. In text-based games, for example, descriptions like "Good Job! You ate the food}" indicate progress, and descriptions like "You entered a new room" indicate exploration. Positive and negative cues like these can be converted to rewards through sentiment analysis. This technique converts the sparse reward problem into a dense one, which is easier to solve. Furthermore, this can enable reinforcement learning without rewards, in which the agent learns entirely from these intrinsic sentiment rewards. This framework is similar to intrinsic motivation, where the environment does not necessarily provide the rewards, but the agent analyzes and realizes them by itself. We find that providing dense rewards in text-based games using sentiment analysis improves performance under some conditions.


句法分析(2篇)


[1]:Please Mind the Root: Decoding Arborescences for Dependency Parsing
标题:请注意根:为依赖解析解码树状结构
作者:Ran Zmigrod, Tim Vieira, Ryan Cotterell
链接:https://arxiv.org/abs/2010.02550

摘要:The connection between dependency trees and spanning trees is exploited by the NLP community to train and to decode graph-based dependency parsers. However, the NLP literature has missed an important difference between the two structures: only one edge may emanate from the root in a dependency tree. We analyzed the output of state-of-the-art parsers on many languages from the Universal Dependency Treebank: although these parsers are often able to learn that trees which violate the constraint should be assigned lower probabilities, their ability to do so unsurprisingly de-grades as the size of the training setthis http URLfact, the worst constraint-violation rate we observe is 24%. Prior work has proposed an inefficient algorithm to enforce the constraint, which adds a factor of n to the decoding runtime. We adapt an algorithm due to Gabow and Tarjan (1984) to dependency parsing, which satisfies the constraint without compromising the original runtime.



[2]:On the Role of Supervision in Unsupervised Constituency Parsing
标题:论监督在无监督选区分析中的作用
作者:Haoyue Shi, Karen Livescu, Kevin Gimpel
备注:EMNLP 2020. Project page:this https URL
链接:https://arxiv.org/abs/2010.02423

摘要:We analyze several recent unsupervised constituency parsing models, which are tuned with respect to the parsing $F_1$ score on the Wall Street Journal (WSJ) development set (1,700 sentences). We introduce strong baselines for them, by training an existing supervised parsing model (Kitaev and Klein, 2018) on the same labeled examples they access. When training on the 1,700 examples, or even when using only 50 examples for training and 5 for development, such a few-shot parsing approach can outperform all the unsupervised parsing methods by a significant margin. Few-shot parsing can be further improved by a simple data augmentation method and self-training. This suggests that, in order to arrive at fair conclusions, we should carefully consider the amount of labeled data used for model development. We propose two protocols for future work on unsupervised parsing: (i) use fully unsupervised criteria for hyperparameter tuning and model selection; (ii) use as few labeled examples as possible for model development, and compare to few-shot parsing trained on the same labeled examples.


模型(5篇)


[1]:Analyzing Individual Neurons in Pre-trained Language Models
标题:预训练语言模型中单个神经元的分析
作者:Nadir Durrani, Hassan Sajjad, Fahim Dalvi, Yonatan Belinkov
备注:Accepted in EMNLP 2020
链接:https://arxiv.org/abs/2010.02695

摘要:While a lot of analysis has been carried to demonstrate linguistic knowledge captured by the representations learned within deep NLP models, very little attention has been paid towards individual neurons.We carry outa neuron-level analysis using core linguistic tasks of predicting morphology, syntax and semantics, on pre-trained language models, with questions like: i) do individual neurons in pre-trained models capture linguistic information? ii) which parts of the network learn more about certain linguistic phenomena? iii) how distributed or focused is the information? and iv) how do various architectures differ in learning these properties? We found small subsets of neurons to predict linguistic tasks, with lower level tasks (such as morphology) localized in fewer neurons, compared to higher level task of predicting syntax. Our study also reveals interesting cross architectural comparisons. For example, we found neurons in XLNet to be more localized and disjoint when predicting properties compared to BERT and others, where they are more distributed and coupled.



[2]:On the Branching Bias of Syntax Extracted from Pre-trained Language  Models
标题:从语言偏误模型中提取的预分支
作者:Huayang Li, Lemao Liu, Guoping Huang, Shuming Shi
备注:EMNLP 2020 findings
链接:https://arxiv.org/abs/2010.02448

摘要:Many efforts have been devoted to extracting constituency trees from pre-trained language models, often proceeding in two stages: feature definition and parsing. However, this kind of methods may suffer from the branching bias issue, which will inflate the performances on languages with the same branch it biases to. In this work, we propose quantitatively measuring the branching bias by comparing the performance gap on a language and its reversed language, which is agnostic to both language models and extracting methods. Furthermore, we analyze the impacts of three factors on the branching bias, namely parsing algorithms, feature definitions, and language models. Experiments show that several existing works exhibit branching biases, and some implementations of these three factors can introduce the branching bias.



[3]:Improving Neural Topic Models using Knowledge Distillation
标题:用知识蒸馏改进神经话题模型
作者:Alexander Hoyle, Pranav Goel, Philip Resnik
备注:Accepted to EMNLP 2020
链接:https://arxiv.org/abs/2010.02377

摘要:Topic models are often used to identify human-interpretable topics to help make sense of large document collections. We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers. Our modular method can be straightforwardly applied with any neural topic model to improve topic quality, which we demonstrate using two models having disparate architectures, obtaining state-of-the-art topic coherence. We show that our adaptable framework not only improves performance in the aggregate over all estimated topics, as is commonly reported, but also in head-to-head comparisons of aligned topics.



[4]:Investigating representations of verb bias in neural language models
标题:动词偏误在神经语言模型中的表征研究
作者:Robert D. Hawkins, Takateru Yamakoshi, Thomas L. Griffiths, Adele E. Goldberg
备注:Accepted to EMNLP
链接:https://arxiv.org/abs/2010.02375

摘要:Languages typically provide more than one grammatical construction to express certain types of messages. A speaker's choice of construction is known to depend on multiple factors, including the choice of main verb -- a phenomenon known as \emph{verb bias}. Here we introduce DAIS, a large benchmark dataset containing 50K human judgments for 5K distinct sentence pairs in the English dative alternation. This dataset includes 200 unique verbs and systematically varies the definiteness and length of arguments. We use this dataset, as well as an existing corpus of naturally occurring data, to evaluate how well recent neural language models capture human preferences. Results show that larger models perform better than smaller models, and transformer architectures (e.g. GPT-2) tend to out-perform recurrent architectures (e.g. LSTMs) even under comparable parameter and training settings. Additional analyses of internal feature representations suggest that transformers may better integrate specific lexical information with grammatical constructions.



[5]:InfoBERT: Improving Robustness of Language Models from An Information  Theoretic Perspective
标题:InfoBERT:从信息论的角度提高语言模型的健壮性
作者:Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu
备注:20 pages, 8 tables, 2 figures
链接:https://arxiv.org/abs/2010.02329

摘要:Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies, however, show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks. We aim to address this problem from an information-theoretic perspective, and propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models. InfoBERT contains two mutual-information-based regularizers for model training: (i) an Information Bottleneck regularizer, which suppresses noisy mutual information between the input and the feature representation; and (ii) a Robust Feature regularizer, which increases the mutual information between local robust features and global features. We provide a principled way to theoretically analyze and improve the robustness of representation learning for language models in both standard and adversarial training. Extensive experiments demonstrate that InfoBERT achieves state-of-the-art robust accuracy over several adversarial datasets on Natural Language Inference (NLI) and Question Answering (QA) tasks.


其他(59篇)


[1]:LOGAN: Local Group Bias Detection by Clustering
标题:LOGAN:基于聚类的局部群偏差检测
作者:Jieyu Zhao, Kai-Wei Chang
备注:EMNLP 2020
链接:https://arxiv.org/abs/2010.02867

摘要:Machine learning techniques have been widely used in natural language processing (NLP). However, as revealed by many recent studies, machine learning models often inherit and amplify the societal biases in data. Various metrics have been proposed to quantify biases in model predictions. In particular, several of them evaluate disparity in model performance between protected groups and advantaged groups in the test corpus. However, we argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model. In fact, a model with similar aggregated performance between different groups on the entire data may behave differently on instances in a local region. To analyze and detect such local bias, we propose LOGAN, a new bias detection technique based on clustering. Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region and allows us to better analyze the biases in model predictions.



[2]:Robustness and Reliability of Gender Bias Assessment in WordEmbeddings:  The Role of Base Pairs
标题:单词嵌入中性别偏见评估的稳健性和可靠性:碱基对的作用
作者:Haiyang Zhang, Alison Sneyd, Mark Stevenson
备注:Accepted at AACL-IJCNLP 2020
链接:https://arxiv.org/abs/2010.02847

摘要:It has been shown that word embeddings can exhibit gender bias, and various methods have been proposed to quantify this. However, the extent to which the methods are capturing social stereotypes inherited from the data has been debated. Bias is a complex concept and there exist multiple ways to define it. Previous work has leveraged gender word pairs to measure bias and extract biased analogies. We show that the reliance on these gendered pairs has strong limitations: bias measures based off of them are not robust and cannot identify common types of real-world bias, whilst analogies utilising them are unsuitable indicators of bias. In particular, the well-known analogy "man is to computer-programmer as woman is to homemaker" is due to word similarity rather than societal bias. This has important implications for work on measuring bias in embeddings and related work debiasing embeddings.



[3]:Semantic Evaluation for Text-to-SQL with Distilled Test Suites
标题:基于抽取测试套件的文本到SQL语义评估
作者:Ruiqi Zhong, Tao Yu, Dan Klein
备注:EMNLP 2020 Long Paper
链接:https://arxiv.org/abs/2010.02840

摘要:We propose test suite accuracy to approximate semantic accuracy for Text-to-SQL models. Our method distills a small test suite of databases that achieves high code coverage for the gold query from a large number of randomly generated databases. At evaluation time, it computes the denotation accuracy of the predicted queries on the distilled test suite, hence calculating a tight upper-bound for semantic accuracy efficiently. We use our proposed method to evaluate 21 models submitted to the Spider leader board and manually verify that our method is always correct on 100 examples. In contrast, the current Spider metric leads to a 2.5% false negative rate on average and 8.1% in the worst case, indicating that test suite accuracy is needed. Our implementation, along with distilled test suites for eleven Text-to-SQL datasets, is publicly available.



[4]:Intrinsic Probing through Dimension Selection
标题:量纲选择的内在探索
作者:Lucas Torroba Hennigen, Adina Williams, Ryan Cotterell
备注:To appear EMNLP 2020
链接:https://arxiv.org/abs/2010.02812

摘要:Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks. Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it. In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted. To enable intrinsic probing, we propose a novel framework based on a decomposable multivariate Gaussian probe that allows us to determine whether the linguistic information in word embeddings is dispersed or focal. We then probe fastText and BERT for various morphosyntactic attributes across 36 languages. We find that most attributes are reliably encoded by only a few neurons, with fastText concentrating its linguistic structure more than BERT.



[5]:Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech  to Standard German Text Corpus
标题:瑞士议会语料库,一个自动对齐的瑞士德语演讲标准德语文本语料库
作者:Michel Plüss, Lukas Neukom, Manfred Vogel
备注:5 pages, 0 figures
链接:https://arxiv.org/abs/2010.02810

摘要:We present a forced sentence alignment procedure for Swiss German speech and Standard German text. It is able to create a speech-to-text corpus in a fully automatic fashion, given an audio recording and the corresponding unaligned transcript. Compared to a manual alignment, it achieves a mean IoU of 0.8401 with a sentence recall of 0.9491. When applying our IoU estimate filter, the mean IoU can be further improved to 0.9271 at the cost of a lower sentence recall of 0.4881. Using this procedure, we created the Swiss Parliaments Corpus, an automatically aligned Swiss German speech to Standard German text corpus. 65 % of the raw data could be transformed to sentence-level audio-text-pairs, resulting in 293 hours of training data. We have made the corpus freely available for download.



[6]:Learning to Ignore: Long Document Coreference with Bounded Memory Neural  Networks
标题:学习忽略:有界记忆神经网络的长文档共指
作者:Shubham Toshniwal, Sam Wiseman, Allyson Ettinger, Karen Livescu, Kevin Gimpel
备注:EMNLP 2020 camera ready
链接:https://arxiv.org/abs/2010.02807

摘要:Long document coreference resolution remains a challenging task due to the large memory and runtime requirements of current models. Recent work doing incremental coreference resolution using just the global representation of entities shows practical benefits but requires keeping all entities in memory, which can be impractical for long documents. We argue that keeping all entities in memory is unnecessary, and we propose a memory-augmented neural network that tracks only a small bounded number of entities at a time, thus guaranteeing a linear runtime in length of document. We show that (a) the model remains competitive with models with high memory and computational requirements on OntoNotes and LitBank, and (b) the model learns an efficient memory management strategy easily outperforming a rule-based strategy.



[7]:Textual Supervision for Visually Grounded Spoken Language Understanding
标题:视觉化口语理解的语篇监控
作者:Bertrand Higy, Desmond Eliott, Grzegorz Chrupała
备注:Findings of EMNLP 2020
链接:https://arxiv.org/abs/2010.02806

摘要:Visually-grounded models of spoken language understanding extract semantic information directly from speech, without relying on transcriptions. This is useful for low-resource languages, where transcriptions can be expensive or impossible to obtain. Recent work showed that these models can be improved if transcriptions are available at training time. However, it is not clear how an end-to-end approach compares to a traditional pipeline-based approach when one has access to transcriptions. Comparing different strategies, we find that the pipeline approach works better when enough text is available. With low-resource languages in mind, we also show that translations can be effectively used in place of transcriptions but more data is needed to obtain similar results.



[8]:Tackling the Low-resource Challenge for Canonical Segmentation
标题:解决规范分割的低资源挑战
作者:Manuel Mager, Özlem Çetinoğlu, Katharina Kann
备注:Accepted to EMNLP 2020
链接:https://arxiv.org/abs/2010.02804

摘要:Canonical morphological segmentation consists of dividing words into their standardized morphemes. Here, we are interested in approaches for the task when training data is limited. We compare model performance in a simulated low-resource setting for the high-resource languages German, English, and Indonesian to experiments on new datasets for the truly low-resource languages Popoluca and Tepehua. We explore two new models for the task, borrowing from the closely related area of morphological generation: an LSTM pointer-generator and a sequence-to-sequence model with hard monotonic attention trained with imitation learning. We find that, in the low-resource setting, the novel approaches outperform existing ones on all languages by up to 11.4% accuracy. However, while accuracy in emulated low-resource scenarios is over 50% for all languages, for the truly low-resource languages Popoluca and Tepehua, our best model only obtains 37.4% and 28.4% accuracy, respectively. Thus, we conclude that canonical segmentation is still a challenging task for low-resource languages.



[9]:COSMIC: COmmonSense knowledge for eMotion Identification in  Conversations
标题:宇宙:会话中情感识别的常识知识
作者:Deepanway Ghosal, Navonil Majumder, Alexander Gelbukh, Rada Mihalcea, Soujanya Poria
链接:https://arxiv.org/abs/2010.02795

摘要:In this paper, we address the task of utterance level emotion recognition in conversations using commonsense knowledge. We propose COSMIC, a new framework that incorporates different elements of commonsense such as mental states, events, and causal relations, and build upon them to learn interactions between interlocutors participating in a conversation. Current state-of-the-art methods often encounter difficulties in context propagation, emotion shift detection, and differentiating between related emotion classes. By learning distinct commonsense representations, COSMIC addresses these challenges and achieves new state-of-the-art results for emotion recognition on four different benchmark conversational datasets. Our code is available atthis https URL.



[10]:Towards Coalgebras in Stylometry
标题:关于柱状图测量中的余布拉斯
作者:Joël A. Doat
备注:9 pages, 12 figures
链接:https://arxiv.org/abs/2010.02733

摘要:The syntactic behaviour of texts can highly vary depending on their contexts (e.g. author, genre, etc.). From the standpoint of stylometry, it can be helpful to objectively measure this behaviour. In this paper, we discuss how coalgebras are used to formalise the notion of behaviour by embedding syntactic features of a given text into probabilistic transition systems. By introducing the behavioural distance, we are then able to quantitatively measure differences between points in these systems and thus, comparing features of different texts. Furthermore, the behavioural distance of points can be approximated by a polynomial-time algorithm.



[11]:SlotRefine: A Fast Non-Autoregressive Model for Joint Intent Detection  and Slot Filling
标题:SlotRefine:一种快速的非自回归联合意图检测和缝隙填充模型
作者:Di Wu, Liang Ding, Fan Lu, Jian Xie
备注:To appear in main conference of EMNLP 2020
链接:https://arxiv.org/abs/2010.02693

摘要:Slot filling and intent detection are two main tasks in spoken language understanding (SLU) system. In this paper, we propose a novel non-autoregressive model named SlotRefine for joint intent detection and slot filling. Besides, we design a novel two-pass iteration mechanism to handle the uncoordinated slots problem caused by conditional independence of non-autoregressive model. Experiments demonstrate that our model significantly outperforms previous models in slot filling task, while considerably speeding up the decoding (up to X 10.77). In-depth analyses show that 1) pretraining schemes could further enhance our model; 2) two-pass mechanism indeed remedy the uncoordinated slots.



[12]:BERT Knows Punta Cana is not just beautiful, it's gorgeous: Ranking  Scalar Adjectives with Contextualised Representations
标题:伯特知道puntacana不仅漂亮,而且很华丽:用语境化的表征对标量形容词进行排序
作者:Aina Garí Soler, Marianna Apidianaki
备注:EMNLP 2020
链接:https://arxiv.org/abs/2010.02686

摘要:Adjectives like pretty, beautiful and gorgeous describe positive properties of the nouns they modify but with different intensity. These differences are important for natural language understanding and reasoning. We propose a novel BERT-based approach to intensity detection for scalar adjectives. We model intensity by vectors directly derived from contextualised representations and show they can successfully rank scalar adjectives. We evaluate our models both intrinsically, on gold standard datasets, and on an Indirect Question Answering task. Our results demonstrate that BERT encodes rich knowledge about the semantics of scalar adjectives, and is able to provide better quality intensity rankings than static embeddings and previous models with access to dedicated resources.



[13]:Poison Attacks against Text Datasets with Conditional Adversarially  Regularized Autoencoder
标题:条件逆正则化自动编码器对文本数据集的毒害攻击
作者:Alvin Chan, Yi Tay, Yew-Soon Ong, Aston Zhang
备注:Accepted in EMNLP-Findings 2020, Camera Ready Version
链接:https://arxiv.org/abs/2010.02684

摘要:This paper demonstrates a fatal vulnerability in natural language inference (NLI) and text classification systems. More concretely, we present a 'backdoor poisoning' attack on NLP models. Our poisoning attack utilizes conditional adversarially regularized autoencoder (CARA) to generate poisoned training samples by poison injection in latent space. Just by adding 1% poisoned data, our experiments show that a victim BERT finetuned classifier's predictions can be steered to the poison target class with success rates of >80% when the input hypothesis is injected with the poison signature, demonstrating that NLI and text classification systems face a huge security risk.



[14]:Automatic Metaphor Interpretation Using Word Embeddings
标题:基于词嵌入的隐喻自动解释
作者:Kfir Bar, Nachum Dershowitz, Lena Dankin
备注:Presented at 19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2018
链接:https://arxiv.org/abs/2010.02665

摘要:We suggest a model for metaphor interpretation using word embeddings trained over a relatively large corpus. Our system handles nominal metaphors, like "time is money". It generates a ranked list of potential interpretations of given metaphors. Candidate meanings are drawn from collocations of the topic ("time") and vehicle ("money") components, automatically extracted from a dependency-parsed corpus. We explore adding candidates derived from word association norms (common human responses to cues). Our ranking procedure considers similarity between candidate interpretations and metaphor components, measured in a semantic vector space. Lastly, a clustering algorithm removes semantically related duplicates, thereby allowing other candidate interpretations to attain higher rank. We evaluate using a set of annotated metaphors.



[15]:Detecting Attackable Sentences in Arguments
标题:可攻击句子中的论据检测
作者:Yohan Jo, Seojin Bang, Emaad Manzoor, Eduard Hovy, Chris Reed
备注:EMNLP 2020
链接:https://arxiv.org/abs/2010.02660

摘要:Finding attackable sentences in an argument is the first step toward successful refutation in argumentation. We present a first large-scale analysis of sentence attackability in online arguments. We analyze driving reasons for attacks in argumentation and identify relevant characteristics of sentences. We demonstrate that a sentence's attackability is associated with many of these characteristics regarding the sentence's content, proposition types, and tone, and that an external knowledge source can provide useful information about attackability. Building on these findings, we demonstrate that machine learning models can automatically detect attackable sentences in arguments, significantly better than several baselines and comparably well to laypeople.



[16]:If beam search is the answer, what was the question?
标题:如果光束搜索是答案,问题是什么?
作者:Clara Meister, Tim Vieira, Ryan Cotterell
备注:EMNLP 2020
链接:https://arxiv.org/abs/2010.02650

摘要:Quite surprisingly, exact maximum a posteriori (MAP) decoding of neural language generators frequently leads to low-quality results. Rather, most state-of-the-art results on language generation tasks are attained using beam search despite its overwhelmingly high search error rate. This implies that the MAP objective alone does not express the properties we desire in text, which merits the question: if beam search is the answer, what was the question? We frame beam search as the exact solution to a different decoding objective in order to gain insights into why high probability under a model alone may not indicate adequacy. We find that beam search enforces uniform information density in text, a property motivated by cognitive science. We suggest a set of decoding objectives that explicitly enforce this property and find that exact decoding with these objectives alleviates the problems encountered when decoding poorly calibrated language generation models. Additionally, we analyze the text produced using various decoding strategies and see that, in our neural machine translation experiments, the extent to which this property is adhered to strongly correlates with BLEU.



[17]:On the Sub-Layer Functionalities of Transformer Decoder
标题:论变压器译码器的子层功能
作者:Yilin Yang, Longyue Wang, Shuming Shi, Prasad Tadepalli, Stefan Lee, Zhaopeng Tu
备注:Findings of the 2020 Conference on Empirical Methods in Natural Language Processing (Long)
链接:https://arxiv.org/abs/2010.02648

摘要:There have been significant efforts to interpret the encoder of Transformer-based encoder-decoder architectures for neural machine translation (NMT); meanwhile, the decoder remains largely unexamined despite its critical role. During translation, the decoder must predict output tokens by considering both the source-language text from the encoder and the target-language prefix produced in previous steps. In this work, we study how Transformer-based decoders leverage information from the source and target languages -- developing a universal probe task to assess how information is propagated through each module of each decoder layer. We perform extensive experiments on three major translation datasets (WMT En-De, En-Fr, and En-Zh). Our analysis provides insight on when and where decoders leverage different sources. Based on these insights, we demonstrate that the residual feed-forward module in each Transformer decoder layer can be dropped with minimal loss of performance -- a significant reduction in computation and number of parameters, and consequently a significant boost to both training and inference speed.



[18]:Neural Speech Synthesis for Estonian
标题:爱沙尼亚语神经语音合成
作者:Liisa Rätsep, Liisi Piits, Hille Pajupuu, Indrek Hein, Mark Fišel
备注:9 pages in Estonian
链接:https://arxiv.org/abs/2010.02636

摘要:This technical report describes the results of a collaboration between the NLP research group at the University of Tartu and the Institute of Estonian Language on improving neural speech synthesis for Estonian. The report (written in Estonian) describes the project results, the summary of which is: (1) Speech synthesis data from 6 speakers for a total of 92.4 hours is collected and openly released (CC-BY-4.0). Data available atthis https URLandthis https URL. (2) software and models for neural speech synthesis is released open-source (MIT license). Available atthis https URL. (3) We ran evaluations of the new models and compared them to other existing solutions (HMM-based HTS models from EKI,this http URL, and Google's speech synthesis for Estonian, accessed viathis https URL). Evaluation includes voice acceptability MOS scores for sentence-level and longer excerpts, detailed error analysis and evaluation of the pre-processing module.



[19]:On the Interplay Between Fine-tuning and Sentence-level Probing for  Linguistic Knowledge in Pre-trained Transformers
标题:论预训练变形金刚语言知识的微调与句子层次探究的相互作用
作者:Marius Mosbach, Anna Khokhlova, Michael A. Hedderich, Dietrich Klakow
备注:Accepted at Findings of EMNLP 2020 and BlackboxNLP 2020
链接:https://arxiv.org/abs/2010.02616

摘要:Fine-tuning pre-trained contextualized embedding models has become an integral part of the NLP pipeline. At the same time, probing has emerged as a way to investigate the linguistic knowledge captured by pre-trained models. Very little is, however, understood about how fine-tuning affects the representations of pre-trained models and thereby the linguistic knowledge they encode. This paper contributes towards closing this gap. We study three different pre-trained models: BERT, RoBERTa, and ALBERT, and investigate through sentence-level probing how fine-tuning affects their representations. We find that for some probing tasks fine-tuning leads to substantial changes in accuracy, possibly suggesting that fine-tuning introduces or even removes linguistic knowledge from a pre-trained model. These changes, however, vary greatly across different models, fine-tuning and probing tasks. Our analysis reveals that while fine-tuning indeed changes the representations of a pre-trained model and these changes are typically larger for higher layers, only in very few cases, fine-tuning has a positive effect on probing accuracy that is larger than just using the pre-trained model with a strong pooling method. Based on our findings, we argue that both positive and negative effects of fine-tuning on probing require a careful interpretation.



[20]:Converting the Point of View of Messages Spoken to Virtual Assistants
标题:转换与虚拟助理交谈的消息的观点
作者:Isabelle G. Lee, Vera Zu, Sai Srujana Buddi, Dennis Liang, Jack G.M. Fitzgerald
备注:10 pages, 11 figures, Findings of EMNLP 2020
链接:https://arxiv.org/abs/2010.02600

摘要:Virtual Assistants can be quite literal at times. If the user says "tell Bob I love him," most virtual assistants will extract the message "I love him" and send it to the user's contact named Bob, rather than properly converting the message to "I love you." We designed a system to allow virtual assistants to take a voice message from one user, convert the point of view of the message, and then deliver the result to its target user. We developed a rule-based model, which integrates a linear text classification model, part-of-speech tagging, and constituency parsing with rule-based transformation methods. We also investigated Neural Machine Translation (NMT) approaches, including LSTMs, CopyNet, and T5. We explored 5 metrics to gauge both naturalness and faithfulness automatically, and we chose to use BLEU plus METEOR for faithfulness and relative perplexity using a separately trained language model (GPT) for naturalness. Transformer-Copynet and T5 performed similarly on faithfulness metrics, with T5 achieving slight edge, a BLEU score of 63.8 and a METEOR score of 83.0. CopyNet was the most natural, with a relative perplexity of 1.59. CopyNet also has 37 times fewer parameters than T5. We have publicly released our dataset, which is composed of 46,565 crowd-sourced samples.



[21]:Embedding Words in Non-Vector Space with Unsupervised Graph Learning
标题:基于无监督图学习的非向量空间单词嵌入
作者:Max Ryabinin, Sergei Popov, Liudmila Prokhorenkova, Elena Voita
备注:Accepted as a long paper for EMNLP 2020. 15 pages, 6 figures
链接:https://arxiv.org/abs/2010.02598

摘要:It has become a de-facto standard to represent words as elements of a vector space (word2vec, GloVe). While this approach is convenient, it is unnatural for language: words form a graph with a latent hierarchical structure, and this structure has to be revealed and encoded by word embeddings. We introduce GraphGlove: unsupervised graph word representations which are learned end-to-end. In our setting, each word is a node in a weighted graph and the distance between words is the shortest path distance between the corresponding nodes. We adopt a recent method learning a representation of data in the form of a differentiable weighted graph and use it to modify the GloVe training algorithm. We show that our graph-based representations substantially outperform vector-based methods on word similarity and analogy tasks. Our analysis reveals that the structure of the learned graphs is hierarchical and similar to that of WordNet, the geometry is highly non-trivial and contains subgraphs with different local topology.



[22]:Semantically Driven Sentence Fusion: Modeling and Evaluation
标题:语义驱动的句子融合:建模与评价
作者:Eyal Ben-David, Orgad Keller, Eric Malmi, Idan Szpektor, Roi Reichart
备注:This paper was accepted to Findings of EMNLP 2020
链接:https://arxiv.org/abs/2010.02592

摘要:Sentence fusion is the task of joining related sentences into coherent text. Current training and evaluation schemes for this task are based on single reference ground-truths and do not account for valid fusion variants. We show that this hinders models from robustly capturing the semantic relationship between input sentences. To alleviate this, we present an approach in which ground-truth solutions are automatically expanded into multiple references via curated equivalence classes of connective phrases. We apply this method to a large-scale dataset and use the augmented dataset for both model training and evaluation. To improve the learning of semantic representation using multiple references, we enrich the model with auxiliary discourse classification tasks under a multi-tasking framework. Our experiments highlight the improvements of our approach over state-of-the-art models.



[23]:Scene Graph Modification Based on Natural Language Commands
标题:基于自然语言命令的场景图修改
作者:Xuanli He, Quan Hung Tran, Gholamreza Haffari, Walter Chang, Trung Bui, Zhe Lin, Franck Dernoncourt, Nhan Dam
备注:Accepted to the Findings of EMNLP 2020
链接:https://arxiv.org/abs/2010.02591

摘要:Structured representations like graphs and parse trees play a crucial role in many Natural Language Processing systems. In recent years, the advancements in multi-turn user interfaces necessitate the need for controlling and updating these structured representations given new sources of information. Although there have been many efforts focusing on improving the performance of the parsers that map text to graphs or parse trees, very few have explored the problem of directly manipulating these representations. In this paper, we explore the novel problem of graph modification, where the systems need to learn how to update an existing scene graph given a new user's command. Our novel models based on graph-based sparse transformer and cross attention information fusion outperform previous systems adapted from the machine translation and graph generation literature. We further contribute our large graph modification datasets to the research community to encourage future research for this new problem.



[24]:CoRefi: A Crowd Sourcing Suite for Coreference Annotation
标题:CoRefi:一个用于共指注释的众包套件
作者:Aaron Bornstein, Arie Cattan, Ido Dagan
备注:EMNLP 2020 system demonstration paper
链接:https://arxiv.org/abs/2010.02588

摘要:Coreference annotation is an important, yet expensive and time consuming, task, which often involved expert annotators trained on complex decision guidelines. To enable cheaper and more efficient annotation, we present CoRefi, a web-based coreference annotation suite, oriented for crowdsourcing. Beyond the core coreference annotation tool, CoRefi provides guided onboarding for the task as well as a novel algorithm for a reviewing phase. CoRefi is open source and directly embeds into any website, including popular crowdsourcing platforms.
CoRefi Demo:this http URLVideo Tour:this http URLGithub Repo:this https URL



[25]:Dissecting Span Identification Tasks with Performance Prediction
标题:基于性能预测的跨距识别任务剖析
作者:Sean Papay, Roman Klinger, Sebastian Padó
备注:accepted at EMNLP 2020
链接:https://arxiv.org/abs/2010.02587

摘要:Span identification (in short, span ID) tasks such as chunking, NER, or code-switching detection, ask models to identify and classify relevant spans in a text. Despite being a staple of NLP, and sharing a common structure, there is little insight on how these tasks' properties influence their difficulty, and thus little guidance on what model families work well on span ID tasks, and why. We analyze span ID tasks via performance prediction, estimating how well neural architectures do on different tasks. Our contributions are: (a) we identify key properties of span ID tasks that can inform performance prediction; (b) we carry out a large-scale experiment on English data, building a model to predict performance for unseen span ID tasks that can support architecture choices; (c), we investigate the parameters of the meta model, yielding new insights on how model and task properties interact to affect span ID performance. We find, e.g., that span frequency is especially important for LSTMs, and that CRFs help when spans are infrequent and boundaries non-distinctive.



[26]:Knowing What You Know: Calibrating Dialogue Belief State Distributions  via Ensembles
标题:知己知彼:通过集合校准对话信念状态分布
作者:Carel van Niekerk, Michael Heck, Christian Geishauser, Hsien-Chin Lin, Nurul Lubis, Marco Moresi, Milica Gašić
备注:7 pages, 9 figures, to be published in Findings of EMNLP 2020, code available at:this https URL
链接:https://arxiv.org/abs/2010.02586

摘要:The ability to accurately track what happens during a conversation is essential for the performance of a dialogue system. Current state-of-the-art multi-domain dialogue state trackers achieve just over 55% accuracy on the current go-to benchmark, which means that in almost every second dialogue turn they place full confidence in an incorrect dialogue state. Belief trackers, on the other hand, maintain a distribution over possible dialogue states. However, they lack in performance compared to dialogue state trackers, and do not produce well calibrated distributions. In this work we present state-of-the-art performance in calibration for multi-domain dialogue belief trackers using a calibrated ensemble of models. Our resulting dialogue belief tracker also outperforms previous dialogue belief tracking models in terms of accuracy.



[27]:The Multilingual Amazon Reviews Corpus
标题:多语种亚马逊评论语料库
作者:Phillip Keung, Yichao Lu, György Szarvas, Noah A. Smith
备注:To appear in EMNLP 2020
链接:https://arxiv.org/abs/2010.02573

摘要:We present the Multilingual Amazon Reviews Corpus (MARC), a large-scale collection of Amazon reviews for multilingual text classification. The corpus contains reviews in English, Japanese, German, French, Spanish, and Chinese, which were collected between 2015 and 2019. Each record in the dataset contains the review text, the review title, the star rating, an anonymized reviewer ID, an anonymized product ID, and the coarse-grained product category (e.g., 'books', 'appliances', etc.) The corpus is balanced across the 5 possible star ratings, so each rating constitutes 20% of the reviews in each language. For each language, there are 200,000, 5,000, and 5,000 reviews in the training, development, and test sets, respectively. We report baseline results for supervised text classification and zero-shot cross-lingual transfer learning by fine-tuning a multilingual BERT model on reviews data. We propose the use of mean absolute error (MAE) instead of classification accuracy for this task, since MAE accounts for the ordinal nature of the ratings.



[28]:Does the Objective Matter? Comparing Training Objectives for Pronoun  Resolution
标题:目标重要吗?代词分辨力训练目标的比较研究
作者:Yordan Yordanov, Oana-Maria Camburu, Vid Kocijan, Thomas Lukasiewicz
备注:Accepted to the EMNLP 2020 conference
链接:https://arxiv.org/abs/2010.02570

摘要:Hard cases of pronoun resolution have been used as a long-standing benchmark for commonsense reasoning. In the recent literature, pre-trained language models have been used to obtain state-of-the-art results on pronoun resolution. Overall, four categories of training and evaluation objectives have been introduced. The variety of training datasets and pre-trained language models used in these works makes it unclear whether the choice of training objective is critical. In this work, we make a fair comparison of the performance and seed-wise stability of four models that represent the four categories of objectives. Our experiments show that the objective of sequence ranking performs the best in-domain, while the objective of semantic similarity between candidates and pronoun performs the best out-of-domain. We also observe a seed-wise instability of the model using sequence ranking, which is not the case when the other objectives are used.



[29]:LEGAL-BERT: The Muppets straight out of Law School
标题:法学院毕业的木偶
作者:Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos
备注:5 pages, short paper in Findings of EMNLP 2020
链接:https://arxiv.org/abs/2010.02559

摘要:BERT has achieved impressive performance in several NLP tasks. However, there has been limited investigation on its adaptation guidelines in specialised domains. Here we focus on the legal domain, where we explore several approaches for applying BERT models to downstream legal tasks, evaluating on multiple datasets. Our findings indicate that the previous guidelines for pre-training and fine-tuning, often blindly followed, do not always generalize well in the legal domain. Thus we propose a systematic investigation of the available strategies when applying BERT in specialised domains. These are: (a) use the original BERT out of the box, (b) adapt BERT by additional pre-training on domain-specific corpora, and (c) pre-train BERT from scratch on domain-specific corpora. We also propose a broader hyper-parameter search space when fine-tuning for downstream tasks and we release LEGAL-BERT, a family of BERT models intended to assist legal NLP research, computational law, and legal technology applications.



[30]:PolicyQA: A Reading Comprehension Dataset for Privacy Policies
标题:PolicyQA:隐私策略的阅读理解数据集
作者:Wasi Uddin Ahmad, Jianfeng Chi, Yuan Tian, Kai-Wei Chang
备注:EMNLP Findings 2020 (short paper)
链接:https://arxiv.org/abs/2010.02557

摘要:Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy document given a question. On the contrary, we argue that providing users with a short text span from policy documents reduces the burden of searching the target information from a lengthy text segment. In this paper, we present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies. PolicyQA provides 714 human-annotated questions written for a wide range of privacy practices. We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.



[31]:Do Explicit Alignments Robustly Improve Multilingual Encoders?
标题:显式对齐是否有力地改进了多语言编码器?
作者:Shijie Wu, Mark Dredze
备注:EMNLP 2020
链接:https://arxiv.org/abs/2010.02537

摘要:Multilingual BERT (mBERT), XLM-RoBERTa (XLMR) and other unsupervised multilingual encoders can effectively learn cross-lingual representation. Explicit alignment objectives based on bitexts like Europarl or MultiUN have been shown to further improve these representations. However, word-level alignments are often suboptimal and such bitexts are unavailable for many languages. In this paper, we propose a new contrastive alignment objective that can better utilize such signal, and examine whether these previous alignment methods can be adapted to noisier sources of aligned data: a randomly sampled 1 million pair subset of the OPUS collection. Additionally, rather than report results on a single dataset with a single model run, we report the mean and standard derivation of multiple runs with different seeds, on four datasets and tasks. Our more extensive analysis finds that, while our new objective outperforms previous work, overall these methods do not improve performance with a more robust evaluation framework. Furthermore, the gains from using a better underlying model eclipse any benefits from alignment training. These negative results dictate more care in evaluating these methods and suggest limitations in applying explicit alignment objectives.



[32]:An Empirical Study of Tokenization Strategies for Various Korean NLP  Tasks
标题:不同韩国语NLP任务标记化策略的实证研究
作者:Kyubyong Park, Joohong Lee, Seongbo Jang, Dawoon Jung
备注:Accepted to AACL-IJCNLP 2020
链接:https://arxiv.org/abs/2010.02534

摘要:Typically, tokenization is the very first step in most text processing works. As a token serves as an atomic unit that embeds the contextual information of text, how to define a token plays a decisive role in the performance of a model.Even though Byte Pair Encoding (BPE) has been considered the de facto standard tokenization method due to its simplicity and universality, it still remains unclear whether BPE works best across all languages and tasks. In this paper, we test several tokenization strategies in order to answer our primary research question, that is, "What is the best tokenization strategy for Korean NLP tasks?" Experimental results demonstrate that a hybrid approach of morphological segmentation followed by BPE works best in Korean to/from English machine translation and natural language understanding tasks such as KorNLI, KorSTS, NSMC, and PAWS-X. As an exception, for KorQuAD, the Korean extension of SQuAD, BPE segmentation turns out to be the most effective.



[33]:Efficient Meta Lifelong-Learning with Limited Memory
标题:有限记忆的有效元终身学习
作者:Zirui Wang, Sanket Vaibhav Mehta, Barnabás Póczos, Jaime Carbonell
备注:Published as a main conference paper at EMNLP 2020
链接:https://arxiv.org/abs/2010.02500

摘要:Current natural language processing models work well on a single task, yet they often fail to continuously learn new tasks without forgetting previous ones as they are re-trained throughout their lifetime, a challenge known as lifelong learning. State-of-the-art lifelong language learning methods store past examples in episodic memory and replay them at both training and inference time. However, as we show later in our experiments, there are three significant impediments: (1) needing unrealistically large memory module to achieve good performance, (2) suffering from negative transfer, (3) requiring multiple local adaptation steps for each test example that significantly slows down the inference speed. In this paper, we identify three common principles of lifelong learning methods and propose an efficient meta-lifelong framework that combines them in a synergistic fashion. To achieve sample efficiency, our method trains the model in a manner that it learns a better initialization for local adaptation. Extensive experiments on text classification and question answering benchmarks demonstrate the effectiveness of our framework by achieving state-of-the-art performance using merely 1% memory size and narrowing the gap with multi-task learning. We further show that our method alleviates both catastrophic forgetting and negative transfer at the same time.



[34]:Joint Turn and Dialogue level User Satisfaction Estimation on  Multi-Domain Conversations
标题:多域会话中的联合转向和对话级用户满意度估计
作者:Praveen Kumar Bodigutla, Aditya Tiwari, Josep Vallas Vargas, Lazaros Polymenakos, Spyros Matsoukas
备注:Findings of EMNLP, 2020
链接:https://arxiv.org/abs/2010.02495

摘要:Dialogue level quality estimation is vital for optimizing data driven dialogue management. Current automated methods to estimate turn and dialogue level user satisfaction employ hand-crafted features and rely on complex annotation schemes, which reduce the generalizability of the trained models. We propose a novel user satisfaction estimation approach which minimizes an adaptive multi-task loss function in order to jointly predict turn-level Response Quality labels provided by experts and explicit dialogue-level ratings provided by end users. The proposed BiLSTM based deep neural net model automatically weighs each turn's contribution towards the estimated dialogue-level rating, implicitly encodes temporal dependencies, and removes the need to hand-craft features.
On dialogues sampled from 28 Alexa domains, two dialogue systems and three user groups, the joint dialogue-level satisfaction estimation model achieved up to an absolute 27% (0.43->0.70) and 7% (0.63->0.70) improvement in linear correlation performance over baseline deep neural net and benchmark Gradient boosting regression models, respectively.



[35]:Help! Need Advice on Identifying Advice
标题:救命啊!需要关于识别建议的建议吗
作者:Venkata Subrahmanyan Govindarajan, Benjamin T Chen, Rebecca Warholic, Katrin Erk, Junyi Jessy Li
备注:EMNLP 2020
链接:https://arxiv.org/abs/2010.02494

摘要:Humans use language to accomplish a wide variety of tasks - asking for and giving advice being one of them. In online advice forums, advice is mixed in with non-advice, like emotional support, and is sometimes stated explicitly, sometimes implicitly. Understanding the language of advice would equip systems with a better grasp of language pragmatics; practically, the ability to identify advice would drastically increase the efficiency of advice-seeking online, as well as advice-giving in natural language generation systems.
We present a dataset in English from two Reddit advice forums - r/AskParents and r/needadvice - annotated for whether sentences in posts contain advice or not. Our analysis reveals rich linguistic phenomena in advice discourse. We present preliminary models showing that while pre-trained language models are able to capture advice better than rule-based systems, advice identification is challenging, and we identify directions for future research.
Comments: To be presented at EMNLP 2020.



[36]:Dynamic Semantic Matching and Aggregation Network for Few-shot Intent  Detection
标题:动态语义匹配与聚合网络在少镜头意图检测中的应用
作者:Hoang Nguyen, Chenwei Zhang, Congying Xia, Philip S. Yu
备注:10 pages, 3 figures To appear in Findings of EMNLP 2020
链接:https://arxiv.org/abs/2010.02481

摘要:Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances. Although recent works demonstrate that multi-level matching plays an important role in transferring learned knowledge from seen training classes to novel testing classes, they rely on a static similarity measure and overly fine-grained matching components. These limitations inhibit generalizing capability towards Generalized Few-shot Learning settings where both seen and novel classes are co-existent. In this paper, we propose a novel Semantic Matching and Aggregation Network where semantic components are distilled from utterances via multi-head self-attention with additional dynamic regularization constraints. These semantic components capture high-level information, resulting in more effective matching between instances. Our multi-perspective matching method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances. We also propose a more challenging evaluation setting that considers classification on the joint all-class label space. Extensive experimental results demonstrate the effectiveness of our method. Our code and data are publicly available.



[37]:Pretrained Language Model Embryology: The Birth of ALBERT
标题:预训练语言模型胚胎学:阿尔伯特的诞生
作者:David C. Chiang, Sung-Feng Huang, Hung-yi Lee
备注:Accepted to EMNLP 2020, short paper
链接:https://arxiv.org/abs/2010.02480

摘要:While behaviors of pretrained language models (LMs) have been thoroughly examined, what happened during pretraining is rarely studied. We thus investigate the developmental process from a set of randomly initialized parameters to a totipotent language model, which we refer to as the embryology of a pretrained language model. Our results show that ALBERT learns to reconstruct and predict tokens of different parts of speech (POS) in different learning speeds during pretraining. We also find that linguistic knowledge and world knowledge do not generally improve as pretraining proceeds, nor do downstream tasks' performance. These findings suggest that knowledge of a pretrained model varies during pretraining, and having more pretrain steps does not necessarily provide a model with more comprehensive knowledge. We will provide source codes and pretrained models to reproduce our results atthis https URL.



[38]:Are Words Commensurate with Actions? Quantifying Commitment to a Cause  from Online Public Messaging
标题:言行一致吗?从网上公开信息中量化对某项事业的承诺
作者:Zhao Wang, Jennifer Cutler, Aron Culotta
备注:In IEEE International Conference on Data Mining (ICDM) Workshop on Data science for Human Performance in Social Networks, 2017
链接:https://arxiv.org/abs/2010.02466

摘要:Public entities such as companies and politicians increasingly use online social networks to communicate directly with their constituencies. Often, this public messaging is aimed at aligning the entity with a particular cause or issue, such as the environment or public health. However, as a consumer or voter, it can be difficult to assess an entity's true commitment to a cause based on public messaging. In this paper, we present a text classification approach to categorize a message according to its commitment level toward a cause. We then compare the volume of such messages with external ratings based on entities' actions (e.g., a politician's voting record with respect to the environment or a company's rating from environmental non-profits). We find that by distinguishing between low- and high- level commitment messages, we can more reliably identify truly committed entities. Furthermore, by measuring the discrepancy between classified messages and external ratings, we can identify entities whose public messaging does not align with their actions, thereby providing a methodology to identify potentially "inauthentic" messaging campaigns.



[39]:Modeling Preconditions in Text with a Crowd-sourced Dataset
标题:使用众包数据集对文本中的前提条件进行建模
作者:Heeyoung Kwon, Mahnaz Koupaee, Pratyush Singh, Gargi Sawhney, Anmol Shukla, Keerthi Kumar Kallur, Nathanael Chambers, Niranjan Balasubramanian
链接:https://arxiv.org/abs/2010.02429

摘要:Preconditions provide a form of logical connection between events that explains {why} some events occur together and information that is complementary to the more widely studied relations such as causation, temporal ordering, entailment, and discourse relations. Modeling preconditions in text has been hampered in part due to the lack of large scale labeled data grounded in text. This paper introduces PeKo, a crowd-sourced annotation of \emph{preconditions} between event pairs in newswire, an order of magnitude larger than prior text annotations. To complement this new corpus, we also introduce two challenge tasks aimed at modeling preconditions: (i) Precondition Identification -- a standard classification task defined over pairs of event mentions, and (ii) Precondition Generation -- a generative task aimed at testing a more general ability to reason about a given event. Evaluation on both tasks shows that modeling preconditions is challenging even for today's large language models (LM). This suggests that precondition knowledge is not easily accessible in LM-derived representations alone. Our generation results show that fine-tuning an LM on PeKo yields better conditional relations than when trained on raw text or temporally-ordered corpora.



[40]:UNQOVERing Stereotyping Biases via Underspecified Questions
标题:通过不明确的问题消除陈规定型偏见
作者:Tao Li, Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Vivek Srikumar
备注:Accepted at Findings of EMNLP 2020
链接:https://arxiv.org/abs/2010.02428

摘要:While language embeddings have been shown to have stereotyping biases, how these biases affect downstream question answering (QA) models remains unexplored. We present UNQOVER, a general framework to probe and quantify biases through underspecified questions. We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors: positional dependence and question independence. We design a formalism that isolates the aforementioned errors. As case studies, we use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion. We probe five transformer-based QA models trained on two QA datasets, along with their underlying language models. Our broad study reveals that (1) all these models, with and without fine-tuning, have notable stereotyping biases in these classes; (2) larger models often have higher bias; and (3) the effect of fine-tuning on bias varies strongly with the dataset and the model size.



[41]:Efficient One-Pass End-to-End Entity Linking for Questions
标题:高效的单通道端到端实体链接
作者:Belinda Z. Li, Sewon Min, Srinivasan Iyer, Yashar Mehdad, Wen-tau Yih
备注:9 pages, EMNLP 2020
链接:https://arxiv.org/abs/2010.02413

摘要:We present ELQ, a fast end-to-end entity linking model for questions, which uses a biencoder to jointly perform mention detection and linking in one pass. Evaluated on WebQSP and GraphQuestions with extended annotations that cover multiple entities per question, ELQ outperforms the previous state of the art by a large margin of +12.7% and +19.6% F1, respectively. With a very fast inference time (1.57 examples/s on a single CPU), ELQ can be useful for downstream question answering systems. In a proof-of-concept experiment, we demonstrate that using ELQ significantly improves the downstream QA performance of GraphRetriever (arXiv:1911.03868). Code and data available atthis https URL



[42]:Adversarial Grammatical Error Correction
标题:对抗性语法错误更正
作者:Vipul Raheja, Dimitrios Alikaniotis
备注:13 Pages, EMNLP 2020
链接:https://arxiv.org/abs/2010.02407

摘要:Recent works in Grammatical Error Correction (GEC) have leveraged the progress in Neural Machine Translation (NMT), to learn rewrites from parallel corpora of grammatically incorrect and corrected sentences, achieving state-of-the-art results. At the same time, Generative Adversarial Networks (GANs) have been successful in generating realistic texts across many different tasks by learning to directly minimize the difference between human-generated and synthetic text. In this work, we present an adversarial learning approach to GEC, using the generator-discriminator framework. The generator is a Transformer model, trained to produce grammatically correct sentences given grammatically incorrect ones. The discriminator is a sentence-pair classification model, trained to judge a given pair of grammatically incorrect-correct sentences on the quality of grammatical correction. We pre-train both the discriminator and the generator on parallel texts and then fine-tune them further using a policy gradient method that assigns high rewards to sentences which could be true corrections of the grammatically incorrect text. Experimental results on FCE, CoNLL-14, and BEA-19 datasets show that Adversarial-GEC can achieve competitive GEC quality compared to NMT-based baselines.



[43]:Simple and Effective Few-Shot Named Entity Recognition with Structured  Nearest Neighbor Learning
标题:基于结构化最近邻学习的简单有效的少镜头命名实体识别
作者:Yi Yang, Arzoo Katiyar
备注:Accepted by EMNLP 2020
链接:https://arxiv.org/abs/2010.02405

摘要:We present a simple few-shot named entity recognition (NER) system based on nearest neighbor learning and structured inference. Our system uses a supervised NER model trained on the source domain, as a feature extractor. Across several test domains, we show that a nearest neighbor classifier in this feature-space is far more effective than the standard meta-learning approaches. We further propose a cheap but effective method to capture the label dependencies between entity tags without expensive CRF training. We show that our method of combining structured decoding with nearest neighbor learning achieves state-of-the-art performance on standard few-shot NER evaluation tasks, improving F1 scores by $6\%$ to $16\%$ absolute points over prior meta-learning based systems.



[44]:Guiding Attention for Self-Supervised Learning with Transformers
标题:变压器自监督学习的引导注意
作者:Ameet Deshpande, Karthik Narasimhan
备注:Accepted to Findings of EMNLP, 2020
链接:https://arxiv.org/abs/2010.02399

摘要:In this paper, we propose a simple and effective technique to allow for efficient self-supervised learning with bi-directional Transformers. Our approach is motivated by recent studies demonstrating that self-attention patterns in trained models contain a majority of non-linguistic regularities. We propose a computationally efficient auxiliary loss function to guide attention heads to conform to such patterns. Our method is agnostic to the actual pre-training objective and results in faster convergence of models as well as better performance on downstream tasks compared to the baselines, achieving state of the art results in low-resource settings. Surprisingly, we also find that linguistic properties of attention heads are not necessarily correlated with language modeling performance.



[45]:Plan Optimization to Bilingual Dictionary Induction for Low-Resource  Language Families
标题:低资源语系双语词典导入方案优化
作者:Arbi Haza Nasution, Yohei Murakami, Toru Ishida
备注:29 pages, 16 figures, 9 tables, accepted for publication in ACM TALLIP
链接:https://arxiv.org/abs/2010.02396

摘要:Creating bilingual dictionary is the first crucial step in enriching low-resource languages. Especially for the closely-related ones, it has been shown that the constraint-based approach is useful for inducing bilingual lexicons from two bilingual dictionaries via the pivot language. However, if there are no available machine-readable dictionaries as input, we need to consider manual creation by bilingual native speakers. To reach a goal of comprehensively create multiple bilingual dictionaries, even if we already have several existing machine-readable bilingual dictionaries, it is still difficult to determine the execution order of the constraint-based approach to reducing the total cost. Plan optimization is crucial in composing the order of bilingual dictionaries creation with the consideration of the methods and their costs. We formalize the plan optimization for creating bilingual dictionaries by utilizing Markov Decision Process (MDP) with the goal to get a more accurate estimation of the most feasible optimal plan with the least total cost before fully implementing the constraint-based bilingual lexicon induction. We model a prior beta distribution of bilingual lexicon induction precision with language similarity and polysemy of the topology as $\alpha$ and $\beta$ parameters. It is further used to model cost function and state transition probability. We estimated the cost of all investment plan as a baseline for evaluating the proposed MDP-based approach with total cost as an evaluation metric. After utilizing the posterior beta distribution in the first batch of experiments to construct the prior beta distribution in the second batch of experiments, the result shows 61.5\% of cost reduction compared to the estimated all investment plan and 39.4\% of cost reduction compared to the estimated MDP optimal plan. The MDP-based proposal outperformed the baseline on the total cost.



[46]:A Generalized Constraint Approach to Bilingual Dictionary Induction for  Low-Resource Language Families
标题:低资源语族双语词典归纳的广义约束方法
作者:Arbi Haza Nasution, Yohei Murakami, Toru Ishida
备注:30 pages, 13 figures, 14 tables, published in ACM TALLIP
链接:https://arxiv.org/abs/2010.02395

摘要:The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction a difficult task for low-resource languages. The pivot language and cognate recognition approaches have been proven useful for inducing bilingual lexicons for such languages. We propose constraint-based bilingual lexicon induction for closely-related languages by extending constraints from the recent pivot-based induction technique and further enabling multiple symmetry assumption cycles to reach many more cognates in the transgraph. We further identify cognate synonyms to obtain many-to-many translation pairs. This paper utilizes four datasets: one Austronesian low-resource language and three Indo-European high-resource languages. We use three constraint-based methods from our previous work, the Inverse Consultation method and translation pairs generated from the Cartesian product of input dictionaries as baselines. We evaluate our result using the metrics of precision, recall and F-score. Our customizable approach allows the user to conduct cross-validation to predict the optimal hyperparameters (cognate threshold and cognate synonym threshold) with various combinations of heuristics and the number of symmetry assumption cycles to gain the highest F-score. Our proposed methods have statistically significant improvement of precision and F-score compared to our previous constraint-based methods. The results show that our method demonstrates the potential to complement other bilingual dictionary creation methods like word alignment models using parallel corpora for high-resource languages while well handling low-resource languages.



[47]:Mixup-Transfomer: Dynamic Data Augmentation for NLP Tasks
标题:混合变换器:NLP任务的动态数据扩充
作者:Lichao Sun, Congying Xia, Wenpeng Yin, Tingting Liang, Philip S. Yu, Lifang He
备注:Accepted by COLING 2020
链接:https://arxiv.org/abs/2010.02394

摘要:Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels. It has shown strong effectiveness in image classification by interpolating images at the pixel level. Inspired by this line of research, in this paper, we explore i) how to apply mixup to natural language processing tasks since text data can hardly be mixed in the raw format; ii) if mixup is still effective in transformer-based learning models, e.g., BERT. To achieve the goal, we incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks while keeping the whole end-to-end training system. We evaluate the proposed framework by running extensive experiments on the GLUE benchmark. Furthermore, we also examine the performance of mixup-transformer in low-resource scenarios by reducing the training data with a certain ratio. Our studies show that mixup is a domain-independent data augmentation technique to pre-trained language models, resulting in significant performance improvement for transformer-based models.



[48]:Interactive Fiction Game Playing as Multi-Paragraph Reading  Comprehension with Reinforcement Learning
标题:互动小说游戏作为强化学习的多段落阅读理解
作者:Xiaoxiao Guo, Mo Yu, Yupeng Gao, Chuang Gan, Murray Campbell, Shiyu Chang
备注:Accepted to EMNLP 2020
链接:https://arxiv.org/abs/2010.02386

摘要:Interactive Fiction (IF) games with real human-written natural language texts provide a new natural evaluation for language understanding techniques. In contrast to previous text games with mostly synthetic texts, IF games pose language understanding challenges on the human-written textual descriptions of diverse and sophisticated game worlds and language generation challenges on the action command generation from less restricted combinatorial space. We take a novel perspective of IF game solving and re-formulate it as Multi-Passage Reading Comprehension (MPRC) tasks. Our approaches utilize the context-query attention mechanisms and the structured prediction in MPRC to efficiently generate and evaluate action outputs and apply an object-centric historical observation retrieval strategy to mitigate the partial observability of the textual observations. Extensive experiments on the recent IF benchmark (Jericho) demonstrate clear advantages of our approaches achieving high winning rates and low data requirements compared to all previous approaches. Our source code is available at:this https URL.



[49]:Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent  Structure Learning
标题:理解龙头的机制:潜在结构学习的替代梯度
作者:Tsvetomila Mihaylova, Vlad Niculae, André F. T. Martins
备注:EMNLP 2020
链接:https://arxiv.org/abs/2010.02357

摘要:Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy to deal with this problem. We explore latent structure learning through the angle of pulling back the downstream learning objective. In this paradigm, we discover a principled motivation for both the straight-through estimator (STE) as well as the recently-proposed SPIGOT - a variant of STE for structured models. Our perspective leads to new algorithms in the same family. We empirically compare the known and the novel pulled-back estimators against the popular alternatives, yielding new insight for practitioners and revealing intriguing failure cases.



[50]:SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup
标题:SeqMix:通过序列混合增强主动序列标记
作者:Rongzhi Zhang, Yue Yu, Chao Zhang
备注:EMNLP 2020 Long Paper
链接:https://arxiv.org/abs/2010.02322

摘要:Active learning is an important technique for low-resource sequence labeling tasks. However, current active sequence labeling methods use the queried samples alone in each iteration, which is an inefficient way of leveraging human annotations. We propose a simple but effective data augmentation method to improve the label efficiency of active sequence labeling. Our method, SeqMix, simply augments the queried samples by generating extra labeled sequences in each iteration. The key difficulty is to generate plausible sequences along with token-level labels. In SeqMix, we address this challenge by performing mixup for both sequences and token-level labels of the queried samples. Furthermore, we design a discriminator during sequence mixup, which judges whether the generated sequences are plausible or not. Our experiments on Named Entity Recognition and Event Detection tasks show that SeqMix can improve the standard active sequence labeling method by $2.27\%$--$3.75\%$ in terms of $F_1$ scores. The code and data for SeqMix can be found atthis https URL



[51]:Conversational Document Prediction to Assist Customer Care Agents
标题:会话式文档预测以帮助客户服务代理
作者:Jatin Ganhotra, Haggai Roitman, Doron Cohen, Nathaniel Mills, Chulaka Gunasekara, Yosi Mass, Sachindra Joshi, Luis Lastras, David Konopnicki
备注:EMNLP 2020. The released Twitter dataset is available at:this https URL
链接:https://arxiv.org/abs/2010.02305

摘要:A frequent pattern in customer care conversations is the agents responding with appropriate webpage URLs that address users' needs. We study the task of predicting the documents that customer care agents can use to facilitate users' needs. We also introduce a new public dataset which supports the aforementioned problem. Using this dataset and two others, we investigate state-of-the art deep learning (DL) and information retrieval (IR) models for the task. Additionally, we analyze the practicality of such systems in terms of inference time complexity. Our show that an hybrid IR+DL approach provides the best of both worlds.



[52]:Semi-Supervised Speech-Language Joint Pre-Training for Spoken Language  Understanding
标题:半监督言语-语言联合预训练的口语理解
作者:Yu-An Chung, Chenguang Zhu, Michael Zeng
链接:https://arxiv.org/abs/2010.02295

摘要:Spoken language understanding (SLU) requires a model to analyze input acoustic signals to understand its linguistic content and make predictions. To boost the models' performance, various pre-training methods have been proposed to utilize large-scale unlabeled text and speech data. However, the inherent disparities between the two modalities necessitate a mutual analysis. In this paper, we propose a novel semi-supervised learning method, AlignNet, to jointly pre-train the speech and language modules. Besides a self-supervised masked language modeling of the two individual modules, AlignNet aligns representations from paired speech and transcripts in a shared latent semantic space. Thus, during fine-tuning, the speech module alone can produce representations carrying both acoustic information and contextual semantic knowledge. Experimental results verify the effectiveness of our approach on various SLU tasks. For example, AlignNet improves the previous state-of-the-art accuracy on the Spoken SQuAD dataset by 6.2%.



[53]:Effects of Naturalistic Variation in Goal-Oriented Dialog
标题:目标导向对话中自然变异的影响
作者:Jatin Ganhotra, Robert Moore, Sachindra Joshi, Kahini Wadhawan
备注:Findings of EMNLP 2020. The updated datasets are available at:this https URL
链接:https://arxiv.org/abs/2010.02260

摘要:Existing benchmarks used to evaluate the performance of end-to-end neural dialog systems lack a key component: natural variation present in human conversations. Most datasets are constructed through crowdsourcing, where the crowd workers follow a fixed template of instructions while enacting the role of a user/agent. This results in straight-forward, somewhat routine, and mostly trouble-free conversations, as crowd workers do not think to represent the full range of actions that occur naturally with real users. In this work, we investigate the impact of naturalistic variation on two goal-oriented datasets: bAbI dialog task and Stanford Multi-Domain Dataset (SMD). We also propose new and more effective testbeds for both datasets, by introducing naturalistic variation by the user. We observe that there is a significant drop in performance (more than 60% in Ent. F1 on SMD and 85% in per-dialog accuracy on bAbI task) of recent state-of-the-art end-to-end neural methods such as BossNet and GLMP on both datasets.



[54]:An Ensemble Approach to Automatic Structuring of Radiology Reports
标题:放射科报告自动组织的集成方法
作者:Morteza Pourreza Shahri, Amir Tahmasebi, Bingyang Ye, Henghui Zhu, Javed Aslam, Timothy Ferris
备注:Accepted by the 3rd Workshop on Clinical NLP at EMNLP 2020
链接:https://arxiv.org/abs/2010.02256

摘要:Automatic structuring of electronic medical records is of high demand for clinical workflow solutions to facilitate extraction, storage, and querying of patient care information. However, developing a scalable solution is extremely challenging, specifically for radiology reports, as most healthcare institutes use either no template or department/institute specific templates. Moreover, radiologists' reporting style varies from one to another as sentences are telegraphic and do not follow general English grammar rules. We present an ensemble method that consolidates the predictions of three models, capturing various attributes of textual information for automatic labeling of sentences with section labels. These three models are: 1) Focus Sentence model, capturing context of the target sentence; 2) Surrounding Context model, capturing the neighboring context of the target sentence; and finally, 3) Formatting/Layout model, aimed at learning report formatting cues. We utilize Bi-directional LSTMs, followed by sentence encoders, to acquire the context. Furthermore, we define several features that incorporate the structure of reports. We compare our proposed approach against multiple baselines and state-of-the-art approaches on a proprietary dataset as well as 100 manually annotated radiology notes from the MIMIC-III dataset, which we are making publicly available. Our proposed approach significantly outperforms other approaches by achieving 97.1% accuracy.



[55]:Learning to Generalize for Sequential Decision Making
标题:学习归纳以进行顺序决策
作者:Xusen Yin, Ralph Weischedel, Jonathan May
备注:Findings of EMNLP2020, 18 pages
链接:https://arxiv.org/abs/2010.02229

摘要:We consider problems of making sequences of decisions to accomplish tasks, interacting via the medium of language. These problems are often tackled with reinforcement learning approaches. We find that these models do not generalize well when applied to novel task domains. However, the large amount of computation necessary to adequately train and explore the search space of sequential decision making, under a reinforcement learning paradigm, precludes the inclusion of large contextualized language models, which might otherwise enable the desired generalization ability. We introduce a teacher-student imitation learning methodology and a means of converting a reinforcement learning model into a natural language understanding model. Together, these methodologies enable the introduction of contextualized language models into the sequential decision making problem space. We show that models can learn faster and generalize more, leveraging both the imitation learning and the reformulation. Our models exceed teacher performance on various held-out decision problems, by up to 7% on in-domain problems and 24% on out-of-domain problems.



[56]:Unsupervised Hierarchical Concept Learning
标题:无监督层次概念学习
作者:Sumegh Roychowdhury, Sumedh A. Sontakke, Nikaash Puri, Mausoom Sarkar, Milan Aggarwal, Pinkesh Badjatiya, Balaji Krishnamurthy, Laurent Itti
链接:https://arxiv.org/abs/2010.02556

摘要:Discovering concepts (or temporal abstractions) in an unsupervised manner from demonstration data in the absence of an environment is an important problem. Organizing these discovered concepts hierarchically at different levels of abstraction is useful in discovering patterns, building ontologies, and generating tutorials from demonstration data. However, recent work to discover such concepts without access to any environment does not discover relationships (or a hierarchy) between these discovered concepts. In this paper, we present a Transformer-based concept abstraction architecture UNHCLE (pronounced uncle) that extracts a hierarchy of concepts in an unsupervised way from demonstration data. We empirically demonstrate how UNHCLE discovers meaningful hierarchies using datasets from Chess and Cooking domains. Finally, we show how UNHCLE learns meaningful language labels for concepts by using demonstration data augmented with natural language for cooking and chess. All of our code is available atthis https URL



[57]:A Unified Deep Learning Framework for Short-Duration Speaker  Verification in Adverse Environments
标题:一个统一的深度学习框架在恶劣环境下进行短时说话人验证
作者:Youngmoon Jung, Yeunju Choi, Hyungjun Lim, Hoirin Kim
备注:19 pages, 10 figures, 13 tables
链接:https://arxiv.org/abs/2010.02477

摘要:Speaker verification (SV) has recently attracted considerable research interest due to the growing popularity of virtual assistants. At the same time, there is an increasing requirement for an SV system: it should be robust to short speech segments, especially in noisy and reverberant environments. In this paper, we consider one more important requirement for practical applications: the system should be robust to an audio stream containing long non-speech segments, where a voice activity detection (VAD) is not applied. To meet these two requirements, we introduce feature pyramid module (FPM)-based multi-scale aggregation (MSA) and self-adaptive soft VAD (SAS-VAD). We present the FPM-based MSA to deal with short speech segments in noisy and reverberant environments. Also, we use the SAS-VAD to increase the robustness to long non-speech segments. To further improve the robustness to acoustic distortions (i.e., noise and reverberation), we apply a masking-based speech enhancement (SE) method. We combine SV, VAD, and SE models in a unified deep learning framework and jointly train the entire network in an end-to-end manner. To the best of our knowledge, this is the first work combining these three models in a deep learning framework. We conduct experiments on Korean indoor (KID) and VoxCeleb datasets, which are corrupted by noise and reverberation. The results show that the proposed method is effective for SV in the challenging conditions and performs better than the baseline i-vector and deep speaker embedding systems.



[58]:Learning Visual-Semantic Embeddings for Reporting Abnormal Findings on  Chest X-rays
标题:学习视觉语义嵌入在胸部X线异常报告中的应用
作者:Jianmo Ni, Chun-Nan Hsu, Amilcare Gentili, Julian McAuley
备注:7 pages, 2 figures, to be published in Findings of EMNLP 2020
链接:https://arxiv.org/abs/2010.02467

摘要:Automatic medical image report generation has drawn growing attention due to its potential to alleviate radiologists' workload. Existing work on report generation often trains encoder-decoder networks to generate complete reports. However, such models are affected by data bias (e.g.~label imbalance) and face common issues inherent in text generation models (e.g.~repetition). In this work, we focus on reporting abnormal findings on radiology images; instead of training on complete radiology reports, we propose a method to identify abnormal findings from the reports in addition to grouping them with unsupervised clustering and minimal rules. We formulate the task as cross-modal retrieval and propose Conditional Visual-Semantic Embeddings to align images and fine-grained abnormal findings in a joint embedding space. We demonstrate that our method is able to retrieve abnormal findings and outperforms existing generation models on both clinical correctness and text generation metrics.



[59]:ERFit: Entropic Regression Fit Matlab Package, for Data-Driven System  Identification of Underlying Dynamic Equations
标题:ERFit:熵回归拟合Matlab软件包,用于数据驱动系统辨识底层动力学方程
作者:Abd AlRahman AlMomani, Erik Bollt
备注:7 pages, 2 figures
链接:https://arxiv.org/abs/2010.02411

摘要:Data-driven sparse system identification becomes the general framework for a wide range of problems in science and engineering. It is a problem of growing importance in applied machine learning and artificial intelligence algorithms. In this work, we developed the Entropic Regression Software Package (ERFit), a MATLAB package for sparse system identification using the entropic regression method. The code requires minimal supervision, with a wide range of options that make it adapt easily to different problems in science and engineering. The ERFit is available atthis https URL


点赞
收藏
表情
图片
附件