自然语言处理方向-LingLab

自然语言处理方向

2866 阅读 2020-10-18 22:14:02 上传

以下文章来源于理论语言学与古汉语

今日 cs.CL方向共计124篇文章。

知识图谱(2篇)

[1]：On the Complementary Nature of Knowledge Graph Embedding, Fine Grain Entity Types, and Language Modeling
标题：知识图嵌入、细粒度实体类型与语言建模的互补性
作者：Rajat Patel, Francis Ferraro
备注：To appear at the EMNLP 2020 Workshop on Deep Learning Inside Out
链接：https://arxiv.org/abs/2010.05732

摘要：We demonstrate the complementary natures of neural knowledge graph embedding, fine-grain entity type prediction, and neural language modeling. We show that a language model-inspired knowledge graph embedding approach yields both improved knowledge graph embeddings and fine-grain entity type representations. Our work also shows that jointly modeling both structured knowledge tuples and language improves both.

[2]：RatE: Relation-Adaptive Translating Embedding for Knowledge Graph Completion
标题：基于关系自适应转换嵌入的知识图完成率
作者：Hao Huang, Guodong Long, Tao Shen, Jing Jiang, Chengqi Zhang
备注：Accepted to appear at COLING 2020
链接：https://arxiv.org/abs/2010.04863

摘要：Many graph embedding approaches have been proposed for knowledge graph completion via link prediction. Among those, translating embedding approaches enjoy the advantages of light-weight structure, high efficiency and great interpretability. Especially when extended to complex vector space, they show the capability in handling various relation patterns including symmetry, antisymmetry, inversion and composition. However, previous translating embedding approaches defined in complex vector space suffer from two main issues: 1) representing and modeling capacities of the model are limited by the translation function with rigorous multiplication of two complex numbers; and 2) embedding ambiguity caused by one-to-many relations is not explicitly alleviated. In this paper, we propose a relation-adaptive translation function built upon a novel weighted product in complex space, where the weights are learnable, relation-specific and independent to embedding size. The translation function only requires eight more scalar parameters each relation, but improves expressive power and alleviates embedding ambiguity problem. Based on the function, we then present our Relation-adaptive translating Embedding (RatE) approach to score each graph triple. Moreover, a novel negative sampling method is proposed to utilize both prior knowledge and self-adversarial learning for effective optimization. Experiments verify RatE achieves state-of-the-art performance on four link prediction benchmarks.

文本朗读(1篇)

[1]：Improving Low Resource Code-switched ASR using Augmented Code-switched TTS
标题：用增广码交换TTS改进低资源码交换ASR
作者：Yash Sharma, Basil Abraham, Karan Taneja, Preethi Jyothi
备注：Interspeech 2020, 5 pages
链接：https://arxiv.org/abs/2010.05549

摘要：Building Automatic Speech Recognition (ASR) systems for code-switched speech has recently gained renewed attention due to the widespread use of speech technologies in multilingual communities worldwide. End-to-end ASR systems are a natural modeling choice due to their ease of use and superior performance in monolingual settings. However, it is well known that end-to-end systems require large amounts of labeled speech. In this work, we investigate improving code-switched ASR in low resource settings via data augmentation using code-switched text-to-speech (TTS) synthesis. We propose two targeted techniques to effectively leverage TTS speech samples: 1) Mixup, an existing technique to create new training samples via linear interpolation of existing samples, applied to TTS and real speech samples, and 2) a new loss function, used in conjunction with TTS samples, to encourage code-switched predictions. We report significant improvements in ASR performance achieving absolute word error rate (WER) reductions of up to 5%, and measurable improvement in code switching using our proposed techniques on a Hindi-English code-switched ASR task.

推理分析(4篇)

[1]：Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning
标题：回到未来：基于反事实和诱拐常识推理的无监督反命题解码
作者：Lianhui Qin, Vered Shwartz, Peter West, Chandra Bhagavatula, Jena Hwang, Ronan Le Bras, Antoine Bosselut, Yejin Choi
备注：EMNLP 2020
链接：https://arxiv.org/abs/2010.05906

摘要：Abductive and counterfactual reasoning, core abilities of everyday human cognition, require reasoning about what might have happened at time t, while conditioning on multiple contexts from the relative past and future. However, simultaneous incorporation of past and future contexts using generative language models (LMs) can be challenging, as they are trained either to condition only on the past context or to perform narrowly scoped text-infilling. In this paper, we propose DeLorean, a new unsupervised decoding algorithm that can flexibly incorporate both the past and future contexts using only off-the-shelf, left-to-right language models and no supervision. The key intuition of our algorithm is incorporating the future through back-propagation, during which, we only update the internal representation of the output while fixing the model parameters. By alternating between forward and backward propagation, DeLorean can decode the output representation that reflects both the left and right contexts. We demonstrate that our approach is general and applicable to two nonmonotonic reasoning tasks: abductive text generation and counterfactual story revision, where DeLorean outperforms a range of unsupervised and some supervised methods, based on automatic and human evaluation.

[2]：Social Commonsense Reasoning with Multi-Head Knowledge Attention
标题：多头知识关注下的社会常识推理
作者：Debjit Paul, Anette Frank
备注：Findings of EMNLP 2020
链接：https://arxiv.org/abs/2010.05587

摘要：Social Commonsense Reasoning requires understanding of text, knowledge about social events and their pragmatic implications, as well as commonsense reasoning skills. In this work we propose a novel multi-head knowledge attention model that encodes semi-structured commonsense inference rules and learns to incorporate them in a transformer-based reasoning cell. We assess the model's performance on two tasks that require different reasoning skills: Abductive Natural Language Inference and Counterfactual Invariance Prediction as a new task. We show that our proposed model improves performance over strong state-of-the-art models (i.e., RoBERTa) across both reasoning tasks. Notably we are, to the best of our knowledge, the first to demonstrate that a model that learns to perform counterfactual reasoning helps predicting the best explanation in an abductive reasoning task. We validate the robustness of the model's reasoning capabilities by perturbing the knowledge and provide qualitative analysis on the model's knowledge incorporation capabilities.

[3]：OCNLI: Original Chinese Natural Language Inference
标题：原始汉语自然语言推理
作者：Hai Hu, Kyle Richardson, Liang Xu, Lu Li, Sandra Kuebler, Lawrence S. Moss
备注：Findings of EMNLP 2020
链接：https://arxiv.org/abs/2010.05444

摘要：Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g., SNLI, MNLI) and advances in modeling, most progress has been limited to English due to a lack of reliable datasets for most of the world's languages. In this paper, we present the first large-scale NLI dataset (consisting of ~56,000 annotated sentence pairs) for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI). Unlike recent attempts at extending NLI to other languages, our dataset does not rely on any automatic translation or non-expert annotation. Instead, we elicit annotations from native speakers specializing in linguistics. We follow closely the annotation protocol used for MNLI, but create new strategies for eliciting diverse hypotheses. We establish several baseline results on our dataset using state-of-the-art pre-trained models for Chinese, and find even the best performing models to be far outpaced by human performance (~12% absolute performance gap), making it a challenging new resource that we hope will help to accelerate progress in Chinese NLU. To the best of our knowledge, this is the first human-elicited MNLI-style corpus for a non-English language.

[4]：Beyond Language: Learning Commonsense from Images for Reasoning
标题：超越语言：从图像中学习常识进行推理
作者：Wanqing Cui, Yanyan Lan, Liang Pang, Jiafeng Guo, Xueqi Cheng
备注：Accepted to EMNLP'20 Findings
链接：https://arxiv.org/abs/2010.05001

摘要：This paper proposes a novel approach to learn commonsense from images, instead of limited raw texts or costly constructed knowledge bases, for the commonsense reasoning problem in NLP. Our motivation comes from the fact that an image is worth a thousand words, where richer scene information could be leveraged to help distill the commonsense knowledge, which is often hidden in languages. Our approach, namely Loire, consists of two stages. In the first stage, a bi-modal sequence-to-sequence approach is utilized to conduct the scene layout generation task, based on a text representation model ViBERT. In this way, the required visual scene knowledge, such as spatial relations, will be encoded in ViBERT by the supervised learning process with some bi-modal data like COCO. Then ViBERT is concatenated with a pre-trained language model to perform the downstream commonsense reasoning tasks. Experimental results on two commonsense reasoning problems, i.e. commonsense question answering and pronoun resolution, demonstrate that Loire outperforms traditional language-based methods. We also give some case studies to show what knowledge is learned from images and explain how the generated scene layout helps the commonsense reasoning process.

自然语言生成(10篇)

[1]：Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data
标题：控制幻觉：学习从嘈杂的数据中忠实地生成
作者：Katja Filippova
链接：https://arxiv.org/abs/2010.05873

摘要：Neural text generation (data- or text-to-text) demonstrates remarkable performance when training data is abundant which for many applications is not the case. To collect a large corpus of parallel data, heuristic rules are often used but they inevitably let noise into the data, such as phrases in the output which cannot be explained by the input. Consequently, models pick up on the noise and may hallucinate--generate fluent but unsupported text. Our contribution is a simple but powerful technique to treat such hallucinations as a controllable aspect of the generated text, without dismissing any input and without modifying the model architecture. On the WikiBio corpus (Lebret et al., 2016), a particularly noisy dataset, we demonstrate the efficacy of the technique both in an automatic and in a human evaluation.

[2]：Reformulating Unsupervised Style Transfer as Paraphrase Generation
标题：将无监督风格转换转化为释义生成
作者：Kalpesh Krishna, John Wieting, Mohit Iyyer
备注：EMNLP 2020 camera-ready (26 pages)
链接：https://arxiv.org/abs/2010.05700

摘要：Modern NLP defines the task of style transfer as modifying the style of a given sentence without appreciably changing its semantics, which implies that the outputs of style transfer systems should be paraphrases of their inputs. However, many existing systems purportedly designed for style transfer inherently warp the input's meaning through attribute transfer, which changes semantic properties such as sentiment. In this paper, we reformulate unsupervised style transfer as a paraphrase generation problem, and present a simple methodology based on fine-tuning pretrained language models on automatically generated paraphrase data. Despite its simplicity, our method significantly outperforms state-of-the-art style transfer systems on both human and automatic evaluations. We also survey 23 style transfer papers and discover that existing automatic metrics can be easily gamed and propose fixed variants. Finally, we pivot to a more real-world style transfer setting by collecting a large dataset of 15M sentences in 11 diverse styles, which we use for an in-depth analysis of our system.

[3]：Meta-Context Transformers for Domain-Specific Response Generation
标题：用于域特定响应生成的元上下文转换器
作者：Debanjana Kar, Suranjana Samanta, Amar Prakash Azad
备注：7+2 pages, 6 figures, 4 tables
链接：https://arxiv.org/abs/2010.05572

摘要：Despite the tremendous success of neural dialogue models in recent years, it suffers a lack of relevance, diversity, and some times coherence in generated responses. Lately, transformer-based models, such as GPT-2, have revolutionized the landscape of dialogue generation by capturing the long-range structures through language modeling. Though these models have exhibited excellent language coherence, they often lack relevance and terms when used for domain-specific response generation. In this paper, we present DSRNet (Domain Specific Response Network), a transformer-based model for dialogue response generation by reinforcing domain-specific attributes. In particular, we extract meta attributes from context and infuse them with the context utterances for better attention over domain-specific key terms and relevance. We study the use of DSRNet in a multi-turn multi-interlocutor environment for domain-specific response generation. In our experiments, we evaluate DSRNet on Ubuntu dialogue datasets, which are mainly composed of various technical domain related dialogues for IT domain issue resolutions and also on CamRest676 dataset, which contains restaurant domain conversations. Trained with maximum likelihood objective, our model shows significant improvement over the state-of-the-art for multi-turn dialogue systems supported by better BLEU and semantic similarity (BertScore) scores. Besides, we also observe that the responses produced by our model carry higher relevance due to the presence of domain-specific key attributes that exhibit better overlap with the attributes of the context. Our analysis shows that the performance improvement is mostly due to the infusion of key terms along with dialogues which result in better attention over domain-relevant terms. Other contributing factors include joint modeling of dialogue context with the domain-specific meta attributes and topics.

[4]：Toward Cross-Lingual Definition Generation for Language Learners
标题：面向语言学习者的跨语言定义生成
作者：Cunliang Kong, Liner Yang, Tianzuo Zhang, Qinan Fan, Zhenghao Liu, Yun Chen, Erhong Yang
链接：https://arxiv.org/abs/2010.05533

摘要：Generating dictionary definitions automatically can prove useful for language learners. However, it's still a challenging task of cross-lingual definition generation. In this work, we propose to generate definitions in English for words in various languages. To achieve this, we present a simple yet effective approach based on publicly available pretrained language models. In this approach, models can be directly applied to other languages after trained on the English dataset. We demonstrate the effectiveness of this approach on zero-shot definition generation. Experiments and manual analyses on newly constructed datasets show that our models have a strong cross-lingual transfer ability and can generate fluent English definitions for Chinese words. We further measure the lexical complexity of generated and reference definitions. The results show that the generated definitions are much simpler, which is more suitable for language learners.

[5]：Evaluating Factuality in Generation with Dependency-level Entailment
标题：用依赖级蕴涵评价世代真实性
作者：Tanya Goyal, Greg Durrett
备注：Findings of Emnlp 2020
链接：https://arxiv.org/abs/2010.05478

摘要：Despite significant progress in text generation models, a serious limitation is their tendency to produce text that is factually inconsistent with information in the input. Recent work has studied whether textual entailment systems can be used to identify factual errors; however, these sentence-level entailment models are trained to solve a different problem than generation filtering and they do not localize which part of a generation is non-factual. In this paper, we propose a new formulation of entailment that decomposes it at the level of dependency arcs. Rather than focusing on aggregate decisions, we instead ask whether the semantic relationship manifested by individual dependency arcs in the generated output is supported by the input. Human judgments on this task are difficult to obtain; we therefore propose a method to automatically create data based on existing entailment or paraphrase corpora. Experiments show that our dependency arc entailment model trained on this data can identify factual inconsistencies in paraphrasing and summarization better than sentence-level methods or those based on question generation, while additionally localizing the erroneous parts of the generation.

[6]：VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles
标题：VMSMO：学习为基于视频的新闻文章生成多模式摘要
作者：Mingzhe Li, Xiuying Chen, Shen Gao, Zhangming Chan, Dongyan Zhao, Rui Yan
备注：Accepted by The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)
链接：https://arxiv.org/abs/2010.05406

摘要：A popular multimedia news format nowadays is providing users with a lively video and a corresponding news article, which is employed by influential news media including CNN, BBC, and social media including Twitter and Weibo. In such a case, automatically choosing a proper cover frame of the video and generating an appropriate textual summary of the article can help editors save time, and readers make the decision more effectively. Hence, in this paper, we propose the task of Video-based Multimodal Summarization with Multimodal Output (VMSMO) to tackle such a problem. The main challenge in this task is to jointly model the temporal dependency of video with semantic meaning of article. To this end, we propose a Dual-Interaction-based Multimodal Summarizer (DIMS), consisting of a dual interaction module and multimodal generator. In the dual interaction module, we propose a conditional self-attention mechanism that captures local semantic information within video and a global-attention mechanism that handles the semantic relationship between news text and video from a high level. Extensive experiments conducted on a large-scale real-world VMSMO dataset show that DIMS achieves the state-of-the-art performance in terms of both automatic metrics and human evaluations.

[7]：A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies
标题：基于BERT的多任务负回答训练分心器生成方案
作者：Ho-Lam Chung, Ying-Hong Chan, Yao-Chung Fan
备注：Accepted by EMNLP2020 Findings
链接：https://arxiv.org/abs/2010.05384

摘要：In this paper, we investigate the following two limitations for the existing distractor generation (DG) methods. First, the quality of the existing DG methods are still far from practical use. There is still room for DG quality improvement. Second, the existing DG designs are mainly for single distractor generation. However, for practical MCQ preparation, multiple distractors are desired. Aiming at these goals, in this paper, we present a new distractor generation scheme with multi-tasking and negative answer training strategies for effectively generating \textit{multiple} distractors. The experimental results show that (1) our model advances the state-of-the-art result from 28.65 to 39.81 (BLEU 1 score) and (2) the generated multiple distractors are diverse and show strong distracting power for multiple choice question.

[8]：Controllable Multi-Character Psychology-Oriented Story Generation
标题：可控多人物心理导向的故事生成
作者：Feifei Xu, Xinpeng Wang, Yunpu Ma, Volker Tresp, Yuyi Wang, Shanlin Zhou, Haizhou Du
备注：Accepted by CIKM2020
链接：https://arxiv.org/abs/2010.05230

摘要：Story generation, which aims to generate a long and coherent story automatically based on the title or an input sentence, is an important research area in the field of natural language generation. There is relatively little work on story generation with appointed emotions. Most existing works focus on using only one specific emotion to control the generation of a whole story and ignore the emotional changes in the characters in the course of the story. In our work, we aim to design an emotional line for each character that considers multiple emotions common in psychological theories, with the goal of generating stories with richer emotional changes in the characters. To the best of our knowledge, this work is first to focuses on characters' emotional lines in story generation. We present a novel model-based attention mechanism that we call SoCP (Storytelling of multi-Character Psychology). We show that the proposed model can generate stories considering the changes in the psychological state of different characters. To take into account the particularity of the model, in addition to commonly used evaluation indicators(BLEU, ROUGE, etc.), we introduce the accuracy rate of psychological state control as a novel evaluation metric. The new indicator reflects the effect of the model on the psychological state control of story characters. Experiments show that with SoCP, the generated stories follow the psychological state for each character according to both automatic and human evaluations.

[9]：Cue-word Driven Neural Response Generation with a Shrinking Vocabulary
标题：线索词驱动的词汇量缩减的神经反应生成
作者：Qiansheng Wang, Yuxin Liu, Chengguo Lv, Zhen Wang, Guohong Fu
链接：https://arxiv.org/abs/2010.04927

摘要：Open-domain response generation is the task of generating sensible and informative re-sponses to the source sentence. However, neural models tend to generate safe and mean-ingless responses. While cue-word introducing approaches encourage responses with concrete semantics and have shown tremendous potential, they still fail to explore di-verse responses during decoding. In this paper, we propose a novel but natural approach that can produce multiple cue-words during decoding, and then uses the produced cue-words to drive decoding and shrinks the decoding vocabulary. Thus the neural genera-tion model can explore the full space of responses and discover informative ones with efficiency. Experimental results show that our approach significantly outperforms several strong baseline models with much lower decoding complexity. Especially, our approach can converge to concrete semantics more efficiently during decoding.

[10]：Data Agnostic RoBERTa-based Natural Language to SQL Query Generation
标题：基于数据无关RoBERTa的自然语言到SQL查询生成
作者：Debaditya Pal, Harsh Sharma, Kaustubh Chaudhari
备注：8 Pages, 2 figures. Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing
链接：https://arxiv.org/abs/2010.05243

摘要：Relational databases are among the most widely used architectures to store massive amounts of data in the modern world. However, there is a barrier between these databases and the average user. The user often lacks the knowledge of a query language such as SQL required to interact with the database. The NL2SQL task aims at finding deep learning approaches to solve this problem by converting natural language questions into valid SQL queries. Given the sensitive nature of some databases and the growing need for data privacy, we have presented an approach with data privacy at its core. We have passed RoBERTa embeddings and data-agnostic knowledge vectors into LSTM based submodels to predict the final query. Although we have not achieved state of the art results, we have eliminated the need for the table data, right from the training of the model, and have achieved a test set execution accuracy of 76.7%. By eliminating the table data dependency while training we have created a model capable of zero shot learning based on the natural language question and table schema alone.

文本分类(2篇)

[1]：Adversarial Self-Supervised Data-Free Distillation for Text Classification
标题：文本分类的对抗性自监督无数据蒸馏
作者：Xinyin Ma, Yongliang Shen, Gongfan Fang, Chen Chen, Chenghao Jia, Weiming Lu
备注：11 pages, 5 figures, Accepted to EMNLP2020
链接：https://arxiv.org/abs/2010.04883

摘要：Large pre-trained transformer-based language models have achieved impressive results on a wide range of NLP tasks. In the past few years, Knowledge Distillation(KD) has become a popular paradigm to compress a computationally expensive model to a resource-efficient lightweight model. However, most KD algorithms, especially in NLP, rely on the accessibility of the original training dataset, which may be unavailable due to privacy issues. To tackle this problem, we propose a novel two-stage data-free distillation method, named Adversarial self-Supervised Data-Free Distillation (AS-DFD), which is designed for compressing large-scale transformer-based models (e.g., BERT). To avoid text generation in discrete space, we introduce a Plug & Play Embedding Guessing method to craft pseudo embeddings from the teacher's hidden knowledge. Meanwhile, with a self-supervised module to quantify the student's ability, we adapt the difficulty of pseudo embeddings in an adversarial training manner. To the best of our knowledge, our framework is the first data-free distillation framework designed for NLP tasks. We verify the effectiveness of our method on several text classification datasets.

[2]：End to End Binarized Neural Networks for Text Classification
标题：用于文本分类的端到端二值化神经网络
作者：Harshil Jain, Akshat Agarwal, Kumar Shridhar, Denis Kleyko
备注：14 pages. Accepted at the SustaiNLP Workshop on Simple and Efficient Natural Language Processing at EMNLP 2020
链接：https://arxiv.org/abs/2010.05223

摘要：Deep neural networks have demonstrated their superior performance in almost every Natural Language Processing task, however, their increasing complexity raises concerns. In particular, these networks require high expenses on computational hardware, and training budget is a concern for many. Even for a trained network, the inference phase can be too demanding for resource-constrained devices, thus limiting its applicability. The state-of-the-art transformer models are a vivid example. Simplifying the computations performed by a network is one way of relaxing the complexity requirements. In this paper, we propose an end to end binarized neural network architecture for the intent classification task. In order to fully utilize the potential of end to end binarization, both input representations (vector embeddings of tokens statistics) and the classifier are binarized. We demonstrate the efficiency of such architecture on the intent classification of short texts over three datasets and for text classification with a larger dataset. The proposed architecture achieves comparable to the state-of-the-art results on standard intent classification datasets while utilizing ~ 20-40% lesser memory and training time. Furthermore, the individual components of the architecture, such as binarized vector embeddings of documents or binarized classifiers, can be used separately with not necessarily fully binary architectures.

信息抽取(7篇)

[1]：Feature Extraction of Text for Deep Learning Algorithms: Application on Fake News Dectection
标题：面向深度学习的文本特征提取算法在虚假新闻检测中的应用
作者：HyeonJun Kim
备注：8 pages
链接：https://arxiv.org/abs/2010.05496

摘要：Feature extraction is important process of machine learning and even deep learning, as the process make algorithms function more efficiently, and also accurate. In natural language processing used in deception detection such as fake news detection, several ways of feature extraction in statistical aspect had been introduced (e.g. N-gram). In this research, it will be shown that by using deep learning algorithms and alphabet frequencies of the original text of a news without any information about the sequence of the alphabet can actually be used to classify fake news and trustworthy ones in high accuracy (85%). As this pre-processing method make the data notably compact but also include the feature that is needed for the classifier, it seems that alphabet frequencies contains some useful features for understanding complex context or meaning of the original text.

[2]：InfoMiner at WNUT-2020 Task 2: Transformer-based Covid-19 Informative Tweet Extraction
标题：WNUT-2020任务2：基于变压器的Covid-19信息性Tweet提取
作者：Hansi Hettiarachchi, Tharindu Ranasinghe
备注：Accepted to the 6th Workshop on Noisy User-generated Text (W-NUT) at EMNLP 2020
链接：https://arxiv.org/abs/2010.05327

摘要：Identifying informative tweets is an important step when building information extraction systems based on social media. WNUT-2020 Task 2 was organised to recognise informative tweets from noise tweets. In this paper, we present our approach to tackle the task objective using transformers. Overall, our approach achieves 10th place in the final rankings scoring 0.9004 F1 score for the test set.

[3]：Weakly Supervised Medication Regimen Extraction from Medical Conversations
标题：从医学对话中提取弱监督的用药方案
作者：Dhruvesh Patel, Sandeep Konam, Sai P. Selvaraj
备注：To appear in the Proceedings of the Clinical Natural Language Processing Workshop, EMNLP, 2020
链接：https://arxiv.org/abs/2010.05317

摘要：Automated Medication Regimen (MR) extraction from medical conversations can not only improve recall and help patients follow through with their care plan, but also reduce the documentation burden for doctors. In this paper, we focus on extracting spans for frequency, route and change, corresponding to medications discussed in the conversation. We first describe a unique dataset of annotated doctor-patient conversations and then present a weakly supervised model architecture that can perform span extraction using noisy classification data. The model utilizes an attention bottleneck inside a classification model to perform the extraction. We experiment with several variants of attention scoring and projection functions and propose a novel transformer-based attention scoring function (TAScore). The proposed combination of TAScore and Fusedmax projection achieves a 10 point increase in Longest Common Substring F1 compared to the baseline of additive scoring plus softmax projection.

[4]：Hierarchical Evidence Set Modeling for Automated Fact Extraction and Verification
标题：面向自动事实提取与验证的层次证据集建模
作者：Shyam Subramanian, Kyumin Lee
备注：12 pages, 7 figures. Accepted to EMNLP 2020
链接：https://arxiv.org/abs/2010.05111

摘要：Automated fact extraction and verification is a challenging task that involves finding relevant evidence sentences from a reliable corpus to verify the truthfulness of a claim. Existing models either (i) concatenate all the evidence sentences, leading to the inclusion of redundant and noisy information; or (ii) process each claim-evidence sentence pair separately and aggregate all of them later, missing the early combination of related sentences for more accurate claim verification. Unlike the prior works, in this paper, we propose Hierarchical Evidence Set Modeling (HESM), a framework to extract evidence sets (each of which may contain multiple evidence sentences), and verify a claim to be supported, refuted or not enough info, by encoding and attending the claim and evidence sets at different levels of hierarchy. Our experimental results show that HESM outperforms 7 state-of-the-art methods for fact extraction and claim verification. Our source code is available atthis https URL.

[5]：Information Extraction from Swedish Medical Prescriptions with Sig-Transformer Encoder
标题：利用Sig变换器编码器从瑞典医学处方中提取信息
作者：John Pougue Biyong, Bo Wang, Terry Lyons, Alejo J Nevado-Holgado
链接：https://arxiv.org/abs/2010.04897

摘要：Relying on large pretrained language models such as Bidirectional Encoder Representations from Transformers (BERT) for encoding and adding a simple prediction layer has led to impressive performance in many clinical natural language processing (NLP) tasks. In this work, we present a novel extension to the Transformer architecture, by incorporating signature transform with the self-attention model. This architecture is added between embedding and prediction layers. Experiments on a new Swedish prescription data show the proposed architecture to be superior in two of the three information extraction tasks, comparing to baseline models. Finally, we evaluate two different embedding approaches between applying Multilingual BERT and translating the Swedish text to English then encode with a BERT model pretrained on clinical notes.

[6]：Relation Extraction as Two-way Span-Prediction
标题：基于双向跨度预测的关系抽取
作者：Amir DN Cohen, Shachar Rosenman, Yoav Goldberg
链接：https://arxiv.org/abs/2010.04829

摘要：The current supervised relation classification (RC) task uses a single embedding to represent the relation between a pair of entities. We argue that a better approach is to treat the RC task as a Question answering (QA) like span prediction problem. We present a span-prediction based system for RC and evaluate its performance compared to the embedding based system. We achieve state-of-the-art results on the TACRED and SemEval task 8 datasets.

[7]：Extracting Angina Symptoms from Clinical Notes Using Pre-Trained Transformer Architectures
标题：使用预先训练的变压器结构从临床记录中提取心绞痛症状
作者：Aaron S. Eisman, Nishant R. Shah, Carsten Eickhoff, George Zerveas, Elizabeth S. Chen, Wen-Chih Wu, Indra Neil Sarkar
链接：https://arxiv.org/abs/2010.05757

摘要：Anginal symptoms can connote increased cardiac risk and a need for change in cardiovascular management. This study evaluated the potential to extract these symptoms from physician notes using the Bidirectional Encoder from Transformers language model fine-tuned on a domain-specific corpus. The history of present illness section of 459 expert annotated primary care physician notes from consecutive patients referred for cardiac testing without known atherosclerotic cardiovascular disease were included. Notes were annotated for positive and negative mentions of chest pain and shortness of breath characterization. The results demonstrate high sensitivity and specificity for the detection of chest pain or discomfort, substernal chest pain, shortness of breath, and dyspnea on exertion. Small sample size limited extracting factors related to provocation and palliation of chest pain. This study provides a promising starting point for the natural language processing of physician notes to characterize clinically actionable anginal symptoms.

问答系统(4篇)

[1]：Counterfactual Variable Control for Robust and Interpretable Question Answering
标题：鲁棒可解释问答的反事实变量控制
作者：Sicheng Yu, Yulei Niu, Shuohang Wang, Jing Jiang, Qianru Sun
链接：https://arxiv.org/abs/2010.05581

摘要：Deep neural network based question answering (QA) models are neither robust nor explainable in many cases. For example, a multiple-choice QA model, tested without any input of question, is surprisingly "capable" to predict the most of correct options. In this paper, we inspect such spurious "capability" of QA models using causal inference. We find the crux is the shortcut correlation, e.g., unrobust word alignment between passage and options learned by the models. We propose a novel approach called Counterfactual Variable Control (CVC) that explicitly mitigates any shortcut correlation and preserves the comprehensive reasoning for robust QA. Specifically, we leverage multi-branch architecture that allows us to disentangle robust and shortcut correlations in the training process of QA. We then conduct two novel CVC inference methods (on trained models) to capture the effect of comprehensive reasoning as the final prediction. For evaluation, we conduct extensive experiments using two BERT backbones on both multi-choice and span-extraction QA benchmarks. The results show that our CVC achieves high robustness against a variety of adversarial attacks in QA while maintaining good interpretation ability.

[2]：Localizing Open-Ontology QA Semantic Parsers in a Day Using Machine Translation
标题：利用机器翻译实现开放本体QA语义解析器的一天本地化
作者：Mehrad Moradshahi. Giovanni Campagna, Sina J. Semnani, Silei Xu, Monica S. Lam
备注：Published in EMNLP 2020
链接：https://arxiv.org/abs/2010.05106

摘要：We propose Semantic Parser Localizer (SPL), a toolkit that leverages Neural Machine Translation (NMT) systems to localize a semantic parser for a new language. Our methodology is to (1) generate training data automatically in the target language by augmenting machine-translated datasets with local entities scraped from public websites, (2) add a few-shot boost of human-translated sentences and train a novel XLMR-LSTM semantic parser, and (3) test the model on natural utterances curated using human translators.
We assess the effectiveness of our approach by extending the current capabilities of Schema2QA, a system for English Question Answering (QA) on the open web, to 10 new languages for the restaurants and hotels domains. Our models achieve an overall test accuracy ranging between 61% and 69% for the hotels domain and between 64% and 78% for restaurants domain, which compares favorably to 69% and 80% obtained for English parser trained on gold English data and a few examples from validation set. We show our approach outperforms the previous state-of-the-art methodology by more than 30% for hotels and 40% for restaurants with localized ontologies for the subset of languages tested.
Our methodology enables any software developer to add a new language capability to a QA system for a new domain, leveraging machine translation, in less than 24 hours.

[3]：AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data
标题：AutoQA：从数据库到只有合成训练数据的QA语义解析器
作者：Silei Xu, Sina J. Semnani, Giovanni Campagna, Monica S. Lam
备注：To appear in EMNLP 2020
链接：https://arxiv.org/abs/2010.04806

摘要：We propose AutoQA, a methodology and toolkit to generate semantic parsers that answer questions on databases, with no manual effort. Given a database schema and its data, AutoQA automatically generates a large set of high-quality questions for training that covers different database operations. It uses automatic paraphrasing combined with template-based parsing to find alternative expressions of an attribute in different parts of speech. It also uses a novel filtered auto-paraphraser to generate correct paraphrases of entire sentences. We apply AutoQA to the Schema2QA dataset and obtain an average logical form accuracy of 62.9% when tested on natural questions, which is only 6.4% lower than a model trained with expert natural language annotations and paraphrase data collected from crowdworkers. To demonstrate the generality of AutoQA, we also apply it to the Overnight dataset. AutoQA achieves 69.8% answer accuracy, 16.4% higher than the state-of-the-art zero-shot models and only 5.2% lower than the same model trained with human data.

[4]：Open-Domain Question Answering Goes Conversational via Question Rewriting
标题：开放域问答通过问题重写进行会话
作者：Raviteja Anantha, Svitlana Vakulenko, Zhucheng Tu, Shayne Longpre, Stephen Pulman, Srinivas Chappidi
备注：15 pages, 10 tables, 3 figures
链接：https://arxiv.org/abs/2010.04898

摘要：We introduce a new dataset for Question Rewriting in Conversational Context (QReCC), which contains 14K conversations with 81K question-answer pairs. The task in QReCC is to find answers to conversational questions within a collection of 10M web pages (split into 54M passages). Answers to questions in the same conversation may be distributed across several web pages. QReCC provides annotations that allow us to train and evaluate individual subtasks of question rewriting, passage retrieval and reading comprehension required for the end-to-end conversational question answering (QA) task. We report the effectiveness of a strong baseline approach that combines the state-of-the-art model for question rewriting, and competitive models for open-domain QA. Our results set the first baseline for the QReCC dataset with F1 of 19.07, compared to the human upper bound of 74.47, indicating the difficulty of the setup and a large room for improvement.

机器翻译(11篇)

[1]：Controllable Paraphrasing and Translation with a Syntactic Exemplar
标题：可控释义与翻译的句法范例
作者：Mingda Chen, Sam Wiseman, Kevin Gimpel
链接：https://arxiv.org/abs/2010.05856

摘要：Most prior work on exemplar-based syntactically controlled paraphrase generation relies on automatically-constructed large-scale paraphrase datasets. We sidestep this prerequisite by adapting models from prior work to be able to learn solely from bilingual text (bitext). Despite only using bitext for training, and in near zero-shot conditions, our single proposed model can perform four tasks: controlled paraphrase generation in both languages and controlled machine translation in both language directions. To evaluate these tasks quantitatively, we create three novel evaluation datasets. Our experimental results show that our models achieve competitive results on controlled paraphrase generation and strong performance on controlled machine translation. Analysis shows that our models learn to disentangle semantics and syntax in their latent representations.

[2]：Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation
标题：集体智慧：利用自适应知识精馏改进低资源神经机器翻译
作者：Fahimeh Saleh, Wray Buntine, Gholamreza Haffari
链接：https://arxiv.org/abs/2010.05445

摘要：Scarcity of parallel sentence-pairs poses a significant hurdle for training high-quality Neural Machine Translation (NMT) models in bilingually low-resource scenarios. A standard approach is transfer learning, which involves taking a model trained on a high-resource language-pair and fine-tuning it on the data of the low-resource MT condition of interest. However, it is not clear generally which high-resource language-pair offers the best transfer learning for the target MT setting. Furthermore, different transferred models may have complementary semantic and/or syntactic strengths, hence using only one model may be sub-optimal. In this paper, we tackle this problem using knowledge distillation, where we propose to distill the knowledge of ensemble of teacher models to a single student model. As the quality of these teacher models varies, we propose an effective adaptive knowledge distillation approach to dynamically adjust the contribution of the teacher models during the distillation process. Experiments on transferring from a collection of six language pairs from IWSLT to five low-resource language-pairs from TED Talks demonstrate the effectiveness of our approach, achieving up to +0.9 BLEU score improvement compared to strong baselines.

[3]：It's not a Non-Issue: Negation as a Source of Error in Machine Translation
标题：这不是一个没有问题的问题：否定是机器翻译中的错误来源
作者：Md Mosharaf Hossain, Antonios Anastasopoulos, Eduardo Blanco, Alexis Palmer
备注：Accepted at the Findings of EMNLP2020
链接：https://arxiv.org/abs/2010.05432

摘要：As machine translation (MT) systems progress at a rapid pace, questions of their adequacy linger. In this study we focus on negation, a universal, core property of human language that significantly affects the semantics of an utterance. We investigate whether translating negation is an issue for modern MT systems using 17 translation directions as test bed. Through thorough analysis, we find that indeed the presence of negation can significantly impact downstream quality, in some cases resulting in quality reductions of more than 60%. We also provide a linguistically motivated analysis that directly explains the majority of our findings. We release our annotations and code to replicate our analysis here:this https URL.

[4]：Addressing Exposure Bias With Document Minimum Risk Training: Cambridge at the WMT20 Biomedical Translation Task
标题：用文档最小风险培训解决暴露偏差：剑桥大学WMT20生物医学翻译任务
作者：Danielle Saunders, Bill Byrne
备注：WMT20 biomedical task
链接：https://arxiv.org/abs/2010.05333

摘要：The 2020 WMT Biomedical translation task evaluated Medline abstract translations. This is a small-domain translation task, meaning limited relevant training data with very distinct style and vocabulary. Models trained on such data are susceptible to exposure bias effects, particularly when training sentence pairs are imperfect translations of each other. This can result in poor behaviour during inference if the model learns to neglect the source sentence.
The UNICAM entry addresses this problem during fine-tuning using a robust variant on Minimum Risk Training. We contrast this approach with data-filtering to remove `problem' training examples. Under MRT fine-tuning we obtain good results for both directions of English-German and English-Spanish biomedical translation. In particular we achieve the best English-to-Spanish translation result and second-best Spanish-to-English result, despite using only single models with no ensembling.

[5]：Neural Machine Translation Doesn't Translate Gender Coreference Right Unless You Make It
标题：神经机器翻译不能翻译性别相辅相成的权利，除非你做到了。
作者：Danielle Saunders, Rosie Sallis, Bill Byrne
备注：Workshop on Gender Bias in NLP, 2020
链接：https://arxiv.org/abs/2010.05332

摘要：Neural Machine Translation (NMT) has been shown to struggle with grammatical gender that is dependent on the gender of human referents, which can cause gender bias effects. Many existing approaches to this problem seek to control gender inflection in the target language by explicitly or implicitly adding a gender feature to the source sentence, usually at the sentence level.
In this paper we propose schemes for incorporating explicit word-level gender inflection tags into NMT. We explore the potential of this gender-inflection controlled translation when the gender feature can be determined from a human reference, assessing on English-to-Spanish and English-to-German translation.
We find that simple existing approaches can over-generalize a gender-feature to multiple entities in a sentence, and suggest an effective alternative in the form of tagged coreference adaptation data. We also propose an extension to assess translations of gender-neutral entities from English given a corresponding linguistic convention in the inflected target language.

[6]：Machine Translation of Mathematical Text
标题：数学文本的机器翻译
作者：Aditya Ohri, Tanya Schmah
备注：14 pages, 2 figures
链接：https://arxiv.org/abs/2010.05229

摘要：We have implemented a machine translation system, the PolyMath Translator, for LaTeX documents containing mathematical text. The current implementation translates English LaTeX to French LaTeX, attaining a BLEU score of 53.5 on a held-out test corpus of mathematical sentences. It produces LaTeX documents that can be compiled to PDF without further editing. The system first converts the body of an input LaTeX document into English sentences containing math tokens, using the pandoc universal document converter to parse LaTeX input. We have trained a Transformer-based translator model, using OpenNMT, on a combined corpus containing a small proportion of domain-specific sentences. Our full system uses both this Transformer model and Google Translate, the latter being used as a backup to better handle linguistic features that do not appear in our training dataset. If the Transformer model does not have confidence in its translation, as determined by a high perplexity score, then we use Google Translate with a custom glossary. This backup was used 26% of the time on our test corpus of mathematical sentences. The PolyMath Translator is available as a web service atthis http URL.

[7]：Lexically Cohesive Neural Machine Translation with Copy Mechanism
标题：具有复制机制的词汇衔接神经机器翻译
作者：Vipul Mishra, Chenhui Chu, Yuki Arase
链接：https://arxiv.org/abs/2010.05193

摘要：Lexically cohesive translations preserve consistency in word choices in document-level translation. We employ a copy mechanism into a context-aware neural machine translation model to allow copying words from previous translation outputs. Different from previous context-aware neural machine translation models that handle all the discourse phenomena implicitly, our model explicitly addresses the lexical cohesion problem by boosting the probabilities to output words consistently. We conduct experiments on Japanese to English translation using an evaluation dataset for discourse translation. The results showed that the proposed model significantly improved lexical cohesion compared to previous context-aware models.

[8]：SJTU-NICT's Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task
标题：SJTU-NICT的WMT20新闻翻译任务监督和非监督神经机器翻译系统
作者：Zuchao Li, Hai Zhao, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita
备注：WMT20
链接：https://arxiv.org/abs/2010.05122

摘要：In this paper, we introduced our joint team SJTU-NICT 's participation in the WMT 2020 machine translation shared task. In this shared task, we participated in four translation directions of three language pairs: English-Chinese, English-Polish on supervised machine translation track, German-Upper Sorbian on low-resource and unsupervised machine translation tracks. Based on different conditions of language pairs, we have experimented with diverse neural machine translation (NMT) techniques: document-enhanced NMT, XLM pre-trained language model enhanced NMT, bidirectional translation as a pre-training, reference language based UNMT, data-dependent gaussian prior objective, and BT-BLEU collaborative filtering self-training. We also used the TF-IDF algorithm to filter the training set to obtain a domain more similar set with the test set for finetuning. In our submissions, the primary systems won the first place on English to Chinese, Polish to English, and German to Upper Sorbian translation directions.

[9]：Zero-Shot Translation Quality Estimation with Explicit Cross-Lingual Patterns
标题：基于显式跨语言模式的零镜头翻译质量评估
作者：Lei Zhou, Liang Ding, Koichi Takeda
备注：To appear in WMT2020
链接：https://arxiv.org/abs/2010.04989

摘要：This paper describes our submission of the WMT 2020 Shared Task on Sentence Level Direct Assessment, Quality Estimation (QE). In this study, we empirically reveal the \textit{mismatching issue} when directly adopting BERTScore to QE. Specifically, there exist lots of mismatching errors between the source sentence and translated candidate sentence with token pairwise similarity. In response to this issue, we propose to expose explicit cross-lingual patterns, \textit{e.g.} word alignments and generation score, to our proposed zero-shot models. Experiments show that our proposed QE model with explicit cross-lingual patterns could alleviate the mismatching issue, thereby improving the performance. Encouragingly, our zero-shot QE method could achieve comparable performance with supervised QE method, and even outperforms the supervised counterpart on 2 out of 6 directions. We expect our work could shed light on the zero-shot QE model improvement.

[10]：On Long-Tailed Phenomena in Neural Machine Translation
标题：神经机器翻译中的长尾现象
作者：Vikas Raunak, Siddharth Dalmia, Vivek Gupta, Florian Metze
备注：Accepted to Findings of EMNLP 2020
链接：https://arxiv.org/abs/2010.04924

摘要：State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens, tackling which remains a major challenge. The analysis of long-tailed phenomena in the context of structured prediction tasks is further hindered by the added complexities of search during inference. In this work, we quantitatively characterize such long-tailed phenomena at two levels of abstraction, namely, token classification and sequence generation. We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation by incorporating the inductive biases of beam search in the training process. We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy across different language pairs, especially on the generation of low-frequency words. We have released the code to reproduce our results.

[11]：ChrEn: Cherokee-English Machine Translation for Endangered Language Revitalization
标题：ChrEn：濒危语言振兴的切罗基英语机器翻译
作者：Shiyue Zhang, Benjamin Frey, Mohit Bansal
备注：EMNLP 2020 (19 pages)
链接：https://arxiv.org/abs/2010.04791

摘要：Cherokee is a highly endangered Native American language spoken by the Cherokee people. The Cherokee culture is deeply embedded in its language. However, there are approximately only 2,000 fluent first language Cherokee speakers remaining in the world, and the number is declining every year. To help save this endangered language, we introduce ChrEn, a Cherokee-English parallel dataset, to facilitate machine translation research between Cherokee and English. Compared to some popular machine translation language pairs, ChrEn is extremely low-resource, only containing 14k sentence pairs in total. We split our parallel data in ways that facilitate both in-domain and out-of-domain evaluation. We also collect 5k Cherokee monolingual data to enable semi-supervised learning. Besides these datasets, we propose several Cherokee-English and English-Cherokee machine translation systems. We compare SMT (phrase-based) versus NMT (RNN-based and Transformer-based) systems; supervised versus semi-supervised (via language model, back-translation, and BERT/Multilingual-BERT) methods; as well as transfer learning versus multilingual joint training with 4 other languages. Our best results are 15.8/12.7 BLEU for in-domain and 6.5/5.0 BLEU for out-of-domain Chr-En/EnChr translations, respectively, and we hope that our dataset and systems will encourage future work by the community for Cherokee language revitalization. Our data, code, and demo will be publicly available atthis https URL

自动摘要(2篇)

[1]：Quantitative Argument Summarization and Beyond: Cross-Domain Key Point Analysis
标题：定量论证总结及超越：跨领域关键点分析
作者：Roy Bar-Haim, Yoav Kantor, Lilach Eden, Roni Friedman, Dan Lahav, Noam Slonim
备注：EMNLP 2020
链接：https://arxiv.org/abs/2010.05369

摘要：When summarizing a collection of views, arguments or opinions on some topic, it is often desirable not only to extract the most salient points, but also to quantify their prevalence. Work on multi-document summarization has traditionally focused on creating textual summaries, which lack this quantitative aspect. Recent work has proposed to summarize arguments by mapping them to a small set of expert-generated key points, where the salience of each key point corresponds to the number of its matching arguments. The current work advances key point analysis in two important respects: first, we develop a method for automatic extraction of key points, which enables fully automatic analysis, and is shown to achieve performance comparable to a human expert. Second, we demonstrate that the applicability of key point analysis goes well beyond argumentation data. Using models trained on publicly available argumentation datasets, we achieve promising results in two additional domains: municipal surveys and user reviews. An additional contribution is an in-depth evaluation of argument-to-key point matching models, where we substantially outperform previous results.

[2]：CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural Summarization Systems
标题：CDEvalSumm：神经摘要系统跨数据集评价的实证研究
作者：Yiran Chen, Pengfei Liu, Ming Zhong, Zi-Yi Dou, Danqing Wang, Xipeng Qiu, Xuanjing Huang
备注：13 pages, Findings of EMNLP2020
链接：https://arxiv.org/abs/2010.05139

摘要：Neural network-based models augmented with unsupervised pre-trained knowledge have achieved impressive performance on text summarization. However, most existing evaluation methods are limited to an in-domain setting, where summarizers are trained and evaluated on the same dataset. We argue that this approach can narrow our understanding of the generalization ability for different summarization systems. In this paper, we perform an in-depth analysis of characteristics of different datasets and investigate the performance of different summarization models under a cross-dataset setting, in which a summarizer trained on one corpus will be evaluated on a range of out-of-domain corpora. A comprehensive study of 11 representative summarization systems on 5 datasets from different domains reveals the effect of model architectures and generation ways (i.e. abstractive and extractive) on model generalization ability. Further, experimental results shed light on the limitations of existing summarizers. Brief introduction and supplementary code can be found inthis https URL.

情感分析(3篇)

[1]：A Sentiment-Controllable Topic-to-Essay Generator with Topic Knowledge Graph
标题：基于主题知识图的情感可控主题生成系统
作者：Lin Qiao, Jianhao Yan, Fandong Meng, Zhendong Yang, Jie Zhou
备注：Accepted as a regular paper in Findings of EMNLP 2020
链接：https://arxiv.org/abs/2010.05511

摘要：Generating a vivid, novel, and diverse essay with only several given topic words is a challenging task of natural language generation. In previous work, there are two problems left unsolved: neglect of sentiment beneath the text and insufficient utilization of topic-related knowledge. Therefore, we propose a novel Sentiment-Controllable topic-to-essay generator with a Topic Knowledge Graph enhanced decoder, named SCTKG, which is based on the conditional variational autoencoder (CVAE) framework. We firstly inject the sentiment information into the generator for controlling sentiment for each sentence, which leads to various generated essays. Then we design a Topic Knowledge Graph enhanced decoder. Unlike existing models that use knowledge entities separately, our model treats the knowledge graph as a whole and encodes more structured, connected semantic information in the graph to generate a more relevant essay. Experimental results show that our SCTKG can generate sentiment controllable essays and outperform the state-of-the-art approach in terms of topic relevance, fluency, and diversity on both automatic and human evaluation.

[2]：HPCC-YNU at SemEval-2020 Task 9: A Bilingual Vector Gating Mechanism for Sentiment Analysis of Code-Mixed Text
标题：HPCC-YNU在SemEval-2020任务9：代码混合文本情感分析的双语向量选通机制
作者：Jun Kong, Jin Wang, Xuejie Zhang
备注：6 pages, 3 figures
链接：https://arxiv.org/abs/2010.04935

摘要：It is fairly common to use code-mixing on a social media platform to express opinions and emotions in multilingual societies. The purpose of this task is to detect the sentiment of code-mixed social media text. Code-mixed text poses a great challenge for the traditional NLP system, which currently uses monolingual resources to deal with the problem of multilingual mixing. This task has been solved in the past using lexicon lookup in respective sentiment dictionaries and using a long short-term memory (LSTM) neural network for monolingual resources. In this paper, we (my codalab username is kongjun) present a system that uses a bilingual vector gating mechanism for bilingual resources to complete the task. The model consists of two main parts: the vector gating mechanism, which combines the character and word levels, and the attention mechanism, which extracts the important emotional parts of the text. The results show that the proposed system outperforms the baseline algorithm. We achieved fifth place in Spanglish and 19th place in Hinglish.The code of this paper is availabled at :this https URL

[3]：Structured Self-Attention Weights Encode Semantics in Sentiment Analysis
标题：情绪分析中的结构化自我注意权重编码语义
作者：Zhengxuan Wu, Thanh-Son Nguyen, Desmond C. Ong
备注：10 pages
链接：https://arxiv.org/abs/2010.04922

摘要：Neural attention, especially the self-attention made popular by the Transformer, has become the workhorse of state-of-the-art natural language processing (NLP) models. Very recent work suggests that the self-attention in the Transformer encodes syntactic information; Here, we show that self-attention scores encode semantics by considering sentiment analysis tasks. In contrast to gradient-based feature attribution methods, we propose a simple and effective Layer-wise Attention Tracing (LAT) method to analyze structured attention weights. We apply our method to Transformer models trained on two tasks that have surface dissimilarities, but share common semantics---sentiment analysis of movie reviews and time-series valence prediction in life story narratives. Across both tasks, words with high aggregated attention weights were rich in emotional semantics, as quantitatively validated by an emotion lexicon labeled by human annotators. Our results show that structured attention weights encode rich semantics in sentiment analysis, and match human interpretations of semantics.

句法分析(4篇)

[1]：HUJI-KU at MRP~2020: Two Transition-based Neural Parsers
标题：MRP~2020的HUJI-KU：两个基于转换的神经解析器
作者：Ofir Arviv, Ruixiang Cui, Daniel Hershcovich
链接：https://arxiv.org/abs/2010.05710

摘要：This paper describes the HUJI-KU system submission to the shared task on Cross-Framework Meaning Representation Parsing (MRP) at the 2020 Conference for Computational Language Learning (CoNLL), employing TUPA and the HIT-SCIR parser, which were, respectively, the baseline system and winning system in the 2019 MRP shared task. Both are transition-based parsers using BERT contextualized embeddings. We generalized TUPA to support the newly-added MRP frameworks and languages, and experimented with multitask learning with the HIT-SCIR parser. We reached 4th place in both the cross-framework and cross-lingual tracks.

[2]：Improving Compositional Generalization in Semantic Parsing
标题：语义分析中合成泛化的改进
作者：Inbar Oren, Jonathan Herzig, Nitish Gupta, Matt Gardner, Jonathan Berant
链接：https://arxiv.org/abs/2010.05647

摘要：Generalization of models to out-of-distribution (OOD) data has captured tremendous attention recently. Specifically, compositional generalization, i.e., whether a model generalizes to new structures built of components observed during training, has sparked substantial interest. In this work, we investigate compositional generalization in semantic parsing, a natural test-bed for compositional generalization, as output programs are constructed from sub-components. We analyze a wide variety of models and propose multiple extensions to the attention module of the semantic parser, aiming to improve compositional generalization. We find that the following factors improve compositional generalization: (a) using contextual representations, such as ELMo and BERT, (b) informing the decoder what input tokens have previously been attended to, (c) training the decoder attention to agree with pre-computed token alignments, and (d) downsampling examples corresponding to frequent program templates. While we substantially reduce the gap between in-distribution and OOD generalization, performance on OOD compositions is still substantially lower.

[3]：Second-Order Neural Dependency Parsing with Message Passing and End-to-End Training
标题：基于消息传递和端到端训练的二阶神经依赖分析
作者：Xinyu Wang, Kewei Tu
备注：Accepted to AACL 2020. 7 pages
链接：https://arxiv.org/abs/2010.05003

摘要：In this paper, we propose second-order graph-based neural dependency parsing using message passing and end-to-end neural networks. We empirically show that our approaches match the accuracy of very recent state-of-the-art second-order graph-based neural dependency parsers and have significantly faster speed in both training and testing. We also empirically show the advantage of second-order parsing over first-order parsing and observe that the usefulness of the head-selection structured constraint vanishes when using BERT embedding.

[4]：Compressing Transformer-Based Semantic Parsing Models using Compositional Code Embeddings
标题：使用组合代码嵌入压缩基于变换器的语义分析模型
作者：Prafull Prakash, Saurabh Kumar Shashidhar, Wenlong Zhao, Subendhu Rongali, Haidar Khan, Michael Kayser
备注：Accepted at EMNLP 2020 (Findings); 7 Pages
链接：https://arxiv.org/abs/2010.05002

摘要：The current state-of-the-art task-oriented semantic parsing models use BERT or RoBERTa as pretrained encoders; these models have huge memory footprints. This poses a challenge to their deployment for voice assistants such as Amazon Alexa and Google Assistant on edge devices with limited memory budgets. We propose to learn compositional code embeddings to greatly reduce the sizes of BERT-base and RoBERTa-base. We also apply the technique to DistilBERT, ALBERT-base, and ALBERT-large, three already compressed BERT variants which attain similar state-of-the-art performances on semantic parsing with much smaller model sizes. We observe 95.15% ~ 98.46% embedding compression rates and 20.47% ~ 34.22% encoder compression rates, while preserving greater than 97.5% semantic parsing performances. We provide the recipe for training and analyze the trade-off between code embedding sizes and downstream performances.

模型(10篇)

[1]：Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models
标题：梯度疫苗：大规模多语言模型中多任务优化的研究与改进
作者：Zirui Wang, Yulia Tsvetkov, Orhan Firat, Yuan Cao
链接：https://arxiv.org/abs/2010.05874

摘要：Massively multilingual models subsuming tens or even hundreds of languages pose great challenges to multi-task optimization. While it is a common practice to apply a language-agnostic procedure optimizing a joint multilingual task objective, how to properly characterize and take advantage of its underlying problem structure for improving optimization efficiency remains under-explored. In this paper, we attempt to peek into the black-box of multilingual optimization through the lens of loss function geometry. We find that gradient similarity measured along the optimization trajectory is an important signal, which correlates well with not only language proximity but also the overall model performance. Such observation helps us to identify a critical limitation of existing gradient-based multi-task learning methods, and thus we derive a simple and scalable optimization procedure, named Gradient Vaccine, which encourages more geometrically aligned parameter updates for close tasks. Empirically, our method obtains significant model performance gains on multilingual machine translation and XTREME benchmark tasks for multilingual language models. Our work reveals the importance of properly measuring and utilizing language proximity in multilingual optimization, and has broader implications for multi-task learning beyond multilingual modeling.

[2]：Probing Pretrained Language Models for Lexical Semantics
标题：词汇语义的预训练语言模型探讨
作者：Ivan Vulić, Edoardo Maria Ponti, Robert Litschko, Goran Glavaš, Anna Korhonen
备注：EMNLP 2020: Long paper
链接：https://arxiv.org/abs/2010.05731

摘要：The success of large pretrained language models (LMs) such as BERT and RoBERTa has sparked interest in probing their representations, in order to unveil what types of knowledge they implicitly capture. While prior research focused on morphosyntactic, semantic, and world knowledge, it remains unclear to which extent LMs also derive lexical type-level knowledge from words in context. In this work, we present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks, addressing the following questions: 1) How do different lexical knowledge extraction strategies (monolingual versus multilingual source LM, out-of-context versus in-context encoding, inclusion of special tokens, and layer-wise averaging) impact performance? How consistent are the observed effects across tasks and languages? 2) Is lexical knowledge stored in few parameters, or is it scattered throughout the network? 3) How do these representations fare against traditional static word vectors in lexical tasks? 4) Does the lexical information emerging from independently trained monolingual LMs display latent similarities? Our main results indicate patterns and best practices that hold universally, but also point to prominent variations across languages and tasks. Moreover, we validate the claim that lower Transformer layers carry more type-level lexical knowledge, but also show that this knowledge is distributed across multiple layers.

[3]：Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models
标题：结构监督改进了神经语言模型中的少镜头学习和句法泛化
作者：Ethan Wilcox, Peng Qian, Richard Futrell, Ryosuke Kohita, Roger Levy, Miguel Ballesteros
备注：To appear at EMNLP 2020
链接：https://arxiv.org/abs/2010.05725

摘要：Humans can learn structural properties about a word from minimal experience, and deploy their learned syntactic representations uniformly in different grammatical contexts. We assess the ability of modern neural language models to reproduce this behavior in English and evaluate the effect of structural supervision on learning outcomes. First, we assess few-shot learning capabilities by developing controlled experiments that probe models' syntactic nominal number and verbal argument structure generalizations for tokens seen as few as two times during training. Second, we assess invariance properties of learned representation: the ability of a model to transfer syntactic generalizations from a base context (e.g., a simple declarative active-voice sentence) to a transformed context (e.g., an interrogative sentence). We test four models trained on the same dataset: an n-gram baseline, an LSTM, and two LSTM-variants trained with explicit structural supervision (Dyer et al.,2016; Charniak et al., 2016). We find that in most cases, the neural models are able to induce the proper syntactic generalizations after minimal exposure, often from just two examples during training, and that the two structurally supervised models generalize more accurately than the LSTM model. All neural models are able to leverage information learned in base contexts to drive expectations in transformed contexts, indicating that they have learned some invariance properties of syntax.

[4]：Gradient-based Analysis of NLP Models is Manipulable
标题：基于梯度的NLP模型分析是可操作的
作者：Junlin Wang, Jens Tuyls, Eric Wallace, Sameer Singh
链接：https://arxiv.org/abs/2010.05419

摘要：Gradient-based analysis methods, such as saliency map visualizations and adversarial input perturbations, have found widespread use in interpreting neural NLP models due to their simplicity, flexibility, and most importantly, their faithfulness. In this paper, however, we demonstrate that the gradients of a model are easily manipulable, and thus bring into question the reliability of gradient-based analyses. In particular, we merge the layers of a target model with a Facade that overwhelms the gradients without affecting the predictions. This Facade can be trained to have gradients that are misleading and irrelevant to the task, such as focusing only on the stop words in the input. On a variety of NLP tasks (text classification, NLI, and QA), we show that our method can manipulate numerous gradient-based analysis techniques: saliency maps, input reduction, and adversarial perturbations all identify unimportant or targeted tokens as being highly important. The code and a tutorial of this paper is available atthis http URL.

[5]：Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU
标题：非增量编码器时代的增量处理：增量NLU双向模型的经验评估
作者：Brielen Madureira, David Schlangen
备注：Accepted to the EMNLP 2020 conference (long paper)
链接：https://arxiv.org/abs/2010.05330

摘要：While humans process language incrementally, the best language encoders currently used in NLP do not. Both bidirectional LSTMs and Transformers assume that the sequence that is to be encoded is available in full, to be processed either forwards and backwards (BiLSTMs) or as a whole (Transformers). We investigate how they behave under incremental interfaces, when partial output must be provided based on partial input seen up to a certain time step, which may happen in interactive systems. We test five models on various NLU datasets and compare their performance using three incremental evaluation metrics. The results support the possibility of using bidirectional encoders in incremental mode while retaining most of their non-incremental quality. The "omni-directional" BERT model, which achieves better non-incremental performance, is impacted more by the incremental access. This can be alleviated by adapting the training regime (truncated training), or the testing procedure, by delaying the output until some right context is available or by incorporating hypothetical right contexts generated by a language model like GPT-2.

[6]：Towards Accurate and Reliable Energy Measurement of NLP Models
标题：NLP模型准确可靠的能量测量
作者：Qingqing Cao, Aruna Balasubramanian, Niranjan Balasubramanian
备注：Accepted to SustaiNLP 2020 (co-located with EMNLP 2020)
链接：https://arxiv.org/abs/2010.05248

摘要：Accurate and reliable measurement of energy consumption is critical for making well-informed design choices when choosing and training large scale NLP models. In this work, we show that existing software-based energy measurements are not accurate because they do not take into account hardware differences and how resource utilization affects energy consumption. We conduct energy measurement experiments with four different models for a question answering task. We quantify the error of existing software-based energy measurements by using a hardware power meter that provides highly accurate energy measurements. Our key takeaway is the need for a more accurate energy estimation model that takes into account hardware variabilities and the non-linear relationship between resource utilization and energy consumption. We release the code and data atthis https URL.

[7]：PHICON: Improving Generalization of Clinical Text De-identification Models via Data Augmentation
标题：PHICON：通过数据扩充改进临床文本去识别模型的泛化
作者：Xiang Yue, Shuang Zhou
备注：Accepted by The 3rd ClinicalNLP Workshop at EMNLP'20
链接：https://arxiv.org/abs/2010.05143

摘要：De-identification is the task of identifying protected health information (PHI) in the clinical text. Existing neural de-identification models often fail to generalize to a new dataset. We propose a simple yet effective data augmentation method PHICON to alleviate the generalization issue. PHICON consists of PHI augmentation and Context augmentation, which creates augmented training corpora by replacing PHI entities with named-entities sampled from external sources, and by changing background context with synonym replacement or random word insertion, respectively. Experimental results on the i2b2 2006 and 2014 de-identification challenge datasets show that PHICON can help three selected de-identification models boost F1-score (by at most 8.6%) on cross-dataset test setting. We also discuss how much augmentation to use and how each augmentation method influences the performance.

[8]：Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions
标题：学术文档中文档级定义检测：现有模型、错误分析和未来发展方向
作者：Dongyeop Kang, Andrew Head, Risham Sidhu, Kyle Lo, Daniel S. Weld, Marti A. Hearst
备注：Workshop on Scholarly Document Processing (SDP), EMNLP 2020
链接：https://arxiv.org/abs/2010.05129

摘要：The task of definition detection is important for scholarly papers, because papers often make use of technical terminology that may be unfamiliar to readers. Despite prior work on definition detection, current approaches are far from being accurate enough to use in real-world applications. In this paper, we first perform in-depth error analysis of the current best performing definition detection system and discover major causes of errors. Based on this analysis, we develop a new definition detection system, HEDDEx, that utilizes syntactic features, transformer encoders, and heuristic filters, and evaluate it on a standard sentence-level benchmark. Because current benchmarks evaluate randomly sampled sentences, we propose an alternative evaluation that assesses every sentence within a document. This allows for evaluating recall in addition to precision. HEDDEx outperforms the leading system on both the sentence-level and the document-level tasks, by 12.7 F1 points and 14.4 F1 points, respectively. We note that performance on the high-recall document-level task is much lower than in the standard evaluation approach, due to the necessity of incorporation of document structure as features. We discuss remaining challenges in document-level definition detection, ideas for improvements, and potential issues for the development of reading aid applications.

[9]：When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models
标题：赫斯特不足时：用分布模型改进语料库超同名检测
作者：Changlong Yu, Jialong Han, Peifeng Wang, Yangqiu Song, Hongming Zhang, Wilfred Ng, Shuming Shi
备注：Accepted by EMNLP2020 Main Conference
链接：https://arxiv.org/abs/2010.04941

摘要：We address hypernymy detection, i.e., whether an is-a relationship exists between words (x, y), with the help of large textual corpora. Most conventional approaches to this task have been categorized to be either pattern-based or distributional. Recent studies suggest that pattern-based ones are superior, if large-scale Hearst pairs are extracted and fed, with the sparsity of unseen (x, y) pairs relieved. However, they become invalid in some specific sparsity cases, where x or y is not involved in any pattern. For the first time, this paper quantifies the non-negligible existence of those specific cases. We also demonstrate that distributional methods are ideal to make up for pattern-based ones in such cases. We devise a complementary framework, under which a pattern-based and a distributional model collaborate seamlessly in cases which they each prefer. On several benchmark datasets, our framework achieves competitive improvements and the case study shows its better interpretability.

[10]：Discourse structure interacts with reference but not syntax in neural language models
标题：在神经语言模型中，语篇结构与指称而非句法相互作用
作者：Forrest Davis, Marten van Schijndel
备注：Proceedings of the 2020 Conference on Computational Natural Language Learning (CoNLL 2020)
链接：https://arxiv.org/abs/2010.04887

摘要：Language models (LMs) trained on large quantities of text have been claimed to acquire abstract linguistic representations. Our work tests the robustness of these abstractions by focusing on the ability of LMs to learn interactions between different linguistic representations. In particular, we utilized stimuli from psycholinguistic studies showing that humans can condition reference (i.e. coreference resolution) and syntactic processing on the same discourse structure (implicit causality). We compared both transformer and long short-term memory LMs to find that, contrary to humans, implicit causality only influences LM behavior for reference, not syntax, despite model representations that encode the necessary discourse information. Our results further suggest that LM behavior can contradict not only learned representations of discourse but also syntactic agreement, pointing to shortcomings of standard language modeling.

其他(64篇)

[1]：Multi-Stage Pre-training for Low-Resource Domain Adaptation
标题：低资源领域适应的多阶段预训练
作者：Rong Zhang, Revanth Gangi Reddy, Md Arafat Sultan, Vittorio Castelli, Anthony Ferritto, Radu Florian, Efsun Sarioglu Kayi, Salim Roukos, Avirup Sil, Todd Ward
备注：Accepted at EMNLP 2020
链接：https://arxiv.org/abs/2010.05904

摘要：Transfer learning techniques are particularly useful in NLP tasks where a sizable amount of high-quality annotated data is difficult to obtain. Current approaches directly adapt a pre-trained language model (LM) on in-domain text before fine-tuning to downstream tasks. We show that extending the vocabulary of the LM with domain-specific terms leads to further gains. To a bigger effect, we utilize structure in the unlabeled data to create auxiliary synthetic tasks, which helps the LM transfer to downstream tasks. We apply these approaches incrementally on a pre-trained Roberta-large LM and show considerable performance gain on three tasks in the IT domain: Extractive Reading Comprehension, Document Ranking and Duplicate Question Detection.

[2]：Human-centric Dialog Training via Offline Reinforcement Learning
标题：基于离线强化学习的人本对话训练
作者：Natasha Jaques, Judy Hanwen Shen, Asma Ghandeharioun, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Shane Gu, Rosalind Picard
备注：To appear in EMNLP 2020 (long paper)
链接：https://arxiv.org/abs/2010.05848

摘要：How can we train a dialog model to produce better conversations by learning from human feedback, without the risk of humans teaching it harmful chat behaviors? We start by hosting models online, and gather human feedback from real-time, open-ended conversations, which we then use to train and improve the models using offline reinforcement learning (RL). We identify implicit conversational cues including language similarity, elicitation of laughter, sentiment, and more, which indicate positive human feedback, and embed these in multiple reward functions. A well-known challenge is that learning an RL policy in an offline setting usually fails due to the lack of ability to explore and the tendency to make over-optimistic estimates of future reward. These problems become even harder when using RL for language models, which can easily have a 20,000 action vocabulary and many possible reward functions. We solve the challenge by developing a novel class of offline RL algorithms. These algorithms use KL-control to penalize divergence from a pre-trained prior language model, and use a new strategy to make the algorithm pessimistic, instead of optimistic, in the face of uncertainty. We test the resulting dialog model with ratings from 80 users in an open-domain setting and find it achieves significant improvements over existing deep offline RL approaches. The novel offline RL method is viable for improving any existing generative dialog model using a static dataset of human feedback.

[3]：Layer-wise Guided Training for BERT: Learning Incrementally Refined Document Representations
标题：BERT的分层指导训练：学习增量细化的文档表示
作者：Nikolaos Manginas, Ilias Chalkidis, Prodromos Malakasiotis
备注：5 pages, short paper at SPNLP 2020 (EMNLP 2020 Workshop)
链接：https://arxiv.org/abs/2010.05763

摘要：Although BERT is widely used by the NLP community, little is known about its inner workings. Several attempts have been made to shed light on certain aspects of BERT, often with contradicting conclusions. A much raised concern focuses on BERT's over-parameterization and under-utilization issues. To this end, we propose o novel approach to fine-tune BERT in a structured manner. Specifically, we focus on Large Scale Multilabel Text Classification (LMTC) where documents are assigned with one or more labels from a large predefined set of hierarchically organized labels. Our approach guides specific BERT layers to predict labels from specific hierarchy levels. Experimenting with two LMTC datasets we show that this structured fine-tuning approach not only yields better classification results but also leads to better parameter utilization.

[4]：Dynamic Memory Enhanced Transformer for End-to-end Task-Oriented Dialogue System
标题：面向任务的端到端对话系统的动态记忆增强变换器
作者：Yanjie Gou, Yinjie Lei, Lingqiao Liu
备注：First version of this work
链接：https://arxiv.org/abs/2010.05740

摘要：Recent studies try to build task-oriented dialogue system in an end-to-end manner and the existing works make great progress on this task. However, there are still two issues need to consider: (1) How to effectively represent the knowledge bases and incorporate it into dialogue system. (2) How to efficiently reason the knowledge bases given queries. To solve these issues, we design a novel Transformer-based Dynamic Memory Network (DMN) with a novel Memory Mask scheme, which can dynamically generate the context-aware knowledge base representations, and reason the knowledge bases simultaneously. Furthermore, we incorporate the dynamic memory network into Transformer and propose Dynamic Memory Enhanced Transformer (DMET), which can aggregate information from dialogue history and knowledge bases to generate better responses. Through extensive experiments, our method can achieve superior performance over the state-of-the-art methods.

[5]：Using Type Information to Improve Entity Coreference Resolution
标题：利用类型信息提高实体共指分辨率
作者：Sopan Khosla, Carolyn Rose
备注：Accepted as Long Paper at CODI workshop EMNLP 2020
链接：https://arxiv.org/abs/2010.05738

摘要：Coreference resolution (CR) is an essential part of discourse analysis. Most recently, neural approaches have been proposed to improve over SOTA models from earlier paradigms. So far none of the published neural models leverage external semantic knowledge such as type information. This paper offers the first such model and evaluation, demonstrating modest gains in accuracy by introducing either gold standard or predicted types. In the proposed approach, type information serves both to (1) improve mention representation and (2) create a soft type consistency check between coreference candidate mentions. Our evaluation covers two different grain sizes of types over four different benchmark corpora.

[6]：EFSG: Evolutionary Fooling Sentences Generator
标题：进化愚弄句生成器
作者：Marco Di Giovanni, Marco Brambilla
备注：13 pages, 19 figures
链接：https://arxiv.org/abs/2010.05736

摘要：Large pre-trained language representation models (LMs) have recently collected a huge number of successes in many NLP tasks.
In 2018 BERT, and later its successors (e.g. RoBERTa), obtained state-of-the-art results in classical benchmark tasks, such as GLUE benchmark.
After that, works about adversarial attacks have been published to test their generalization proprieties and robustness.
In this work, we design Evolutionary Fooling Sentences Generator (EFSG), a model- and task-agnostic adversarial attack algorithm built using an evolutionary approach to generate false-positive sentences for binary classification tasks.
We successfully apply EFSG to CoLA and MRPC tasks, on BERT and RoBERTa, comparing performances. Results prove the presence of weak spots in state-of-the-art LMs.
We finally test adversarial training as a data augmentation defence approach against EFSG, obtaining stronger improved models with no loss of accuracy when tested on the original datasets.

[7]：Modelling Lexical Ambiguity with Density Matrices
标题：用密度矩阵建模词汇歧义
作者：Francois Meyer, Martha Lewis
链接：https://arxiv.org/abs/2010.05670

摘要：Words can have multiple senses. Compositional distributional models of meaning have been argued to deal well with finer shades of meaning variation known as polysemy, but are not so well equipped to handle word senses that are etymologically unrelated, or homonymy. Moving from vectors to density matrices allows us to encode a probability distribution over different senses of a word, and can also be accommodated within a compositional distributional model of meaning. In this paper we present three new neural models for learning density matrices from a corpus, and test their ability to discriminate between word senses on a range of compositional datasets. When paired with a particular composition method, our best model outperforms existing vector-based compositional models as well as strong sentence encoders.

[8]：From Hero to Zéroe: A Benchmark of Low-Level Adversarial Attacks
标题：None
作者：Steffen Eger, Yannik Benz
备注：Authors accidentally in wrong order; cannot be undone due to conference constraints
链接：https://arxiv.org/abs/2010.05648

摘要：Adversarial attacks are label-preserving modifications to inputs of machine learning classifiers designed to fool machines but not humans. Natural Language Processing (NLP) has mostly focused on high-level attack scenarios such as paraphrasing input texts. We argue that these are less realistic in typical application scenarios such as in social media, and instead focus on low-level attacks on the character-level. Guided by human cognitive abilities and human robustness, we propose the first large-scale catalogue and benchmark of low-level adversarial attacks, which we dub Zéroe, encompassing nine different attack modes including visual and phonetic adversaries. We show that RoBERTa, NLP's current workhorse, fails on our attacks. Our dataset provides a benchmark for testing robustness of future more human-like NLP models.

[9]：Predicting Clinical Trial Results by Implicit Evidence Integration
标题：应用内隐证据整合预测临床试验结果
作者：Qiao Jin, Chuanqi Tan, Mosha Chen, Xiaozhong Liu, Songfang Huang
备注：EMNLP 2020 long paper
链接：https://arxiv.org/abs/2010.05639

摘要：Clinical trials provide essential guidance for practicing Evidence-Based Medicine, though often accompanying with unendurable costs and risks. To optimize the design of clinical trials, we introduce a novel Clinical Trial Result Prediction (CTRP) task. In the CTRP framework, a model takes a PICO-formatted clinical trial proposal with its background as input and predicts the result, i.e. how the Intervention group compares with the Comparison group in terms of the measured Outcome in the studied Population. While structured clinical evidence is prohibitively expensive for manual collection, we exploit large-scale unstructured sentences from medical literature that implicitly contain PICOs and results as evidence. Specifically, we pre-train a model to predict the disentangled results from such implicit evidence and fine-tune the model with limited data on the downstream datasets. Experiments on the benchmark Evidence Integration dataset show that the proposed model outperforms the baselines by large margins, e.g., with a 10.7% relative gain over BioBERT in macro-F1. Moreover, the performance improvement is also validated on another dataset composed of clinical trials related to COVID-19.

[10]：Contextual Modulation for Relation-Level Metaphor Identification
标题：关系级隐喻识别的语境调节
作者：Omnia Zayed, John P. McCrae, Paul Buitelaar
备注：accepted at Findings of EMNLP 2020
链接：https://arxiv.org/abs/2010.05633

摘要：Identifying metaphors in text is very challenging and requires comprehending the underlying comparison. The automation of this cognitive process has gained wide attention lately. However, the majority of existing approaches concentrate on word-level identification by treating the task as either single-word classification or sequential labelling without explicitly modelling the interaction between the metaphor components. On the other hand, while existing relation-level approaches implicitly model this interaction, they ignore the context where the metaphor occurs. In this work, we address these limitations by introducing a novel architecture for identifying relation-level metaphoric expressions of certain grammatical relations based on contextual modulation. In a methodology inspired by works in visual reasoning, our approach is based on conditioning the neural network computation on the deep contextualised features of the candidate expressions using feature-wise linear modulation. We demonstrate that the proposed architecture achieves state-of-the-art results on benchmark datasets. The proposed methodology is generic and could be applied to other textual classification problems that benefit from contextual interaction.

[11]：Load What You Need: Smaller Versions of Multilingual BERT
标题：加载您需要的：多语言BERT的较小版本
作者：Amine Abdaoui, Camille Pradel, Grégoire Sigel
链接：https://arxiv.org/abs/2010.05609

摘要：Pre-trained Transformer-based models are achieving state-of-the-art results on a variety of Natural Language Processing data sets. However, the size of these models is often a drawback for their deployment in real production applications. In the case of multilingual models, most of the parameters are located in the embeddings layer. Therefore, reducing the vocabulary size should have an important impact on the total number of parameters. In this paper, we propose to generate smaller models that handle fewer number of languages according to the targeted corpora. We present an evaluation of smaller versions of multilingual BERT on the XNLI data set, but we believe that this method may be applied to other multilingual transformers. The obtained results confirm that we can generate smaller models that keep comparable results, while reducing up to 45% of the total number of parameters. We compared our models with DistilmBERT (a distilled version of multilingual BERT) and showed that unlike language reduction, distillation induced a 1.7% to 6% drop in the overall accuracy on the XNLI data set. The presented models and code are publicly available.

[12]：The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?
标题：解释室里的大象：当我们有显著性方法时，为什么要用注意力来解释？
作者：Jasmijn Bastings, Katja Filippova
备注：Accepted at BlackboxNLP 2020
链接：https://arxiv.org/abs/2010.05607

摘要：There is a recent surge of interest in using attention as explanation of model predictions, with mixed evidence on whether attention can be used as such. While attention conveniently gives us one weight per input token and is easily extracted, it is often unclear toward what goal it is used as explanation. We find that often that goal, whether explicitly stated or not, is to find out what input tokens are the most relevant to a prediction, and that the implied user for the explanation is a model developer. For this goal and user, we argue that input saliency methods are better suited, and that there are no compelling reasons to use attention, despite the coincidence that it provides a weight for each input. With this position paper, we hope to shift some of the recent focus on attention to saliency methods, and for authors to clearly state the goal and user for their explanations.

[13]：MultiWOZ 2.3: A multi-domain task-oriented dataset enhanced with annotation corrections and co-reference annotation
标题：multiwoz2.3：一个多域面向任务的数据集，通过注释更正和协同引用注释得到增强
作者：Ting Han, Ximing Liu, Ryuichi Takanobu, Yixin Lian, Chongxuan Huang, Wei Peng, Minlie Huang
链接：https://arxiv.org/abs/2010.05594

摘要：Task-oriented dialogue systems have made unprecedented progress with multiple state-of-the-art (SOTA) models underpinned by a number of publicly available MultiWOZ datasets. Dialogue state annotations are error-prone, leading to sub-optimal performance. Various efforts have been put in rectifying the annotation errors presented in the original MultiWOZ dataset. In this paper, we introduce MultiWOZ 2.3, in which we differentiate incorrect annotations in dialogue acts from dialogue states, identifying a lack of co-reference when publishing the updated dataset. To ensure consistency between dialogue acts and dialogue states, we implement co-reference features and unify annotations of dialogue acts and dialogue states. We update the state of the art performance of natural language understanding and dialog state tracking on MultiWOZ 2.3, where the results show significant improvements than on previous versions of MultiWOZ datasets (2.0-2.2).

[14]：Carbon to Diamond: An Incident Remediation Assistant System From Site Reliability Engineers' Conversations in Hybrid Cloud Operations
标题：碳到钻石：从混合云操作的站点可靠性工程师对话中获得的事件补救辅助系统
作者：Suranjana Samanta, Ajay Gupta, Prateeti Mohapatra, Amar Prakash Azad
备注：6 Pages, 5 figures, 2 tables
链接：https://arxiv.org/abs/2010.05569

摘要：Conversational channels are changing the landscape of hybrid cloud service management. These channels are becoming important avenues for Site Reliability Engineers (SREs) %Subject Matter Experts (SME) to collaboratively work together to resolve an incident or issue. Identifying segmented conversations and extracting key insights or artefacts from them can help engineers to improve the efficiency of the incident remediation process by using information retrieval mechanisms for similar incidents. However, it has been empirically observed that due to the semi-formal behavior of such conversations (human language) they are very unique in nature and also contain lot of domain-specific terms. This makes it difficult to use the standard natural language processing frameworks directly, which are popularly used in standard NLP tasks. %It is important to identify the correct keywords and artefacts like symptoms, issue etc., present in the conversation chats. In this paper, we build a framework that taps into the conversational channels and uses various learning methods to (a) understand and extract key artefacts from conversations like diagnostic steps and resolution actions taken, and (b) present an approach to identify past conversations about similar issues. Experimental results on our dataset show the efficacy of our proposed method.

[15]：Joint Semantic Analysis with Document-Level Cross-Task Coherence Rewards
标题：基于文档级跨任务一致性奖励的联合语义分析
作者：Rahul Aralikatte, Mostafa Abdou, Heather Lent, Daniel Hershcovich, Anders Søgaard
链接：https://arxiv.org/abs/2010.05567

摘要：Coreference resolution and semantic role labeling are NLP tasks that capture different aspects of semantics, indicating respectively, which expressions refer to the same entity, and what semantic roles expressions serve in the sentence. However, they are often closely interdependent, and both generally necessitate natural language understanding. Do they form a coherent abstract representation of documents? We present a neural network architecture for joint coreference resolution and semantic role labeling for English, and train graph neural networks to model the 'coherence' of the combined shallow semantic graph. Using the resulting coherence score as a reward for our joint semantic analyzer, we use reinforcement learning to encourage global coherence over the document and between semantic annotations. This leads to improvements on both tasks in multiple datasets from different domains, and across a range of encoders of different expressivity, calling, we believe, for a more holistic approach to semantics in NLP.

[16]：The National Corpus of Contemporary Welsh: Project Report | Y Corpws Cenedlaethol Cymraeg Cyfoes: Adroddiad y Prosiect
标题：当代威尔士国家语料库：项目报告
作者：Dawn Knight, Steve Morris, Tess Fitzpatrick, Paul Rayson, Irena Spasić, Enlli Môn Thomas
备注：English-Welsh bilingual project report
链接：https://arxiv.org/abs/2010.05542

摘要：This report provides an overview of the CorCenCC project and the online corpus resource that was developed as a result of work on the project. The report lays out the theoretical underpinnings of the research, demonstrating how the project has built on and extended this theory. We also raise and discuss some of the key operational questions that arose during the course of the project, outlining the ways in which they were answered, the impact of these decisions on the resource that has been produced and the longer-term contribution they will make to practices in corpus-building. Finally, we discuss some of the applications and the utility of the work, outlining the impact that CorCenCC is set to have on a range of different individuals and user groups.

[17]：FILM: A Fast, Interpretable, and Low-rank Metric Learning Approach for Sentence Matching
标题：电影：一种快速、可解释、低阶的句子匹配度量学习方法
作者：Xiangru Tang, Alan Aw
链接：https://arxiv.org/abs/2010.05523

摘要：Detection of semantic similarity plays a vital role in sentence matching. It requires to learn discriminative representations of natural language. Recently, owing to more and more sophisticated model architecture, impressive progress has been made, along with a time-consuming training process and not-interpretable inference. To alleviate this problem, we explore a metric learning approach, named FILM(Fast, Interpretable, and Low-rank Metric learning) to efficiently find a high discriminative projection of the high-dimensional data. We construct this metric learning problem as a manifold optimization problem and solve it with the Cayleytransformation method with the Barzilai-Borweinstep size. In experiments, we applyFILMwith triplet loss minimization objective to theQuora Challenge and Semantic Textual Similarity (STS) Task. The results demonstrate that the FILM method achieves superior performance as well as the fastest computation speed, which is consistent with our theoretical analysis of time complexity.

[18]：Pre-trained Language Model Based Active Learning for Sentence Matching
标题：基于预训练语言模型的句子匹配主动学习
作者：Guirong Bai, Shizhu He, Kang Liu, Jun Zhao, Zaiqing Nie
备注：Accepted by the conference of coling 2020
链接：https://arxiv.org/abs/2010.05522

摘要：Active learning is able to significantly reduce the annotation cost for data-driven techniques. However, previous active learning approaches for natural language processing mainly depend on the entropy-based uncertainty criterion, and ignore the characteristics of natural language. In this paper, we propose a pre-trained language model based active learning approach for sentence matching. Differing from previous active learning, it can provide linguistic criteria to measure instances and help select more efficient instances for annotation. Experiments demonstrate our approach can achieve greater accuracy with fewer labeled training instances.

[19]：Unseen Target Stance Detection with Adversarial Domain Generalization
标题：基于对抗域泛化的隐形目标姿态检测
作者：Zhen Wang, Qiansheng Wang, Chengguo Lv, Xue Cao, Guohong Fu
链接：https://arxiv.org/abs/2010.05471

摘要：Although stance detection has made great progress in the past few years, it is still facing the problem of unseen targets. In this study, we investigate the domain difference between targets and thus incorporate attention-based conditional encoding with adversarial domain generalization to perform unseen target stance detection. Experimental results show that our approach achieves new state-of-the-art performance on the SemEval-2016 dataset, demonstrating the importance of domain difference between targets in unseen target stance detection.

[20]：COGS: A Compositional Generalization Challenge Based on Semantic Interpretation
标题：COGS：基于语义解释的合成泛化挑战
作者：Najoung Kim, Tal Linzen
备注：Accepted to EMNLP 2020
链接：https://arxiv.org/abs/2010.05465

摘要：Natural language is characterized by compositionality: the meaning of a complex expression is constructed from the meanings of its constituent parts. To facilitate the evaluation of the compositional abilities of language processing architectures, we introduce COGS, a semantic parsing dataset based on a fragment of English. The evaluation portion of COGS contains multiple systematic gaps that can only be addressed by compositional generalization; these include new combinations of familiar syntactic structures, or new combinations of familiar words and familiar structures. In experiments with Transformers and LSTMs, we found that in-distribution accuracy on the COGS test set was near-perfect (96--99%), but generalization accuracy was substantially lower (16--35%) and showed high sensitivity to random seed ($\pm$6--8%). These findings indicate that contemporary standard NLP models are limited in their compositional generalization capacity, and position COGS as a good way to measure progress.

[21]：MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding
标题：MAF：弱监督相位接地的多模式对齐框架
作者：Qinxin Wang, Hao Tan, Sheng Shen, Michael W. Mahoney, Zhewei Yao
链接：https://arxiv.org/abs/2010.05379

摘要：Phrase localization is a task that studies the mapping from textual phrases to regions of an image. Given difficulties in annotating phrase-to-object datasets at scale, we develop a Multimodal Alignment Framework (MAF) to leverage more widely-available caption-image datasets, which can then be used as a form of weak supervision. We first present algorithms to model phrase-object relevance by leveraging fine-grained visual representations and visually-aware language representations. By adopting a contrastive objective, our method uses information in caption-image pairs to boost the performance in weakly-supervised scenarios. Experiments conducted on the widely-adopted Flickr30k dataset show a significant improvement over existing weakly-supervised methods. With the help of the visually-aware language representations, we can also improve the previous best unsupervised result by 5.56%. We conduct ablation studies to show that both our novel model and our weakly-supervised strategies significantly contribute to our strong results.

[22]：Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)
标题：学习哪一个特征是重要的：罗伯塔获得了语言概括的偏好（最终）
作者：Alex Warstadt, Yian Zhang, Haau-Sing Li, Haokun Liu, Samuel R. Bowman
备注：accepted at EMNLP 2020
链接：https://arxiv.org/abs/2010.05358

摘要：One reason pretraining on self-supervised linguistic tasks is effective is that it teaches models features that are helpful for language understanding. However, we want pretrained models to learn not only to represent linguistic features, but also to use those features preferentially during fine-turning. With this goal in mind, we introduce a new English-language diagnostic set called MSGS (the Mixed Signals Generalization Set), which consists of 20 ambiguous binary classification tasks that we use to test whether a pretrained model prefers linguistic or surface generalizations during fine-tuning. We pretrain RoBERTa models from scratch on quantities of data ranging from 1M to 1B words and compare their performance on MSGS to the publicly available RoBERTa-base. We find that models can learn to represent linguistic features with little pretraining data, but require far more data to learn to prefer linguistic generalizations over surface ones. Eventually, with about 30B words of pretraining data, RoBERTa-base does demonstrate a linguistic bias with some regularity. We conclude that while self-supervised pretraining is an effective way to learn helpful inductive biases, there is likely room to improve the rate at which models learn which features matter.

[23]：A Knowledge-Driven Approach to Classifying Object and Attribute Coreferences in Opinion Mining
标题：基于知识驱动的意见挖掘对象属性共指分类方法
作者：Jiahua Chen, Shuai Wang, Sahisnu Mazumder, Bing Liu
备注：Accepted to Proceedings of EMNLP 2020 (Findings)
链接：https://arxiv.org/abs/2010.05357

摘要：Classifying and resolving coreferences of objects (e.g., product names) and attributes (e.g., product aspects) in opinionated reviews is crucial for improving the opinion mining performance. However, the task is challenging as one often needs to consider domain-specific knowledge (e.g., iPad is a tablet and has aspect resolution) to identify coreferences in opinionated reviews. Also, compiling a handcrafted and curated domain-specific knowledge base for each domain is very time consuming and arduous. This paper proposes an approach to automatically mine and leverage domain-specific knowledge for classifying objects and attribute coreferences. The approach extracts domain-specific knowledge from unlabeled review data and trains a knowledgeaware neural coreference classification model to leverage (useful) domain knowledge together with general commonsense knowledge for the task. Experimental evaluation on realworld datasets involving five domains (product types) shows the effectiveness of the approach.

[24]：Do Language Embeddings Capture Scales?
标题：语言嵌入能捕获尺度吗？
作者：Xikun Zhang, Deepak Ramachandran, Ian Tenney, Yanai Elazar, Dan Roth
备注：Accepted at EMNLP Findings 2020 and EMNLP BlackboxNLP workshop 2020
链接：https://arxiv.org/abs/2010.05345

摘要：Pretrained Language Models (LMs) have been shown to possess significant linguistic, common sense, and factual knowledge. One form of knowledge that has not been studied yet in this context is information about the scalar magnitudes of objects. We show that pretrained language models capture a significant amount of this information but are short of the capability required for general common-sense reasoning. We identify contextual information in pre-training and numeracy as two key factors affecting their performance and show that a simple method of canonicalizing numbers can have a significant effect on the results.

[25]：We Can Detect Your Bias: Predicting the Political Ideology of News Articles
标题：我们可以发现你的偏见：预测新闻文章的政治意识形态
作者：Ramy Baly, Giovanni Da San Martino, James Glass, Preslav Nakov
备注：Political bias, bias in news, neural networks bias, adversarial adaptation, triplet loss, transformers, recurrent neural networks
链接：https://arxiv.org/abs/2010.05338

摘要：We explore the task of predicting the leading political ideology or bias of news articles. First, we collect and release a large dataset of 34,737 articles that were manually annotated for political ideology -left, center, or right-, which is well-balanced across both topics and media. We further use a challenging experimental setup where the test examples come from media that were not seen during training, which prevents the model from learning to detect the source of the target news article instead of predicting its political ideology. From a modeling perspective, we propose an adversarial media adaptation, as well as a specially adapted triplet loss. We further add background information about the source, and we show that it is quite helpful for improving article-level prediction. Our experimental results show very sizable improvements over using state-of-the-art pre-trained Transformers in this challenging setup.

[26]：Multilingual Offensive Language Identification with Cross-lingual Embeddings
标题：基于跨语言嵌入的多语种攻击性语言识别
作者：Tharindu Ranasinghe, Marcos Zampieri
备注：Accepted to EMNLP 2020
链接：https://arxiv.org/abs/2010.05324

摘要：Offensive content is pervasive in social media and a reason for concern to companies and government organizations. Several studies have been recently published investigating methods to detect the various forms of such content (e.g. hate speech, cyberbulling, and cyberaggression). The clear majority of these studies deal with English partially because most annotated datasets available contain English data. In this paper, we take advantage of English data available by applying cross-lingual contextual word embeddings and transfer learning to make predictions in languages with less resources. We project predictions on comparable data in Bengali, Hindi, and Spanish and we report results of 0.8415 F1 macro for Bengali, 0.8568 F1 macro for Hindi, and 0.7513 F1 macro for Spanish. Finally, we show that our approach compares favorably to the best systems submitted to recent shared tasks on these three languages, confirming the robustness of cross-lingual contextual embeddings and transfer learning for this task.

[27]：TransQuest at WMT2020: Sentence-Level Direct Assessment
标题：WMT2020的转换：句子级直接评估
作者：Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov
备注：Accepted to WMT 2020
链接：https://arxiv.org/abs/2010.05318

摘要：This paper presents the team TransQuest's participation in Sentence-Level Direct Assessment shared task in WMT 2020. We introduce a simple QE framework based on cross-lingual transformers, and we use it to implement and evaluate two different neural architectures. The proposed methods achieve state-of-the-art results surpassing the results obtained by OpenKiwi, the baseline used in the shared task. We further fine tune the QE framework by performing ensemble and data augmentation. Our approach is the winning solution in all of the language pairs according to the WMT 2020 official results.

[28]：Automated Prediction of Medieval Arabic Diacritics
标题：中世纪阿拉伯变音符号的自动预测
作者：Khalid Alnajjar, Mika Hämäläinen, Niko Partanen, Jack Rueter
链接：https://arxiv.org/abs/2010.05269

摘要：This study uses a character level neural machine translation approach trained on a long short-term memory-based bi-directional recurrent neural network architecture for diacritization of Medieval Arabic. The results improve from the online tool used as a baseline. A diacritization model have been published openly through an easy to use Python package available on PyPi and Zenodo. We have found that context size should be considered when optimizing a feasible prediction model.

[29]：Unsupervised Distillation of Syntactic Information from Contextualized Word Representations
标题：从语境化词汇表征中提取句法信息的无监督提取
作者：Shauli Ravfogel, Yanai Elazar, Jacob Goldberger, Yoav Goldberg
备注：Accepted in BlackboxNLP@EMNLP2020
链接：https://arxiv.org/abs/2010.05265

摘要：Contextualized word representations, such as ELMo and BERT, were shown to perform well on various semantic and syntactic tasks. In this work, we tackle the task of unsupervised disentanglement between semantics and structure in neural language representations: we aim to learn a transformation of the contextualized vectors, that discards the lexical semantics, but keeps the structural information. To this end, we automatically generate groups of sentences which are structurally similar but semantically different, and use metric-learning approach to learn a transformation that emphasizes the structural component that is encoded in the vectors. We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics. Finally, we demonstrate the utility of our distilled representations by showing that they outperform the original contextualized representations in a few-shot parsing setting.

[30]：Few-shot Learning for Multi-label Intent Detection
标题：多标签意图检测的少镜头学习算法
作者：Yutai Hou, Yongkui Lai, Yushan Wu, Wanxiang Che, Ting Liu
链接：https://arxiv.org/abs/2010.05256

摘要：In this paper, we study the few-shot multi-label classification for user intent detection. For multi-label intent detection, state-of-the-art work estimates label-instance relevance scores and uses a threshold to select multiple associated intent labels. To determine appropriate thresholds with only a few examples, we first learn universal thresholding experience on data-rich domains, and then adapt the thresholds to certain few-shot domains with a calibration based on nonparametric learning. For better calculation of label-instance relevance score, we introduce label name embedding as anchor points in representation space, which refines representations of different classes to be well-separated from each other. Experiments on two datasets show that the proposed model significantly outperforms strong baselines in both one-shot and five-shot settings.

[31]：Connecting the Dots Between Fact Verification and Fake News Detection
标题：将事实核实与虚假新闻检测联系起来
作者：Qifei Li, Wangchunshu Zhou
备注：Accepted to COLING 2020
链接：https://arxiv.org/abs/2010.05202

摘要：Fact verification models have enjoyed a fast advancement in the last two years with the development of pre-trained language models like BERT and the release of large scale datasets such as FEVER. However, the challenging problem of fake news detection has not benefited from the improvement of fact verification models, which is closely related to fake news detection. In this paper, we propose a simple yet effective approach to connect the dots between fact verification and fake news detection. Our approach first employs a text summarization model pre-trained on news corpora to summarize the long news article into a short claim. Then we use a fact verification model pre-trained on the FEVER dataset to detect whether the input news article is real or fake. Our approach makes use of the recent success of fact verification models and enables zero-shot fake news detection, alleviating the need of large-scale training data to train fake news detection models. Experimental results on FakenewsNet, a benchmark dataset for fake news detection, demonstrate the effectiveness of our proposed approach.

[32]：Detecting Foodborne Illness Complaints in Multiple Languages Using English Annotations Only
标题：仅使用英文注释检测多种语言的食源性疾病投诉
作者：Ziyi Liu, Giannis Karamanolakis, Daniel Hsu, Luis Gravano
备注：Accepted for the 11th International Workshop on Health Text Mining and Information Analysis (LOUHI@EMNLP 2020)
链接：https://arxiv.org/abs/2010.05194

摘要：Health departments have been deploying text classification systems for the early detection of foodborne illness complaints in social media documents such as Yelp restaurant reviews. Current systems have been successfully applied for documents in English and, as a result, a promising direction is to increase coverage and recall by considering documents in additional languages, such as Spanish or Chinese. Training previous systems for more languages, however, would be expensive, as it would require the manual annotation of many documents for each new target language. To address this challenge, we consider cross-lingual learning and train multilingual classifiers using only the annotations for English-language reviews. Recent zero-shot approaches based on pre-trained multi-lingual BERT (mBERT) have been shown to effectively align languages for aspects such as sentiment. Interestingly, we show that those approaches are less effective for capturing the nuances of foodborne illness, our public health application of interest. To improve performance without extra annotations, we create artificial training documents in the target language through machine translation and train mBERT jointly for the source (English) and target language. Furthermore, we show that translating labeled documents to multiple languages leads to additional performance improvements for some target languages. We demonstrate the benefits of our approach through extensive experiments with Yelp restaurant reviews in seven languages. Our classifiers identify foodborne illness complaints in multilingual reviews from the Yelp Challenge dataset, which highlights the potential of our general approach for deployment in health departments.

[33]：Learning Adaptive Language Interfaces through Decomposition
标题：通过分解学习自适应语言接口
作者：Siddharth Karamcheti, Dorsa Sadigh, Percy Liang
备注：Accepted at the 1st Workshop for Interactive and Executable Semantic Parsing (IntEx-SemPar) @ EMNLP 2020. 11 pages, 5 figures
链接：https://arxiv.org/abs/2010.05190

摘要：Our goal is to create an interactive natural language interface that efficiently and reliably learns from users to complete tasks in simulated robotics settings. We introduce a neural semantic parsing system that learns new high-level abstractions through decomposition: users interactively teach the system by breaking down high-level utterances describing novel behavior into low-level steps that it can understand. Unfortunately, existing methods either rely on grammars which parse sentences with limited flexibility, or neural sequence-to-sequence models that do not learn efficiently or reliably from individual examples. Our approach bridges this gap, demonstrating the flexibility of modern neural systems, as well as the one-shot reliable generalization of grammar-based methods. Our crowdsourced interactive experiments suggest that over time, users complete complex tasks more efficiently while using our system by leveraging what they just taught. At the same time, getting users to trust the system enough to be incentivized to teach high-level utterances is still an ongoing challenge. We end with a discussion of some of the obstacles we need to overcome to fully realize the potential of the interactive paradigm.

[34]：fairseq S2T: Fast Speech-to-Text Modeling with fairseq
标题：fairseq S2T：用fairseq快速进行语音到文本建模
作者：Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino
备注：Accepted to AACL 2020 Demo
链接：https://arxiv.org/abs/2010.05171

摘要：We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference. We implement state-of-the-art RNN-based as well as Transformer-based models and open-source detailed training recipes. Fairseq's machine translation models and language models can be seamlessly integrated into S2T workflows for multi-task learning or transfer learning. Fairseq S2T documentation and examples are available atthis https URL.

[35]：A General Model of Conversational Dynamics and an Example Application in Serious Illness Communication
标题：会话动力学的一般模型及其在大病传播中的应用
作者：Laurence A. Clarfeld, Robert Gramling, Donna M. Rizzo, Margaret J. Eppstein
备注：34 pages, 20 figures, submitted to PLOS One (in review)
链接：https://arxiv.org/abs/2010.05164

摘要：Conversation has been a primary means for the exchange of information since ancient times. Understanding patterns of information flow in conversations is a critical step in assessing and improving communication quality. In this paper, we describe COnversational DYnamics Model (CODYM) analysis, a novel approach for studying patterns of information flow in conversations. CODYMs are Markov Models that capture sequential dependencies in the lengths of speaker turns. The proposed method is automated and scalable, and preserves the privacy of the conversational participants. The primary function of CODYM analysis is to quantify and visualize patterns of information flow, concisely summarized over sequential turns from one or more conversations. Our approach is general and complements existing methods, providing a new tool for use in the analysis of any type of conversation. As an important first application, we demonstrate the model on transcribed conversations between palliative care clinicians and seriously ill patients. These conversations are dynamic and complex, taking place amidst heavy emotions, and include difficult topics such as end-of-life preferences and patient values. We perform a versatile set of CODYM analyses that (a) establish the validity of the model by confirming known patterns of conversational turn-taking and word usage, (b) identify normative patterns of information flow in serious illness conversations, and (c) show how these patterns vary across narrative time and differ under expressions of anger, fear and sadness. Potential applications of CODYMs range from assessment and training of effective healthcare communication to comparing conversational dynamics across language and culture, with the prospect of identifying universal similarities and unique "fingerprints" of information flow.

[36]：Safe Reinforcement Learning with Natural Language Constraints
标题：自然语言约束下的安全强化学习
作者：Tsung-Yen Yang, Michael Hu, Yinlam Chow, Peter J. Ramadge, Karthik Narasimhan
备注：The first two authors contributed equally
链接：https://arxiv.org/abs/2010.05150

摘要：In this paper, we tackle the problem of learning control policies for tasks when provided with constraints in natural language. In contrast to instruction following, language here is used not to specify goals, but rather to describe situations that an agent must avoid during its exploration of the environment. Specifying constraints in natural language also differs from the predominant paradigm in safe reinforcement learning, where safety criteria are enforced by hand-defined cost functions. While natural language allows for easy and flexible specification of safety constraints and budget limitations, its ambiguous nature presents a challenge when mapping these specifications into representations that can be used by techniques for safe reinforcement learning. To address this, we develop a model that contains two components: (1) a constraint interpreter to encode natural language constraints into vector representations capturing spatial and temporal information on forbidden states, and (2) a policy network that uses these representations to output a policy with minimal constraint violations. Our model is end-to-end differentiable and we train it using a recently proposed algorithm for constrained policy optimization. To empirically demonstrate the effectiveness of our approach, we create a new benchmark task for autonomous navigation with crowd-sourced free-form text specifying three different types of constraints. Our method outperforms several baselines by achieving 6-7 times higher returns and 76% fewer constraint violations on average. Dataset and code to reproduce our experiments are available atthis https URL.

[37]：Plan ahead: Self-Supervised Text Planning for Paragraph Completion Task
标题：提前计划：段落完成任务的自我监督文本规划
作者：Dongyeop Kang, Eduard Hovy
备注：EMNLP 2020
链接：https://arxiv.org/abs/2010.05141

摘要：Despite the recent success of contextualized language models on various NLP tasks, language model itself cannot capture textual coherence of a long, multi-sentence document (e.g., a paragraph). Humans often make structural decisions on what and how to say about before making utterances. Guiding surface realization with such high-level decisions and structuring text in a coherent way is essentially called a planning process. Where can the model learn such high-level coherence? A paragraph itself contains various forms of inductive coherence signals called self-supervision in this work, such as sentence orders, topical keywords, rhetorical structures, and so on. Motivated by that, this work proposes a new paragraph completion task PARCOM; predicting masked sentences in a paragraph. However, the task suffers from predicting and selecting appropriate topical content with respect to the given context. To address that, we propose a self-supervised text planner SSPlanner that predicts what to say first (content prediction), then guides the pretrained language model (surface realization) using the predicted content. SSPlanner outperforms the baseline generation models on the paragraph completion task in both automatic and human evaluation. We also find that a combination of noun and verb types of keywords is the most effective for content selection. As more number of content keywords are provided, overall generation quality also increases.

[38]：On the Importance of Adaptive Data Collection for Extremely Imbalanced Pairwise Tasks
标题：极不平衡两两任务自适应数据采集的重要性
作者：Stephen Mussmann, Robin Jia, Percy Liang
备注：In Findings of EMNLP 2020
链接：https://arxiv.org/abs/2010.05103

摘要：Many pairwise classification tasks, such as paraphrase detection and open-domain question answering, naturally have extreme label imbalance (e.g., $99.99\%$ of examples are negatives). In contrast, many recent datasets heuristically choose examples to ensure label balance. We show that these heuristics lead to trained models that generalize poorly: State-of-the art models trained on QQP and WikiQA each have only $2.4\%$ average precision when evaluated on realistically imbalanced test data. We instead collect training data with active learning, using a BERT-based embedding model to efficiently retrieve uncertain points from a very large pool of unlabeled utterance pairs. By creating balanced training data with more informative negative examples, active learning greatly improves average precision to $32.5\%$ on QQP and $20.1\%$ on WikiQA.

[39]：Leveraging Spatial Information in Radiology Reports for Ischemic Stroke Phenotyping
标题：利用影像学报告中的空间信息进行缺血性中风分型
作者：Surabhi Datta, Shekhar Khanpara, Roy F. Riascos, Kirk Roberts
链接：https://arxiv.org/abs/2010.05096

摘要：Classifying fine-grained ischemic stroke phenotypes relies on identifying important clinical information. Radiology reports provide relevant information with context to determine such phenotype information. We focus on stroke phenotypes with location-specific information: brain region affected, laterality, stroke stage, and lacunarity. We use an existing fine-grained spatial information extraction system--Rad-SpatialNet--to identify clinically important information and apply simple domain rules on the extracted information to classify phenotypes. The performance of our proposed approach is promising (recall of 89.62% for classifying brain region and 74.11% for classifying brain region, side, and stroke stage together). Our work demonstrates that an information extraction system based on a fine-grained schema can be utilized to determine complex phenotypes with the inclusion of simple domain rules. These phenotypes have the potential to facilitate stroke research focusing on post-stroke outcome and treatment planning based on the stroke location.

[40]：Semi-supervised Formality Style Transfer using Language Model Discriminator and Mutual Information Maximization
标题：基于语言模型识别和互信息最大化的半监督形式风格转换
作者：Kunal Chawla, Diyi Yang
备注：EMNLP 2020 Findings
链接：https://arxiv.org/abs/2010.05090

摘要：Formality style transfer is the task of converting informal sentences to grammatically-correct formal sentences, which can be used to improve performance of many downstream NLP tasks. In this work, we propose a semi-supervised formality style transfer model that utilizes a language model-based discriminator to maximize the likelihood of the output sentence being formal, which allows us to use maximization of token-level conditional probabilities for training. We further propose to maximize mutual information between source and target styles as our training objective instead of maximizing the regular likelihood that often leads to repetitive and trivial generated responses. Experiments showed that our model outperformed previous state-of-the-art baselines significantly in terms of both automated metrics and human judgement. We further generalized our model to unsupervised text style transfer task, and achieved significant improvements on two benchmark sentiment style transfer datasets.

[41]：Structural Knowledge Distillation
标题：结构知识提炼
作者：Xinyu Wang, Yong Jiang, Zhaohui Yan, Zixia Jia, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu
备注：Under review as a conference paper of ICLR 2021. 15 pages
链接：https://arxiv.org/abs/2010.05010

摘要：Knowledge distillation is a critical technique to transfer knowledge between models, typically from a large model (the teacher) to a smaller one (the student). The objective function of knowledge distillation is typically the cross-entropy between the teacher and the student's output distributions. However, for structured prediction problems, the output space is exponential in size; therefore, the cross-entropy objective becomes intractable to compute and optimize directly. In this paper, we derive a factorized form of the knowledge distillation objective for structured prediction, which is tractable for many typical choices of the teacher and student models. In particular, we show the tractability and empirical effectiveness of structural knowledge distillation between sequence labeling and dependency parsing models under four different scenarios: 1) the teacher and student share the same factorization form of the output structure scoring function; 2) the student factorization produces smaller substructures than the teacher factorization; 3) the teacher factorization produces smaller substructures than the student factorization; 4) the factorization forms from the teacher and the student are incompatible.

[42]：Automated Concatenation of Embeddings for Structured Prediction
标题：用于结构预测的嵌入自动拼接
作者：Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu
备注：We propose ACE, which achieves new SOTA for 6 NLP tasks over 23 datasets. Under review as a conference paper at ICLR 2021. 19 pages
链接：https://arxiv.org/abs/2010.05006

摘要：Pretrained contextualized embeddings are powerful word representations for structured prediction tasks. Recent work found that better word representations can be obtained by concatenating different types of embeddings. However, the selection of embeddings to form the best concatenated representation usually varies depending on the task and the collection of candidate embeddings, and the ever-increasing number of embedding types makes it a more difficult problem. In this paper, we propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks, based on a formulation inspired by recent progress on neural architecture search. Specifically, a controller alternately samples a concatenation of embeddings, according to its current belief of the effectiveness of individual embedding types in consideration for a task, and updates the belief based on a reward. We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model, which is fed with the sampled concatenation as input and trained on a task dataset. Empirical results on 6 tasks and 23 datasets show that our approach outperforms strong baselines and achieves state-of-the-art performance with fine-tuned embeddings in the vast majority of evaluations.

[43]：FIND: Human-in-the-Loop Debugging Deep Text Classifiers
标题：FIND：人工在环调试深度文本分类器
作者：Piyawat Lertvittayakumjorn, Lucia Specia, Francesca Toni
备注：17 pages including appendices; To appear at EMNLP 2020
链接：https://arxiv.org/abs/2010.04987

摘要：Since obtaining a perfect training dataset (i.e., a dataset which is considerably large, unbiased, and well-representative of unseen cases) is hardly possible, many real-world text classifiers are trained on the available, yet imperfect, datasets. These classifiers are thus likely to have undesirable properties. For instance, they may have biases against some sub-populations or may not work effectively in the wild due to overfitting. In this paper, we propose FIND -- a framework which enables humans to debug deep learning text classifiers by disabling irrelevant hidden features. Experiments show that by using FIND, humans can improve CNN text classifiers which were trained under different types of imperfect datasets (including datasets with biases and datasets with dissimilar train-test distributions).

[44]：An Empirical Investigation of Beam-Aware Training in Supertagging
标题：超标记中波束感知训练的实证研究
作者：Renato Negrinho, Matthew R. Gormley, Geoffrey J. Gordon
备注：EMNLP Findings 2020 camera-ready. Code can be found atthis https URL
链接：https://arxiv.org/abs/2010.04980

摘要：Structured prediction is often approached by training a locally normalized model with maximum likelihood and decoding approximately with beam search. This approach leads to mismatches as, during training, the model is not exposed to its mistakes and does not use beam search. Beam-aware training aims to address these problems, but unfortunately, it is not yet widely used due to a lack of understanding about how it impacts performance, when it is most useful, and whether it is stable. Recently, Negrinho et al. (2018) proposed a meta-algorithm that captures beam-aware training algorithms and suggests new ones, but unfortunately did not provide empirical results. In this paper, we begin an empirical investigation: we train the supertagging model of Vaswani et al. (2016) and a simpler model with instantiations of the meta-algorithm. We explore the influence of various design choices and make recommendations for choosing them. We observe that beam-aware training improves performance for both models, with large improvements for the simpler model which must effectively manage uncertainty during decoding. Our results suggest that a model must be learned with search to maximize its effectiveness.

[45]：Can RNNs trained on harder subject-verb agreement instances still perform well on easier ones?
标题：在较难的主语动词一致实例上训练的RNN在较简单的实例上是否仍能表现得很好？
作者：Hritik Bansal, Gantavya Bhatt, Sumeet Agarwal
备注：15 pages, 3 figures, 13 Tables (including Appendix); Submitted for review
链接：https://arxiv.org/abs/2010.04976

摘要：The main subject and the associated verb in English must agree in grammatical number as per the Subject-Verb Agreement (SVA) phenomenon. It has been found that the presence of a noun between the verb and the main subject, whose grammatical number is opposite to that of the main subject, can cause speakers to produce a verb that agrees with the intervening noun rather than the main noun; the former thus acts as an agreement attractor. Such attractors have also been shown to pose a challenge for RNN models without explicit hierarchical bias to perform well on SVA tasks. Previous work suggests that syntactic cues in the input can aid such models to choose hierarchical rules over linear rules for number agreement. In this work, we investigate the effects of the choice of training data, training algorithm, and architecture on hierarchical generalization. We observe that the models under consideration fail to perform well on sentences with no agreement attractor when trained solely on natural sentences with at least one attractor. Even in the presence of this biased training set, implicit hierarchical bias in the architecture (as in the Ordered Neurons LSTM) is not enough to capture syntax-sensitive dependencies. These results suggest that current RNNs do not capture the underlying hierarchical rules of natural language, but rather use shallower heuristics for their predictions.

[46]：Tag Recommendation for Online Q&A Communities based on BERT Pre-Training Technique
标题：基于BERT预训练技术的在线QandA社区标签推荐
作者：Navid Khezrian, Jafar Habibi, Issa Annamoradnejad
备注：5 pages, initial results
链接：https://arxiv.org/abs/2010.04971

摘要：Online Q&A and open source communities use tags and keywords to index, categorize, and search for specific content. The most obvious advantage of tag recommendation is the correct classification of information. In this study, we used the BERT pre-training technique in tag recommendation task for online Q&A and open-source communities for the first time. Our evaluation on freecode datasets show that the proposed method, called TagBERT, is more accurate compared to deep learning and other baseline methods. Moreover, our model achieved a high stability by solving the problem of previous researches, where increasing the number of tag recommendations significantly reduced model performance.

[47]：MS-Ranker: Accumulating Evidence from Potentially Correct Candidates for Answer Selection
标题：兰克女士：从可能正确的候选人那里收集证据以供选择答案
作者：Yingxue Zhang, Fandong Meng, Peng Li, Ping Jian, Jie Zhou
链接：https://arxiv.org/abs/2010.04970

摘要：As conventional answer selection (AS) methods generally match the question with each candidate answer independently, they suffer from the lack of matching information between the question and the candidate. To address this problem, we propose a novel reinforcement learning (RL) based multi-step ranking model, named MS-Ranker, which accumulates information from potentially correct candidate answers as extra evidence for matching the question with a candidate. In specific, we explicitly consider the potential correctness of candidates and update the evidence with a gating mechanism. Moreover, as we use a listwise ranking reward, our model learns to pay more attention to the overall performance. Experiments on two benchmarks, namely WikiQA and SemEval-2016 CQA, show that our model significantly outperforms existing methods that do not rely on external resources.

[48]：Latent Tree Learning with Ordered Neurons: What Parses Does It Produce?
标题：有序神经元的潜在树学习：它产生什么样的语法分析？
作者：Yian Zhang
备注：6 pages, 3 figures, 3 tables, to appear in Proceedings of the 2020 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
链接：https://arxiv.org/abs/2010.04926

摘要：Recent latent tree learning models can learn constituency parsing without any exposure to human-annotated tree structures. One such model is ON-LSTM (Shen et al., 2019), which is trained on language modelling and has near-state-of-the-art performance on unsupervised parsing. In order to better understand the performance and consistency of the model as well as how the parses it generates are different from gold-standard PTB parses, we replicate the model with different restarts and examine their parses. We find that (1) the model has reasonably consistent parsing behaviors across different restarts, (2) the model struggles with the internal structures of complex noun phrases, (3) the model has a tendency to overestimate the height of the split points right before verbs. We speculate that both problems could potentially be solved by adopting a different training task other than unidirectional language modelling.

[49]：What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding
标题：位置嵌入学习什么？预训练语言模型位置编码的实证研究
作者：Yu-An Wang, Yun-Nung Chen
备注：Accepted by EMNLP 2020
链接：https://arxiv.org/abs/2010.04903

摘要：In recent years, pre-trained Transformers have dominated the majority of NLP benchmark tasks. Many variants of pre-trained Transformers have kept breaking out, and most focus on designing different pre-training objectives or variants of self-attention. Embedding the position information in the self-attention mechanism is also an indispensable factor in Transformers however is often discussed at will. Therefore, this paper carries out an empirical study on position embeddings of mainstream pre-trained Transformers, which mainly focuses on two questions: 1) Do position embeddings really learn the meaning of positions? 2) How do these different learned position embeddings affect Transformers for NLP tasks? This paper focuses on providing a new insight of pre-trained position embeddings through feature-level analysis and empirical experiments on most of iconic NLP tasks. It is believed that our experimental results can guide the future work to choose the suitable positional encoding function for specific tasks given the application property.

[50]：Toward Micro-Dialect Identification in Diaglossic and Code-Switched Environments
标题：面向诊断和代码转换环境下的微方言识别
作者：Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Lyle Ungar
备注：Accepted in EMNLP 2020
链接：https://arxiv.org/abs/2010.04900

摘要：Although the prediction of dialects is an important language processing task, with a wide range of applications, existing work is largely limited to coarse-grained varieties. Inspired by geolocation research, we propose the novel task of Micro-Dialect Identification (MDI) and introduce MARBERT, a new language model with striking abilities to predict a fine-grained variety (as small as that of a city) given a single, short message. For modeling, we offer a range of novel spatially and linguistically-motivated multi-task learning models. To showcase the utility of our models, we introduce a new, large-scale dataset of Arabic micro-varieties (low-resource) suited to our tasks. MARBERT predicts micro-dialects with 9.9% F1, ~76X better than a majority class baseline. Our new language model also establishes new state-of-the-art on several external tasks.

[51]：Self-play for Data Efficient Language Acquisition
标题：数据高效语言习得的自演
作者：Charles Lovering, Ellie Pavlick
链接：https://arxiv.org/abs/2010.04872

摘要：When communicating, people behave consistently across conversational roles: People understand the words they say and are able to produce the words they hear. To date, artificial agents developed for language tasks have lacked such symmetry, meaning agents trained to produce language are unable to understand it and vice-versa. In this work, we exploit the symmetric nature of communication in order to improve both the efficiency and quality of language acquisition in learning agents. Specifically, we consider the setting in which an agent must learn to both understand and generate words in an existing language, but with the assumption that access to interaction with "oracle" speakers of the language is very limited. We show that using self-play as a substitute for direct supervision enables the agent to transfer its knowledge across roles (e.g. training as a listener but testing as a speaker) and make better inferences about the ground truth lexicon using only a handful of interactions with the oracle.

[52]：How well does surprisal explain N400 amplitude under different experimental conditions?
标题：在不同的实验条件下，惊奇解释N400振幅有多好？
作者：James A. Michaelov, Benjamin K. Bergen
备注：To be presented at CoNLL 2020
链接：https://arxiv.org/abs/2010.04844

摘要：We investigate the extent to which word surprisal can be used to predict a neural measure of human language processing difficulty - the N400. To do this, we use recurrent neural networks to calculate the surprisal of stimuli from previously published neurolinguistic studies of the N400. We find that surprisal can predict N400 amplitude in a wide range of cases, and the cases where it cannot do so provide valuable insight into the neurocognitive processes underlying the response.

[53]：On Task-Level Dialogue Composition of Generative Transformer Model
标题：生成变压器模型任务级对话合成研究
作者：Prasanna Parthasarathi, Arvind Neelakantan, Sharan Narang
备注：8 pages; Accepted at Workshop on Insights from Negative Results in NLP
链接：https://arxiv.org/abs/2010.04826

摘要：Task-oriented dialogue systems help users accomplish tasks such as booking a movie ticket and ordering food via conversation. Generative models parameterized by a deep neural network are widely used for next turn response generation in such systems. It is natural for users of the system to want to accomplish multiple tasks within the same conversation, but the ability of generative models to compose multiple tasks is not well studied. In this work, we begin by studying the effect of training human-human task-oriented dialogues towards improving the ability to compose multiple tasks on Transformer generative models. To that end, we propose and explore two solutions: (1) creating synthetic multiple task dialogue data for training from human-human single task dialogue and (2) forcing the encoder representation to be invariant to single and multiple task dialogues using an auxiliary loss. The results from our experiments highlight the difficulty of even the sophisticated variant of transformer model in learning to compose multiple tasks from single task dialogues.

[54]：Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented Data
标题：反事实增强的SNLI训练数据并没有比未经审核的数据产生更好的泛化效果
作者：William Huang, Haokun Liu, Samuel R. Bowman
链接：https://arxiv.org/abs/2010.04762

摘要：A growing body of work shows that models exploit annotation artifacts to achieve state-of-the-art performance on standard crowdsourced benchmarks---datasets collected from crowdworkers to create an evaluation task---while still failing on out-of-domain examples for the same task. Recent work has explored the use of counterfactually-augmented data---data built by minimally editing a set of seed examples to yield counterfactual labels---to augment training data associated with these benchmarks and build more robust classifiers that generalize better. However, Khashabi et al. (2020) find that this type of augmentation yields little benefit on reading comprehension tasks when controlling for dataset size and cost of collection. We build upon this work by using English natural language inference data to test model generalization and robustness and find that models trained on a counterfactually-augmented SNLI dataset do not generalize better than unaugmented datasets of similar size and that counterfactual augmentation can hurt performance, yielding models that are less robust to challenge examples. Counterfactual augmentation of natural language understanding data through standard crowdsourcing techniques does not appear to be an effective way of collecting training data and further innovation is required to make this general line of work viable.

[55]：Investigating Cross-Linguistic Adjective Ordering Tendencies with a Latent-Variable Model
标题：基于潜变量模型的跨语言形容词排序倾向研究
作者：Jun Yen Leung, Guy Emerson, Ryan Cotterell
备注：13 pages, 7 tables, 1 figure. To be published in EMNLP 2020 proceedings
链接：https://arxiv.org/abs/2010.04755

摘要：Across languages, multiple consecutive adjectives modifying a noun (e.g. "the big red dog") follow certain unmarked ordering rules. While explanatory accounts have been put forward, much of the work done in this area has relied primarily on the intuitive judgment of native speakers, rather than on corpus data. We present the first purely corpus-driven model of multi-lingual adjective ordering in the form of a latent-variable model that can accurately order adjectives across 24 different languages, even when the training and testing languages are different. We utilize this novel statistical model to provide strong converging evidence for the existence of universal, cross-linguistic, hierarchical adjective ordering tendencies.

[56]：MEEP: An Open-Source Platform for Human-Human Dialog Collection and End-to-End Agent Training
标题：MEEP:Human-Human对话收集和端到端代理培训的开源平台
作者：Arkady Arkhangorodsky, Amittai Axelrod, Christopher Chu, Scot Fang, Yiqi Huang, Ajay Nagesh, Xing Shi, Boliang Zhang, Kevin Knight
备注：10 pages
链接：https://arxiv.org/abs/2010.04747

摘要：We create a new task-oriented dialog platform (MEEP) where agents are given considerable freedom in terms of utterances and API calls, but are constrained to work within a push-button environment. We include facilities for collecting human-human dialog corpora, and for training automatic agents in an end-to-end fashion. We demonstrate MEEP with a dialog assistant that lets users specify trip destinations.

[57]：Solving Historical Dictionary Codes with a Neural Language Model
标题：用神经语言模型求解历史字典码
作者：Christopher Chu, Raphael Valenti, Kevin Knight
备注：10 pages, 6 figures. To appear in EMNLP 2020
链接：https://arxiv.org/abs/2010.04746

摘要：We solve difficult word-based substitution codes by constructing a decoding lattice and searching that lattice with a neural language model. We apply our method to a set of enciphered letters exchanged between US Army General James Wilkinson and agents of the Spanish Crown in the late 1700s and early 1800s, obtained from the US Library of Congress. We are able to decipher 75.1% of the cipher-word tokens correctly.

[58]：Learning to Pronounce Chinese Without a Pronunciation Dictionary
标题：不用读音词典学汉语发音
作者：Christopher Chu, Scot Fang, Kevin Knight
备注：7 pages. To appear in EMNLP 2020
链接：https://arxiv.org/abs/2010.04744

摘要：We demonstrate a program that learns to pronounce Chinese text in Mandarin, without a pronunciation dictionary. From non-parallel streams of Chinese characters and Chinese pinyin syllables, it establishes a many-to-many mapping between characters and pronunciations. Using unsupervised methods, the program effectively deciphers writing into speech. Its token-level character-to-syllable accuracy is 89%, which significantly exceeds the 22% accuracy of prior work.

[59]：Evaluating and Characterizing Human Rationales
标题：评价和描述人类的基本原理
作者：Samuel Carton, Anirudh Rathore, Chenhao Tan
备注：14 pages, 15 figures, to appear in EMNLP 2020. Code is available atthis https URL
链接：https://arxiv.org/abs/2010.04736

摘要：Two main approaches for evaluating the quality of machine-generated rationales are: 1) using human rationales as a gold standard; and 2) automated metrics based on how rationales affect model behavior. An open question, however, is how human rationales fare with these automatic metrics. Analyzing a variety of datasets and models, we find that human rationales do not necessarily perform well on these metrics. To unpack this finding, we propose improved metrics to account for model-dependent baseline performance. We then propose two methods to further characterize rationale quality, one based on model retraining and one on using "fidelity curves" to reveal properties such as irrelevance and redundancy. Our work leads to actionable suggestions for evaluating and characterizing rationales.

[60]：Deep Learning for Information Systems Research
标题：信息系统研究的深度学习
作者：Sagar Samtani, Hongyi Zhu, Balaji Padmanabhan, Yidong Chai, Hsinchun Chen
备注：56 pages total, 1 page title and authors, 42 pages main text, 13 pages appendix
链接：https://arxiv.org/abs/2010.05774

摘要：Artificial Intelligence (AI) has rapidly emerged as a key disruptive technology in the 21st century. At the heart of modern AI lies Deep Learning (DL), an emerging class of algorithms that has enabled today's platforms and organizations to operate at unprecedented efficiency, effectiveness, and scale. Despite significant interest, IS contributions in DL have been limited, which we argue is in part due to issues with defining, positioning, and conducting DL research. Recognizing the tremendous opportunity here for the IS community, this work clarifies, streamlines, and presents approaches for IS scholars to make timely and high-impact contributions. Related to this broader goal, this paper makes five timely contributions. First, we systematically summarize the major components of DL in a novel Deep Learning for Information Systems Research (DL-ISR) schematic that illustrates how technical DL processes are driven by key factors from an application environment. Second, we present a novel Knowledge Contribution Framework (KCF) to help IS scholars position their DL contributions for maximum impact. Third, we provide ten guidelines to help IS scholars generate rigorous and relevant DL-ISR in a systematic, high-quality fashion. Fourth, we present a review of prevailing journal and conference venues to examine how IS scholars have leveraged DL for various research inquiries. Finally, we provide a unique perspective on how IS scholars can formulate DL-ISR inquiries by carefully considering the interplay of business function(s), application areas(s), and the KCF. This perspective intentionally emphasizes inter-disciplinary, intra-disciplinary, and cross-IS tradition perspectives. Taken together, these contributions provide IS scholars a timely framework to advance the scale, scope, and impact of deep learning research.

[61]：ComStreamClust: A communicative text clustering approach to topic detection in streaming data
标题：ComStreamClust：一种用于流数据主题检测的通信文本聚类方法
作者：Ali Najafi, Araz Gholipour-Shilabin, Rahim Dehkharghani, Ali Mohammadpur-Fard, Meysam Asgari-Chenaghlu
备注：11 pages, 6 Figures, 4 Tables
链接：https://arxiv.org/abs/2010.05349

摘要：Topic detection is the task of determining and tracking hot topics in social media. Twitter is arguably the most popular platform for people to share their ideas with others about different issues. One such prevalent issue is the COVID-19 pandemic. Detecting and tracking topics on these kinds of issues would help governments and healthcare companies deal with this phenomenon. In this paper, we propose a novel communicative clustering approach, so-called ComStreamClust for clustering sub-topics inside a broader topic, e.g. COVID-19. The proposed approach was evaluated on two datasets: the COVID-19 and the FA CUP. The results obtained from ComStreamClust approve the effectiveness of the proposed approach when compared to existing methods such as LDA.

[62]：A Defeasible Calculus for Zetetic Agents
标题：齐次智能体的一个可推翻的演算
作者：Jared Millson
链接：https://arxiv.org/abs/2010.05293

摘要：The study of defeasible reasoning unites epistemologists with those working in AI, in part, because both are interested in epistemic rationality. While it is traditionally thought to govern the formation and (with)holding of beliefs, epistemic rationality may also apply to the interrogative attitudes associated with our core epistemic practice of inquiry, such as wondering, investigating, and curiosity. Since generally intelligent systems should be capable of rational inquiry, AI researchers have a natural interest in the norms that govern interrogative attitudes. Following its recent coinage, we use the term "zetetic" to refer to the properties and norms associated with the capacity to inquire. In this paper, we argue that zetetic norms can be modeled via defeasible inferences to and from questions---a.k.a erotetic inferences---in a manner similar to the way norms of epistemic rationality are represented by defeasible inference rules. We offer a sequent calculus that accommodates the unique features of "erotetic defeat" and that exhibits the computational properties needed to inform the design of zetetic agents. The calculus presented here is an improved version of the one presented in Millson (2019), extended to cover a new class of defeasible erotetic inferences.

[63]：Constructing a Visual Relationship Authenticity Dataset
标题：构建视觉关系真实性数据集
作者：Chenhui Chu, Yuto Takebayashi, Mishra Vipul, Yuta Nakashima
链接：https://arxiv.org/abs/2010.05185

摘要：A visual relationship denotes a relationship between two objects in an image, which can be represented as a triplet of (subject; predicate; object). Visual relationship detection is crucial for scene understanding in images. Existing visual relationship detection datasets only contain true relationships that correctly describe the content in an image. However, distinguishing false visual relationships from true ones is also crucial for image understanding and grounded natural language processing. In this paper, we construct a visual relationship authenticity dataset, where both true and false relationships among all objects appeared in the captions in the Flickr30k entities image caption dataset are annotated. The dataset is available atthis https URL. We hope that this dataset can promote the study on both vision and language understanding.

[64]：Conformal retrofitting via Riemannian manifolds: distilling task-specific graphs into pretrained embeddings
标题：基于黎曼流形的共形改进：任务特定图的预训练嵌入
作者：Justin Dieter, Arun Tejasvi Chaganty
备注：14 pages, 5 figures
链接：https://arxiv.org/abs/2010.04842

摘要：Pretrained (language) embeddings are versatile, task-agnostic feature representations of entities, like words, that are central to many machine learning applications. These representations can be enriched through retrofitting, a class of methods that incorporate task-specific domain knowledge encoded as a graph over a subset of these entities. However, existing retrofitting algorithms face two limitations: they overfit the observed graph by failing to represent relationships with missing entities; and they underfit the observed graph by only learning embeddings in Euclidean manifolds, which cannot faithfully represent even simple tree-structured or cyclic graphs. We address these problems with two key contributions: (i) we propose a novel regularizer, a conformality regularizer, that preserves local geometry from the pretrained embeddings---enabling generalization to missing entities and (ii) a new Riemannian feedforward layer that learns to map pre-trained embeddings onto a non-Euclidean manifold that can better represent the entire graph. Through experiments on WordNet, we demonstrate that the conformality regularizer prevents even existing (Euclidean-only) methods from overfitting on link prediction for missing entities, and---together with the Riemannian feedforward layer---learns non-Euclidean embeddings that outperform them.