自然语言处理方向0929-LingLab

自然语言处理方向0929

2789 阅读 2020-10-07 08:40:04 上传

以下文章来源于理论语言学与古汉语

今日 cs.CL方向共计58篇文章。

知识图谱(2篇)

[1]：KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning
标题：KG-BART：用于生成性常识推理的知识图扩充BART
作者：Ye Liu, Yao Wan, Lifang He, Hao Peng, Philip S. Yu
备注：9 pages, 7 figures
链接：https://arxiv.org/abs/2009.12677

摘要：Generative commonsense reasoning which aims to empower machines to generate sentences with the capacity of reasoning over a set of concepts is a critical bottleneck for text generation. Even the state-of-the-art pre-trained language generation models struggle at this task and often produce implausible and anomalous sentences. One reason is that they rarely consider incorporating the knowledge graph which can provide rich relational information among the commonsense concepts. To promote the ability of commonsense reasoning for text generation, we propose a novel knowledge graphaugmented pre-trained language generation model KG-BART, which encompasses the complex relations of concepts through the knowledge graph and produces more logical and natural sentences as output. Moreover, KG-BART can leverage the graph attention to aggregate the rich concept semantics that enhances the model generalization on unseen concept sets. Experiments on benchmark CommonGen dataset verify the effectiveness of our proposed approach by comparing with several strong pre-trained language generation models, particularly KG-BART outperforms BART by 15.98%, 17.49%, in terms of BLEU-3, 4. Moreover, we also show that the generated context by our model can work as background scenarios to benefit downstream commonsense QA tasks.

[2]：QuatRE: Relation-Aware Quaternions for Knowledge Graph Embeddings
标题：四元数：知识图嵌入的关系感知四元数
作者：Dai Quoc Nguyen, Thanh Vu, Tu Dinh Nguyen, Dinh Phung
链接：https://arxiv.org/abs/2009.12517

摘要：We propose a simple and effective embedding model, named QuatRE, to learn quaternion embeddings for entities and relations in knowledge graphs. QuatRE aims to enhance correlations between head and tail entities given a relation within the Quaternion space with Hamilton product. QuatRE achieves this by associating each relation with two quaternion vectors which are used to rotate the quaternion embeddings of the head and tail entities, respectively. To obtain the triple score, QuatRE rotates the rotated embedding of the head entity using the normalized quaternion embedding of the relation, followed by a quaternion-inner product with the rotated embedding of the tail entity. Experimental results show that our QuatRE outperforms up-to-date embedding models on well-known benchmark datasets for knowledge graph completion.

推理分析(3篇)

[1]：Graph-based Multi-hop Reasoning for Long Text Generation
标题：基于图的多跳推理的长文本生成
作者：Liang Zhao, Jingjing Xu, Junyang Lin, Yichang Zhang, Hongxia Yang, Xu Sun
链接：https://arxiv.org/abs/2009.13282

摘要：Long text generation is an important but challenging task.The main problem lies in learning sentence-level semantic dependencies which traditional generative models often suffer from. To address this problem, we propose a Multi-hop Reasoning Generation (MRG) approach that incorporates multi-hop reasoning over a knowledge graph to learn semantic dependencies among sentences. MRG consists of twoparts, a graph-based multi-hop reasoning module and a path-aware sentence realization module. The reasoning module is responsible for searching skeleton paths from a knowledge graph to imitate the imagination process in the human writing for semantic transfer. Based on the inferred paths, the sentence realization module then generates a complete sentence. Unlike previous black-box models, MRG explicitly infers the skeleton path, which provides explanatory views tounderstand how the proposed model works. We conduct experiments on three representative tasks, including story generation, review generation, and product description generation. Automatic and manual evaluation show that our proposed method can generate more informative and coherentlong text than strong baselines, such as pre-trained models(e.g. GPT-2) and knowledge-enhanced models.

[2]：Recurrent Inference in Text Editing
标题：文本编辑中的递归推理
作者：Ning Shi, Ziheng Zeng, Haotian Zhang, Yichen Gong
备注：9 pages, 4 figures, 3 tables, and 1 page appendix
链接：https://arxiv.org/abs/2009.12643

摘要：In neural text editing, prevalent sequence-to-sequence based approaches directly map the unedited text either to the edited text or the editing operations, in which the performance is degraded by the limited source text encoding and long, varying decoding steps. To address this problem, we propose a new inference method, Recurrence, that iteratively performs editing actions, significantly narrowing the problem space. In each iteration, encoding the partially edited text, Recurrence decodes the latent representation, generates an action of short, fixed-length, and applies the action to complete a single edit. For a comprehensive comparison, we introduce three types of text editing tasks: Arithmetic Operators Restoration (AOR), Arithmetic Equation Simplification (AES), Arithmetic Equation Correction (AEC). Extensive experiments on these tasks with varying difficulties demonstrate that Recurrence achieves improvements over conventional inference methods.

[3]：Modeling Dyadic Conversations for Personality Inference
标题：人格推理的二元对话建模
作者：Qiang Liu
链接：https://arxiv.org/abs/2009.12496

摘要：Nowadays, automatical personality inference is drawing extensive attention from both academia and industry. Conventional methods are mainly based on user generated contents, e.g., profiles, likes, and texts of an individual, on social media, which are actually not very reliable. In contrast, dyadic conversations between individuals can not only capture how one expresses oneself, but also reflect how one reacts to different situations. Rich contextual information in dyadic conversation can explain an individual's response during his or her conversation. In this paper, we propose a novel augmented Gated Recurrent Unit (GRU) model for learning unsupervised Personal Conversational Embeddings (PCE) based on dyadic conversations between individuals. We adjust the formulation of each layer of a conventional GRU with sequence to sequence learning and personal information of both sides of the conversation. Based on the learned PCE, we can infer the personality of each individual. We conduct experiments on the Movie Script dataset, which is collected from conversations between characters in movie scripts. We find that modeling dyadic conversations between individuals can significantly improve personality inference accuracy. Experimental results illustrate the successful performance of our proposed method.

自然语言生成(5篇)

[1]：Injecting Entity Types into Entity-Guided Text Generation
标题：将实体类型注入到实体引导的文本生成中
作者：Xiangyu Dong, Wenhao Yu, Chenguang Zhu, Meng Jiang
备注：Preprint; Under review as a conference paper; Code is available at:this https URL
链接：https://arxiv.org/abs/2009.13401

摘要：Recent successes in deep generative modeling have led to significant advances in natural language generation (NLG). Incorporating entities into neural generation models has demonstrated great improvements by assisting to infer the summary topic and to generate coherent content. In order to enhance the role of entity in NLG, in this paper, we aim to model the entity type in the decoding phase to generate contextual words accurately. We develop a novel NLG model to produce a target sequence (i.e., a news article) based on a given list of entities. The generation quality depends significantly on whether the input entities are logically connected and expressed in the output. Our model has a multi-step decoder that injects the entity types into the process of entity mention generation. It first predicts the token of being a contextual word or an entity, then if an entity, predicts the entity mention. It effectively embeds the entity's meaning into hidden states, making the generated words precise. Experiments on two public datasets demonstrate type injection performs better than type embedding concatenation baselines.

[2]：Transformers Are Better Than Humans at Identifying Generated Text
标题：变形金刚比人类更善于识别生成的文本
作者：Antonis Maronikolakis, Mark Stevenson, Hinrich Schutze
链接：https://arxiv.org/abs/2009.13375

摘要：Fake information spread via the internet and social media influences public opinion and user activity. Generative models enable fake content to be generated faster and more cheaply than had previously been possible. This paper examines the problem of identifying fake content generated by lightweight deep learning models. A dataset containing human and machine-generated headlines was created and a user study indicated that humans were only able to identify the fake headlines in 45.3% of the cases. However, the most accurate automatic approach, transformers, achieved an accuracy of 94%, indicating that content generated from language models can be filtered out accurately.

[3]：Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning
标题：对抗性学习在神经对话生成中的性别偏见
作者：Haochen Liu, Wentao Wang, Yiqi Wang, Hui Liu, Zitao Liu, Jiliang Tang
备注：Accepted by EMNLP 2020
链接：https://arxiv.org/abs/2009.13028

摘要：Dialogue systems play an increasingly important role in various aspects of our daily life. It is evident from recent research that dialogue systems trained on human conversation data are biased. In particular, they can produce responses that reflect people's gender prejudice. Many debiasing methods have been developed for various natural language processing tasks, such as word embedding. However, they are not directly applicable to dialogue systems because they are likely to force dialogue models to generate similar responses for different genders. This greatly degrades the diversity of the generated responses and immensely hurts the performance of the dialogue models. In this paper, we propose a novel adversarial learning framework Debiased-Chat to train dialogue models free from gender bias while keeping their performance. Extensive experiments on two real-world conversation datasets show that our framework significantly reduces gender bias in dialogue models while maintaining the response quality.

[4]：Modeling Topical Relevance for Multi-Turn Dialogue Generation
标题：多回合对话生成的主题关联模型
作者：Hainan Zhang, Yanyan Lan, Liang Pang, Hongshen Chen, Zhuoye Ding, Dawei Yin
链接：https://arxiv.org/abs/2009.12735

摘要：Topic drift is a common phenomenon in multi-turn dialogue. Therefore, an ideal dialogue generation models should be able to capture the topic information of each context, detect the relevant context, and produce appropriate responses accordingly. However, existing models usually use word or sentence level similarities to detect the relevant contexts, which fail to well capture the topical level relevance. In this paper, we propose a new model, named STAR-BTM, to tackle this problem. Firstly, the Biterm Topic Model is pre-trained on the whole training dataset. Then, the topic level attention weights are computed based on the topic representation of each context. Finally, the attention weights and the topic distribution are utilized in the decoding process to generate the corresponding responses. Experimental results on both Chinese customer services data and English Ubuntu dialogue data show that STAR-BTM significantly outperforms several state-of-the-art methods, in terms of both metric-based and human evaluations.

[5]：Stylized Dialogue Response Generation Using Stylized Unpaired Texts
标题：使用程式化未配对文本生成程式化对话回应
作者：Yinhe Zheng, Zikai Chen, Rongsheng Zhang, Shilei Huang, Xiaoxi Mao, Minlie Huang
链接：https://arxiv.org/abs/2009.12719

摘要：Generating stylized responses is essential to build intelligent and engaging dialogue systems. However, this task is far from well-explored due to the difficulties of rendering a particular style in coherent responses, especially when the target style is embedded only in unpaired texts that cannot be directly used to train the dialogue model. This paper proposes a stylized dialogue generation method that can capture stylistic features embedded in unpaired texts. Specifically, our method can produce dialogue responses that are both coherent to the given context and conform to the target style. In this study, an inverse dialogue model is first introduced to predict possible posts for the input responses, and then this inverse model is used to generate stylized pseudo dialogue pairs based on these stylized unpaired texts. Further, these pseudo pairs are employed to train the stylized dialogue model with a joint training process, and a style routing approach is proposed to intensify stylistic features in the decoder. Automatic and manual evaluations on two datasets demonstrate that our method outperforms competitive baselines in producing coherent and style-intensive dialogue responses.

文本分类(1篇)

[1]：A Diagnostic Study of Explainability Techniques for Text Classification
标题：文本分类中可解释性技术的诊断研究
作者：Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein
链接：https://arxiv.org/abs/2009.13295

摘要：Recent developments in machine learning have introduced models that approach human performance at the cost of increased architectural complexity. Efforts to make the rationales behind the models' predictions transparent have inspired an abundance of new explainability techniques. Provided with an already trained model, they compute saliency scores for the words of an input instance. However, there exists no definitive guide on (i) how to choose such a technique given a particular application task and model architecture, and (ii) the benefits and drawbacks of using each such technique. In this paper, we develop a comprehensive list of diagnostic properties for evaluating existing explainability techniques. We then employ the proposed list to compare a set of diverse explainability techniques on downstream text classification tasks and neural network architectures. We also compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones. Overall, we find that the gradient-based explanations perform best across tasks and model architectures, and we present further insights into the properties of the reviewed explainability techniques.

信息检索(2篇)

[1]：SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval
标题：SPARTA：基于稀疏变换匹配检索的高效开放域问答系统
作者：Tiancheng Zhao, Xiaopeng Lu, Kyusong Lee
备注：11 pages
链接：https://arxiv.org/abs/2009.13013

摘要：We introduce SPARTA, a novel neural retrieval method that shows great promise in performance, generalization, and interpretability for open-domain question answering. Unlike many neural ranking methods that use dense vector nearest neighbor search, SPARTA learns a sparse representation that can be efficiently implemented as an Inverted Index. The resulting representation enables scalable neural retrieval that does not require expensive approximate vector search and leads to better performance than its dense counterpart. We validated our approaches on 4 open-domain question answering (OpenQA) tasks and 11 retrieval question answering (ReQA) tasks. SPARTA achieves new state-of-the-art results across a variety of open-domain question answering tasks in both English and Chinese datasets, including open SQuAD, Natuarl Question, CMRC and etc. Analysis also confirms that the proposed method creates human interpretable representation and allows flexible control over the trade-off between performance and efficiency.

[2]：Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval
标题：基于多跳密集检索的复杂开放域问题求解
作者：Wenhan Xiong, Xiang Lorraine Li, Srini Iyer, Jingfei Du, Patrick Lewis, William Yang Wang, Yashar Mehdad, Wen-tau Yih, Sebastian Riedel, Douwe Kiela, Barlas Oğuz
链接：https://arxiv.org/abs/2009.12756

摘要：We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions, which achieves state-of-the-art performance on two multi-hop datasets, HotpotQA and multi-evidence FEVER. Contrary to previous work, our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers, and can be applied to any unstructured text corpus. Our system also yields a much better efficiency-accuracy trade-off, matching the best published accuracy on HotpotQA while being 10 times faster at inference time.

信息抽取(4篇)

[1]：Fancy Man Lauches Zippo at WNUT 2020 Shared Task-1: A Bert Case Model for Wet Lab Entity Extraction
标题：在WNUT 2020大会上，Fancy Man Lauches Zippo共享任务1:Wet Lab实体提取的Bert Case模型
作者：Haoding Meng, Qingcheng Zeng, Xiaoyang Fang, Zhexin Liang
备注：EMNLP2020 WNUT
链接：https://arxiv.org/abs/2009.12997

摘要：Automatic or semi-automatic conversion of protocols specifying steps in performing a lab procedure into machine-readable format benefits biological research a lot. These noisy, dense, and domain-specific lab protocols processing draws more and more interests with the development of deep learning. This paper presents our teamwork on WNUT 2020 shared task-1: wet lab entity extract, that we conducted studies in several models, including a BiLSTM CRF model and a Bert case model which can be used to complete wet lab entity extraction. And we mainly discussed the performance differences of \textbf{Bert case} under different situations such as \emph{transformers} versions, case sensitivity that may don't get enough attention before.

[2]：Clustering-based Unsupervised Generative Relation Extraction
标题：基于聚类的无监督生成关系提取
作者：Chenhan Yuan, Ryan Rossi, Andrew Katz, Hoda Eldardiry
备注：11 pages, 5 figures
链接：https://arxiv.org/abs/2009.12681

摘要：This paper focuses on the problem of unsupervised relation extraction. Existing probabilistic generative model-based relation extraction methods work by extracting sentence features and using these features as inputs to train a generative model. This model is then used to cluster similar relations. However, these methods do not consider correlations between sentences with the same entity pair during training, which can negatively impact model performance. To address this issue, we propose a Clustering-based Unsupervised generative Relation Extraction (CURE) framework that leverages an "Encoder-Decoder" architecture to perform self-supervised learning so the encoder can extract relation information. Given multiple sentences with the same entity pair as inputs, self-supervised learning is deployed by predicting the shortest path between entity pairs on the dependency graph of one of the sentences. After that, we extract the relation information using the well-trained encoder. Then, entity pairs that share the same relation are clustered based on their corresponding relation information. Each cluster is labeled with a few words based on the words in the shortest paths corresponding to the entity pairs in each cluster. These cluster labels also describe the meaning of these relation clusters. We compare the triplets extracted by our proposed framework (CURE) and baseline methods with a ground-truth Knowledge Base. Experimental results show that our model performs better than state-of-the-art models on both New York Times (NYT) and United Nations Parallel Corpus (UNPC) standard datasets.

[3]：DWIE: an entity-centric dataset for multi-task document-level information extraction
标题：DWIE：一个面向多任务文档级信息抽取的以实体为中心的数据集
作者：Klim Zaporojets, Johannes Deleu, Chris Develder, Thomas Demeester
链接：https://arxiv.org/abs/2009.12626

摘要：This paper presents DWIE, the 'Deutsche Welle corpus for Information Extraction', a newly created multi-task dataset that combines four main Information Extraction (IE) annotation sub-tasks: (i) Named Entity Recognition (NER), (ii) Coreference Resolution, (iii) Relation Extraction (RE), and (iv) Entity Linking. DWIE is conceived as an entity-centric dataset that describes interactions and properties of conceptual entities on the level of the complete document. This contrasts with currently dominant mention-driven approaches that start from the detection and classification of named entity mentions in individual sentences. Further, DWIE presented two main challenges when building and evaluating IE models for it. First, the use of traditional mention-level evaluation metrics for NER and RE tasks on entity-centric DWIE dataset can result in measurements dominated by predictions on more frequently mentioned entities. We tackle this issue by proposing a new entity-driven metric that takes into account the number of mentions that compose each of the predicted and ground truth entities. Second, the document-level multi-task annotations require the models to transfer information between entity mentions located in different parts of the document, as well as between different tasks, in a joint learning setting. To realize this, we propose to use graph-based neural message passing techniques between document-level mention spans. Our experiments show an improvement of up to 5.5 F1 percentage points when incorporating neural graph propagation into our joint model. This demonstrates DWIE's potential to stimulate further research in graph neural networks for representation learning in multi-task IE. We make DWIE publicly available atthis https URL.

[4]：Reinforcement Learning-based N-ary Cross-Sentence Relation Extraction
标题：基于强化学习的n元跨句关系抽取
作者：Chenhan Yuan, Ryan Rossi, Andrew Katz, Hoda Eldardiry
备注：10 pages, 3 figures, submitted to AAAI
链接：https://arxiv.org/abs/2009.12683

摘要：The models of n-ary cross sentence relation extraction based on distant supervision assume that consecutive sentences mentioning n entities describe the relation of these n entities. However, on one hand, this assumption introduces noisy labeled data and harms the models' performance. On the other hand, some non-consecutive sentences also describe one relation and these sentences cannot be labeled under this assumption. In this paper, we relax this strong assumption by a weaker distant supervision assumption to address the second issue and propose a novel sentence distribution estimator model to address the first problem. This estimator selects correctly labeled sentences to alleviate the effect of noisy data is a two-level agent reinforcement learning model. In addition, a novel universal relation extractor with a hybrid approach of attention mechanism and PCNN is proposed such that it can be deployed in any tasks, including consecutive and nonconsecutive sentences. Experiments demonstrate that the proposed model can reduce the impact of noisy data and achieve better performance on general n-ary cross sentence relation extraction task compared to baseline models.

问答系统(3篇)

[1]：What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams
标题：这个病人得了什么病？大型开放域医学考试答疑数据集
作者：Di Jin, Eileen Pan, Nassim Oufattole, Wei-Hung Weng, Hanyi Fang, Peter Szolovits
备注：Submitted to AAAI 2021
链接：https://arxiv.org/abs/2009.13081

摘要：Open domain question answering (OpenQA) tasks have been recently attracting more and more attention from the natural language processing (NLP) community. In this work, we present the first free-form multiple-choice OpenQA dataset for solving medical problems, MedQA, collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively. We implement both rule-based and popular neural methods by sequentially combining a document retriever and a machine comprehension model. Through experiments, we find that even the current best method can only achieve 36.7\%, 42.0\%, and 70.1\% of test accuracy on the English, traditional Chinese, and simplified Chinese questions, respectively. We expect MedQA to present great challenges to existing OpenQA systems and hope that it can serve as a platform to promote much stronger OpenQA models from the NLP community in the future.

[2]：Unsupervised Pre-training for Biomedical Question Answering
标题：生物医学问答的无监督预培训
作者：Vaishnavi Kommaraju, Karthick Gunasekaran, Kun Li, Trapit Bansal, Andrew McCallum, Ivana Williams, Ana-Maria Istrate
备注：To appear in BioASQ workshop 2020
链接：https://arxiv.org/abs/2009.12952

摘要：We explore the suitability of unsupervised representation learning methods on biomedical text -- BioBERT, SciBERT, and BioSentVec -- for biomedical question answering. To further improve unsupervised representations for biomedical QA, we introduce a new pre-training task from unlabeled data designed to reason about biomedical entities in the context. Our pre-training method consists of corrupting a given context by randomly replacing some mention of a biomedical entity with a random entity mention and then querying the model with the correct entity mention in order to locate the corrupted part of the context. This de-noising task enables the model to learn good representations from abundant, unlabeled biomedical text that helps QA tasks and minimizes the train-test mismatch between the pre-training task and the downstream QA tasks by requiring the model to predict spans. Our experiments show that pre-training BioBERT on the proposed pre-training task significantly boosts performance and outperforms the previous best model from the 7th BioASQ Task 7b-Phase B challenge.

[3]：Hierarchical Deep Multi-modal Network for Medical Visual Question Answering
标题：医学视觉答疑的分层深层多模网络
作者：Deepak Gupta, Swati Suman, Asif Ekbal
备注：Accepted for publication at Expert Systems with Applications
链接：https://arxiv.org/abs/2009.12770

摘要：Visual Question Answering in Medical domain (VQA-Med) plays an important role in providing medical assistance to the end-users. These users are expected to raise either a straightforward question with a Yes/No answer or a challenging question that requires a detailed and descriptive answer. The existing techniques in VQA-Med fail to distinguish between the different question types sometimes complicates the simpler problems, or over-simplifies the complicated ones. It is certainly true that for different question types, several distinct systems can lead to confusion and discomfort for the end-users. To address this issue, we propose a hierarchical deep multi-modal network that analyzes and classifies end-user questions/queries and then incorporates a query-specific approach for answer prediction. We refer our proposed approach as Hierarchical Question Segregation based Visual Question Answering, in short HQS-VQA. Our contributions are three-fold, viz. firstly, we propose a question segregation (QS) technique for VQAMed; secondly, we integrate the QS model to the hierarchical deep multi-modal neural network to generate proper answers to the queries related to medical images; and thirdly, we study the impact of QS in Medical-VQA by comparing the performance of the proposed model with QS and a model without QS. We evaluate the performance of our proposed model on two benchmark datasets, viz. RAD and CLEF18. Experimental results show that our proposed HQS-VQA technique outperforms the baseline models with significant margins. We also conduct a detailed quantitative and qualitative analysis of the obtained results and discover potential causes of errors and their solutions.

机器翻译(4篇)

[1]：Aspects of Terminological and Named Entity Knowledge within Rule-Based Machine Translation Models for Under-Resourced Neural Machine Translation Scenarios
标题：欠驱动神经机器翻译场景中基于规则的机器翻译模型的术语和命名实体知识
作者：Daniel Torregrosa, Nivranshu Pasricha, Maraim Masoud, Bharathi Raja Chakravarthi, Juan Alonso, Noe Casas, Mihael Arcan
链接：https://arxiv.org/abs/2009.13398

摘要：Rule-based machine translation is a machine translation paradigm where linguistic knowledge is encoded by an expert in the form of rules that translate text from source to target language. While this approach grants extensive control over the output of the system, the cost of formalising the needed linguistic knowledge is much higher than training a corpus-based system, where a machine learning approach is used to automatically learn to translate from examples. In this paper, we describe different approaches to leverage the information contained in rule-based machine translation systems to improve a corpus-based one, namely, a neural machine translation model, with a focus on a low-resource scenario. Three different kinds of information were used: morphological information, named entities and terminology. In addition to evaluating the general performance of the system, we systematically analysed the performance of the proposed approaches when dealing with the targeted phenomena. Our results suggest that the proposed models have limited ability to learn from external information, and most approaches do not significantly alter the results of the automatic evaluation, but our preliminary qualitative evaluation shows that in certain cases the hypothesis generated by our system exhibit favourable behaviour such as keeping the use of passive voice.

[2]：Dissecting Lottery Ticket Transformers: Structural and Behavioral Study of Sparse Neural Machine Translation
标题：剖析彩票变形金刚：稀疏神经机器翻译的结构与行为研究
作者：Rajiv Movva, Jason Y. Zhao
备注：8 pages, 6 figures, 11 supplementary figures
链接：https://arxiv.org/abs/2009.13270

摘要：Recent work on the lottery ticket hypothesis has produced highly sparse Transformers for NMT while maintaining BLEU. However, it is unclear how such pruning techniques affect a model's learned representations. By probing sparse Transformers, we find that complex semantic information is first to be degraded. Analysis of internal activations reveals that higher layers diverge most over the course of pruning, gradually becoming less complex than their dense counterparts. Meanwhile, early layers of sparse models begin to perform more encoding. Attention mechanisms remain remarkably consistent as sparsity increases.

[3]：Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models
标题：基于能量的重路由：基于能量模型的神经机器翻译改进
作者：Subhajit Naskar, Amirmohammad Rooshenas, Simeng Sun, Mohit Iyyer, Andrew McCallum
链接：https://arxiv.org/abs/2009.13267

摘要：The discrepancy between maximum likelihood estimation (MLE) and task measures such as BLEU score has been studied before for autoregressive neural machine translation (NMT) and resulted in alternative training algorithms (Ranzato et al., 2016; Norouzi et al., 2016; Shen et al., 2016; Wu et al., 2018). However, MLE training remains the de facto approach for autoregressive NMT because of its computational efficiency and stability. Despite this mismatch between the training objective and task measure, we notice that the samples drawn from an MLE-based trained NMT support the desired distribution -- there are samples with much higher BLEU score comparing to the beam decoding output. To benefit from this observation, we train an energy-based model to mimic the behavior of the task measure (i.e., the energy-based model assigns lower energy to samples with higher BLEU score), which is resulted in a re-ranking algorithm based on the samples drawn from NMT: energy-based re-ranking (EBR). Our EBR consistently improves the performance of the Transformer-based NMT: +3 BLEU points on Sinhala-English and +2.0 BLEU points on IWSLT'17 French-English tasks.

[4]：Inductively Representing Out-of-Knowledge-Graph Entities by Optimal Estimation Under Translational Assumptions
标题：平移假设下知识外图形实体的归纳表示
作者：Damai Dai, Hua Zheng, Fuli Luo, Pengcheng Yang, Baobao Chang, Zhifang Sui
链接：https://arxiv.org/abs/2009.12765

摘要：Conventional Knowledge Graph Completion (KGC) assumes that all test entities appear during training. However, in real-world scenarios, Knowledge Graphs (KG) evolve fast with out-of-knowledge-graph (OOKG) entities added frequently, and we need to represent these entities efficiently. Most existing Knowledge Graph Embedding (KGE) methods cannot represent OOKG entities without costly retraining on the whole KG. To enhance efficiency, we propose a simple and effective method that inductively represents OOKG entities by their optimal estimation under translational assumptions. Given pretrained embeddings of the in-knowledge-graph (IKG) entities, our method needs no additional learning. Experimental results show that our method outperforms the state-of-the-art methods with higher efficiency on two KGC tasks with OOKG entities.

自动摘要(1篇)

[1]：Reducing Quantity Hallucinations in Abstractive Summarization
标题：减少文摘中的数量幻觉
作者：Zheng Zhao, Shay B. Cohen, Bonnie Webber
备注：Accepted to Findings of EMNLP 2020
链接：https://arxiv.org/abs/2009.13312

摘要：It is well-known that abstractive summaries are subject to hallucination---including material that is not supported by the original text. While summaries can be made hallucination-free by limiting them to general phrases, such summaries would fail to be very informative. Alternatively, one can try to avoid hallucinations by verifying that any specific entities in the summary appear in the original text in a similar context. This is the approach taken by our system, Herman. The system learns to recognize and verify quantity entities (dates, numbers, sums of money, etc.) in a beam-worth of abstractive summaries produced by state-of-the-art models, in order to up-rank those summaries whose quantity terms are supported by the original text. Experimental results demonstrate that the ROUGE scores of such up-ranked summaries have a higher Precision than summaries that have not been up-ranked, without a comparable loss in Recall, resulting in higher F$_1$. Preliminary human evaluation of up-ranked vs. original summaries shows people's preference for the former.

文字蕴涵(1篇)

[1]：XTE: Explainable Text Entailment
标题：可解释文本蕴涵
作者：Vivian S. Silva, André Freitas, Siegfried Handschuh
备注：44 pages, 7 figures. Submitted to the Artificial Intelligence Journal
链接：https://arxiv.org/abs/2009.12431

摘要：Text entailment, the task of determining whether a piece of text logically follows from another piece of text, is a key component in NLP, providing input for many semantic applications such as question answering, text summarization, information extraction, and machine translation, among others. Entailment scenarios can range from a simple syntactic variation to more complex semantic relationships between pieces of text, but most approaches try a one-size-fits-all solution that usually favors some scenario to the detriment of another. Furthermore, for entailments requiring world knowledge, most systems still work as a "black box", providing a yes/no answer that does not explain the underlying reasoning process. In this work, we introduce XTE - Explainable Text Entailment - a novel composite approach for recognizing text entailment which analyzes the entailment pair to decide whether it must be resolved syntactically or semantically. Also, if a semantic matching is involved, we make the answer interpretable, using external knowledge bases composed of structured lexical definitions to generate natural language justifications that explain the semantic relationship holding between the pieces of text. Besides outperforming well-established entailment algorithms, our composite approach gives an important step towards Explainable AI, allowing the inference model interpretation, making the semantic reasoning process explicit and understandable.

模型(5篇)

[1]：Generative latent neural models for automatic word alignment
标题：自动词对齐的生成潜在神经网络模型
作者：Anh Khoa Ngo Ho, François Yvon
链接：https://arxiv.org/abs/2009.13117

摘要：Word alignments identify translational correspondences between words in a parallel sentence pair and are used, for instance, to learn bilingual dictionaries, to train statistical machine translation systems or to perform quality estimation. Variational autoencoders have been recently used in various of natural language processing to learn in an unsupervised way latent representations that are useful for language generation tasks. In this paper, we study these models for the task of word alignment and propose and assess several evolutions of a vanilla variational autoencoders. We demonstrate that these techniques can yield competitive results as compared to Giza++ and to a strong neural network alignment system for two language pairs.

[2]：A Simple and Efficient Ensemble Classifier Combining Multiple Neural Network Models on Social Media Datasets in Vietnamese
标题：基于越南社交媒体数据集的一种简单高效的多神经网络集成分类器
作者：Huy Duc Huynh, Hang Thi-Thuy Do, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen
备注：accepted by The 34th Pacific Asia Conference on Language, Information and Computation (PACLIC2020)
链接：https://arxiv.org/abs/2009.13060

摘要：Text classification is a popular topic of natural language processing, which has currently attracted numerous research efforts worldwide. The significant increase of data in social media requires the vast attention of researchers to analyze such data. There are various studies in this field in many languages but limited to the Vietnamese language. Therefore, this study aims to classify Vietnamese texts on social media from three different Vietnamese benchmark datasets. Advanced deep learning models are used and optimized in this study, including CNN, LSTM, and their variants. We also implement the BERT, which has never been applied to the datasets. Our experiments find a suitable model for classification tasks on each specific dataset. To take advantage of single models, we propose an ensemble model, combining the highest-performance models. Our single models reach positive results on each dataset. Moreover, our ensemble model achieves the best performance on all three datasets. We reach 86.96% of F1- score for the HSD-VLSP dataset, 65.79% of F1-score for the UIT-VSMEC dataset, 92.79% and 89.70% for sentiments and topics on the UIT-VSFC dataset, respectively. Therefore, our models achieve better performances as compared to previous studies on these datasets.

[3]：Multi-timescale representation learning in LSTM Language Models
标题：LSTM语言模型中的多时间尺度表示学习
作者：Shivangi Mahto, Vy A. Vo, Javier S. Turek, Alexander G. Huth
链接：https://arxiv.org/abs/2009.12727

摘要：Although neural language models are effective at capturing statistics of natural language, their representations are challenging to interpret. In particular, it is unclear how these models retain information over multiple timescales. In this work, we construct explicitly multi-timescale language models by manipulating the input and forget gate biases in a long short-term memory (LSTM) network. The distribution of timescales is selected to approximate power law statistics of natural language through a combination of exponentially decaying memory cells. We then empirically analyze the timescale of information routed through each part of the model using word ablation experiments and forget gate visualizations. These experiments show that the multi-timescale model successfully learns representations at the desired timescales, and that the distribution includes longer timescales than a standard LSTM. Further, information about high-,mid-, and low-frequency words is routed preferentially through units with the appropriate timescales. Thus we show how to construct language models with interpretable representations of different information timescales.

[4]：Techniques to Improve Q&A Accuracy with Transformer-based models on Large Complex Documents
标题：基于大型复杂文档的变压器模型提高QandA精度的技术
作者：Chejui Liao, Tabish Maniar, Sravanajyothi N, Anantha Sharma
链接：https://arxiv.org/abs/2009.12695

摘要：This paper discusses the effectiveness of various text processing techniques, their combinations, and encodings to achieve a reduction of complexity and size in a given text corpus. The simplified text corpus is sent to BERT (or similar transformer based models) for question and answering and can produce more relevant responses to user queries. This paper takes a scientific approach to determine the benefits and effectiveness of various techniques and concludes a best-fit combination that produces a statistically significant improvement in accuracy.

[5]：ARPA: Armenian Paraphrase Detection Corpus and Models
标题：亚美尼亚语释义检测语料库与模型
作者：Arthur Malajyan, Karen Avetisyan, Tsolak Ghukasyan
备注：To be published in the proceedings of Ivannikov Memorial Workshop 2020
链接：https://arxiv.org/abs/2009.12615

摘要：In this work, we employ a semi-automatic method based on back translation to generate a sentential paraphrase corpus for the Armenian language. The initial collection of sentences is translated from Armenian to English and back twice, resulting in pairs of lexically distant but semantically similar sentences. The generated paraphrases are then manually reviewed and annotated. Using the method train and test datasets are created, containing 2360 paraphrases in total. In addition, the datasets are used to train and evaluate BERTbased models for detecting paraphrase in Armenian, achieving results comparable to the state-of-the-art of other languages.

其他(27篇)

[1]：PIN: A Novel Parallel Interactive Network for Spoken Language Understanding
标题：PIN：一种新颖的并行交互式口语理解网络
作者：Peilin Zhou, Zhiqi Huang, Fenglin Liu, Yuexian Zou
链接：https://arxiv.org/abs/2009.13431

摘要：Spoken Language Understanding (SLU) is an essential part of the spoken dialogue system, which typically consists of intent detection (ID) and slot filling (SF) tasks. Recently, recurrent neural networks (RNNs) based methods achieved the state-of-the-art for SLU. It is noted that, in the existing RNN-based approaches, ID and SF tasks are often jointly modeled to utilize the correlation information between them. However, we noted that, so far, the efforts to obtain better performance by supporting bidirectional and explicit information exchange between ID and SF are not wellthis http URLaddition, few studies attempt to capture the local context information to enhance the performance of SF. Motivated by these findings, in this paper, Parallel Interactive Network (PIN) is proposed to model the mutual guidance between ID and SF. Specifically, given an utterance, a Gaussian self-attentive encoder is introduced to generate the context-aware feature embedding of the utterance which is able to capture local context information. Taking the feature embedding of the utterance, Slot2Intent module and Intent2Slot module are developed to capture the bidirectional information flow for ID and SF tasks. Finally, a cooperation mechanism is constructed to fuse the information obtained from Slot2Intent and Intent2Slot modules to further reduce the prediction bias.The experiments on two benchmark datasets, i.e., SNIPS and ATIS, demonstrate the effectiveness of our approach, which achieves a competitive result with state-of-the-art models. More encouragingly, by using the feature embedding of the utterance generated by the pre-trained language model BERT, our method achieves the state-of-the-art among all comparison approaches.

[2]：Similarity Detection Pipeline for Crawling a Topic Related Fake News Corpus
标题：主题相关假新闻语料库的相似性检测管道
作者：Inna Vogel, Jeong-Eun Choi, Meghana Meghana
链接：https://arxiv.org/abs/2009.13367

摘要：Fake news detection is a challenging task aiming to reduce human time and effort to check the truthfulness of news. Automated approaches to combat fake news, however, are limited by the lack of labeled benchmark datasets, especially in languages other than English. Moreover, many publicly available corpora have specific limitations that make them difficult to use. To address this problem, our contribution is threefold. First, we propose a new, publicly available German topic related corpus for fake news detection. To the best of our knowledge, this is the first corpus of its kind. In this regard, we developed a pipeline for crawling similar news articles. As our third contribution, we conduct different learning experiments to detect fake news. The best performance was achieved using sentence level embeddings from SBERT in combination with a Bi-LSTM (k=0.88).

[3]：Learning to Match Jobs with Resumes from Sparse Interaction Data using Multi-View Co-Teaching Network
标题：利用多视图协同教学网络从稀疏的交互数据中学习如何匹配工作和简历
作者：Shuqing Bian, Xu Chen, Wayne Xin Zhao, Kun Zhou, Yupeng Hou, Yang Song, Tao Zhang, Ji-Rong Wen
链接：https://arxiv.org/abs/2009.13299

摘要：With the ever-increasing growth of online recruitment data, job-resume matching has become an important task to automatically match jobs with suitable resumes. This task is typically casted as a supervised text matching problem. Supervised learning is powerful when the labeled data is sufficient. However, on online recruitment platforms, job-resume interaction data is sparse and noisy, which affects the performance of job-resume match algorithms. To alleviate these problems, in this paper, we propose a novel multi-view co-teaching network from sparse interaction data for job-resume matching. Our network consists of two major components, namely text-based matching model and relation-based matching model. The two parts capture semantic compatibility in two different views, and complement each other. In order to address the challenges from sparse and noisy data, we design two specific strategies to combine the two components. First, two components share the learned parameters or representations, so that the original representations of each component can be enhanced. More importantly, we adopt a co-teaching mechanism to reduce the influence of noise in training data. The core idea is to let the two components help each other by selecting more reliable training instances. The two strategies focus on representation enhancement and data enhancement, respectively. Compared with pure text-based matching models, the proposed approach is able to learn better data representations from limited or even sparse interaction data, which is more resistible to noise in training data. Experiment results have demonstrated that our model is able to outperform state-of-the-art methods for job-resume matching.

[4]：Pchatbot: A Large-Scale Dataset for Personalized Chatbot
标题：Pchatbot：一个面向个性化聊天机器人的大规模数据集
作者：Xiaohe Li, Hanxun Zhong, Yu Guo, Yueyuan Ma, Hongjin Qian, Zhanliang Liu, Zhicheng Dou, Ji-Rong Wen
备注：10 pages
链接：https://arxiv.org/abs/2009.13284

摘要：Natural language dialogue systems raise great attention recently. As many dialogue models are data-driven, high quality datasets are essential to these systems. In this paper, we introduce Pchatbot, a large scale dialogue dataset which contains two subsets collected from Weibo and Judical forums respectively. Different from existing datasets which only contain post-response pairs, we include anonymized user IDs as well as timestamps. This enables the development of personalized dialogue models which depend on the availability of users' historical conversations. Furthermore, the scale of Pchatbot is significantly larger than existing datasets, which might benefit the data-driven models. Our preliminary experimental study shows that a personalized chatbot model trained on Pchatbot outperforms the corresponding ad-hoc chatbot models. We also demonstrate that using larger dataset improves the quality of dialog models.

[5]：Zero-shot Multi-Domain Dialog State Tracking Using Descriptive Rules
标题：基于描述性规则的零触发多域对话状态跟踪
作者：Edgar Altszyler, Pablo Brusco, Nikoletta Basiou, John Byrnes, Dimitra Vergyri
链接：https://arxiv.org/abs/2009.13275

摘要：In this work, we present a framework for incorporating descriptive logical rules in state-of-the-art neural networks, enabling them to learn how to handle unseen labels without the introduction of any new training data. The rules are integrated into existing networks without modifying their architecture, through an additional term in the network's loss function that penalizes states of the network that do not obey the designed rules. As a case of study, the framework is applied to an existing neural-based Dialog State Tracker. Our experiments demonstrate that the inclusion of logical rules allows the prediction of unseen labels, without deteriorating the predictive capacity of the original system.

[6]：Augmented Natural Language for Generative Sequence Labeling
标题：生成序列标记的增强自然语言
作者：Ben Athiwaratkun, Cicero Nogueira dos Santos, Jason Krone, Bing Xiang
备注：To appear at EMNLP 2020
链接：https://arxiv.org/abs/2009.13272

摘要：We propose a generative framework for joint sequence labeling and sentence-level classification. Our model performs multiple sequence labeling tasks at once using a single, shared natural language output space. Unlike prior discriminative methods, our model naturally incorporates label semantics and shares knowledge across tasks. Our framework is general purpose, performing well on few-shot, low-resource, and high-resource tasks. We demonstrate these advantages on popular named entity recognition, slot labeling, and intent classification benchmarks. We set a new state-of-the-art for few-shot slot labeling, improving substantially upon the previous 5-shot ($75.0\% \rightarrow 90.9\%$) and 1-shot ($70.4\% \rightarrow 81.0\%$) state-of-the-art results. Furthermore, our model generates large improvements ($46.27\% \rightarrow 63.83\%$) in low-resource slot labeling over a BERT baseline by incorporating label semantics. We also maintain competitive results on high-resource tasks, performing within two points of the state-of-the-art on all tasks and setting a new state-of-the-art on the SNIPS dataset.

[7]：Knowledge-Aware Procedural Text Understanding with Multi-Stage Training
标题：基于多阶段训练的知识感知过程性文本理解
作者：Zhihan Zhang, Xiubo Geng, Tao Qin, Yunfang Wu, Daxin Jiang
链接：https://arxiv.org/abs/2009.13199

摘要：We focus on the task of procedural text understanding, which aims to track entities' states and locations during a natural process. Although recent approaches have achieved substantial progress, they are far behind human performance. Two challenges, difficulty of commonsense reasoning and data insufficiency, still remain unsolved. In this paper, we propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which leverages external knowledge sources to solve these issues. Specifically, we retrieve informative knowledge triples from ConceptNet and perform knowledge-aware reasoning while tracking the entities. Besides, we employ a multi-stage training schema which fine-tunes the BERT model over unlabeled data collected from Wikipedia before further fine-tuning it on the final model. Experimental results on two procedural text datasets, ProPara and Recipes, verify the effectiveness of the proposed methods, in which our model achieves state-of-the-art performance in comparison to various baselines.

[8]：Incomplete Utterance Rewriting as Semantic Segmentation
标题：作为语义切分的不完全话语重写
作者：Qian Liu, Bei Chen, Jian-Guang Lou, Bin Zhou, Dongmei Zhang
备注：To appear in EMNLP 2020 (Long Paper)
链接：https://arxiv.org/abs/2009.13166

摘要：Recent years the task of incomplete utterance rewriting has raised a large attention. Previous works usually shape it as a machine translation task and employ sequence to sequence based architecture with copy mechanism. In this paper, we present a novel and extensive approach, which formulates it as a semantic segmentation task. Instead of generating from scratch, such a formulation introduces edit operations and shapes the problem as prediction of a word-level edit matrix. Benefiting from being able to capture both local and global information, our approach achieves state-of-the-art performance on several public datasets. Furthermore, our approach is four times faster than the standard approach in inference.

[9]：Neural Baselines for Word Alignment
标题：单词对齐的神经基线
作者：Anh Khoa Ngo Ho, François Yvon
备注：The 16th International Workshop on Spoken Language Translation, Nov 2019, Hong Kong, Hong Kong SAR China
链接：https://arxiv.org/abs/2009.13116

摘要：Word alignments identify translational correspondences between words in a parallel sentence pair and is used, for instance, to learn bilingual dictionaries, to train statistical machine translation systems , or to perform quality estimation. In most areas of natural language processing, neural network models nowadays constitute the preferred approach, a situation that might also apply to word alignment models. In this work, we study and comprehensively evaluate neural models for unsupervised word alignment for four language pairs, contrasting several variants of neural models. We show that in most settings, neural versions of the IBM-1 and hidden Markov models vastly outperform their discrete counterparts. We also analyze typical alignment errors of the baselines that our models overcome to illustrate the benefits-and the limitations-of these new models for morphologically rich languages.

[10]：Deep Transformers with Latent Depth
标题：潜伏深度变压器
作者：Xian Li, Asa Cooper Stickland, Yuqing Tang, Xiang Kong
链接：https://arxiv.org/abs/2009.13102

摘要：The Transformer model has achieved state-of-the-art performance in many sequence modeling tasks. However, how to leverage model capacity with large or variable depths is still an open challenge. We present a probabilistic framework to automatically learn which layer(s) to use by learning the posterior distributions of layer selection. As an extension of this framework, we propose a novel method to train one shared Transformer network for multilingual machine translation with different layer selection posteriors for each language pair. The proposed method alleviates the vanishing gradient issue and enables stable training of deep Transformers (e.g. 100 layers). We evaluate on WMT English-German machine translation and masked language modeling tasks, where our method outperforms existing approaches for training deeper Transformers. Experiments on multilingual machine translation demonstrate that this approach can effectively leverage increased model capacity and bring universal improvement for both many-to-one and one-to-many translation with diverse language pairs.

[11]：Reactive Supervision: A New Method for Collecting Sarcasm Data
标题：反应性监督：一种收集讽刺数据的新方法
作者：Boaz Shmueli, Lun-Wei Ku, Soumya Ray
备注：7 pages, 2 figures, 8 tables. To be published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)
链接：https://arxiv.org/abs/2009.13080

摘要：Sarcasm detection is an important task in affective computing, requiring large amounts of labeled data. We introduce reactive supervision, a novel data collection method that utilizes the dynamics of online conversations to overcome the limitations of existing data collection techniques. We use the new method to create and release a first-of-its-kind large dataset of tweets with sarcasm perspective labels and new contextual features. The dataset is expected to advance sarcasm detection research. Our method can be adapted to other affective computing domains, thus opening up new research opportunities.

[12]：What does it mean to be language-agnostic? Probing multilingual sentence encoders for typological properties
标题：不可知论是什么意思？探讨多语种句子编码的类型化特性
作者：Rochelle Choenni, Ekaterina Shutova
链接：https://arxiv.org/abs/2009.12862

摘要：Multilingual sentence encoders have seen much success in cross-lingual model transfer for downstream NLP tasks. Yet, we know relatively little about the properties of individual languages or the general patterns of linguistic variation that they encode. We propose methods for probing sentence representations from state-of-the-art multilingual encoders (LASER, M-BERT, XLM and XLM-R) with respect to a range of typological properties pertaining to lexical, morphological and syntactic structure. In addition, we investigate how this information is distributed across all layers of the models. Our results show interesting differences in encoding linguistic variation associated with different pretraining strategies.

[13]：TernaryBERT: Distillation-aware Ultra-low Bit BERT
标题：TernaryBERT：蒸馏感知超低比特BERT
作者：Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu
链接：https://arxiv.org/abs/2009.12812

摘要：Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks.However, these models are both computation and memory expensive, hindering their deployment to resource-constrained devices. In this work, we propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model. Specifically, we use both approximation-based and loss-aware ternarization methods and empirically investigate the ternarization granularity of different parts of BERT. Moreover, to reduce the accuracy degradation caused by the lower capacity of low bits, we leverage the knowledge distillation technique in the training process. Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods, and even achieves comparable performance as the full-precision model while being 14.9x smaller.

[14]：A Brief Survey and Comparative Study of Recent Development of Pronoun Coreference Resolution
标题：代词共指消解研究近况及比较研究
作者：Hongming Zhang, Xinran Zhao, Yangqiu Song
链接：https://arxiv.org/abs/2009.12721

摘要：Pronoun Coreference Resolution (PCR) is the task of resolving pronominal expressions to all mentions they refer to. Compared with the general coreference resolution task, the main challenge of PCR is the coreference relation prediction rather than the mention detection. As one important natural language understanding (NLU) component, pronoun resolution is crucial for many downstream tasks and still challenging for existing models, which motivates us to survey existing approaches and think about how to do better. In this survey, we first introduce representative datasets and models for the ordinary pronoun coreference resolution task. Then we focus on recent progress on hard pronoun coreference resolution problems (e.g., Winograd Schema Challenge) to analyze how well current models can understand commonsense. We conduct extensive experiments to show that even though current models are achieving good performance on the standard evaluation set, they are still not ready to be used in real applications (e.g., all SOTA models struggle on correctly resolving pronouns to infrequent objects). All experiment codes are available atthis https URL.

[15]：Local and non-local dependency learning and emergence of rule-like representations in speech data by Deep Convolutional Generative Adversarial Networks
标题：基于深卷积生成对抗网络的语音数据局部和非局部依赖学习和规则表示的产生
作者：Gašper Beguš
链接：https://arxiv.org/abs/2009.12711

摘要：This paper argues that training GANs on local and non-local dependencies in speech data offers insights into how deep neural networks discretize continuous data and how symbolic-like rule-based morphophonological processes emerge in a deep convolutional architecture. Acquisition of speech has recently been modeled as a dependency between latent space and data generated by GANs in Beguš (arXiv:2006.03965), who models learning of a simple local allophonic distribution. We extend this approach to test learning of local and non-local phonological processes that include approximations of morphological processes. We further parallel outputs of the model to results of a behavioral experiment where human subjects are trained on the data used for training the GAN network. Four main conclusions emerge: (i) the networks provide useful information for computational models of language acquisition even if trained on a comparatively small dataset of an artificial grammar learning experiment; (ii) local processes are easier to learn than non-local processes, which matches both behavioral data in human subjects and typology in the world's languages. This paper also proposes (iii) how we can actively observe the network's progress in learning and explore the effect of training steps on learning representations by keeping latent space constant across different training steps. Finally, this paper shows that (iv) the network learns to encode the presence of a prefix with a single latent variable; by interpolating this variable, we can actively observe the operation of a non-local phonological process. The proposed technique for retrieving learning representations has general implications for our understanding of how GANs discretize continuous speech data and suggests that rule-like generalizations in the training data are represented as an interaction between variables in the network's latent space.

[16]：Neural Proof Nets
标题：神经网络证明
作者：Konstantinos Kogkalidis, Michael Moortgat, Richard Moot
备注：14 pages, CoNLL2020
链接：https://arxiv.org/abs/2009.12702

摘要：Linear logic and the linear {\lambda}-calculus have a long standing tradition in the study of natural language form and meaning. Among the proof calculi of linear logic, proof nets are of particular interest, offering an attractive geometric representation of derivations that is unburdened by the bureaucratic complications of conventional prooftheoretic formats. Building on recent advances in set-theoretic learning, we propose a neural variant of proof nets based on Sinkhorn networks, which allows us to translate parsing as the problem of extracting syntactic primitives and permuting them into alignment. Our methodology induces a batch-efficient, end-to-end differentiable architecture that actualizes a formally grounded yet highly efficient neuro-symbolic parser. We test our approach on ÆThel, a dataset of type-logical derivations for written Dutch, where it manages to correctly transcribe raw text sentences into proofs and terms of the linear {\lambda}-calculus with an accuracy of as high as 70%.

[17]：Automatic Arabic Dialect Identification Systems for Written Texts: A Survey
标题：阿拉伯语书面语自动识别系统综述
作者：Maha J. Althobaiti
链接：https://arxiv.org/abs/2009.12622

摘要：Arabic dialect identification is a specific task of natural language processing, aiming to automatically predict the Arabic dialect of a given text. Arabic dialect identification is the first step in various natural language processing applications such as machine translation, multilingual text-to-speech synthesis, and cross-language text generation. Therefore, in the last decade, interest has increased in addressing the problem of Arabic dialect identification. In this paper, we present a comprehensive survey of Arabic dialect identification research in written texts. We first define the problem and its challenges. Then, the survey extensively discusses in a critical manner many aspects related to Arabic dialect identification task. So, we review the traditional machine learning methods, deep learning architectures, and complex learning approaches to Arabic dialect identification. We also detail the features and techniques for feature representations used to train the proposed systems. Moreover, we illustrate the taxonomy of Arabic dialects studied in the literature, the various levels of text processing at which Arabic dialect identification are conducted (e.g., token, sentence, and document level), as well as the available annotated resources, including evaluation benchmark corpora. Open challenges and issues are discussed at the end of the survey.

[18]：Metaphor Detection using Deep Contextualized Word Embeddings
标题：基于深层语境的隐喻检测
作者：Shashwat Aggarwal, Ramesh Singh
链接：https://arxiv.org/abs/2009.12565

摘要：Metaphors are ubiquitous in natural language, and their detection plays an essential role in many natural language processing tasks, such as language understanding, sentiment analysis, etc. Most existing approaches for metaphor detection rely on complex, hand-crafted and fine-tuned feature pipelines, which greatly limit their applicability. In this work, we present an end-to-end method composed of deep contextualized word embeddings, bidirectional LSTMs and multi-head attention mechanism to address the task of automatic metaphor detection. Our method, unlike many other existing approaches, requires only the raw text sequences as input features to detect the metaphoricity of a phrase. We compare the performance of our method against the existing baselines on two benchmark datasets, TroFi, and MOH-X respectively. Experimental evaluations confirm the effectiveness of our approach.

[19]：Topic-Aware Multi-turn Dialogue Modeling
标题：主题感知多回合对话建模
作者：Yi Xu, Hai Zhao, Zhuosheng Zhang
链接：https://arxiv.org/abs/2009.12539

摘要：In the retrieval-based multi-turn dialogue modeling, it remains a challenge to select the most appropriate response according to extracting salient features in context utterances. As a conversation goes on, topic shift at discourse-level naturally happens through the continuous multi-turn dialogue context. However, all known retrieval-based systems are satisfied with exploiting local topic words for context utterance representation but fail to capture such essential global topic-aware clues at discourse-level. Instead of taking topic-agnostic n-gram utterance as processing unit for matching purpose in existing systems, this paper presents a novel topic-aware solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way, so that the resulted model is capable of capturing salient topic shift at discourse-level in need and thus effectively track topic flow during multi-turn conversation. Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network, which matches each topic segment with the response in a dual cross-attention way. Experimental results on three public datasets show TADAM can outperform the state-of-the-art method by a large margin, especially by 3.4% on E-commerce dataset that has an obvious topic shift.

[20]：iNLTK: Natural Language Toolkit for Indic Languages
标题：iNLTK：印度语的自然语言工具包
作者：Gaurav Arora
备注：Accepted at EMNLP2020's NLP-OSS workshop. Changes suggested by reviewers before final version are WIP
链接：https://arxiv.org/abs/2009.12534

摘要：We present iNLTK, an open-source NLP library consisting of pre-trained language models and out-of-the-box support for Paraphrase Generation, Textual Similarity, Sentence Embeddings, Word Embeddings, Tokenization and Text Generation in 13 Indic Languages. By using pre-trained models from iNLTK for text classification on publicly available datasets, we significantly outperform previously reported results. On these datasets, we also show that by using pre-trained models and paraphrases from iNLTK, we can achieve more than 95% of the previous best performance by using less than 10% of the training data. iNLTK is already being widely used by the community and has 40,000+ downloads, 600+ stars and 100+ forks on GitHub. The library is available atthis https URL.

[21]：Learning to Plan and Realize Separately for Open-Ended Dialogue Systems
标题：学习为开放式对话系统分别规划和实现
作者：Sashank Santhanam, Zhuo Cheng, Brodie Mather, Bonnie Dorr, Archna Bhatia, Bryanna Hebenstreit, Alan Zemel, Adam Dalton, Tomek Strzalkowski, Samira Shaikh
备注：Accepted at EMNLP 2020 (Findings)
链接：https://arxiv.org/abs/2009.12506

摘要：Achieving true human-like ability to conduct a conversation remains an elusive goal for open-ended dialogue systems. We posit this is because extant approaches towards natural language generation (NLG) are typically construed as end-to-end architectures that do not adequately model human generation processes. To investigate, we decouple generation into two separate phases: planning and realization. In the planning phase, we train two planners to generate plans for response utterances. The realization phase uses response plans to produce an appropriate response. Through rigorous evaluations, both automated and human, we demonstrate that decoupling the process into planning and realization performs better than an end-to-end approach.

[22]：BET: A Backtranslation Approach for Easy Data Augmentation in Transformer-based Paraphrase Identification Context
标题：BET：一种在基于变换器的释义识别上下文中实现数据扩充的反翻译方法
作者：Jean-Philippe Corbeil, Hadi Abdi Ghadivel
链接：https://arxiv.org/abs/2009.12452

摘要：Newly-introduced deep learning architectures, namely BERT, XLNet, RoBERTa and ALBERT, have been proved to be robust on several NLP tasks. However, the datasets trained on these architectures are fixed in terms of size and generalizability. To relieve this issue, we apply one of the most inexpensive solutions to update these datasets. We call this approach BET by which we analyze the backtranslation data augmentation on the transformer-based architectures. Using the Google Translate API with ten intermediary languages from ten different language families, we externally evaluate the results in the context of automatic paraphrase identification in a transformer-based framework. Our findings suggest that BET improves the paraphrase identification performance on the Microsoft Research Paraphrase Corpus (MRPC) to more than 3% on both accuracy and F1 score. We also analyze the augmentation in the low-data regime with downsampled versions of MRPC, Twitter Paraphrase Corpus (TPC) and Quora Question Pairs. In many low-data cases, we observe a switch from a failing model on the test set to reasonable performances. The results demonstrate that BET is a highly promising data augmentation technique: to push the current state-of-the-art of existing datasets and to bootstrap the utilization of deep learning architectures in the low-data regime of a hundred samples.

[23]：Hierarchical Sparse Variational Autoencoder for Text Encoding
标题：文本编码的分层稀疏变分自动编码器
作者：Victor Prokhorov, Yingzhen Li, Ehsan Shareghi, Nigel Collier
链接：https://arxiv.org/abs/2009.12421

摘要：In this paper we focus on unsupervised representation learning and propose a novel framework, Hierarchical Sparse Variational Autoencoder (HSVAE), that imposes sparsity on sentence representations via direct optimisation of Evidence Lower Bound (ELBO). Our experimental results illustrate that HSVAE is flexible and adapts nicely to the underlying characteristics of the corpus which is reflected by the level of sparsity and its distributional patterns.

[24]：Visually Grounded Compound PCFGs
标题：目视接地复合PCFGs
作者：Yanpeng Zhao, Ivan Titov
备注：Accepted to EMNLP 2020. Our code is available atthis https URL
链接：https://arxiv.org/abs/2009.12404

摘要：Exploiting visual groundings for language understanding has recently been drawing much attention. In this work, we study visually grounded grammar induction and learn a constituency parser from both unlabeled text and its visual groundings. Existing work on this task (Shi et al., 2019) optimizes a parser via Reinforce and derives the learning signal only from the alignment of images and sentences. While their model is relatively accurate overall, its error distribution is very uneven, with low performance on certain constituents types (e.g., 26.2% recall on verb phrases, VPs) and high on others (e.g., 79.6% recall on noun phrases, NPs). This is not surprising as the learning signal is likely insufficient for deriving all aspects of phrase-structure syntax and gradient estimates are noisy. We show that using an extension of probabilistic context-free grammar model we can do fully-differentiable end-to-end visually grounded learning. Additionally, this enables us to complement the image-text alignment loss with a language modeling objective. On the MSCOCO test captions, our model establishes a new state of the art, outperforming its non-grounded version and, thus, confirming the effectiveness of visual groundings in constituency grammar induction. It also substantially outperforms the previous grounded model, with largest improvements on more `abstract' categories (e.g., +55.1% recall on VPs).

[25]：RecoBERT: A Catalog Language Model for Text-Based Recommendations
标题：RecoBERT：基于文本的推荐的目录语言模型
作者：Itzik Malkiel, Oren Barkan, Avi Caciularu, Noam Razin, Ori Katz, Noam Koenigstein
链接：https://arxiv.org/abs/2009.13292

摘要：Language models that utilize extensive self-supervised pre-training from unlabeled text, have recently shown to significantly advance the state-of-the-art performance in a variety of language understanding tasks. However, it is yet unclear if and how these recent models can be harnessed for conducting text-based recommendations. In this work, we introduce RecoBERT, a BERT-based approach for learning catalog-specialized language models for text-based item recommendations. We suggest novel training and inference procedures for scoring similarities between pairs of items, that don't require item similarity labels. Both the training and the inference techniques were designed to utilize the unlabeled structure of textual catalogs, and minimize the discrepancy between them. By incorporating four scores during inference, RecoBERT can infer text-based item-to-item similarities more accurately than other techniques. In addition, we introduce a new language understanding task for wine recommendations using similarities based on professional wine reviews. As an additional contribution, we publish annotated recommendations dataset crafted by human wine experts. Finally, we evaluate RecoBERT and compare it to various state-of-the-art NLP models on wine and fashion recommendations tasks.

[26]：BiteNet: Bidirectional Temporal Encoder Network to Predict Medical Outcomes
标题：BiteNet：用于预测医疗结果的双向时间编码器网络
作者：Xueping Peng, Guodong Long, Tao Shen, Sen Wang, Jing Jiang, Chengqi Zhang
备注：10 pages, 8 figures, accepted by IEEE ICDM 2020. arXiv admin note: substantial text overlap witharXiv:2006.10516
链接：https://arxiv.org/abs/2009.13252

摘要：Electronic health records (EHRs) are longitudinal records of a patient's interactions with healthcare systems. A patient's EHR data is organized as a three-level hierarchy from top to bottom: patient journey - all the experiences of diagnoses and treatments over a period of time; individual visit - a set of medical codes in a particular visit; and medical code - a specific record in the form of medical codes. As EHRs begin to amass in millions, the potential benefits, which these data might hold for medical research and medical outcome prediction, are staggering - including, for example, predicting future admissions to hospitals, diagnosing illnesses or determining the efficacy of medical treatments. Each of these analytics tasks requires a domain knowledge extraction method to transform the hierarchical patient journey into a vector representation for further prediction procedure. The representations should embed a sequence of visits and a set of medical codes with a specific timestamp, which are crucial to any downstream prediction tasks. Hence, expressively powerful representations are appealing to boost learning performance. To this end, we propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey. An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys, based solely on the proposed attention mechanism. We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset. The empirical results demonstrate the proposed BiteNet model produces higher-quality representations than state-of-the-art baseline methods.

[27]：Visual Exploration and Knowledge Discovery from Biomedical Dark Data
标题：生物医学暗数据的视觉探索与知识发现
作者：Shashwat Aggarwal, Ramesh Singh
链接：https://arxiv.org/abs/2009.13059

摘要：Data visualization techniques proffer efficient means to organize and present data in graphically appealing formats, which not only speeds up the process of decision making and pattern recognition but also enables decision-makers to fully understand data insights and make informed decisions. Over time, with the rise in technological and computational resources, there has been an exponential increase in the world's scientific knowledge. However, most of it lacks structure and cannot be easily categorized and imported into regular databases. This type of data is often termed as Dark Data. Data visualization techniques provide a promising solution to explore such data by allowing quick comprehension of information, the discovery of emerging trends, identification of relationships and patterns, etc. In this empirical research study, we use the rich corpus of PubMed comprising of more than 30 million citations from biomedical literature to visually explore and understand the underlying key-insights using various information visualization techniques. We employ a natural language processing based pipeline to discover knowledge out of the biomedical dark data. The pipeline comprises of different lexical analysis techniques like Topic Modeling to extract inherent topics and major focus areas, Network Graphs to study the relationships between various entities like scientific documents and journals, researchers, and, keywords and terms, etc. With this analytical research, we aim to proffer a potential solution to overcome the problem of analyzing overwhelming amounts of information and diminish the limitation of human cognition and perception in handling and examining such large volumes of data.

中文来自机器翻译，仅供参考。