2491 阅读 2021-05-11 12:01:15 上传
以下文章来源于 语言学探索
今日 cs.CL方向共计40篇文章。
推理分析(1篇)
[1]:Easy and Efficient Transformer : Scalable Inference Solution For large NLP mode
标题:简单高效的Transformer:大NLP模式的可扩展推理解决方案
作者:Gongzheng li, Yadong Xi, Jingzhen Ding, Duan Wang, Bai Liu, Changjie Fan, Xiaoxi Mao, Zeng Zhao
链接:https://arxiv.org/abs/2104.12470
摘要:The ultra-large-scale pre-training model can effectively improve the effect of a variety of tasks, and it also brings a heavy computational burden to inference. This paper introduces a series of ultra-large-scale pre-training model optimization methods that combine algorithm characteristics and GPU processor hardware characteristics, and on this basis, propose an inference engine -- Easy and Efficient Transformer (EET), Which has a significant performance improvement over the existing schemes.
We firstly introduce a pre-padding decoding mechanism that improves token parallelism for generation tasks. Then we design high optimized kernels to remove sequence masks and achieve cost-free calculation for padding tokens, as well as support long sequence and long embedding sizes. Thirdly a user-friendly inference system with an easy service pipeline was introduced which greatly reduces the difficulty of engineering deployment with high throughput. Compared to Faster Transformer's implementation for GPT-2 on A100, EET achieves a 1.5-15x state-of-art speedup varying with context length.EET is availablethis https URL.
自然语言生成(3篇)
[1]:Auto Response Generation in Online Medical Chat Services
标题:在线医疗聊天服务中的自动响应生成
作者:Hadi Jahanshahi, Syed Kazmi, Mucahit Cevik
链接:https://arxiv.org/abs/2104.12755
摘要:Telehealth helps to facilitate access to medical professionals by enabling remote medical services for the patients. These services have become gradually popular over the years with the advent of necessary technological infrastructure. The benefits of telehealth have been even more apparent since the beginning of the COVID-19 crisis, as people have become less inclined to visit doctors in person during the pandemic. In this paper, we focus on facilitating the chat sessions between a doctor and a patient. We note that the quality and efficiency of the chat experience can be critical as the demand for telehealth services increases. Accordingly, we develop a smart auto-response generation mechanism for medical conversations that helps doctors respond to consultation requests efficiently, particularly during busy sessions. We explore over 900,000 anonymous, historical online messages between doctors and patients collected over nine months. We implement clustering algorithms to identify the most frequent responses by doctors and manually label the data accordingly. We then train machine learning algorithms using this preprocessed data to generate the responses. The considered algorithm has two steps: a filtering (i.e., triggering) model to filter out infeasible patient messages and a response generator to suggest the top-3 doctor responses for the ones that successfully pass the triggering phase. The method provides an accuracy of 83.28\% for precision@3 and shows robustness to its parameters.
[2]:Focused Attention Improves Document-Grounded Generation
标题:集中注意力改进文档生成
作者:Shrimai Prabhumoye, Kazuma Hashimoto, Yingbo Zhou, Alan W Black, Ruslan Salakhutdinov
备注:Accepted at North American Chapter of the Association for Computational Linguistics (NAACL) 2021
链接:https://arxiv.org/abs/2104.12714
摘要:Document grounded generation is the task of using the information provided in a document to improve text generation. This work focuses on two different document grounded generation tasks: Wikipedia Update Generation task and Dialogue response generation. Our work introduces two novel adaptations of large scale pre-trained encoder-decoder models focusing on building context driven representation of the document and enabling specific attention to the information in the document. Additionally, we provide a stronger BART baseline for these tasks. Our proposed techniques outperform existing methods on both automated (at least 48% increase in BLEU-4 points) and human evaluation for closeness to reference and relevance to the document. Furthermore, we perform comprehensive manual inspection of the generated output and categorize errors to provide insights into future directions in modeling these tasks.
[3]:On a Utilitarian Approach to Privacy Preserving Text Generation
标题:一种实用的隐私保护文本生成方法
作者:Zekun Xu, Abhinav Aggarwal, Oluwaseyi Feyisetan, Nathanael Teissier
备注:10 pages, 3 figures
链接:https://arxiv.org/abs/2104.11838
摘要:Differentially-private mechanisms for text generation typically add carefully calibrated noise to input words and use the nearest neighbor to the noised input as the output word. When the noise is small in magnitude, these mechanisms are susceptible to reconstruction of the original sensitive text. This is because the nearest neighbor to the noised input is likely to be the original input. To mitigate this empirical privacy risk, we propose a novel class of differentially private mechanisms that parameterizes the nearest neighbor selection criterion in traditional mechanisms. Motivated by Vickrey auction, where only the second highest price is revealed and the highest price is kept private, we balance the choice between the first and the second nearest neighbors in the proposed class of mechanisms using a tuning parameter. This parameter is selected by empirically solving a constrained optimization problem for maximizing utility, while maintaining the desired privacy guarantees. We argue that this empirical measurement framework can be used to align different mechanisms along a common benchmark for their privacy-utility tradeoff, particularly when different distance metrics are used to calibrate the amount of noise added. Our experiments on real text classification datasets show up to 50% improvement in utility compared to the existing state-of-the-art with the same empirical privacy guarantee.
文本分类(1篇)
[1]:Incremental Few-shot Text Classification with Multi-round New Classes: Formulation, Dataset and System
标题:多轮新类增量少镜头文本分类:公式、数据集和系统
作者:Congying Xia, Wenpeng Yin, Yihao Feng, Philip Yu
备注:10 pages, accepted to NAACL 2021
链接:https://arxiv.org/abs/2104.11882
摘要:Text classification is usually studied by labeling natural language texts with relevant categories from a predefined set. In the real world, new classes might keep challenging the existing system with limited labeled data. The system should be intelligent enough to recognize upcoming new classes with a few examples. In this work, we define a new task in the NLP domain, incremental few-shot text classification, where the system incrementally handles multiple rounds of new classes. For each round, there is a batch of new classes with a few labeled examples per class. Two major challenges exist in this new task: (i) For the learning process, the system should incrementally learn new classes round by round without re-training on the examples of preceding classes; (ii) For the performance, the system should perform well on new classes without much loss on preceding classes. In addition to formulating the new task, we also release two benchmark datasets in the incremental few-shot setting: intent classification and relation classification. Moreover, we propose two entailment approaches, ENTAILMENT and HYBRID, which show promise for solving this novel problem.
信息检索(1篇)
[1]:GermanQuAD and GermanDPR: Improving Non-English Question Answering and Passage Retrieval
标题:GermanQuAD和GermanDPR:改进非英语问答和文章检索
作者:Timo Möller, Julian Risch, Malte Pietsch
备注:Seethis https URLfor downloading the datasets and models
链接:https://arxiv.org/abs/2104.12741
摘要:A major challenge of research on non-English machine reading for question answering (QA) is the lack of annotated datasets. In this paper, we present GermanQuAD, a dataset of 13,722 extractive question/answer pairs. To improve the reproducibility of the dataset creation approach and foster QA research on other languages, we summarize lessons learned and evaluate reformulation of question/answer pairs as a way to speed up the annotation process. An extractive QA model trained on GermanQuAD significantly outperforms multilingual models and also shows that machine-translated training data cannot fully substitute hand-annotated training data in the target language. Finally, we demonstrate the wide range of applications of GermanQuAD by adapting it to GermanDPR, a training dataset for dense passage retrieval (DPR), and train and evaluate the first non-English DPR model.
信息抽取(1篇)
[1]:Explore BiLSTM-CRF-Based Models for Open Relation Extraction
标题:基于BiLSTM-CRF的开放关系抽取模型研究
作者:Tao Ni, Qing Wang, Gabriela Ferraro
链接:https://arxiv.org/abs/2104.12333
摘要:Extracting multiple relations from text sentences is still a challenge for current Open Relation Extraction (Open RE) tasks. In this paper, we develop several Open RE models based on the bidirectional LSTM-CRF (BiLSTM-CRF) neural network and different contextualized word embedding methods. We also propose a new tagging scheme to solve overlapping problems and enhance models' performance. From the evaluation results and comparisons between models, we select the best combination of tagging scheme, word embedder, and BiLSTM-CRF network to achieve an Open RE model with a remarkable extracting ability on multiple-relation sentences.
问答系统(1篇)
[1]:Ask & Explore: Grounded Question Answering for Curiosity-Driven Exploration
标题:提问与探索:好奇心驱动的探究式接地问答
作者:Jivat Neet Kaur, Yiding Jiang, Paul Pu Liang
备注:Accepted at ICLR 2021 Workshop on Embodied Multimodal Learning
链接:https://arxiv.org/abs/2104.11902
摘要:In many real-world scenarios where extrinsic rewards to the agent are extremely sparse, curiosity has emerged as a useful concept providing intrinsic rewards that enable the agent to explore its environment and acquire information to achieve its goals. Despite their strong performance on many sparse-reward tasks, existing curiosity approaches rely on an overly holistic view of state transitions, and do not allow for a structured understanding of specific aspects of the environment. In this paper, we formulate curiosity based on grounded question answering by encouraging the agent to ask questions about the environment and be curious when the answers to these questions change. We show that natural language questions encourage the agent to uncover specific knowledge about their environment such as the physical properties of objects as well as their spatial relationships with other objects, which serve as valuable curiosity rewards to solve sparse-reward tasks more efficiently.
机器翻译(2篇)
[1]:Reranking Machine Translation Hypotheses with Structured and Web-based Language Models
标题:用结构化和基于Web的语言模型重新排列机器翻译假设
作者:Wen Wang, Andreas Stolcke, Jing Zheng
备注:With a correction to the math in Figure 1 caption
链接:https://arxiv.org/abs/2104.12277
摘要:In this paper, we investigate the use of linguistically motivated and computationally efficient structured language models for reranking N-best hypotheses in a statistical machine translation system. These language models, developed from Constraint Dependency Grammar parses, tightly integrate knowledge of words, morphological and lexical features, and syntactic dependency constraints. Two structured language models are applied for N-best rescoring, one is an almost-parsing language model, and the other utilizes more syntactic features by explicitly modeling syntactic dependencies between words. We also investigate effective and efficient language modeling methods to use N-grams extracted from up to 1 teraword of web documents. We apply all these language models for N-best re-ranking on the NIST and DARPA GALE program 2006 and 2007 machine translation evaluation tasks and find that the combination of these language models increases the BLEU score up to 1.6% absolutely on blind test sets.
[2]:Modeling Coverage for Non-Autoregressive Neural Machine Translation
标题:非自回归神经机器翻译的覆盖范围建模
作者:Yong Shan, Yang Feng, Chenze Shao
备注:Accepted by the 2021 International Joint Conference on Neural Networks (IJCNN 2021)
链接:https://arxiv.org/abs/2104.11897
摘要:Non-Autoregressive Neural Machine Translation (NAT) has achieved significant inference speedup by generating all tokens simultaneously. Despite its high efficiency, NAT usually suffers from two kinds of translation errors: over-translation (e.g. repeated tokens) and under-translation (e.g. missing translations), which eventually limits the translation quality. In this paper, we argue that these issues of NAT can be addressed through coverage modeling, which has been proved to be useful in autoregressive decoding. We propose a novel Coverage-NAT to model the coverage information directly by a token-level coverage iterative refinement mechanism and a sentence-level coverage agreement, which can remind the model if a source token has been translated or not and improve the semantics consistency between the translation and the source, respectively. Experimental results on WMT14 En-De and WMT16 En-Ro translation tasks show that our method can alleviate those errors and achieve strong improvements over the baseline system.
自动摘要(1篇)
[1]:GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization
标题:GPT2MVS:用于多模式视频摘要生成的预训练Transformer-2
作者:Jia-Hong Huang, Luka Murn, Marta Mrak, Marcel Worring
备注:This paper is accepted by ACM International Conference on Multimedia Retrieval (ICMR), 2021
链接:https://arxiv.org/abs/2104.12465
摘要:Traditional video summarization methods generate fixed video representations regardless of user interest. Therefore such methods limit users' expectations in content search and exploration scenarios. Multi-modal video summarization is one of the methods utilized to address this problem. When multi-modal video summarization is used to help video exploration, a text-based query is considered as one of the main drivers of video summary generation, as it is user-defined. Thus, encoding the text-based query and the video effectively are both important for the task of multi-modal video summarization. In this work, a new method is proposed that uses a specialized attention network and contextualized word representations to tackle this task. The proposed model consists of a contextualized video summary controller, multi-modal attention mechanisms, an interactive attention network, and a video summary generator. Based on the evaluation of the existing multi-modal video summarization benchmark, experimental results show that the proposed model is effective with the increase of +5.88% in accuracy and +4.06% increase of F1-score, compared with the state-of-the-art method.
词义消歧(1篇)
[1]:Non-Parametric Few-Shot Learning for Word Sense Disambiguation
标题:非参数少镜头学习用于词义消歧
作者:Howard Chen, Mengzhou Xia, Danqi Chen
备注:Our code is publicly available at:this https URL
链接:https://arxiv.org/abs/2104.12677
摘要:Word sense disambiguation (WSD) is a long-standing problem in natural language processing. One significant challenge in supervised all-words WSD is to classify among senses for a majority of words that lie in the long-tail distribution. For instance, 84% of the annotated words have less than 10 examples in the SemCor training data. This issue is more pronounced as the imbalance occurs in both word and sense distributions. In this work, we propose MetricWSD, a non-parametric few-shot learning approach to mitigate this data imbalance issue. By learning to compute distances among the senses of a given word through episodic training, MetricWSD transfers knowledge (a learned metric space) from high-frequency words to infrequent ones. MetricWSD constructs the training episodes tailored to word frequencies and explicitly addresses the problem of the skewed distribution, as opposed to mixing all the words trained with parametric models in previous work. Without resorting to any lexical resources, MetricWSD obtains strong performance against parametric alternatives, achieving a 75.1 F1 score on the unified WSD evaluation benchmark (Raganato et al., 2017b). Our analysis further validates that infrequent words and senses enjoy significant improvement.
句法分析(1篇)
[1]:Open Intent Discovery through Unsupervised Semantic Clustering and Dependency Parsing
标题:基于无监督语义聚类和依赖分析的开放意图发现
作者:Pengfei Liu, Youzhang Ning, King Keung Wu, Kun Li, Helen Meng
链接:https://arxiv.org/abs/2104.12114
摘要:Intent understanding plays an important role in dialog systems, and is typically formulated as a supervised classification problem. However, it is challenging and time-consuming to design the intent labels manually to support a new domain. This paper proposes an unsupervised two-stage approach to discover intents and generate meaningful intent labels automatically from a collection of unlabeled utterances. In the first stage, we aim to generate a set of semantically coherent clusters where the utterances within each cluster convey the same intent. We obtain the utterance representation from various pre-trained sentence embeddings and present a metric of balanced score to determine the optimal number of clusters in K-means clustering. In the second stage, the objective is to generate an intent label automatically for each cluster. We extract the ACTION-OBJECT pair from each utterance using a dependency parser and take the most frequent pair within each cluster, e.g., book-restaurant, as the generated cluster label. We empirically show that the proposed unsupervised approach can generate meaningful intent labels automatically and achieves high precision and recall in utterance clustering and intent discovery.
模型(1篇)
[1]:PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
标题:None
作者:Wei Zeng, Xiaozhe Ren, Teng Su, Hui Wang, Yi Liao, Zhiwei Wang, Xin Jiang, ZhenZhang Yang, Kaisheng Wang, Xiaoda Zhang, Chen Li, Ziyan Gong, Yifan Yao, Xinjing Huang, Jun Wang, Jianfeng Yu, Qi Guo, Yue Yu, Yan Zhang, Jin Wang, Hengtao Tao, Dasen Yan, Zexuan Yi, Fang Peng, Fangqing Jiang, Han Zhang, Lingfeng Deng, Yehong Zhang, Zhe Lin, Chao Zhang, Shaojie Zhang, Mingyue Guo, Shanzhi Gu, Gaojun Fan, Yaowei Wang, Xuefeng Jin, Qun Liu, Yonghong Tian
备注:The technique report for PanGu-$α$
链接:https://arxiv.org/abs/2104.12369
摘要:Large-scale Pretrained Language Models (PLMs) have become the new paradigm for Natural Language Processing (NLP). PLMs with hundreds of billions parameters such as GPT-3 have demonstrated strong performances on natural language understanding and generation with \textit{few-shot in-context} learning. In this work, we present our practice on training large-scale autoregressive language models named PanGu-$\alpha$, with up to 200 billion parameters. PanGu-$\alpha$ is developed under the MindSpore and trained on a cluster of 2048 Ascend 910 AI processors. The training parallelism strategy is implemented based on MindSpore Auto-parallel, which composes five parallelism dimensions to scale the training task to 2048 processors efficiently, including data parallelism, op-level model parallelism, pipeline model parallelism, optimizer model parallelism and rematerialization. To enhance the generalization ability of PanGu-$\alpha$, we collect 1.1TB high-quality Chinese data from a wide range of domains to pretrain the model. We empirically test the generation ability of PanGu-$\alpha$ in various scenarios including text summarization, question answering, dialogue generation, etc. Moreover, we investigate the effect of model scales on the few-shot performances across a broad range of Chinese NLP tasks. The experimental results demonstrate the superior capabilities of PanGu-$\alpha$ in performing various tasks under few-shot or zero-shot settings.
其他(26篇)
[1]:Exploring Bayesian Deep Learning for Urgent Instructor Intervention Need in MOOC Forums
标题:探讨Bayesian深度学习在MOOC论坛中对教师紧急干预的需求
作者:Jialin Yu, Laila Alrajhi, Anoushka Harit, Zhongtian Sun, Alexandra I. Cristea, Lei Shi
链接:https://arxiv.org/abs/2104.12643
摘要:Massive Open Online Courses (MOOCs) have become a popular choice for e-learning thanks to their great flexibility. However, due to large numbers of learners and their diverse backgrounds, it is taxing to offer real-time support. Learners may post their feelings of confusion and struggle in the respective MOOC forums, but with the large volume of posts and high workloads for MOOC instructors, it is unlikely that the instructors can identify all learners requiring intervention. This problem has been studied as a Natural Language Processing (NLP) problem recently, and is known to be challenging, due to the imbalance of the data and the complex nature of the task. In this paper, we explore for the first time Bayesian deep learning on learner-based text posts with two methods: Monte Carlo Dropout and Variational Inference, as a new solution to assessing the need of instructor interventions for a learner's post. We compare models based on our proposed methods with probabilistic modelling to its baseline non-Bayesian models under similar circumstances, for different cases of applying prediction. The results suggest that Bayesian deep learning offers a critical uncertainty measure that is not supplied by traditional neural networks. This adds more explainability, trust and robustness to AI, which is crucial in education-based applications. Additionally, it can achieve similar or better performance compared to non-probabilistic neural networks, as well as grant lower variance.
[2]:Evaluating the Values of Sources in Transfer Learning
标题:迁移学习中资源价值的评估
作者:Md Rizwan Parvez, Kai-Wei Chang
备注:NAACL 2021 Camera Ready
链接:https://arxiv.org/abs/2104.12567
摘要:Transfer learning that adapts a model trained on data-rich sources to low-resource targets has been widely applied in natural language processing (NLP). However, when training a transfer model over multiple sources, not every source is equally useful for the target. To better transfer a model, it is essential to understand the values of the sources. In this paper, we develop SEAL-Shap, an efficient source valuation framework for quantifying the usefulness of the sources (e.g., domains/languages) in transfer learning based on the Shapley value method. Experiments and comprehensive analyses on both cross-domain and cross-lingual transfers demonstrate that our framework is not only effective in choosing useful transfer sources but also the source values match the intuitive source-target similarity.
[3]:What Makes a Message Persuasive? Identifying Adaptations Towards Persuasiveness in Nine Exploratory Case Studies
标题:是什么让信息有说服力?在九个探索性案例研究中识别对说服力的适应
作者:Sebastian Duerr, Krystian Teodor Lange, Peter A. Gloor
备注:10 pages, 4 tables
链接:https://arxiv.org/abs/2104.12454
摘要:The ability to persuade others is critical to professional and personal success. However, crafting persuasive messages is demanding and poses various challenges. We conducted nine exploratory case studies to identify adaptations that professional and non-professional writers make in written scenarios to increase their subjective persuasiveness. Furthermore, we identified challenges that those writers faced and identified strategies to resolve them with persuasive natural language generation, i.e., artificial intelligence. Our findings show that humans can achieve high degrees of persuasiveness (more so for professional-level writers), and artificial intelligence can complement them to achieve increased celerity and alignment in the process.
[4]:Attention vs non-attention for a Shapley-based explanation method
标题:基于Shapley解释方法的注意与不注意
作者:Tom Kersten, Hugh Mee Wong, Jaap Jumelet, Dieuwke Hupkes
备注:Accepted for publication at DeeLIO 2021
链接:https://arxiv.org/abs/2104.12424
摘要:The field of explainable AI has recently seen an explosion in the number of explanation methods for highly non-linear deep neural networks. The extent to which such methods -- that are often proposed and tested in the domain of computer vision -- are appropriate to address the explainability challenges in NLP is yet relatively unexplored. In this work, we consider Contextual Decomposition (CD) -- a Shapley-based input feature attribution method that has been shown to work well for recurrent NLP models -- and we test the extent to which it is useful for models that contain attention operations. To this end, we extend CD to cover the operations necessary for attention-based models. We then compare how long distance subject-verb relationships are processed by models with and without attention, considering a number of different syntactic structures in two different languages: English and Dutch. Our experiments confirm that CD can successfully be applied for attention-based models as well, providing an alternative Shapley-based attribution method for modern neural networks. In particular, using CD, we show that the English and Dutch models demonstrate similar processing behaviour, but that under the hood there are consistent differences between our attention and non-attention models.
[5]:Teaching NLP with Bracelets and Restaurant Menus: An Interactive Workshop for Italian Students
标题:用手镯和餐厅菜单教授NLP:一个面向意大利学生的交互式工作坊
作者:Ludovica Pannitto, Lucia Busso, Claudia Roberta Combei, Lucio Messina, Alessio Miaschi, Gabriele Sarti, Malvina Nissim
备注:11 pages, 16 figures, accepted at Teaching NLP 2021 Workshop
链接:https://arxiv.org/abs/2104.12422
摘要:Although Natural Language Processing (NLP) is at the core of many tools young people use in their everyday life, high school curricula (in Italy) do not include any computational linguistics education. This lack of exposure makes the use of such tools less responsible than it could be and makes choosing computational linguistics as a university degree unlikely. To raise awareness, curiosity, and longer-term interest in young people, we have developed an interactive workshop designed to illustrate the basic principles of NLP and computational linguistics to high school Italian students aged between 13 and 18 years. The workshop takes the form of a game in which participants play the role of machines needing to solve some of the most common problems a computer faces in understanding language: from voice recognition to Markov chains to syntactic parsing. Participants are guided through the workshop with the help of instructors, who present the activities and explain core concepts from computational linguistics. The workshop was presented at numerous outlets in Italy between 2019 and 2021, both face-to-face and online.
[6]:A dissemination workshop for introducing young Italian students to NLP
标题:向意大利青年学生介绍NLP的传播研讨会
作者:Lucio Messina, Lucia Busso, Claudia Roberta Combei, Ludovica Pannitto, Alessio Miaschi, Gabriele Sarti, Malvina Nissim
备注:3 pages, 4 figures, accepted at Teaching NLP 2021 workshop
链接:https://arxiv.org/abs/2104.12405
摘要:We describe and make available the game-based material developed for a laboratory run at several Italian science festivals to popularize NLP among young students.
[7]:DADgraph: A Discourse-aware Dialogue Graph Neural Network for Multiparty Dialogue Machine Reading Comprehension
标题:DADgraph:一种用于多方对话机阅读理解的语篇感知对话图神经网络
作者:Jiaqi Li, Ming Liu, Zihao Zheng, Heng Zhang, Bing Qin, Min-Yen Kan, Ting Liu
备注:Accepted by IJCNN 2021
链接:https://arxiv.org/abs/2104.12377
摘要:Multiparty Dialogue Machine Reading Comprehension (MRC) differs from traditional MRC as models must handle the complex dialogue discourse structure, previously unconsidered in traditional MRC. To fully exploit such discourse structure in multiparty dialogue, we present a discourse-aware dialogue graph neural network, DADgraph, which explicitly constructs the dialogue graph using discourse dependency links and discourse relations. To validate our model, we perform experiments on the Molweni corpus, a large-scale MRC dataset built over multiparty dialogue annotated with discourse structure. Experiments on Molweni show that our discourse-aware model achieves statistically significant improvements compared against strong neural network MRC baselines.
[8]:A Sliding-Window Approach to Automatic Creation of Meeting Minutes
标题:自动生成会议纪要的滑动窗口方法
作者:Jia Jin Koay, Alexander Roustai, Xiaojin Dai, Fei Liu
备注:8 pages
链接:https://arxiv.org/abs/2104.12324
摘要:Meeting minutes record any subject matters discussed, decisions reached and actions taken at meetings. The importance of minuting cannot be overemphasized in a time when a significant number of meetings take place in the virtual space. In this paper, we present a sliding window approach to automatic generation of meeting minutes. It aims to tackle issues associated with the nature of spoken text, including lengthy transcripts and lack of document structure, which make it difficult to identify salient content to be included in the meeting minutes. Our approach combines a sliding window and a neural abstractive summarizer to navigate through the transcripts to find salient content. The approach is evaluated on transcripts of natural meeting conversations, where we compare results obtained for human transcripts and two versions of automatic transcripts and discuss how and to what extent the summarizer succeeds at capturing salient content.
[9]:A Bi-Encoder LSTM Model For Learning Unstructured Dialogs
标题:非结构化对话学习的双编码器LSTM模型
作者:Diwanshu Shekhar, Pooran S. Negi, Mohammad Mahoor
链接:https://arxiv.org/abs/2104.12269
摘要:Creating a data-driven model that is trained on a large dataset of unstructured dialogs is a crucial step in developing Retrieval-based Chatbot systems. This paper presents a Long Short Term Memory (LSTM) based architecture that learns unstructured multi-turn dialogs and provides results on the task of selecting the best response from a collection of given responses. Ubuntu Dialog Corpus Version 2 was used as the corpus for training. We show that our model achieves 0.8%, 1.0% and 0.3% higher accuracy for Recall@1, Recall@2 and Recall@5 respectively than the benchmark model. We also show results on experiments performed by using several similarity functions, model hyper-parameters and word embeddings on the proposed architecture
[10]:Contextual Lexicon-Based Approach for Hate Speech and Offensive Language Detection
标题:基于上下文词典的仇恨言语和攻击性语言检测方法
作者:Francielle Alves Vargas, Fabiana Rodrigues de Góes, Isabelle Carvalho, Fabrício Benevenuto, Thiago Alexandre Salgueiro Pardo
链接:https://arxiv.org/abs/2104.12265
摘要:This paper presents a new approach for offensive language and hate speech detection on social media. Our approach incorporates an offensive lexicon composed by implicit and explicit offensive and swearing expressions annotated with binary classes: context-dependent offensive and context-independent offensive. Due to the severity of the hate speech and offensive comments in Brazil and the lack of research in Portuguese, Brazilian Portuguese is the language used to validate our method. However, the proposal may be applied to any other language or domain. Based on the obtained results, the proposed approach showed high performance results overcoming the current baselines for European and Brazilian Portuguese.
[11]:XLM-T: A Multilingual Language Model Toolkit for Twitter
标题:XLM-T:一个用于Twitter的多语言模型工具箱
作者:Francesco Barbieri, Luis Espinosa Anke, Jose Camacho-Collados
备注:Submitted to ACL demo. Code and data available atthis https URL
链接:https://arxiv.org/abs/2104.12250
摘要:Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracted considerable attention. However, current analyses have almost exclusively focused on (multilingual variants of) standard benchmarks, and have relied on clean pre-training and task-specific corpora as multilingual signals. In this paper, we introduce XLM-T, a framework for using and evaluating multilingual language models in Twitter. This framework features two main assets: (1) a strong multilingual baseline consisting of an XLM-R (Conneau et al. 2020) model pre-trained on millions of tweets in over thirty languages, alongside starter code to subsequently fine-tune on a target task; and (2) a set of unified sentiment analysis Twitter datasets in eight different languages. This is a modular framework that can easily be extended to additional tasks, as well as integrated with recent efforts also aimed at the homogenization of Twitter-specific datasets (Barbieri et al. 2020).
[12]:Identifying Offensive Expressions of Opinion in Context
标题:在上下文中识别攻击性的意见表达
作者:Francielle Alves Vargas, Isabelle Carvalho, Fabiana Rodrigues de Góes
链接:https://arxiv.org/abs/2104.12227
摘要:Classic information extraction techniques consist in building questions and answers about the facts. Indeed, it is still a challenge to subjective information extraction systems to identify opinions and feelings in context. In sentiment-based NLP tasks, there are few resources to information extraction, above all offensive or hateful opinions in context. To fill this important gap, this short paper provides a new cross-lingual and contextual offensive lexicon, which consists of explicit and implicit offensive and swearing expressions of opinion, which were annotated in two different classes: context dependent and context-independent offensive. In addition, we provide markers to identify hate speech. Annotation approach was evaluated at the expression-level and achieves high human inter-annotator agreement. The provided offensive lexicon is available in Portuguese and English languages.
[13]:Transformers to Fight the COVID-19 Infodemic
标题:变形金刚与COVID-19信息战
作者:Lasitha Uyangodage, Tharindu Ranasinghe, Hansi Hettiarachchi
备注:Accepted to Workshop on NLP for Internet Freedom (NLP4IF) at NAACL 2021
链接:https://arxiv.org/abs/2104.12201
摘要:The massive spread of false information on social media has become a global risk especially in a global pandemic situation like COVID-19. False information detection has thus become a surging research topic in recent months. NLP4IF-2021 shared task on fighting the COVID-19 infodemic has been organised to strengthen the research in false information detection where the participants are asked to predict seven different binary labels regarding false information in a tweet. The shared task has been organised in three languages; Arabic, Bulgarian and English. In this paper, we present our approach to tackle the task objective using transformers. Overall, our approach achieves a 0.707 mean F1 score in Arabic, 0.578 mean F1 score in Bulgarian and 0.864 mean F1 score in English ranking 4th place in all the languages.
[14]:Automatic Post-Editing for Translating Chinese Novels to Vietnamese
标题:汉语小说越南语翻译的自动后编辑
作者:Thanh Vu, Dai Quoc Nguyen
链接:https://arxiv.org/abs/2104.12128
摘要:Automatic post-editing (APE) is an important remedy for reducing errors of raw translated texts that are produced by machine translation (MT) systems or software-aided translation. In this paper, we present the first attempt to tackle the APE task for Vietnamese. Specifically, we construct the first large-scale dataset of 5M Vietnamese translated and corrected sentence pairs. We then apply strong neural MT models to handle the APE task, using our constructed dataset. Experimental results from both automatic and human evaluations show the effectiveness of the neural MT models in handling the Vietnamese APE task.
[15]:Vietnamese Open-domain Complaint Detection in E-Commerce Websites
标题:电子商务网站中的越南开放域投诉检测
作者:Nhung Thi-Hong Nguyen, Phuong Ha-Dieu Phan, Luan Thanh Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen
链接:https://arxiv.org/abs/2104.11969
摘要:Customer product reviews play a role in improving the quality of products and services for organizations or brands. Complaining is an attitude that expresses dissatisfaction with an event or a product not meeting customer expectations. In this paper, we build a Vietnamese dataset (UIT-ViOCD), including 5,485 human-annotated reviews on four categories about product reviews on e-commerce sites. After the data collection phase, we proceed to the annotation task and achieve Am = 87% by Fleiss' Kappa. Then, we present an extensive methodology for the research purposes and achieve 92.16% by F1-score for identifying complaints. With the results, in the future, we want to build a system for open-domain complaint detection in E-commerce websites.
[16]:Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation
标题:提取然后提取:高效和有效的任务不可知提取
作者:Cheng Chen, Yichun Yin, Lifeng Shang, Zhi Wang, Xin Jiang, Xiao Chen, Qun Liu
链接:https://arxiv.org/abs/2104.11928
摘要:Task-agnostic knowledge distillation, a teacher-student framework, has been proved effective for BERT compression. Although achieving promising results on NLP tasks, it requires enormous computational resources. In this paper, we propose Extract Then Distill (ETD), a generic and flexible strategy to reuse the teacher's parameters for efficient and effective task-agnostic distillation, which can be applied to students of any size. Specifically, we introduce two variants of ETD, ETD-Rand and ETD-Impt, which extract the teacher's parameters in a random manner and by following an importance metric respectively. In this way, the student has already acquired some knowledge at the beginning of the distillation process, which makes the distillation process converge faster. We demonstrate the effectiveness of ETD on the GLUE benchmark and SQuAD. The experimental results show that: (1) compared with the baseline without an ETD strategy, ETD can save 70\% of computation cost. Moreover, it achieves better results than the baseline when using the same computing resource. (2) ETD is generic and has been proven effective for different distillation methods (e.g., TinyBERT and MiniLM) and students of different sizes. The source code will be publicly available upon publication.
[17]:Towards Trustworthy Deception Detection: Benchmarking Model Robustness across Domains, Modalities, and Languages
标题:走向可信的欺骗检测:跨领域、模式和语言的基准模型稳健性
作者:Maria Glenski, Ellyn Ayton, Robin Cosbey, Dustin Arendt, Svitlana Volkova
链接:https://arxiv.org/abs/2104.11761
摘要:Evaluating model robustness is critical when developing trustworthy models not only to gain deeper understanding of model behavior, strengths, and weaknesses, but also to develop future models that are generalizable and robust across expected environments a model may encounter in deployment. In this paper we present a framework for measuring model robustness for an important but difficult text classification task - deceptive news detection. We evaluate model robustness to out-of-domain data, modality-specific features, and languages other than English.
Our investigation focuses on three type of models: LSTM models trained on multiple datasets(Cross-Domain), several fusion LSTM models trained with images and text and evaluated with three state-of-the-art embeddings, BERT ELMo, and GloVe (Cross-Modality), and character-level CNN models trained on multiple languages (Cross-Language). Our analyses reveal a significant drop in performance when testing neural models on out-of-domain data and non-English languages that may be mitigated using diverse training data. We find that with additional image content as input, ELMo embeddings yield significantly fewer errors compared to BERT orGLoVe. Most importantly, this work not only carefully analyzes deception model robustness but also provides a framework of these analyses that can be applied to new models or extended datasets in the future.
[18]:MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
标题:MDETR——用于端到端多模态理解的调制检测
作者:Aishwarya Kamath, Mannat Singh, Yann LeCun, Ishan Misra, Gabriel Synnaeve, Nicolas Carion
链接:https://arxiv.org/abs/2104.12763
摘要:Multi-modal reasoning systems rely on a pre-trained object detector to extract regions of interest from the image. However, this crucial module is typically used as a black box, trained independently of the downstream task and on a fixed vocabulary of objects and attributes. This makes it challenging for such systems to capture the long tail of visual concepts expressed in free form text. In this paper we propose MDETR, an end-to-end modulated detector that detects objects in an image conditioned on a raw text query, like a caption or a question. We use a transformer-based architecture to reason jointly over text and image by fusing the two modalities at an early stage of the model. We pre-train the network on 1.3M text-image pairs, mined from pre-existing multi-modal datasets having explicit alignment between phrases in text and objects in the image. We then fine-tune on several downstream tasks such as phrase grounding, referring expression comprehension and segmentation, achieving state-of-the-art results on popular benchmarks. We also investigate the utility of our model as an object detector on a given label set when fine-tuned in a few-shot setting. We show that our pre-training approach provides a way to handle the long tail of object categories which have very few labelled instances. Our approach can be easily extended for visual question answering, achieving competitive performance on GQA and CLEVR. The code and models are available atthis https URL.
[19]:InfographicVQA
标题:信息图表VQA
作者:Minesh Mathew, Viraj Bagal, Rubèn Pérez Tito, Dimosthenis Karatzas, Ernest Valveny, C.V Jawahar
链接:https://arxiv.org/abs/2104.12756
摘要:Infographics are documents designed to effectively communicate information using a combination of textual, graphical and visual elements. In this work, we explore the automatic understanding of infographic images by using Visual Question Answeringthis http URLthis end, we present InfographicVQA, a new dataset that comprises a diverse collection of infographics along with natural language questions and answers annotations. The collected questions require methods to jointly reason over the document layout, textual content, graphical elements, and data visualizations. We curate the dataset with emphasis on questions that require elementary reasoning and basic arithmetic skills. Finally, we evaluate two strong baselines based on state of the art multi-modal VQA models, and establish baseline performance for the new task. The dataset, code and leaderboard will be made available atthis http URL
[20]:Contextualized Keyword Representations for Multi-modal Retinal Image Captioning
标题:多模式视网膜图像字幕的上下文关键字表示
作者:Jia-Hong Huang, Ting-Wei Wu, Marcel Worring
备注:This paper is accepted by ACM International Conference on Multimedia Retrieval (ICMR), 2021
链接:https://arxiv.org/abs/2104.12471
摘要:Medical image captioning automatically generates a medical description to describe the content of a given medical image. A traditional medical image captioning model creates a medical description only based on a single medical image input. Hence, an abstract medical description or concept is hard to be generated based on the traditional approach. Such a method limits the effectiveness of medical image captioning. Multi-modal medical image captioning is one of the approaches utilized to address this problem. In multi-modal medical image captioning, textual input, e.g., expert-defined keywords, is considered as one of the main drivers of medical description generation. Thus, encoding the textual input and the medical image effectively are both important for the task of multi-modal medical image captioning. In this work, a new end-to-end deep multi-modal medical image captioning model is proposed. Contextualized keyword representations, textual feature reinforcement, and masked self-attention are used to develop the proposed approach. Based on the evaluation of the existing multi-modal medical image captioning dataset, experimental results show that the proposed model is effective with the increase of +53.2% in BLEU-avg and +18.6% in CIDEr, compared with the state-of-the-art method.
[21]:Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis
标题:日语文语合成中基于双向编码器表示的短语中断预测
作者:Kosuke Futamata, Byeongseon Park, Ryuichi Yamamoto, Kentaro Tachibana
备注:Submitted to INTERSPEECH 2021
链接:https://arxiv.org/abs/2104.12395
摘要:We propose a novel phrase break prediction method that combines implicit features extracted from a pre-trained large language model, a.k.a BERT, and explicit features extracted from BiLSTM with linguistic features. In conventional BiLSTM based methods, word representations and/or sentence representations are used as independent components. The proposed method takes account of both representations to extract the latent semantics, which cannot be captured by previous methods. The objective evaluation results show that the proposed method obtains an absolute improvement of 3.2 points for the F1 score compared with BiLSTM-based conventional methods using linguistic features. Moreover, the perceptual listening test results verify that a TTS system that applied our proposed method achieved a mean opinion score of 4.39 in prosody naturalness, which is highly competitive with the score of 4.37 for synthesized speech with ground-truth phrase breaks.
[22]:User Preference-aware Fake News Detection
标题:基于用户偏好的虚假新闻检测
作者:Yingtong Dou, Kai Shu, Congying Xia, Philip S. Yu, Lichao Sun
备注:Accepted by SIGIR'21. Code is available atthis https URL
链接:https://arxiv.org/abs/2104.12259
摘要:Disinformation and fake news have posed detrimental effects on individuals and society in recent years, attracting broad attention to fake news detection. The majority of existing fake news detection algorithms focus on mining news content and/or the surrounding exogenous context for discovering deceptive signals; while the endogenous preference of a user when he/she decides to spread a piece of fake news or not is ignored. The confirmation bias theory has indicated that a user is more likely to spread a piece of fake news when it confirms his/her existing beliefs/preferences. Users' historical, social engagements such as posts provide rich information about users' preferences toward news and have great potential to advance fake news detection. However, the work on exploring user preference for fake news detection is somewhat limited. Therefore, in this paper, we study the novel problem of exploiting user preference for fake news detection. We propose a new framework, UPFD, which simultaneously captures various signals from user preferences by joint content and graph modeling. Experimental results on real-world datasets demonstrate the effectiveness of the proposed framework. We release our code and data as a benchmark for GNN-based fake news detection:this https URL.
[23]:Language ID Prediction from Speech Using Self-Attentive Pooling and 1D-Convolutions
标题:基于自聚焦池和一维卷积的语言ID预测
作者:Roman Bedyakin, Nikolay Mikhaylovskiy
备注:Accepted to SYGTYP-2021
链接:https://arxiv.org/abs/2104.11985
摘要:This memo describes NTR-TSU submission for SIGTYP 2021 Shared Task on predicting language IDs from speech.
Spoken Language Identification (LID) is an important step in a multilingual Automated Speech Recognition (ASR) system pipeline. For many low-resource and endangered languages, only single-speaker recordings may be available, demanding a need for domain and speaker-invariant language ID systems. In this memo, we show that a convolutional neural network with a Self-Attentive Pooling layer shows promising results for the language identification task.
[24]:MusCaps: Generating Captions for Music Audio
标题:MusCaps:为音乐音频生成字幕
作者:Ilaria Manco, Emmanouil Benetos, Elio Quinton, Gyorgy Fazekas
备注:Accepted to IJCNN 2021 for the Special Session on Representation Learning for Audio, Speech, and Music Processing
链接:https://arxiv.org/abs/2104.11984
摘要:Content-based music information retrieval has seen rapid progress with the adoption of deep learning. Current approaches to high-level music description typically make use of classification models, such as in auto-tagging or genre and mood classification. In this work, we propose to address music description via audio captioning, defined as the task of generating a natural language description of music audio content in a human-like manner. To this end, we present the first music audio captioning model, MusCaps, consisting of an encoder-decoder with temporal attention. Our method combines convolutional and recurrent neural network architectures to jointly process audio-text inputs through a multimodal encoder and leverages pre-training on audio data to obtain representations that effectively capture and summarise musical features in the input. Evaluation of the generated captions through automatic metrics shows that our method outperforms a baseline designed for non-music audio captioning. Through an ablation study, we unveil that this performance boost can be mainly attributed to pre-training of the audio encoder, while other design choices - modality fusion, decoding strategy and the use of attention - contribute only marginally. Our model represents a shift away from classification-based music description and combines tasks requiring both auditory and linguistic understanding to bridge the semantic gap in music information retrieval.
[25]:Playing Lottery Tickets with Vision and Language
标题:用视觉和语言玩彩票
作者:Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu
链接:https://arxiv.org/abs/2104.11832
摘要:Large-scale transformer-based pre-training has recently revolutionized vision-and-language (V+L) research. Models such as LXMERT, ViLBERT and UNITER have significantly lifted the state of the art over a wide range of V+L tasks. However, the large number of parameters in such models hinders their application in practice. In parallel, work on the lottery ticket hypothesis has shown that deep neural networks contain small matching subnetworks that can achieve on par or even better performance than the dense networks when trained in isolation. In this work, we perform the first empirical study to assess whether such trainable subnetworks also exist in pre-trained V+L models. We use UNITER, one of the best-performing V+L models, as the testbed, and consolidate 7 representative V+L tasks for experiments, including visual question answering, visual commonsense reasoning, visual entailment, referring expression comprehension, image-text retrieval, GQA, and NLVR$^2$. Through comprehensive analysis, we summarize our main findings as follows. ($i$) It is difficult to find subnetworks (i.e., the tickets) that strictly match the performance of the full UNITER model. However, it is encouraging to confirm that we can find "relaxed" winning tickets at 50%-70% sparsity that maintain 99% of the full accuracy. ($ii$) Subnetworks found by task-specific pruning transfer reasonably well to the other tasks, while those found on the pre-training tasks at 60%/70% sparsity transfer universally, matching 98%/96% of the full accuracy on average over all the tasks. ($iii$) Adversarial training can be further used to enhance the performance of the found lottery tickets.
[26]:DeepCAT: Deep Category Representation for Query Understanding in E-commerce Search
标题:DeepCAT:电子商务搜索中查询理解的深层类别表示
作者:Ali Ahmadvand, Surya Kallumadi, Faizan Javed, Eugene Agichtein
链接:https://arxiv.org/abs/2104.11760
摘要:Mapping a search query to a set of relevant categories in the product taxonomy is a significant challenge in e-commerce search for two reasons: 1) Training data exhibits severe class imbalance problem due to biased click behavior, and 2) queries with little customer feedback (e.g., \textit{tail} queries) are not well-represented in the training set, and cause difficulties for query understanding. To address these problems, we propose a deep learning model, DeepCAT, which learns joint word-category representations to enhance the query understanding process. We believe learning category interactions helps to improve the performance of category mapping on \textit{minority} classes, \textit{tail} and \textit{torso} queries. DeepCAT contains a novel word-category representation model that trains the category representations based on word-category co-occurrences in the training set. The category representation is then leveraged to introduce a new loss function to estimate the category-category co-occurrences for refining joint word-category embeddings. To demonstrate our model's effectiveness on {\em minority} categories and {\em tail} queries, we conduct two sets of experiments. The results show that DeepCAT reaches a 10\% improvement on {\em minority} classes and a 7.1\% improvement on {\em tail} queries over a state-of-the-art label embedding model. Our findings suggest a promising direction for improving e-commerce search by semantic modeling of taxonomy hierarchies.









