今日 cs.CL方向共计16篇文章。
自然语言生成(1篇)
[1]:Contextualized Code Representation Learning for Commit Message Generation
标题:用于提交消息生成的上下文化代码表示学习
作者:Lun Yiu Nie, Cuiyun Gao, Zhicong Zhong, Wai Lam, Yang Liu, Zenglin Xu
链接:https://arxiv.org/abs/2007.06934
摘要:Automatic generation of high-quality commit messages for code commits can substantially facilitate developers' works and coordination. However, the semantic gap between source code and natural language poses a major challenge for the task. Several studies have been proposed to alleviate the challenge but none explicitly involves code contextual information during commit message generation. Specifically, existing research adopts static embedding for code tokens, which maps a token to the same vector regardless of its context. In this paper, we propose a novel Contextualized code representation learning method for commit message Generation (CoreGen). CoreGen first learns contextualized code representation which exploits the contextual information behind code commit sequences. The learned representations of code commits built upon Transformer are then transferred for downstream commit message generation. Experiments on the benchmark dataset demonstrate the superior effectiveness of our model over the baseline models with an improvement of 28.18% in terms of BLEU-4 score. Furthermore, we also highlight the future opportunities in training contextualized code representations on larger code corpus as a solution to low-resource settings and adapting the pretrained code representations to other downstream code-to-text generation tasks.信息抽取(1篇)
[1]:Extracting Structured Data from Physician-Patient Conversations By Predicting Noteworthy Utterances
标题:通过预测值得注意的话语从医患对话中提取结构化数据
作者:Kundan Krishna, Amy Pavel, Benjamin Schloss, Jeffrey P. Bigham, Zachary C. Lipton
链接:https://arxiv.org/abs/2007.07151
摘要:Despite diverse efforts to mine various modalities of medical data, the conversations between physicians and patients at the time of care remain an untapped source of insights. In this paper, we leverage this data to extract structured information that might assist physicians with post-visit documentation in electronic health records, potentially lightening the clerical burden. In this exploratory study, we describe a new dataset consisting of conversation transcripts, post-visit summaries, corresponding supporting evidence (in the transcript), and structured labels. We focus on the tasks of recognizing relevant diagnoses and abnormalities in the review of organ systems (RoS). One methodological challenge is that the conversations are long (around 1500 words), making it difficult for modern deep-learning models to use them as input. To address this challenge, we extract noteworthy utterances---parts of the conversation likely to be cited as evidence supporting some summary sentence. We find that by first filtering for (predicted) noteworthy utterances, we can significantly boost predictive performance for recognizing both diagnoses and RoS abnormalities.机器翻译(1篇)
[1]:Modeling Voting for System Combination in Machine Translation
标题:机器翻译中系统组合的投票建模
作者:Xuancheng Huang, Jiacheng Zhang, Zhixing Tan, Derek F. Wong, Huanbo Luan, Jingfang Xu, Maosong Sun, Yang Liu
备注:Accepted by main track of IJCAI2020;SOLE copyright holder is IJCAI (international Joint Conferences on Artificial Intelligence), all rights reserved.this https URL
链接:https://arxiv.org/abs/2007.06943
摘要:System combination is an important technique for combining the hypotheses of different machine translation systems to improve translation performance. Although early statistical approaches to system combination have been proven effective in analyzing the consensus between hypotheses, they suffer from the error propagation problem due to the use of pipelines. While this problem has been alleviated by end-to-end training of multi-source sequence-to-sequence models recently, these neural models do not explicitly analyze the relations between hypotheses and fail to capture their agreement because the attention to a word in a hypothesis is calculated independently, ignoring the fact that the word might occur in multiple hypotheses. In this work, we propose an approach to modeling voting for system combination in machine translation. The basic idea is to enable words in hypotheses from different systems to vote on words that are representative and should get involved in the generation process. This can be done by quantifying the influence of each voter and its preference for each candidate. Our approach combines the advantages of statistical and neural methods since it can not only analyze the relations between hypotheses but also allow for end-to-end training. Experiments show that our approach is capable of better taking advantage of the consensus between hypotheses and achieves significant improvements over state-of-the-art baselines on Chinese-English and English-German machine translation tasks.情感分析(2篇)
[1]:Investigation of Sentiment Controllable Chatbot
标题:情绪可控聊天机器人的研究
作者:Hung-yi Lee, Cheng-Hao Ho, Chien-Fu Lin, Chiung-Chih Chang, Chih-Wei Lee, Yau-Shian Wang, Tsung-Yuan Hsu, Kuan-Yu Chen
备注:arXiv admin note: text overlap witharXiv:1804.02504
链接:https://arxiv.org/abs/2007.07196
摘要:Conventional seq2seq chatbot models attempt only to find sentences with the highest probabilities conditioned on the input sequences, without considering the sentiment of the output sentences. In this paper, we investigate four models to scale or adjust the sentiment of the chatbot response: a persona-based model, reinforcement learning, a plug and play model, and CycleGAN, all based on the seq2seq model. We also develop machine-evaluated metrics to estimate whether the responses are reasonable given the input. These metrics, together with human evaluation, are used to analyze the performance of the four models in terms of different aspects; reinforcement learning and CycleGAN are shown to be very attractive.
[2]:COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions Attributes
标题:COVID-19 Twitter数据集,包含潜在主题、情感和情感属性
作者:Raj Kumar Gupta, Ajay Vishwanath, Yinping Yang
备注:20 pages, 5 figures, 9 tables
链接:https://arxiv.org/abs/2007.06954
摘要:This resource paper describes a large dataset covering over 63 million coronavirus-related Twitter posts from more than 13 million unique users since 28 January to 1 July 2020. As strong concerns and emotions are expressed in the tweets, we analyzed the tweets content using natural language processing techniques and machine-learning based algorithms, and inferred seventeen latent semantic attributes associated with each tweet, including 1) ten attributes indicating the tweet's relevance to ten detected topics, 2) five quantitative attributes indicating the degree of intensity in the valence (i.e., unpleasantness/pleasantness) and emotional intensities across four primary emotions of fear, anger, sadness and joy, and 3) two qualitative attributes indicating the sentiment category and the most dominant emotion category, respectively. To illustrate how the dataset can be used, we present descriptive statistics around the topics, sentiments and emotions attributes and their temporal distributions, and discuss possible applications in communication, psychology, public health, economics and epidemiology.模型(1篇)
[1]:An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models
标题:基于预训练语言模型的伪相关鲁棒性实证研究
作者:Lifu Tu, Garima Lalwani, Spandana Gella, He He
备注:Accepted to TACL 2020
链接:https://arxiv.org/abs/2007.06778
摘要:Recent work has shown that pre-trained language models such as BERT improve robustness to spurious correlations in the dataset. Intrigued by these results, we find that the key to their success is generalization from a small amount of counterexamples where the spurious correlations do not hold. When such minority examples are scarce, pre-trained models perform as poorly as models trained from scratch. In the case of extreme minority, we propose to use multi-task learning (MTL) to improve generalization. Our experiments on natural language inference and paraphrase identification show that MTL with the right auxiliary tasks significantly improves performance on challenging examples without hurting the in-distribution performance. Further, we show that the gain from MTL mainly comes from improved generalization from the minority examples. Our results highlight the importance of data diversity for overcoming spurious correlations.其他(10篇)
[1]:Questionnaire analysis to define the most suitable survey for port-noise investigation
标题:通过问卷分析,确定最适合港口噪声调查的调查项目
作者:Andrea Cerniglia, Davide Chiarella, Paola Cutugno, Lucia Marconi, Anna Magrini, Gelsomina Di Feo, Melissa Ferretti
备注:8 pages, Proceedings of the 26th International Congress on Sound and Vibration. ISBN 978-1-9991810-0-0 ISSN 2329-3675
链接:https://arxiv.org/abs/2007.06915
摘要:The high level of noise pollution affecting the areas between ports and logistic platforms represents a problem that can be faced from different points of view. Acoustic monitoring, mapping, short-term measurements, port and road traffic flows analyses can give useful indications on the strategies to be proposed for a better management of the problem. A survey campaign through the preparation of questionnaires to be submitted to the population exposed to noise in the back-port areas will help to better understand the subjective point of view. The paper analyses a sample of questions suitable for the specific research, chosen as part of the wide database of questionnaires internationally proposed for subjective investigations. The preliminary results of a first data collection campaign are considered to verify the adequacy of the number, the type of questions, and the type of sample noise used for the survey. The questionnaire will be optimized to be distributed in the TRIPLO project (TRansports and Innovative sustainable connections between Ports and LOgistic platforms). The results of this survey will be the starting point for the linguistic investigation carried out in combination with the acoustic monitoring, to improve understanding the connections between personal feeling and technical aspects.
[2]:Language, communication and society: a gender based linguistics analysis
标题:语言、交际与社会:基于性别的语言学分析
作者:P. Cutugno, D. Chiarella, R. Lucentini, L. Marconi, G. Morgavi
备注:7 pages, Mladenov et al., Recent Advances in Communications - Proceedings of the 19th International Conference on Communications (part of 19th International Conference on Circuits, Systems, Communications and Computers 2015)
链接:https://arxiv.org/abs/2007.06908
摘要:The purpose of this study is to find evidence for supporting the hypothesis that language is the mirror of our thinking, our prejudices and cultural stereotypes. In this analysis, a questionnaire was administered to 537 people. The answers have been analysed to see if gender stereotypes were present such as the attribution of psychological and behavioural characteristics. In particular, the aim was to identify, if any, what are the stereotyped images, which emerge in defining the roles of men and women in modern society. Moreover, the results given can be a good starting point to understand if gender stereotypes, and the expectations they produce, can result in penalization or inequality. If so, the language and its use would create inherently a gender bias, which influences evaluations both in work settings both in everyday life.
[3]:Our Evaluation Metric Needs an Update to Encourage Generalization
标题:我们的评估指标需要更新,以鼓励泛化
作者:Swaroop Mishra, Anjana Arunkumar, Chris Bryan, Chitta Baral
备注:Accepted to ICML UDL 2020
链接:https://arxiv.org/abs/2007.06898
摘要:Models that surpass human performance on several popular benchmarks display significant degradation in performance on exposure to Out of Distribution (OOD) data. Recent research has shown that models overfit to spurious biases and `hack' datasets, in lieu of learning generalizable features like humans. In order to stop the inflation in model performance -- and thus overestimation in AI systems' capabilities -- we propose a simple and novel evaluation metric, WOOD Score, that encourages generalization during evaluation.
[4]:What's in a Name? Are BERT Named Entity Representations just as Good for any other Name?
标题:名字里有什么?BERT命名的实体表示和其他名称一样好吗?
作者:Sriram Balasubramanian, Naman Jain, Gaurav Jindal, Abhijeet Awasthi, Sunita Sarawagi
备注:Accepted at RepL4NLP, ACL2020
链接:https://arxiv.org/abs/2007.06897
摘要:We evaluate named entity representations of BERT-based NLP models by investigating their robustness to replacements from the same typed class in the input. We highlight that on several tasks while such perturbations are natural, state of the art trained models are surprisingly brittle. The brittleness continues even with the recent entity-aware BERT models. We also try to discern the cause of this non-robustness, considering factors such as tokenization and frequency of occurrence. Then we provide a simple method that ensembles predictions from multiple replacements while jointly modeling the uncertainty of type annotations and label predictions. Experiments on three NLP tasks show that our method enhances robustness and increases accuracy on both natural and adversarial datasets.
[5]:Calling Out Bluff: Attacking the Robustness of Automatic Scoring Systems with Simple Adversarial Testing
标题:吹牛:用简单的对抗性测试攻击自动计分系统的健壮性
作者:Yaman Kumar, Mehar Bhatia, Anubha Kabra, Jessy Junyi Li, Di Jin, Rajiv Ratn Shah
链接:https://arxiv.org/abs/2007.06796
摘要:A significant progress has been made in deep-learning based Automatic Essay Scoring (AES) systems in the past two decades. The performance commonly measured by the standard performance metrics like Quadratic Weighted Kappa (QWK), and accuracy points to the same. However, testing on common-sense adversarial examples of these AES systems reveal their lack of natural language understanding capability. Inspired by common student behaviour during examinations, we propose a task agnostic adversarial evaluation scheme for AES systems to test their natural language understanding capabilities and overall robustness.
[6]:Can neural networks acquire a structural bias from raw linguistic data?
标题:神经网络能从原始语言数据中获得结构偏差吗?
作者:Alex Warstadt, Samuel R. Bowman
备注:To appear in Proceedings of 42nd Annual Meeting of the Cognitive Science Society
链接:https://arxiv.org/abs/2007.06761
摘要:We evaluate whether BERT, a widely used neural network for sentence processing, acquires an inductive bias towards forming structural generalizations through pretraining on raw data. We conduct four experiments testing its preference for structural vs. linear generalizations in different structure-dependent phenomena. We find that BERT makes a structural generalization in 3 out of 4 empirical domains---subject-auxiliary inversion, reflexive binding, and verb tense detection in embedded clauses---but makes a linear generalization when tested on NPI licensing. We argue that these results are the strongest evidence so far from artificial learners supporting the proposition that a structural bias can be acquired from raw data. If this conclusion is correct, it is tentative evidence that some linguistic universals can be acquired by learners without innate biases. However, the precise implications for human language acquisition are unclear, as humans learn language from significantly less data than BERT.
[7]:BERTERS: Multimodal Representation Learning for Expert Recommendation System with Transformer
标题:BERTERS:带变压器的专家推荐系统的多模式表示学习
作者:N. Nikzad-Khasmakhi, M. A. Balafar, M.Reza Feizi-Derakhshi, Cina Motamed
链接:https://arxiv.org/abs/2007.07229
摘要:The objective of an expert recommendation system is to trace a set of candidates' expertise and preferences, recognize their expertise patterns, and identify experts. In this paper, we introduce a multimodal classification approach for expert recommendation system (BERTERS). In our proposed system, the modalities are derived from text (articles published by candidates) and graph (their co-author connections) information. BERTERS converts text into a vector using the Bidirectional Encoder Representations from Transformer (BERT). Also, a graph Representation technique called ExEm is used to extract the features of candidates from the co-author network. Final representation of a candidate is the concatenation of these vectors and other features. Eventually, a classifier is built on the concatenation of features. This multimodal approach can be used in both the academic community and the community question answering. To verify the effectiveness of BERTERS, we analyze its performance on multi-label classification and visualization tasks.
[8]:Deep Transformer based Data Augmentation with Subword Units for Morphologically Rich Online ASR
标题:基于深度变换子字单元的形态学丰富的在线ASR数据扩充
作者:Balázs Tarján, György Szaszák, Tibor Fegyó, Péter Mihajlik
备注:Submitted to Interspeech 2020
链接:https://arxiv.org/abs/2007.06949
摘要:Recently Deep Transformer models have proven to be particularly powerful in language modeling tasks for ASR. Their high complexity, however, makes them very difficult to apply in the first (single) pass of an online system. Recent studies showed that a considerable part of the knowledge of neural network Language Models (LM) can be transferred to traditional n-grams by using neural text generation based data augmentation. In our paper, we pre-train a GPT-2 Transformer LM on a general text corpus and fine-tune it on our Hungarian conversational call center ASR task. We show that although data augmentation with Transformer-generated text works well for isolating languages, it causes a vocabulary explosion in a morphologically rich language. Therefore, we propose a new method called subword-based neural text augmentation, where we retokenize the generated text into statistically derived subwords. We show that this method can significantly reduce the WER while greatly reducing vocabulary size and memory requirements. Finally, we also show that subword-based neural text augmentation outperforms the word-based approach not only in terms of overall WER but also in recognition of OOV words.
[9]:Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets
标题:比较和重排:使用相似图像集的独特图像标题
作者:Jiuniu Wang, Wenjia Xu, Qingzhong Wang, Antoni B. Chan
链接:https://arxiv.org/abs/2007.06877
摘要:A wide range of image captioning models has been developed, achieving significant improvement based on popular metrics, such as BLEU, CIDEr, and SPICE. However, although the generated captions can accurately describe the image, they are generic for similar images and lack distinctiveness, i.e., cannot properly describe the uniqueness of each image. In this paper, we aim to improve the distinctiveness of image captions through training with sets of similar images. First, we propose a distinctiveness metric -- between-set CIDEr (CIDErBtw) to evaluate the distinctiveness of a caption with respect to those of similar images. Our metric shows that the human annotations of each image are not equivalent based on distinctiveness. Thus we propose several new training strategies to encourage the distinctiveness of the generated caption for each image, which are based on using CIDErBtw in a weighted loss function or as a reinforcement learning reward. Finally, extensive experiments are conducted, showing that our proposed approach significantly improves both distinctiveness (as measured by CIDErBtw and retrieval metrics) and accuracy (e.g., as measured by CIDEr) for a wide variety of image captioning baselines. These results are further confirmed through a user study.
[10]:Sudo rm -rf: Efficient Networks for Universal Audio Source Separation
标题:Sudo rm-rf:通用音频源分离的高效网络
作者:Efthymios Tzinis, Zhepei Wang, Paris Smaragdis
备注:accepted to MLSP 2020
链接:https://arxiv.org/abs/2007.06833
摘要:In this paper, we present an efficient neural network for end-to-end general purpose audio source separation. Specifically, the backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRMRF) as well as their aggregation which is performed through simple one-dimensional convolutions. In this way, we are able to obtain high quality audio source separation with limited number of floating point operations, memory requirements, number of parameters and latency. Our experiments on both speech and environmental sound separation datasets show that SuDoRMRF performs comparably and even surpasses various state-of-the-art approaches with significantly higher computational resource requirements.