879 阅读 2021-07-27 16:26:16 上传
以下文章来源于 Sociolinguistics
今日 cs.CL方向共计26篇文章。
推理分析(1篇)
[1]:Meta-Embeddings for Natural Language Inference and Semantic Similarity tasks
标题:自然语言推理和语义相似任务的元嵌入
作者:Shree Charran R, Rahul Kumar Dubey
链接:https://arxiv.org/abs/2012.00633
摘要:Word Representations form the core component for almost all advanced Natural Language Processing (NLP) applications such as text mining, question-answering, and text summarization, etc. Over the last two decades, immense research is conducted to come up with one single model to solve all major NLP tasks. The major problem currently is that there are a plethora of choices for different NLP tasks. Thus for NLP practitioners, the task of choosing the right model to be used itself becomes a challenge. Thus combining multiple pre-trained word embeddings and forming meta embeddings has become a viable approach to improve tackle NLP tasks. Meta embedding learning is a process of producing a single word embedding from a given set of pre-trained input word embeddings. In this paper, we propose to use Meta Embedding derived from few State-of-the-Art (SOTA) models to efficiently tackle mainstream NLP tasks like classification, semantic relatedness, and text similarity. We have compared both ensemble and dynamic variants to identify an efficient approach. The results obtained show that even the best State-of-the-Art models can be bettered. Thus showing us that meta-embeddings can be used for several NLP tasks by harnessing the power of several individual representations.
自然语言生成(2篇)
[1]:An Enhanced Knowledge Injection Model for Commonsense Generation
标题:一种增强的常识生成知识注入模型
作者:Zhihao Fan, Yeyun Gong, Zhongyu Wei, Siyuan Wang, Yameng Huang, Jian Jiao, Xuanjing Huang, Nan Duan, Ruofei Zhang
备注:Accepted to COLING 2020
链接:https://arxiv.org/abs/2012.00366
摘要:Commonsense generation aims at generating plausible everyday scenario description based on a set of provided concepts. Digging the relationship of concepts from scratch is non-trivial, therefore, we retrieve prototypes from external knowledge to assist the understanding of the scenario for better description generation. We integrate two additional modules, namely position indicator and scaling module, into the pretrained encoder-decoder model for prototype modeling to enhance the knowledge injection procedure. We conduct experiment on CommonGen benchmark, and experimental results show that our method significantly improves the performance on all the metrics.[2]:High Quality Real-Time Structured Debate Generation
标题:高质量实时结构化辩论生成
作者:Eric Bolton, Alex Calderwood, Niles Christensen, Jerome Kafrouni, Iddo Drori
链接:https://arxiv.org/abs/2012.00209
摘要:Automatically generating debates is a challenging task that requires an understanding of arguments and how to negate or support them. In this work we define debate trees and paths for generating debates while enforcing a high level structure and grammar. We leverage a large corpus of tree-structured debates that have metadata associated with each argument. We develop a framework for generating plausible debates which is agnostic to the sentence embedding model. Our results demonstrate the ability to generate debates in real-time on complex topics at a quality that is close to humans, as evaluated by the style, content, and strategy metrics used for judging competitive human debates. In the spirit of reproducible research we make our data, models, and code publicly available.
文本分类(1篇)
[1]:Neural language models for text classification in evidence-based medicine
标题:循证医学文本分类的神经语言模型
作者:Andres Carvallo, Denis Parra, Gabriel Rada, Daniel Perez, Juan Ignacio Vasquez, Camilo Vergara
链接:https://arxiv.org/abs/2012.00584
摘要:The COVID-19 has brought about a significant challenge to the whole of humanity, but with a special burden upon the medical community. Clinicians must keep updated continuously about symptoms, diagnoses, and effectiveness of emergent treatments under a never-ending flood of scientific literature. In this context, the role of evidence-based medicine (EBM) for curating the most substantial evidence to support public health and clinical practice turns essential but is being challenged as never before due to the high volume of research articles published and pre-prints posted daily. Artificial Intelligence can have a crucial role in this situation. In this article, we report the results of an applied research project to classify scientific articles to support Epistemonikos, one of the most active foundations worldwide conducting EBM. We test several methods, and the best one, based on the XLNet neural language model, improves the current approach by 93\% on average F1-score, saving valuable time from physicians who volunteer to curate COVID-19 research articles manually.
信息抽取(2篇)
[1]:Extracting Synonyms from Bilingual Dictionaries
标题:从双语词典中提取同义词
作者:Mustafa Jarrar, Eman Karajah, Muhammad Khalifa, Khaled Shaalan
备注:In Proceedings - 11th International Global Wordnet Conference (GWC2021). Global Wordnet Association (2021)
链接:https://arxiv.org/abs/2012.00600
摘要:We present our progress in developing a novel algorithm to extract synonyms from bilingual dictionaries. Identification and usage of synonyms play a significant role in improving the performance of information access applications. The idea is to construct a translation graph from translation pairs, then to extract and consolidate cyclic paths to form bilingual sets of synonyms. The initial evaluation of this algorithm illustrates promising results in extracting Arabic-English bilingual synonyms. In the evaluation, we first converted the synsets in the Arabic WordNet into translation pairs (i.e., losing word-sense memberships). Next, we applied our algorithm to rebuild these synsets. We compared the original and extracted synsets obtaining an F-Measure of 82.3% and 82.1% for Arabic and English synsets extraction, respectively.[2]:Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation
标题:基于说话人提取的深度自组织波束形成用于目标相关语音分离
作者:Ziye Yang, Shanzheng Guan, Xiao-Lei Zhang
链接:https://arxiv.org/abs/2012.00403
摘要:Recently, the research on ad-hoc microphone arrays with deep learning has drawn much attention, especially in speech enhancement and separation. Because an ad-hoc microphone array may cover such a large area that multiple speakers may locate far apart and talk independently, target-dependent speech separation, which aims to extract a target speaker from a mixed speech, is important for extracting and tracing a specific speaker in the ad-hoc array. However, this technique has not been explored yet. In this paper, we propose deep ad-hoc beamforming based on speaker extraction, which is to our knowledge the first work for target-dependent speech separation based on ad-hoc microphone arrays and deep learning. The algorithm contains three components. First, we propose a supervised channel selection framework based on speaker extraction, where the estimated utterance-level SNRs of the target speech are used as the basis for the channel selection. Second, we apply the selected channels to a deep learning based MVDR algorithm, where a single-channel speaker extraction algorithm is applied to each selected channel for estimating the mask of the target speech. We conducted an extensive experiment on a WSJ0-adhoc corpus. Experimental results demonstrate the effectiveness of the proposed method.
情感分析(1篇)
[1]:BAN-ABSA: An Aspect-Based Sentiment Analysis dataset for Bengali and it's baseline evaluation
标题:BAN-ABSA:基于方面的孟加拉情绪分析数据集及其基线评价
作者:Mahfuz Ahmed Masum, Sheikh Junayed Ahmed, Ayesha Tasnim, Md Saiful Islam
备注:11 pages,2 figures, 8 tables Included in proceedings of International Joint Conference on Advances in Computational Intelligence (IJCACI) 2020
链接:https://arxiv.org/abs/2012.00288
摘要:Due to the breathtaking growth of social media or newspaper user comments, online product reviews comments, sentiment analysis (SA) has captured substantial interest from the researchers. With the fast increase of domain, SA work aims not only to predict the sentiment of a sentence or document but also to give the necessary detail on different aspects of the sentence or document (i.e. aspect-based sentiment analysis). A considerable number of datasets for SA and aspect-based sentiment analysis (ABSA) have been made available for English and other well-known European languages. In this paper, we present a manually annotated Bengali dataset of high quality, BAN-ABSA, which is annotated with aspect and its associated sentiment by 3 native Bengali speakers. The dataset consists of 2,619 positive, 4,721 negative and 1,669 neutral data samples from 9,009 unique comments gathered from some famous Bengali news portals. In addition, we conducted a baseline evaluation with a focus on deep learning model, achieved an accuracy of 78.75% for aspect term extraction and accuracy of 71.08% for sentiment classification. Experiments on the BAN-ABSA dataset show that the CNN model is better in terms of accuracy though Bi-LSTM significantly outperforms CNN model in terms of average F1-score.
模型(2篇)
[1]:Intrinsic analysis for dual word embedding space models
标题:双字嵌入空间模型的内在分析
作者:Mohit Mayank
链接:https://arxiv.org/abs/2012.00728
摘要:Recent word embeddings techniques represent words in a continuous vector space, moving away from the atomic and sparse representations of the past. Each such technique can further create multiple varieties of embeddings based on different settings of hyper-parameters like embedding dimension size, context window size and training method. One additional variety appears when we especially consider the Dual embedding space techniques which generate not one but two-word embeddings as output. This gives rise to an interesting question - "is there one or a combination of the two word embeddings variety, which works better for a specific task?". This paper tries to answer this question by considering all of these variations. Herein, we compare two classical embedding methods belonging to two different methodologies - Word2Vec from window-based and Glove from count-based. For an extensive evaluation after considering all variations, a total of 84 different models were compared against semantic, association and analogy evaluations tasks which are made up of 9 open-source linguistics datasets. The final Word2vec reports showcase the preference of non-default model for 2 out of 3 tasks. In case of Glove, non-default models outperform in all 3 evaluation tasks.[2]:Modifying Memories in Transformer Models
标题:修改变压器模型中的存储器
作者:Chen Zhu, Ankit Singh Rawat, Manzil Zaheer, Srinadh Bhojanapalli, Daliang Li, Felix Yu, Sanjiv Kumar
链接:https://arxiv.org/abs/2012.00363
摘要:Large Transformer models have achieved impressive performance in many natural language tasks. In particular, Transformer based language models have been shown to have great capabilities in encoding factual knowledge in their vast amount of parameters. While the tasks of improving the memorization and generalization of Transformers have been widely studied, it is not well known how to make transformers forget specific old facts and memorize new ones. In this paper, we propose a new task of \emph{explicitly modifying specific factual knowledge in Transformer models while ensuring the model performance does not degrade on the unmodified facts}. This task is useful in many scenarios, such as updating stale knowledge, protecting privacy, and eliminating unintended biases stored in the models. We benchmarked several approaches that provide natural baseline performances on this task. This leads to the discovery of key components of a Transformer model that are especially effective for knowledge modifications. The work also provides insights into the role that different training phases (such as pretraining and fine-tuning) play towards memorization and knowledge modification.
其他(17篇)
[1]:CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Claims
标题:气候热:验证真实世界气候主张的数据集
作者:Thomas Diggelmann, Jordan Boyd-Graber, Jannis Bulian, Massimiliano Ciaramita, Markus Leippold
备注:Accepted for the Tackling Climate Change with Machine Learning Workshop at NeurIPS 2020
链接:https://arxiv.org/abs/2012.00614
摘要:We introduce CLIMATE-FEVER, a new publicly available dataset for verification of climate change-related claims. By providing a dataset for the research community, we aim to facilitate and encourage work on improving algorithms for retrieving evidential support for climate-specific claims, addressing the underlying language understanding challenges, and ultimately help alleviate the impact of misinformation on climate change. We adapt the methodology of FEVER [1], the largest dataset of artificially designed claims, to real-life claims collected from the Internet. While during this process, we could rely on the expertise of renowned climate scientists, it turned out to be no easy task. We discuss the surprising, subtle complexity of modeling real-world climate-related claims within the \textsc{fever} framework, which we believe provides a valuable challenge for general natural language understanding. We hope that our work will mark the beginning of a new exciting long-term joint effort by the climate science and AI community.[2]:Denoising Pre-Training and Data Augmentation Strategies for Enhanced RDF Verbalization with Transformers
标题:基于变压器的增强RDF描述的去噪预训练和数据增强策略
作者:Sebastien Montella, Betty Fabre, Tanguy Urvoy, Johannes Heinecke, Lina Rojas-Barahona
备注:Accepted at WebNLG+: 3rd Workshop on Natural Language Generation from the Semantic Web
链接:https://arxiv.org/abs/2012.00571
摘要:The task of verbalization of RDF triples has known a growth in popularity due to the rising ubiquity of Knowledge Bases (KBs). The formalism of RDF triples is a simple and efficient way to store facts at a large scale. However, its abstract representation makes it difficult for humans to interpret. For this purpose, the WebNLG challenge aims at promoting automated RDF-to-text generation. We propose to leverage pre-trainings from augmented data with the Transformer model using a data augmentation strategy. Our experiment results show a minimum relative increases of 3.73%, 126.05% and 88.16% in BLEU score for seen categories, unseen entities and unseen categories respectively over the standard training.[3]:ClimaText: A Dataset for Climate Change Topic Detection
标题:气候文本:气候变化主题检测的数据集
作者:Francesco S. Varini, Jordan Boyd-Graber, Massimiliano Ciaramita, Markus Leippold
备注:Accepted for the Tackling Climate Change with Machine Learning Workshop at NeurIPS 2020
链接:https://arxiv.org/abs/2012.00483
摘要:Climate change communication in the mass media and other textual sources may affect and shape public perception. Extracting climate change information from these sources is an important task, e.g., for filtering content and e-discovery, sentiment analysis, automatic summarization, question-answering, and fact-checking. However, automating this process is a challenge, as climate change is a complex, fast-moving, and often ambiguous topic with scarce resources for popular text-based AI tasks. In this paper, we introduce \textsc{ClimaText}, a dataset for sentence-based climate change topic detection, which we make publicly available. We explore different approaches to identify the climate change topic in various text sources. We find that popular keyword-based models are not adequate for such a complex and evolving task. Context-based algorithms like BERT \cite{devlin2018bert} can detect, in addition to many trivial cases, a variety of complex and implicit topic patterns. Nevertheless, our analysis reveals a great potential for improvement in several directions, such as, e.g., capturing the discussion on indirect effects of climate change. Hence, we hope this work can serve as a good starting point for further research on this topic.[4]:CPM: A Large-scale Generative Chinese Pre-trained Language Model
标题:CPM:一个大规模的生成性汉语预训练语言模型
作者:Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun
链接:https://arxiv.org/abs/2012.00413
摘要:Pre-trained Language Models (PLMs) have proven to be beneficial for various downstream NLP tasks. Recently, GPT-3, with 175 billion parameters and 570GB training data, drew a lot of attention due to the capacity of few-shot (even zero-shot) learning. However, applying GPT-3 to address Chinese NLP tasks is still challenging, as the training corpus of GPT-3 is primarily English, and the parameters are not publicly available. In this technical report, we release the Chinese Pre-trained Language Model (CPM) with generative pre-training on large-scale Chinese training data. To the best of our knowledge, CPM, with 2.6 billion parameters and 100GB Chinese training data, is the largest Chinese pre-trained language model, which could facilitate several downstream Chinese NLP tasks, such as conversation, essay generation, cloze test, and language understanding. Extensive experiments demonstrate that CPM achieves strong performance on many NLP tasks in the settings of few-shot (even zero-shot) learning. The code and parameters are available atthis https URL.[5]:Introducing Inter-Relatedness between Wikipedia Articles in Explicit Semantic Analysis
标题:在显式语义分析中引入维基百科文章之间的相互关系
作者:Naveen Elango, Pawan Prasad K
备注:16 pages
链接:https://arxiv.org/abs/2012.00398
摘要:Explicit Semantic Analysis (ESA) is a technique used to represent a piece of text as a vector in the space of concepts, such as Articles found in Wikipedia. We propose a methodology to incorporate knowledge of Inter-relatedness between Wikipedia Articles to the vectors obtained from ESA using a technique called Retrofitting to improve the performance of subsequent tasks that use ESA to form vector embeddings. Especially we use an undirected Graph to represent this knowledge with nodes as Articles and edges as inter relations between two Articles. Here, we also emphasize how the ESA step could be seen as a predominantly bottom-up approach using a corpus to come up with vector representations and the incorporation of top-down knowledge which is the relations between Articles to further improve it. We test our hypothesis on several smaller subsets of the Wikipedia corpus and show that our proposed methodology leads to decent improvements in performance measures including Spearman's Rank correlation coefficient in most cases.[6]:Towards a Unified Framework for Emotion Analysis
标题:建立统一的情绪分析框架
作者:Sven Buechel, Luise Modersohn, Udo Hahn
链接:https://arxiv.org/abs/2012.00190
摘要:We present EmoCoder, a modular encoder-decoder architecture that generalizes emotion analysis over different tasks (sentence-level, word-level, label-to-label mapping), domains (natural languages and their registers), and label formats (e.g., polarity classes, basic emotions, and affective dimensions). Experiments on 14 datasets indicate that EmoCoder learns an interpretable language-independent representation of emotions, allows seamless absorption of state-of-the-art models, and maintains strong prediction quality, even when tested on unseen combinations of domains and label formats.[7]:Statistical patterns of word frequency suggesting the probabilistic nature of human languages
标题:词频统计模式揭示了人类语言的概率特性
作者:Shuiyuan Yu, Chunshan Xu, Haitao Liu
链接:https://arxiv.org/abs/2012.00187
摘要:Traditional linguistic theories have largely regard language as a formal system composed of rigid rules. However, their failures in processing real language, the recent successes in statistical natural language processing, and the findings of many psychological experiments have suggested that language may be more a probabilistic system than a formal system, and thus cannot be faithfully modeled with the either/or rules of formal linguistic theory. The present study, based on authentic language data, confirmed that those important linguistic issues, such as linguistic universal, diachronic drift, and language variations can be translated into probability and frequency patterns in parole. These findings suggest that human language may well be probabilistic systems by nature, and that statistical may well make inherent properties of human languages.[8]:Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion
标题:利用单图浅融合提高RNN传感器稀有字精度
作者:Vijay Ravi, Yile Gu, Ankur Gandhe, Ariya Rastrow, Linda Liu, Denis Filimonov, Scott Novotney, Ivan Bulyko
链接:https://arxiv.org/abs/2012.00133
摘要:End-to-end automatic speech recognition (ASR) systems, such as recurrent neural network transducer (RNN-T), have become popular, but rare word remains a challenge. In this paper, we propose a simple, yet effective method called unigram shallow fusion (USF) to improve rare words for RNN-T. In USF, we extract rare words from RNN-T training data based on unigram count, and apply a fixed reward when the word is encountered during decoding. We show that this simple method can improve performance on rare words by 3.7% WER relative without degradation on general test set, and the improvement from USF is additive to any additional language model based rescoring. Then, we show that the same USF does not work on conventional hybrid system. Finally, we reason that USF works by fixing errors in probability estimates of words due to Viterbi search used during decoding with subword-based RNN-T.[9]:Extreme Model Compression for On-device Natural Language Understanding
标题:面向设备自然语言理解的极端模型压缩
作者:Kanthashree Mysore Sathyendra, Samridhi Choudhary, Leah Nicolich-Henkin
备注:Long paper at COLING 2020
链接:https://arxiv.org/abs/2012.00124
摘要:In this paper, we propose and experiment with techniques for extreme compression of neural natural language understanding (NLU) models, making them suitable for execution on resource-constrained devices. We propose a task-aware, end-to-end compression approach that performs word-embedding compression jointly with NLU task learning. We show our results on a large-scale, commercial NLU system trained on a varied set of intents with huge vocabulary sizes. Our approach outperforms a range of baselines and achieves a compression rate of 97.4% with less than 3.7% degradation in predictive performance. Our analysis indicates that the signal from the downstream task is important for effective compression with minimal degradation in performance.[10]:Systematically Exploring Redundancy Reduction in Summarizing Long Documents
标题:系统地探讨了在长文档摘要中减少冗余的方法
作者:Wen Xiao, Giuseppe Carenini
备注:13 pages. Accepted at AACL 2020
链接:https://arxiv.org/abs/2012.00052
摘要:Our analysis of large summarization datasets indicates that redundancy is a very serious problem when summarizing long documents. Yet, redundancy reduction has not been thoroughly investigated in neural summarization. In this work, we systematically explore and compare different ways to deal with redundancy when summarizing long documents. Specifically, we organize the existing methods into categories based on when and how the redundancy is considered. Then, in the context of these categories, we propose three additional methods balancing non-redundancy and importance in a general and flexible way. In a series of experiments, we show that our proposed methods achieve the state-of-the-art with respect to ROUGE scores on two scientific paper datasets, Pubmed and arXiv, while reducing redundancy significantly.[11]:Facilitating the Communication of Politeness through Fine-Grained Paraphrasing
标题:通过精细的释义促进礼貌的交流
作者:Liye Fu, Susan R. Fussell, Cristian Danescu-Niculescu-Mizil
备注:Proceedings of EMNLP 2020, 14 pages. Data and code atthis https URLandthis https URL
链接:https://arxiv.org/abs/2012.00012
摘要:Aided by technology, people are increasingly able to communicate across geographical, cultural, and language barriers. This ability also results in new challenges, as interlocutors need to adapt their communication approaches to increasingly diverse circumstances. In this work, we take the first steps towards automatically assisting people in adjusting their language to a specific communication circumstance.
As a case study, we focus on facilitating the accurate transmission of pragmatic intentions and introduce a methodology for suggesting paraphrases that achieve the intended level of politeness under a given communication circumstance. We demonstrate the feasibility of this approach by evaluating our method in two realistic communication scenarios and show that it can reduce the potential for misalignment between the speaker's intentions and the listener's perceptions in both cases.[12]:UWB at SemEval-2020 Task 1: Lexical Semantic Change Detection
标题:UWB在SemEval-2020任务1:词汇语义变化检测
作者:Ondřej Pražák, Pavel Přibáň, Stephen Taylor, Jakub Sido
备注:arXiv admin note: substantial text overlap witharXiv:2011.14678
链接:https://arxiv.org/abs/2012.00004
摘要:In this paper, we describe our method for the detection of lexical semantic change, i.e., word sense changes over time. We examine semantic differences between specific words in two corpora, chosen from different time periods, for English, German, Latin, and Swedish. Our method was created for the SemEval 2020 Task 1: \textit{Unsupervised Lexical Semantic Change Detection.} We ranked $1^{st}$ in Sub-task 1: binary change detection, and $4^{th}$ in Sub-task 2: ranked change detection. Our method is fully unsupervised and language independent. It consists of preparing a semantic vector space for each corpus, earlier and later; computing a linear transformation between earlier and later spaces, using Canonical Correlation Analysis and Orthogonal Transformation; and measuring the cosines between the transformed vector for the target word from the earlier corpus and the vector for the target word in the later corpus.[13]:Mutual Information Constraints for Monte-Carlo Objectives
标题:蒙特卡罗目标的互信息约束
作者:Gábor Melis, András György, Phil Blunsom
备注:32 pages, 29 figures
链接:https://arxiv.org/abs/2012.00708
摘要:A common failure mode of density models trained as variational autoencoders is to model the data without relying on their latent variables, rendering these variables useless. Two contributing factors, the underspecification of the model and the looseness of the variational lower bound, have been studied separately in the literature. We weave these two strands of research together, specifically the tighter bounds of Monte-Carlo objectives and constraints on the mutual information between the observable and the latent variables. Estimating the mutual information as the average Kullback-Leibler divergence between the easily available variational posterior $q(z|x)$ and the prior does not work with Monte-Carlo objectives because $q(z|x)$ is no longer a direct approximation to the model's true posterior $p(z|x)$. Hence, we construct estimators of the Kullback-Leibler divergence of the true posterior from the prior by recycling samples used in the objective, with which we train models of continuous and discrete latents at much improved rate-distortion and no posterior collapse. While alleviated, the tradeoff between modelling the data and using the latents still remains, and we urge for evaluating inference methods across a range of mutual information values.[14]:SeMantic AnsweR Type prediction task (SMART) at ISWC 2020 Semantic Web Challenge
标题:ISWC 2020语义网络挑战赛语义答案类型预测任务(SMART)
作者:Nandana Mihindukulasooriya, Mohnish Dubey, Alfio Gliozzo, Jens Lehmann, Axel-Cyrille Ngonga Ngomo, Ricardo Usbeck
链接:https://arxiv.org/abs/2012.00555
摘要:Each year the International Semantic Web Conference accepts a set of Semantic Web Challenges to establish competitions that will advance the state of the art solutions in any given problem domain. The SeMantic AnsweR Type prediction task (SMART) was part of ISWC 2020 challenges. Question type and answer type prediction can play a key role in knowledge base question answering systems providing insights that are helpful to generate correct queries or rank the answer candidates. More concretely, given a question in natural language, the task of SMART challenge is, to predict the answer type using a target ontology (e.g., DBpedia or Wikidata).[15]:A Unified Deep Speaker Embedding Framework for Mixed-Bandwidth Speech Data
标题:混合带宽语音数据的统一深度说话人嵌入框架
作者:Weicheng Cai, Ming Li
链接:https://arxiv.org/abs/2012.00486
摘要:This paper proposes a unified deep speaker embedding framework for modeling speech data with different sampling rates. Considering the narrowband spectrogram as a sub-image of the wideband spectrogram, we tackle the joint modeling problem of the mixed-bandwidth data in an image classification manner. From this perspective, we elaborate several mixed-bandwidth joint training strategies under different training and test data scenarios. The proposed systems are able to flexibly handle the mixed-bandwidth speech data in a single speaker embedding model without any additional downsampling, upsampling, bandwidth extension, or padding operations. We conduct extensive experimental studies on the VoxCeleb1 dataset. Furthermore, the effectiveness of the proposed approach is validated by the SITW and NIST SRE 2016 datasets.[16]:Just Ask: Learning to Answer Questions from Millions of Narrated Videos
标题:只需问:学会回答数百万个旁白视频中的问题
作者:Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid
备注:19 pages; 12 figures
链接:https://arxiv.org/abs/2012.00451
摘要:Modern approaches to visual question answering require large annotated datasets for training. Manual annotation of questions and answers for videos, however, is tedious, expensive and prevents scalability. In this work, we propose to avoid manual annotation and to learn video question answering (VideoQA) from millions of readily-available narrated videos. We propose to automatically generate question-answer pairs from transcribed video narrations leveraging a state-of-the-art text transformer pipeline and obtain a new large-scale VideoQA training dataset. To handle the open vocabulary of diverse answers in this dataset, we propose a training procedure based on a contrastive loss between a video-question multi-modal transformer and an answer embedding. We evaluate our model on the zero-shot VideoQA task and show excellent results, in particular for rare answers. Furthermore, we demonstrate that finetuning our model on target datasets significantly outperforms the state of the art on MSRVTT-QA, MSVD-QA and ActivityNet-QA. Finally, for a detailed evaluation we introduce a new manually annotated VideoQA dataset with reduced language biases and high quality annotations. Our code and datasets will be made publicly available atthis https URL.[17]:Multi-Modal Detection of Alzheimer's Disease from Speech and Text
标题:基于语音和文本的阿尔茨海默病多模态检测
作者:Amish Mittal, Sourav Sahoo, Arnhav Datar, Juned Kadiwala, Hrithwik Shalu, Jimson Mathew
备注:17 pages, 4 figures
链接:https://arxiv.org/abs/2012.00096
摘要:Reliable detection of the prodromal stages of Alzheimer's disease (AD) remains difficult even today because, unlike other neurocognitive impairments, there is no definitive diagnosis of AD in vivo. In this context, existing research has shown that patients often develop language impairment even in mild AD conditions. We propose a multimodal deep learning method that utilizes speech and the corresponding transcript simultaneously to detect AD. For audio signals, the proposed audio-based network, a convolutional neural network (CNN) based model, predicts the diagnosis for multiple speech segments, which are combined for the final prediction. Similarly, we use contextual embedding extracted from BERT concatenated with a CNN-generated embedding for classifying the transcript. The individual predictions of the two models are then combined to make the final classification. We also perform experiments to analyze the model performance when Automated Speech Recognition (ASR) system generated transcripts are used instead of manual transcription in the text-based model. The proposed method achieves 85.3% 10-fold cross-validation accuracy when trained and evaluated on the Dementiabank Pitt corpus.中文来自机器翻译,仅供参考。









