In this tutorial, we provide a comprehensive coverage of both classical and deep learning methods for handling natural language sentiment and affect.
Deep neural networks have recently broken records on a range of natural language tasks (e.g., speech recognition, machine translation). While there are long-standing methodological traditions pre-dating the modern wave of deep learning approaches, the impact of deep architectures has been similarly positive on tasks like sentiment analysis and emotion detection. Especially due to availability of large amounts of social data, the surge of academic and industrial interest for sentiment and affect processing remains significantly high. The volume of works focused at these areas makes it specially challenging for both new trainees and established scholars to remain sufficiently informed about the progress achieved.
In this tutorial, we will provide a comprehensive coverage of both classical and deep learning methods for handling natural language sentiment and affect. We will also introduce machine learning methods targeting multilingual processing of these tasks, handling a wide host of European languages and languages of complex morphology. Available resources will be presented, as well as existing challenges and emergent methods cutting across text classification tasks. We will conclude by overviewing and debating legislative and ethical issues.
With the growing role of social media in societies today, searching and processing of social data beyond the limiting level of surface words remains increasingly critical to business and governmental bodies, as well as individuals. The areas of sentiment and affect detection are central to social data processing in both the academia and the industry. The swelling market demand and wide range of practical applications of these key areas make them instrumental for research and engineering. For these reasons, machine learning and natural language processing methods have been developed to carry out these tasks. From simple scoring of surface input words and use of manually crafted lexica to the more novel deep representations with artificial neural networks, methods targeting these tasks are observably (e.g., in our labs) overwhelming to new individuals seeking relevant training. Additionally, established researchers without sufficient experience with deep learning methods or who have been working on one of these tasks but not the other, or focusing on one language or a single family of languages, usually have expressed interest in emergent topics and methods. The main purpose of this tutorial is to provide comprehensive coverage of both established and novel approaches to sentiment and affect processing in natural language multilingual settings.
Considering the number of papers accepted to ECMLPKDD2017 related to the areas of social media mining, affective natural language processing, and deep neural networks, we expect the tutorial to be of wide interest. The strategic importance of sentiment and affect models for business intelligence and the practical need for these models within a wide range of fields makes the tutorial particularly attractive, especially given the instructors experience in these areas (we have created the state-of-the-art models for many related tasks and in some cases patented related work). A third reason we expect the tutorial to be of wide interest to the ECML community is our coverage of several European languages (15 languages) where expect many of the attendees have funded projects (e.g., via the European Union). Finally, our coverage of ethical and legal considerations for carrying out machine learning of natural language research exploiting social media data is very timely, due to the recent debates around privacy (e.g., Facebook and Cambridge Analytica debate and the new European General Data Protection Regulation legislation) and the rapid rise and pervasive use of artificial intelligence applications.
Muhammad Abdul-Mageed is an Assistant Professor at the School of Information, the Department of Linguistics (Associate Member), and the Department of Computer Science (Associate Member), The University of British Columbia. His areas of research interest are Natural Language Processing, Deep Learning of Natural Language, Arabic Natural Language Processing, and Social Media Mining. In his Lab, their goal is to create intelligent ‘social’ machines that can interact naturally with humans. Dr. Abdul-Mageed has published more than 35 research papers in top academic venues. Before UBC, Dr. Abdul-Mageed was a Visiting Assistant Professor in the School of Informatics and Computing, Indiana University (2015-2016). In 2015, he was a Visiting Scholar in the Department of Computer Science at the George Washington University. Between 2010 and 2012, he was a Visiting Scholar in the Center for Computational Learning Systems, Columbia University. In 2013-2015 he was a research scientist in Codeq LLC, where he created patented technologies for text summarization and sentiment analysis. Currently, he is also a Visiting Scholar in the University of Pennsylvania. Dr. Abdul-Mageed regularly serves in scientific programs of major academic and industrial conferences such as ACL, EMNLP, NAACL, LREC, and Southern Data Science. He is currently a member of the standing reviewing committee for Transactions of the Association for Computational Linguistics (TACL). Technical Background & Intellectual Property. Dr. Abdul-Mageed has developed the current state-of-the-art systems for (i) English emotion detection (ACL 2017), (ii) Arabic emotion detection (NAACL 2018), and (iii) Arabic sentiment analysis (ACL 2011; Computer Speech & Language 2014).
Dr. Abdul-Mageed has given more than 30 invited presentations and guest talks to diverse audiences. These include in academic (e.g., Columbia University, Indiana University, The University of British Columbia, The University of Pennsylvania, and Simon Fraser University), industrial (Bloomberg NYC, Southern Data Science, Crowd Analyzer), and public health (Vancouver Coastal Health) venues. He has also taught several graduate courses in Indiana University and the University of British Columbia.
Petra Kalaj Novak is a researcher at the Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia. Her research belongs to the wide area of knowledge discovery from databases. Currently, as a postdoctoral researcher, she analyses social and mainstream media focusing on the mediated sentiment. Avant-garde research in analyzing the role of emojis in conveying sentiment was published in P. Kralj Novak, et al. "Sentiment of emojis" and is the main reference for current research in the analysis of emoji. Dr. Kralj Novak publishes research papers and datasets in top academic venues. Her thesis focused on rule induction from class labeled data, where the induced rules are intended for human interpretation. The main findings of the thesis are published in Journal of Machine Learning Research and in the Encyclopaedia of Machine Learning. She also designed and implemented GMOtreck - a system for optimization of laboratory level traceability of genetically modified organisms. Dr. Kralj Novak regularly serves in scientific programs of major academic and industrial conferences such as ICDM, ICML, DS, IDA, and Southern Data Science. From 2006 to 2009, she was secretary and treasurer of SLAIS - the Slovenian Artificial Intelligence Society. She has also actively collaborated in several national and European research projects.
Dr. Petra Kralj Novak has given seminars and invited talks to diverse audiences. These include academic (e.g., Georgia State University, Fudan University, University of Ljubljana) and industrial audiences (Southern Data Science [USA], Career Builder, LLC [USA]). She has also co-taught several courses at undergraduate and graduate level at the Jožef Stefan International Postgraduate School and at University of Nova Gorica, Slovenia.
@ECMLPKDD 2018, Dublin, Ireland.
Friday, 14th September 2018, 14:00 Suite 688/689