WISDOM (Workshop on Issues of Sentiment Discovery and Opinion Mining) aims to explore how the wisdom of the crowds is affecting (and will affect) the evolution of the Web and of businesses gravitating around it. In particular, the workshop series explores two different stages of sentiment analysis: the former focusing on the identification of opinionated text over the Web, the latter focusing on the classification of such text either in terms of polarity detection or emotion recognition.

WISDOM'16 (KDD 2016, August 14th, San Francisco)
WISDOM'15 (KDD 2015, August 10th, Sydney)
WISDOM'14 (ICML 2014, June 25th, Beijing)
WISDOM'13 (KDD 2013, August 11th, Chicago)
WISDOM'12 (KDD 2012, August 12th, Beijing)

The distillation of knowledge from social media is an extremely difficult task as the content of today's Web, while perfectly suitable for human consumption, remains hardly accessible to machines. The opportunity to capture the opinions of the general public about social events, political movements, company strategies, marketing campaigns, and product preferences has raised growing interest both within the scientific community, leading to many exciting open challenges, as well as in the business world, due to the remarkable benefits to be had from marketing and financial market prediction.

Statistical NLP has been the mainstream NLP research direction since late 1990s. It relies on language models based on popular machine-learning algorithms such as maximum-likelihood, expectation maximization, conditional random fields, and support vector machines. By feeding a large training corpus of annotated texts to a machine-learning algorithm, it is possible for the system to not only learn the valence of keywords, but also to take into account the valence of other arbitrary keywords, punctuation, and word co-occurrence frequencies. However, standard statistical methods are generally semantically weak as they merely focus on lexical co-occurrence elements with little predictive value individually.

Endogenous NLP, instead, involves the use of machine-learning techniques to perform semantic analysis of a corpus by building structures that approximate concepts from a large set of documents. It does not involve prior semantic understanding of documents; instead, it relies only on the endogenous knowledge of these (rather than on external knowledge bases). The advantages of this approach over the knowledge engineering approach are effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. Endogenous NLP includes methods based either on lexical semantics, which focuses on the meanings of individual words (e.g., LSA, LDA, and MapReduce), or compositional semantics, which looks at the meanings of sentences and longer utterances (e.g., HMM, association rule learning, and probabilistic generative models).

WISDOM aims to provide an international forum for researchers in the field of machine learning for opinion mining and sentiment analysis to share information on their latest investigations in social information retrieval and their applications both in academic research areas and industrial sectors. The broader context of the workshop comprehends opinion mining, social media marketing, information retrieval, and natural language processing. Topics of interest include but are not limited to:
• Endogenous NLP for sentiment analysis
• Sentiment learning algorithms
• Semantic multi-dimensional scaling for sentiment analysis
• Big social data analysis
• Opinion retrieval, extraction, classification, tracking and summarization
• Domain adaptation for sentiment classification
• Time evolving sentiment analysis
• Emotion detection
• Concept-level sentiment analysis
• Topic modeling for aspect-based opinion mining
• Multimodal sentiment analysis
• Sentiment pattern mining
• Affective knowledge acquisition for sentiment analysis
• Biologically-inspired opinion mining
• Content-, concept-, and context-based sentiment analysis

WISDOM'16 (KDD 2016, August 14th, San Francisco)                                             GO TO TOP


8:00 - 8:10 Opening remarks
8:10 - 9:00 Keynote Talk I: Social Influence and Sentiment Analysis (J. Tang)
9:00 - 9:20 Actionable and Political Text Classification Using Word Embeddings and LSTM (A. Rao)
9:20 - 9:40 Predicting Trust Relations Among Users in a Social Network: On the Role of Influence, Cohesion and Valence (N. Vedula)
9:40 - 10:00 AnonyMine: Mining Anonymous Social Media Posts Using Psycho-lingual and Crowd-sourced Dictionaries (A. Paul)

10:00 - 10:30: Coffee Break

10:30 - 11:30 Keynote talk II: Obtaining Quality Labeled Data for Opinion Mining in Short Text (V. Markman)
11:30 - 11:50 A Framework for Emotion Recognition from Human Computer Interaction in Natural Setting (L. Constantine)
11:50 - 12:00 Closing remarks

Jie Tang is a tenured associate professor with the Department of Computer Science and Technology at Tsinghua University, and was also visiting scholar at Cornell University. His interests include social network analysis, data mining, and machine learning. He has published more than 200 journal/conference papers and holds 20 patents. His papers have been cited by more than 6,000 times (Google Scholar). He served as PC Co-Chair of CIKM’16, WSDM’15, ASONAM’15, SocInfo’12, KDD-CUP/Poster/Workshop/Local/Publication Co-Chair of KDD’11-15, and Associate Editor-in-Chief of ACM TKDD, Editors of IEEE TKDE/TBD and ACM TIST. He leads the project AMiner.org for academic social network analysis and mining, which has attracted more than 8 million independent IP accesses from 220 countries/regions in the world. He was honored with the Newton Advanced Scholarship Award, CCF Young Scientist Award, and NSFC Excellent Young Scholar.

KEYNOTE I. Social Influence and Sentiment Analysis
Social influence is the behavioral change of a person because of the perceived relationship with other people, organizations and society in general. Social influence has been a widely accepted phenomenon in social networks for decades. Many applications have been built based around the implicit notation of social influence between people, such as marketing, advertisement and recommendations. With the exponential growth of online social network services such as Facebook and Twitter, social influence can for the first time be measured over a large population. In this talk, I will present how we quantify the influential degree between users and also introduce the application of social influence to sentiment analysis in large social networks.

Vita Markman is a Staff Software Engineer at LinkedIn, where she works on various natural language processing applications such as sentiment analysis on customer feedback, member and job data standardization, and member-job match filtering. Before joining LinkedIn, she was a Staff Research Engineer at Samsung Research America, where among other projects, she worked on extracting topic-indicative phrases from a stream of closed caption news data in real-time and text-mining customer support chat-logs for common issues customers experience. Prior to Samsung Research, she was employed as a Computational Linguist at Disney Interactive, where she led a team of linguists and annotators to develop a proprietary patented language-filtering system for ensuring safety and appropriateness of user-generated linguistic content in Disney’s online games for kids. In addition, Vita conducts independent research on mining the language of social media. Her primary interests are focused on extracting topics and sentiment from micro-text – the short, snippet-like pieces of text found on Twitter, Facebook, and in various other social media sources. This includes discovering phrase-based sentiment features in Amazon feedback data, clustering micro-tweets by topic, and mining online product reviews to identify features people rate as positive or negative. Vita’s education background is in theoretical and computational linguistics (Rutgers, 2005). In addition to computational linguistics, she has publication record in theoretical syntax and morphology, which was her primary area of research between the years of 2002 – 2008.

KEYNOTE II. Obtaining Quality Labeled Data for Opinion Mining in Short Text
Quality training data is essential for building high performance machine learning models. However, certain types of tasks such as opinion mining are inherently subjective, making it hard to elicit reliable judgements from human annotators. The problem is further exacerbated in situations where opinions are elicited on short text such as Tweets or micro reviews containing only one or two lines. The talk addresses various means of circumventing these challenges via automation of some annotation tasks as well as setting up multiple experiments for collecting human judgements.

• Yongzheng Zhang, LinkedIn Inc. (USA)
• Erik Cambria, Nanyang Technological University (Singapore)
• Bing Liu, University of Illinois at Chicago (USA)
• Yunqing Xia, Microsoft Research Asia (China)


WISDOM'15 (KDD 2015, August 10th, Sydney)                                               GO TO TOP

1:30-2:30: Using Support Vector Machine Ensembles for Target Audience Classification on Twitter (Raymond Chiong)
2:30-3:00: Spatial and Temporal Word Spectrum of Social Media (X. Li)

3:00-3:30: Coffee Break

3:30-4:00: SarcasmBot - An open-source sarcasm-generation module for chatbots (A. Joshi)
4:00-4:30: Multilingual Subjectivity Detection using Deep Multiple Kernel Learning (I. Chaturvedi)
4:30-5:00: Smiley Captcha and Smiley Games Solving Complex Problems (S. Aggarwal)
5:00-5:30: Customer Intentions Analysis of Twitter Based on Lexico Semantic Patterns (M. Hamroun)
5:30-6:00: Incremental Active Opinion Learning Over a Stream of Polarized Documents (M. Zimmermann)
6:00 Closing Remarks

Raymond Chiong is a tenured academic staff member with the School of Design, Communication and Information Technology at The University of Newcastle, Australia. He is also a Guest Research Professor of the Center for Modern Information Management at Huazhong University of Science and Technology, China. He obtained his PhD degree from the University of Melbourne, Australia, and his MSc degree from the University of Birmingham, England. He has been actively pursuing research in evolutionary game theory, optimisation (focusing on production and logistics problems), data analytics and modelling of complex adaptive systems. He was the Editor-in-Chief of the Interdisciplinary Journal of Information, Knowledge, and Management from 2011 to 2014. Currently, he is an Editor for the journal Engineering Applications of Artificial Intelligence and an Associate Editor for the IEEE Computational Intelligence Magazine. He has also served as Guest Editors for a number of reputable journals such as the International Journal of Production Economics and European Journal of Operational Research. He is an IEEE Senior Member, and one of the Founding co-Chairs of the IEEE Symposium on Computational Intelligence in Production and Logistics Systems. To date, he has produced/co-authored over 100 refereed publications in the form of books, book chapters, journal articles and conference papers, among others.

The vast amount and diversity of data shared on social media can pose a challenge for any business wanting to use it for identifying potential customers. In our work, we use both unsupervised and supervised learning methods to classify the target audience of a Twitter account owner with minimal annotation efforts. We first identify topic domains automatically using data shared by followers of the account owner using Twitter Latent Dirichlet Allocation (LDA). We then train a Support Vector Machine (SVM) ensemble using data from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that our methods are able to successfully identify the target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling to construct training datasets can achieve a better classifier in the SVM ensemble. This ensemble system can take advantage of data diversity, and its ability to differentiate prospective customers from the general audience may lead to the development of an application for targeted marketing, leading to business advantage in the crowded social media space.

• Yongzheng Zhang, LinkedIn Inc. (USA)
• Erik Cambria, Nanyang Technological University (Singapore)
• Yunqing Xia, Microsoft Research Asia (China)
• Bing Liu, University of Illinois at Chicago (USA)


WISDOM'14 (ICML 2014, June 25th, Beijing)                                               GO TO TOP

09:15-10:20: KEYNOTE
Opening Remarks (Yunqing Xia)
Instance-based domain adaptation (Rui Xia)

10:20-10:40: Coffee Break

10:40-12:00: SESSION I
Multi-granularity comparison on Chinese and English reviews: A case study in IT product domain (Qingqing Zhou)
WEMOTE - Word embedding based minority oversampling technique for imbalanced emotion and sentiment classification (Chen Tao)
Disambiguating word sentiment polarity through Bayesian modeling and opinion-level features (Yunqing Xia)

12:00-14:00: Lunch Break

14:00-15:20: SESSION II
Cross-lingual Twitter polarity detection via projection across word-aligned corpora (Mohamed Abdel-Hady)
Sentiment analysis in Turkish social media (Cumali Türkmenoğlu)
A two-Level learning hierarchy of nonnegative matrix factorization based topic modeling for main topic extraction (Hendri Murfi)
Closing Remarks (Yunqing Xia)

Rui Xia is currently an assistant professor at School of Computer Science and Engineering, Nanjing University of Science and Technology, China. His research interests include machine learning, natural language processing, text mining and sentiment analysis. He received the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences in 2011. He has published several refereed conference papers in the areas of artificial intelligence and natural language processing, including IJCAI, AAAI, ACL, COLING, etc. He served on the program commitee member of several international conferences and workshops including IJCAI, COLING, WWW Workshop on MABSDA, KDD Workshop on WISDOM and ICDM Workshop on SENTIRE. He is a member of ACM, ACL and CCF, and he is an operating committee member of YSSNLP.

One one hand, most of the existing domain adaptation studies in the field of NLP belong to the feature-based adaptation, while the research of instance-based adaptation is very scarce. One the other hand, due to the explosive growth of the Internet online reviews, we can easily collect a large amount of labeled reviews from different domains. But only some of them are beneficial for training a desired target-domain sentiment classifier. Therefore, it is important for us to identify those samples that are the most relevant to the target domain and use them as training data. To address this problem, we propose two instance-based domain adpatation methods for NLP applications. The first one is called PUIS and PUIW, which conduct instance adaptation based on instance selection and instance weighting via PU learning. The second one is called in-target-domain logistic approximation (ILA), where we conduct instance apdatation by a joint logistic approximation model. Both of methods achieve sound performance in high-dimentional NLP tasks such as cross-domain text categorization and sentiment classification.

• Yunqing Xia, Tsinghua University (China)
• Erik Cambria, Nanyang Technological University (Singapore)
• Yongzheng Zhang, LinkedIn Inc. (USA)
• Newton Howard, MIT Media Laboratory (USA)


WISDOM'13 (KDD 2013, August 11th, Chicago)                                               GO TO TOP

09:00-10:10: KEYNOTE
• Opening Remarks
Statistical Methods for Integration and Analysis of Opinionated Text Data (Chengxiang Zhai)

10:10-10:30: Coffee Break

10:30-12:00: SESSION I
Identifying Purpose Behind Electoral Tweets (Saif Mohammad, Svetlana Kiritchenko, and Joel Martin)
Combining Strengths, Emotions and Polarities for Boosting Twitter Sentiment Analysis (Felipe Bravo-Marquez, Marcelo Mendoza, and Barbara Poblete)
Modelling Political Disaffection from Twitter Data (Corrado Monti, Alessandro Rozza, Giovanni Zappella, Matteo Zignani, Adam Arvidsson, and Elanor Colleoni)

12:00-13:30: Lunch Break

13:30-15:30: SESSION II
Enhancing Sentiment Extraction from Text by Means of Arguments (Lucas Carstens and Francesca Toni)
Evaluation of an Algorithm for Aspect-Based Opinion Mining Using a Lexicon-Based Approach (Florian Wogenstein, Johannes Drescher, Dirk Reinel, Sven Rill, and Jörg Scheidt)
Commonsense-Based Topic Modeling (Dheeraj Rajagopal, Daniel Olsher, Erik Cambria, and Kenneth Kwok)
Online Debate Summarization using Topic Directed Sentiment Analysis (Sarvesh Ranade, Jayant Gupta, Vasudeva Varma, and Radhika Mamidi)

15:30-16:00: Coffee Break

16:00-17:00: SESSION III
RBEM: A Rule Based Approach to Polarity Detection (Erik Tromp and Mykola Pechenizkiy)
Cross-lingual Polarity Detection with Machine Translation (Erkin Demirtas and Mykola Pechenizkiy)
Sentribute: Image Sentiment Analysis from a Mid-level Perspective (Jianbo Yuan, Quanzeng You, Sean Mcdonough, and Jiebo Luo)
• Closing Remarks

ChengXiang Zhai is an Associate Professor of Computer Science at the University of Illinois at Urbana-Champaign, where he is also affiliated with the Graduate School of Library and Information Science, Institute for Genomic Biology, and Department of Statistics. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. He worked at Clairvoyance Corp. as a Research Scientist and a Senior Research Scientist from 1997 to 2000. His research interests include information retrieval, text mining, natural language processing, machine learning, and biomedical informatics, in which he published over 150 research papers. He is an Associate Editor of ACM Transactions on Information Systems, and Information Processing and Management, and serves on the editorial board of Information Retrieval Journal. He is a program co-chair of ACM CIKM 2004, NAACL HLT 2007, and ACM SIGIR 2009. He is an ACM Distinguished Scientist and a recipient of multiple best paper awards, Alfred P. Sloan Research Fellowship, IBM Faculty Award, HP Innovation Research Program Award, and the Presidential Early Career Award for Scientists and Engineers (PECASE).

Opinionated text data such as blogs, forum posts, product reviews and online comments are increasingly available on the Web. They are very useful sources for public opinions about virtually any topics. However, because the opinions are scattered and abundant, it is a significant challenge for users to collect all the opinions about a topic and digest them efficiently. In this talk, I will present a suite of general statistical text mining methods that can help users integrate, summarize and analyze scattered online opinions to obtain actionable knowledge for decision making. Specifically, I will first present approaches to integration of scattered opinions by aligning them to a well-structured article or relevant ontology. Second, I will discuss several techniques for generating a concise opinion summary that can reveal the major sentiments and opinion points buried in large amounts of opinionated text data. Finally, I will present probabilistic general models for analyzing review data in depth to discover latent aspect ratings and relative weights placed by reviewers on different aspects. These methods are completely general and can thus help users integrate and analyze large amounts of online opinionated text data on any topic in any natural language.

• Erik Cambria, National University of Singapore (Singapore)
• Bing Liu, University of Illinois at Chicago (USA)
• Yongzheng Zhang, eBay Inc. (USA)
• Yunqing Xia, Tsinghua University (China)


WISDOM'12 (KDD 2012, August 12th, Beijing)                                               GO TO TOP

09:00-10:15: KEYNOTE
• Opening Remarks
Detecting Fake Opinions in Social Media (Bing Liu)

10:15-10:30: Coffee Break

10:30-12:00: SESSION I
A Bayesian Modeling Approach to Multi-Dimensional Sentiment Distributions Prediction (Yulan He)
Transverse Subjectivity Classification (Dinko Lambov and Gaël Dias)
Combining Lexicon and Learning based Approaches for Concept-Level Sentiment Analysis (Andrius Mudinas, Dell Zhang, and Mark Levene)

12:00-13:30: Lunch Break

13:30-15:15: SESSION II
A Unified Graph Model for Chinese Product Review Summarization Using Richer Information (He Huang and Chunping Li)
Retrieval Approach to Extract Opinions about People from Resource Scarce Language News Articles (Aditya Mogadala and Vasudeva Varma)
A Generic Approach to Generate Opinion Lists of Phrases for Opinion Mining Applications (Sven Rill, Johannes Drescher, Dirk Reinel, Joerg Scheidt, Oliver Schuetz, Florian Wogenstein, and Daniel Simon)
Finding Emotion in Image Descriptions (Morgan Ulinski, Victor Soto, and Julia Hirschberg)

15:15-15:30: Coffee Break

15:30-17:00: SESSION III
Predicting Collective Sentiment Dynamics from Time-series Social Media (Le Nguyen, Pang Wu, William Chan, Wei Peng, and Ying Zhang)
Crowdsourcing Recommendations from Social Sentiment (Yusheng Xie, Yu Cheng, Daniel Honbo, Kunpeng Zhang, Ankit Agrawal, and Alok Choudhary)
Fast Learning for Sentiment Analysis on Bullying (Jun-Ming Xu, Xiaojin Zhu, and Amy Bellmore)
• Closing Remarks

Bing Liu is a professor of Computer Science at University of Illinois at Chicago (UIC). He received his PhD in Artificial Intelligence from the University of Edinburgh. Before joining UIC, he was with the National University of Singapore. His current research interests include opinion mining and sentiment analysis, Web mining, and data mining. He has published extensively in leading conferences and journals in these fields. He has also written a textbook titled “Web Data Mining: Exploring Hyperlinks, Contents and Usage Data” published by Springer. The second edition of the book came out in July 2011. On professional services, Liu has served as program chairs of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), IEEE International Conference on Data Mining (ICDM), ACM Conference on Web Search and Data Mining (WSDM), SIAM Conference on Data Mining (SDM), ACM Conference on Information and Knowledge Management (CIKM), and Pacific Asia Conference on Data Mining (PAKDD). Additionally, he has also served as associate editors of IEEE Transactions on Knowledge and Data Engineering (TKDE), Journal of Data Mining and Knowledge Discovery (DMKD), and SIGKDD Explorations, and is on the editorial boards of several other journals.

Opinions from social media are increasingly used by individuals and organizations for making purchase decisions and making choices at elections and for marketing and product design. Positive opinions often mean profits and fames for businesses and individuals, which, unfortunately, give strong incentives for people to game the system by posting fake opinions or reviews to promote or to discredit some target products, services, organizations, individuals, and even ideas without disclosing their true intentions, or the person or organization that they are secretly working for. Such individuals are called opinion spammers and their activities are called opinion spamming. Opinion spamming about social and political issues can even be frightening as they can warp opinions and mobilize masses into positions counter to legal or ethical mores. It is safe to say that as opinions in social media are increasingly used in practice, opinion spamming will become more and more rampant and sophisticated, which presents a major challenge for their detection. However, they must be detected in order to ensure that the social media is a trusted source of public opinions, rather than is full of fake opinions, lies, and deceptions. In this talk, I will introduce this research topic and discuss some state-of-the-art opinion spam detection techniques.

• Erik Cambria, National University of Singapore (Singapore)
• Yongzheng Zhang, eBay Inc. (USA)
• Yunqing Xia, Tsinghua University (China)
• Newton Howard, MIT Media Laboratory (USA)