WISDOM (Workshop on Issues of Sentiment Discovery and Opinion Mining) aims to explore how the wisdom of the crowds is affecting (and will affect) the evolution of the Web and of businesses gravitating around it. In particular, the workshop series explores two different stages of sentiment analysis: the former focusing on the identification of opinionated text over the Web, the latter focusing on the classification of such text either in terms of polarity detection or emotion recognition.

WISDOM'14 (ICML 2014, June 25th, Beijing)
WISDOM'13 (KDD 2013, August 11th, Chicago)
WISDOM'12 (KDD 2012, August 12th, Beijing)

The distillation of knowledge from social media is an extremely difficult task as the content of today's Web, while perfectly suitable for human consumption, remains hardly accessible to machines. The opportunity to capture the opinions of the general public about social events, political movements, company strategies, marketing campaigns, and product preferences has raised growing interest both within the scientific community, leading to many exciting open challenges, as well as in the business world, due to the remarkable benefits to be had from marketing and financial market prediction.

Statistical NLP has been the mainstream NLP research direction since late 1990s. It relies on language models based on popular machine-learning algorithms such as maximum-likelihood, expectation maximization, conditional random fields, and support vector machines. By feeding a large training corpus of annotated texts to a machine-learning algorithm, it is possible for the system to not only learn the valence of keywords, but also to take into account the valence of other arbitrary keywords, punctuation, and word co-occurrence frequencies. However, standard statistical methods are generally semantically weak as they merely focus on lexical co-occurrence elements with little predictive value individually.

Endogenous NLP, instead, involves the use of machine-learning techniques to perform semantic analysis of a corpus by building structures that approximate concepts from a large set of documents. It does not involve prior semantic understanding of documents; instead, it relies only on the endogenous knowledge of these (rather than on external knowledge bases). The advantages of this approach over the knowledge engineering approach are effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. Endogenous NLP includes methods based either on lexical semantics, which focuses on the meanings of individual words (e.g., LSA, LDA, and MapReduce), or compositional semantics, which looks at the meanings of sentences and longer utterances (e.g., HMM, association rule learning, and probabilistic generative models).

WISDOM aims to provide an international forum for researchers in the field of machine learning for opinion mining and sentiment analysis to share information on their latest investigations in social information retrieval and their applications both in academic research areas and industrial sectors. The broader context of the workshop comprehends opinion mining, social media marketing, information retrieval, and natural language processing. Topics of interest include but are not limited to:
• Endogenous NLP for sentiment analysis
• Sentiment learning algorithms
• Semantic multi-dimensional scaling for sentiment analysis
• Big social data analysis
• Opinion retrieval, extraction, classification, tracking and summarization
• Domain adaptation for sentiment classification
• Time evolving sentiment analysis
• Emotion detection
• Concept-level sentiment analysis
• Topic modeling for aspect-based opinion mining
• Multimodal sentiment analysis
• Sentiment pattern mining
• Affective knowledge acquisition for sentiment analysis
• Biologically-inspired opinion mining
• Content-, concept-, and context-based sentiment analysis

WISDOM'14 (ICML 2014, June 25th, Beijing)                                               GO TO TOP

09:15-10:20: KEYNOTE
Opening Remarks (Yunqing Xia)
Instance-based domain adaptation (Rui Xia)

10:20-10:40: Coffee Break

10:40-12:00: SESSION I
Multi-granularity comparison on Chinese and English reviews: A case study in IT product domain (Qingqing Zhou)
WEMOTE - Word embedding based minority oversampling technique for imbalanced emotion and sentiment classification (Chen Tao)
Disambiguating word sentiment polarity through Bayesian modeling and opinion-level features (Yunqing Xia)

12:00-14:00: Lunch Break

14:00-15:20: SESSION II
Cross-lingual Twitter polarity detection via projection across word-aligned corpora (Mohamed Abdel-Hady)
Sentiment analysis in Turkish social media (Cumali Türkmenoğlu)
A two-Level learning hierarchy of nonnegative matrix factorization based topic modeling for main topic extraction (Hendri Murfi)
Closing Remarks (Yunqing Xia)

Rui Xia is currently an assistant professor at School of Computer Science and Engineering, Nanjing University of Science and Technology, China. His research interests include machine learning, natural language processing, text mining and sentiment analysis. He received the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences in 2011. He has published several refereed conference papers in the areas of artificial intelligence and natural language processing, including IJCAI, AAAI, ACL, COLING, etc. He served on the program commitee member of several international conferences and workshops including IJCAI, COLING, WWW Workshop on MABSDA, KDD Workshop on WISDOM and ICDM Workshop on SENTIRE. He is a member of ACM, ACL and CCF, and he is an operating committee member of YSSNLP.

One one hand, most of the existing domain adaptation studies in the field of NLP belong to the feature-based adaptation, while the research of instance-based adaptation is very scarce. One the other hand, due to the explosive growth of the Internet online reviews, we can easily collect a large amount of labeled reviews from different domains. But only some of them are beneficial for training a desired target-domain sentiment classifier. Therefore, it is important for us to identify those samples that are the most relevant to the target domain and use them as training data. To address this problem, we propose two instance-based domain adpatation methods for NLP applications. The first one is called PUIS and PUIW, which conduct instance adaptation based on instance selection and instance weighting via PU learning. The second one is called in-target-domain logistic approximation (ILA), where we conduct instance apdatation by a joint logistic approximation model. Both of methods achieve sound performance in high-dimentional NLP tasks such as cross-domain text categorization and sentiment classification.

• Yunqing Xia, Tsinghua University (China)
• Erik Cambria, Nanyang Technological University (Singapore)
• Yongzheng Zhang, LinkedIn Inc. (USA)
• Newton Howard, MIT Media Laboratory (USA)


WISDOM'13 (KDD 2013, August 11th, Chicago)                                               GO TO TOP

09:00-10:10: KEYNOTE
• Opening Remarks
Statistical Methods for Integration and Analysis of Opinionated Text Data (Chengxiang Zhai)

10:10-10:30: Coffee Break

10:30-12:00: SESSION I
Identifying Purpose Behind Electoral Tweets (Saif Mohammad, Svetlana Kiritchenko, and Joel Martin)
Combining Strengths, Emotions and Polarities for Boosting Twitter Sentiment Analysis (Felipe Bravo-Marquez, Marcelo Mendoza, and Barbara Poblete)
Modelling Political Disaffection from Twitter Data (Corrado Monti, Alessandro Rozza, Giovanni Zappella, Matteo Zignani, Adam Arvidsson, and Elanor Colleoni)

12:00-13:30: Lunch Break

13:30-15:30: SESSION II
Enhancing Sentiment Extraction from Text by Means of Arguments (Lucas Carstens and Francesca Toni)
Evaluation of an Algorithm for Aspect-Based Opinion Mining Using a Lexicon-Based Approach (Florian Wogenstein, Johannes Drescher, Dirk Reinel, Sven Rill, and Jörg Scheidt)
Commonsense-Based Topic Modeling (Dheeraj Rajagopal, Daniel Olsher, Erik Cambria, and Kenneth Kwok)
Online Debate Summarization using Topic Directed Sentiment Analysis (Sarvesh Ranade, Jayant Gupta, Vasudeva Varma, and Radhika Mamidi)

15:30-16:00: Coffee Break

16:00-17:00: SESSION III
RBEM: A Rule Based Approach to Polarity Detection (Erik Tromp and Mykola Pechenizkiy)
Cross-lingual Polarity Detection with Machine Translation (Erkin Demirtas and Mykola Pechenizkiy)
Sentribute: Image Sentiment Analysis from a Mid-level Perspective (Jianbo Yuan, Quanzeng You, Sean Mcdonough, and Jiebo Luo)
• Closing Remarks

ChengXiang Zhai is an Associate Professor of Computer Science at the University of Illinois at Urbana-Champaign, where he is also affiliated with the Graduate School of Library and Information Science, Institute for Genomic Biology, and Department of Statistics. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. He worked at Clairvoyance Corp. as a Research Scientist and a Senior Research Scientist from 1997 to 2000. His research interests include information retrieval, text mining, natural language processing, machine learning, and biomedical informatics, in which he published over 150 research papers. He is an Associate Editor of ACM Transactions on Information Systems, and Information Processing and Management, and serves on the editorial board of Information Retrieval Journal. He is a program co-chair of ACM CIKM 2004, NAACL HLT 2007, and ACM SIGIR 2009. He is an ACM Distinguished Scientist and a recipient of multiple best paper awards, Alfred P. Sloan Research Fellowship, IBM Faculty Award, HP Innovation Research Program Award, and the Presidential Early Career Award for Scientists and Engineers (PECASE).

Opinionated text data such as blogs, forum posts, product reviews and online comments are increasingly available on the Web. They are very useful sources for public opinions about virtually any topics. However, because the opinions are scattered and abundant, it is a significant challenge for users to collect all the opinions about a topic and digest them efficiently. In this talk, I will present a suite of general statistical text mining methods that can help users integrate, summarize and analyze scattered online opinions to obtain actionable knowledge for decision making. Specifically, I will first present approaches to integration of scattered opinions by aligning them to a well-structured article or relevant ontology. Second, I will discuss several techniques for generating a concise opinion summary that can reveal the major sentiments and opinion points buried in large amounts of opinionated text data. Finally, I will present probabilistic general models for analyzing review data in depth to discover latent aspect ratings and relative weights placed by reviewers on different aspects. These methods are completely general and can thus help users integrate and analyze large amounts of online opinionated text data on any topic in any natural language.

• Erik Cambria, National University of Singapore (Singapore)
• Bing Liu, University of Illinois at Chicago (USA)
• Yongzheng Zhang, eBay Inc. (USA)
• Yunqing Xia, Tsinghua University (China)


WISDOM'12 (KDD 2012, August 12th, Beijing)                                               GO TO TOP

09:00-10:15: KEYNOTE
• Opening Remarks
Detecting Fake Opinions in Social Media (Bing Liu)

10:15-10:30: Coffee Break

10:30-12:00: SESSION I
A Bayesian Modeling Approach to Multi-Dimensional Sentiment Distributions Prediction (Yulan He)
• Transverse Subjectivity Classification (Dinko Lambov and Gaël Dias)
• Combining Lexicon and Learning based Approaches for Concept-Level Sentiment Analysis (Andrius Mudinas, Dell Zhang, and Mark Levene)

12:00-13:30: Lunch Break

13:30-15:15: SESSION II
• A Unified Graph Model for Chinese Product Review Summarization Using Richer Information (He Huang and Chunping Li)
Retrieval Approach to Extract Opinions about People from Resource Scarce Language News Articles (Aditya Mogadala and Vasudeva Varma)
A Generic Approach to Generate Opinion Lists of Phrases for Opinion Mining Applications (Sven Rill, Johannes Drescher, Dirk Reinel, Joerg Scheidt, Oliver Schuetz, Florian Wogenstein, and Daniel Simon)
• Finding Emotion in Image Descriptions (Morgan Ulinski, Victor Soto, and Julia Hirschberg)

15:15-15:30: Coffee Break

15:30-17:00: SESSION III
• Predicting Collective Sentiment Dynamics from Time-series Social Media (Le Nguyen, Pang Wu, William Chan, Wei Peng, and Ying Zhang)
Crowdsourcing Recommendations from Social Sentiment (Yusheng Xie, Yu Cheng, Daniel Honbo, Kunpeng Zhang, Ankit Agrawal, and Alok Choudhary)
Fast Learning for Sentiment Analysis on Bullying (Jun-Ming Xu, Xiaojin Zhu, and Amy Bellmore)
• Closing Remarks

Bing Liu is a professor of Computer Science at University of Illinois at Chicago (UIC). He received his PhD in Artificial Intelligence from the University of Edinburgh. Before joining UIC, he was with the National University of Singapore. His current research interests include opinion mining and sentiment analysis, Web mining, and data mining. He has published extensively in leading conferences and journals in these fields. He has also written a textbook titled “Web Data Mining: Exploring Hyperlinks, Contents and Usage Data” published by Springer. The second edition of the book came out in July 2011. On professional services, Liu has served as program chairs of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), IEEE International Conference on Data Mining (ICDM), ACM Conference on Web Search and Data Mining (WSDM), SIAM Conference on Data Mining (SDM), ACM Conference on Information and Knowledge Management (CIKM), and Pacific Asia Conference on Data Mining (PAKDD). Additionally, he has also served as associate editors of IEEE Transactions on Knowledge and Data Engineering (TKDE), Journal of Data Mining and Knowledge Discovery (DMKD), and SIGKDD Explorations, and is on the editorial boards of several other journals.

Opinions from social media are increasingly used by individuals and organizations for making purchase decisions and making choices at elections and for marketing and product design. Positive opinions often mean profits and fames for businesses and individuals, which, unfortunately, give strong incentives for people to game the system by posting fake opinions or reviews to promote or to discredit some target products, services, organizations, individuals, and even ideas without disclosing their true intentions, or the person or organization that they are secretly working for. Such individuals are called opinion spammers and their activities are called opinion spamming. Opinion spamming about social and political issues can even be frightening as they can warp opinions and mobilize masses into positions counter to legal or ethical mores. It is safe to say that as opinions in social media are increasingly used in practice, opinion spamming will become more and more rampant and sophisticated, which presents a major challenge for their detection. However, they must be detected in order to ensure that the social media is a trusted source of public opinions, rather than is full of fake opinions, lies, and deceptions. In this talk, I will introduce this research topic and discuss some state-of-the-art opinion spam detection techniques.

• Erik Cambria, National University of Singapore (Singapore)
• Yongzheng Zhang, eBay Inc. (USA)
• Yunqing Xia, Tsinghua University (China)
• Newton Howard, MIT Media Laboratory (USA)