At SenticNet, we are working on several projects spanning from fundamental knowledge representation problems to applications of commonsense reasoning in contexts such as big social data analysis and human-computer interaction. Each project is led by a single member of the Sentic Team but, in fact, every project is highly interdependent and interconnected with each other as we are all driven by the same vision. Some of the current projects include:
• Singlish SenticNet (Lead Investigator: Danyuan Ho)
• Mood of Singapore (Lead Investigator: Frank Xing)
• One Belt, One Road, One Sentiment? (Lead Investigator: Ranjan Satapathy)
We are also working on several fundamental problems of natural language processing, alongside with applications of sentiment analysis (some of them listed below).
KNOWLEDGE REPRESENTATION (Lead Investigator: Rajiv Bajpai)
We harvest affective commonsense knowledge from the Open Mind Common Sense (OMCS), WordNet-Affect, and the game engine for commonsense knowledge acquisition (GECKA) and represent it redundantly at three levels: semantic network, matrix, and vector space. In particular, the collected or crowdsourced pieces of knowledge are firstly integrated in a semantic network as triples of the format <concept-relationship-concept>. Secondly, the graph is represented as a matrix having concepts as rows and the combination <relationship-concept> as columns. Finally, multi-dimensionality reduction is applied to such a matrix in order to create a vector space representation of commonsense knowledge.
SUBJECTIVITY DETECTION (Lead Investigator: Iti Chaturvedi)
Subjectivity detection can prevent a sentiment classifier from considering irrelevant or potentially misleading text. Since different attributes may correspond to different opinions in the lexicon of different languages, we resort to multiple kernel learning (MKL) to simultaneously optimize the different modalities. Previous approaches to MKL for sentence classifiers are computationally slow and lack any hierarchy when grouping features into different kernels. Instead, we exploit deep recurrent convolution neural networks to reduce the dimensionality of the problem. In particular, the lower layers of the deep model are abstract while the higher layers become more detailed connecting attributes to opinions.
MULTIMODAL SENTIMENT ANALYSIS (Lead Investigator: Soujanya Poria)
A huge number of videos are posted every day on social media platforms such as Facebook and YouTube. This makes the Internet an unlimited source of information. In the coming decades, coping with such information and mining useful knowledge from it will be an increasingly difficult task. We are developing a novel methodology for multimodal sentiment analysis, which consists in harvesting sentiments from Web videos by exploiting a model that uses audio, visual and textual modalities as sources of information. We used both feature- and decision-level fusion methods to merge affective information extracted from multiple modalities.
MULTILINGUAL SENTIMENT ANALYSIS (Lead Investigator: Haiyun Peng)
The ability to analyze online user generated contents for sentiment on products or services has become a de-facto skillset for many companies. There is, in fact, a need to analyze the unique nature of the online social media language as this is often coupled with localized slangs to express the true feelings of the local community on top of the usual formal language. To this end, we are developing a localization toolkit for creating non-English versions of SenticNet in a time- and cost-effective way. This is achieved by exploiting online facilities such as Web dictionaries and translation engines. The challenging issues are three: firstly, a concept in English may be mapped to multiple concepts in the target language; secondly, the polarity of some concepts in the local language may be different from the English counterpart; lastly, some entries may still be left untranslated because they are out-of-vocabulary concepts.
AUTOMATIC SPEECH RECOGNITION (Lead Investigator: Yukun Ma)
Concepts are critical semantics capturing high-level knowledge of the human language. Recognizing the concepts used to represent the underlying meanings of spoken documents is a key step in order to build speech-centric information systems, e.g., search engines, dialogue systems, and multimodal sentiment analysis engines. We mainly address two issues faced by recognizing concepts and named entities from speech data: data sparseness due to the shortage of training data and informal style of spoken language and the prevalence of errors propagated from automatic speech recognition.
STOCK MARKET PREDICTION (Lead Investigator: Sandro Cavallari)
Mining microblogging data to forecast stock market behavior is a very recent research topic that appears to present promising results. We are developing a computational model that accounts for investor sentiment and attention to predict key stock market variables, such as returns, volatility and volume. Several arguments support this approach. For example, some studies have shown that individuals’ financial decisions are significantly affected by their emotions and mood. Also, the community of users that utilizes these microblogging services to share information about stock market issues has grown and is potentially more representative of all investors.