Saturday, 5 November 2016

Lexicon Based Approach to Sentiment Analyzer

Lexicon based approach to Sentiment analyzer uses external lexical resources to identify the sentiment/polarity of given text.

In general, Sentiment analysis also known as Opinion mining is the process of determining the emotional tone behind a series of words, and is used to gain an understanding of attitudes, opinions and emotions expressed. Sentiment analysis uses natural language processing, text analysis and computational linguistics in order to identify and extract subjective information from a piece of text or document.

At the very basic level, sentiment analysis is used to identify and classify the polarity of given document or text, to identify whether the overall nature of given text is Positive or Negative. There are two methods for sentiment analysis,
  1. Supervised Learning Method [Machine Learning Based] - The supervised learning method uses statistical machine learning techniques to establish a model from a large corpus of documents. A set of sample opinions forms the training data on which the model is built.
  2. Unsupervised Learning Method [Lexicon Based] - In unsupervised learning, external lexical resources are used having a polarity score associated with each term. The sentiment of whole text depends upon the sentiment of each term which compose it.
In this article we are going to focus only on unsupervised learning method.

Methodology -

In order to understand Lexicon based approach, you must firstly understand what are Micro-phrases. Micro-phrases are formed whenever stopping words are encountered in a text. Conjunctions, punctuation marks and adverbs are used as stopping words. Example, 
I love  and  care for my family.
Here and will be treated as a stopping word and hence will be removed and the sentence will be decomposed into two sub sentences i.e Micro-phrases I love and care for my family.

Now, we will have to use an external lexical resource in order to obtain the polarity strength of each term. There are numerous resources which provide list of terms with their associated polarities. The ones which have I used are,
  • SentiWordNet -
    SentiWordNet is a lexical resource for opinion mining containing roughly 117000 terms (as of now). SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity, negativity and objectivity.
  • AFINN - AFINN is also a lexical resource containing roughly 2500 terms (as of now). AFINN model associates with each term a signed integer value from -5 (highly negative) to +5 (highly postive).
The polarity of text/document depends upon the polarity of micro-phrases which composes it. And the polarity of micro-phrases depends upon the polarity of individual terms which composes it. So in order to find the polarity of given text we will have to find the polarity of each term except stopping words.
Below given is the basic mathematical equation to calculate polarity of given text.

Conclusion :

The Lexicon Based Approach to Sentiment analyzer despite being the easiest technique in terms of implementation can still produce accuracy level up to 70%. One must understand that the level of accurate results provided by lexicon based approach depends solely upon the external resources used. Use of multiple lexical resources and use of lexicon based approach in conjunction with machine learning approach is fairly common in order to further increase the accuracy of Sentiment Analyzer.

Yogesh Mandge

Author & Editor

I am a Software Developer, Software Engineer, Computer Scientist, Coder, Programmer, and/or Computer Nerd, depending on who I am talking to, what I am talking about and what the purpose of conversation is.


Post a Comment