In general, Sentiment analysis also known as Opinion mining is the process of determining the emotional tone behind a series of words, and is used to gain an understanding of attitudes, opinions and emotions expressed. Sentiment analysis uses natural language processing, text analysis and computational linguistics in order to identify and extract subjective information from a piece of text or document.
At the very basic level, sentiment analysis is used to identify and classify the polarity of given document or text, to identify whether the overall nature of given text is Positive or Negative. There are two methods for sentiment analysis,
- Supervised Learning Method [Machine Learning Based] - The supervised learning method uses statistical machine learning techniques to establish a model from a large corpus of documents. A set of sample opinions forms the training data on which the model is built.
- Unsupervised Learning Method [Lexicon Based] - In unsupervised learning, external lexical resources are used having a polarity score associated with each term. The sentiment of whole text depends upon the sentiment of each term which compose it.
Methodology -
In order to understand Lexicon based approach, you must firstly understand what are Micro-phrases. Micro-phrases are formed whenever stopping words are encountered in a text. Conjunctions, punctuation marks and adverbs are used as stopping words. Example,
I love and care for my family.
Here and will be treated as a stopping word and hence will be removed and the sentence will be decomposed into two sub sentences i.e Micro-phrases I love and care for my family.
Now, we will have to use an external lexical resource in order to obtain the polarity strength of each term. There are numerous resources which provide list of terms with their associated polarities. The ones which have I used are,
- SentiWordNet -
SentiWordNet is a lexical resource for opinion mining containing roughly 117000 terms (as of now). SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity, negativity and objectivity.
[link] - AFINN - AFINN is also a lexical resource containing roughly 2500 terms (as of now). AFINN model associates with each term a signed integer value from -5 (highly negative) to +5 (highly postive).
[link]
Below given is the basic mathematical equation to calculate polarity of given text.
Conclusion :
The Lexicon Based Approach to Sentiment analyzer despite being the easiest technique in terms of implementation can still produce accuracy level up to 70%. One must understand that the level of accurate results provided by lexicon based approach depends solely upon the external resources used. Use of multiple lexical resources and use of lexicon based approach in conjunction with machine learning approach is fairly common in order to further increase the accuracy of Sentiment Analyzer.