Tuesday, February 12, 2013

Sentiment Analysis with Rapidminer

Sentiment analysis or opinion mining is an application of Text Analytics to identify and extract subjective information in source materials.

A basic task in sentiment analysis is classifying an expressed opinion in a document, a sentence or an entity feature as positive or negative. This tutorial explains the usage of sentiment analysis in Rapidminer. The example presented here gives the list of movies and its review such as Positive or Negative. This program implements Precision and Recall method.  Precision is the probability that a (randomly selected) retrieved document is relevant. Recall is the probability that a (randomly selected) relevant document is retrieved in a search. Or high recall means that an algorithm returned most of the relevant results. High precision means that an algorithm returned more relevant results than irrelevant.

At first, both positive and negative reviews of a certain movie are taken. All of the words are stemmed into root words. Then the words are stored in different polarity(positive and negative). Both vector wordlist and model are created. Then, the required list of movies is given as an input.  Model compares each and every word from the given list of movies with that of words which come under different polarity stored earlier. The movie review is estimated based on the majority of number of words that occur under a polarity. For example, when you look at Django Unchained, the reviews are compared with the vector wordlist created at the beginning. The highest number of words comes under positive polarity. So the outcome is Positive.  Same happens for Negative outcome.
First step for implementing this analysis is Processing the document from data i.e. extracting the positive and negative reviews of a movie and storing it in different polarity. The model is shown in Figure 1.

Figure 1

Under Process document, click on the Edit List on the right. Load the positive and negative reviews under different class name "Positive" and "Negative" as shown in Figure 2.

Figure 2

Under Process Document operator, nested operation takes place such as Tokenizing the words, Filtering the Stop words, Stemming the words into root words and Filtering the tokens between 4 and 25  characters as shown in Figure 3.

Figure 3

Then two operators are used such as Store and Validation operator as shown in Figure 1. Store operator is used to output word vector to a file and directory of our choosing. Validation operator(Cross-validation) is a standard way to assess the accuracy and validity of a statistical model. Our data set is divided into two parts, a training set and a test set. The model is trained on the training set only and its accuracy is evaluated on the test set. This is repeated n number of times. Double click on validation operator. There will two panels- Training and Testing. Under Training panel, Linear Support Vector Machine(SVM) is used which is a popular set of classifier since the function is a linear combination of all the input variables. In order to test the model, we use the ‘Apply Model’ operator to apply the training set to our test set. To measure the model accuracy we use the ‘Performance’ operator. The operations under Validation is shown in Figure 4.

Figure 4

Then run the model. The result of Class Recall % and Precision % is shown in Figure 5. The model and vector wordlist are stored in a Repository.

Figure 5

Then retrieve both the model and vector wordlist from the Repository you have stored earlier. Then connect out from the retrieve wordlist to the process document operator shown in Figure 6. The operations under Process document is same shown in Figure 3.

Figure 6

Then click on Process Document operator and click edit list on the right. This time I have added the list of 5 movie reviews from Rottentomatoes website and stored it in a directory. Assign the class name as unlabeled shown in Figure 7.

Figure 7

The Apply Model operator takes a model from a Retrieve operator and unlabeled data from Process document as input and outputs the applied model to the ‘lab’ port, so connect that to the ‘res’ (results) port. The result is shown below. When you look at Les Miserables, there is 86.4% confidence that it is positive and 13.6% as negative because the match of the reviews with wordlist under positive polarity is higher compared to negative polarity.  

Figure 8


  1. Hi,

    I could not get the expected output. I took a dataset from cornell site.
    Can you please elaborate the above process with screenshots (diagrams) at each and every step, at each node creation. That would be really helpful. You can also e-mail me a word or pdf document at annaji.sankar at gmail.com


  2. I think this is an great blogs. Such a very informative and creative contents. These concept is good for these knowledge.I like it and help me to development very well.Thank you for this brief explanations.
    Analytics Training in Chennai

  3. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

  4. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in RAPIDMINER kindly contact us http://www.maxmunus.com/contact
    MaxMunus Offer World Class Virtual Instructor led training on RAPIDMINER We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Saurabh Srivastava
    E-mail: saurabh@maxmunus.com
    Skype id: saurabhmaxmunus
    Ph:+91 8553576305 / 080 - 41103383

  5. nice blog has been shared by you. before i read this blog i didn't have any knowledge about this but now i got some knowledge so keep on sharing such kind of an interesting blogs.
    aws vpc interview questions

  6. I am a regular reader of your blog and being students it is great to read that your responsibilities have not prevented you from continuing your study and other activities. Love
    java training in annanagar | java training in chennai

    java training in marathahalli | java training in btm layout

    java training in rajaji nagar | java training in jayanagar

    java training in chennai

  7. Greetings. I know this is somewhat off-topic, but I was wondering if you knew where I could get a captcha plugin for my comment form? I’m using the same blog platform like yours, and I’m having difficulty finding one? Thanks a lot.

    AWS Training in Bangalore | Amazon Web Services Training in Bangalore

    Amazon Web Services Training in Pune | Best AWS Training in Pune

    AWS Online Training | Online AWS Certification Course - Gangboard

  8. Your story is truly inspirational and I have learned a lot from your blog. Much appreciated.
    Python training in marathahalli
    Python training institute in pune

  9. Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.
    Devops training in sholinganallur
    Devops training in velachery
    Devops training in annanagar
    Devops training in tambaram

  10. myTectra Placement Portal is a Web based portal brings Potentials Employers and myTectra Candidates on a common platform for placement assistance

  11. I simply want to tell you that I am just beginner to blogs and absolutely enjoyed you’re website. More than likely I’m planning to bookmark your blog . You really come with awesome stories.

    RPA Course in Chennai

    DevOps Training

  12. Superb. I really enjoyed very much with this article here. Really it is an amazing article I had ever read. I hope it will help a lot for all. Thank you so much for this amazing posts and please keep update like this excellent article.thank you for sharing such a great blog with us. expecting for your.
    Blue prism training in bangalore
    Blue prism training bangalore
    Blue prism classes in bangalore
    Blue Prism Training Centers in Bangalore
    Blue Prism Institute in Bangalore


  13. Excellent post, great quality and most important all above mentioned point very useful actionable advice.
    Data Science training in Chennai | Data Science Training Institute in Chennai | Data Science Course in Chennai

  14. Thank you for sharing your article. Great efforts put it to find the list of articles which is very useful to know, Definitely will share the same to other forums.

    best openstack training in chennai | openstack course fees in chennai | openstack certification in chennai | openstack training in chennai velachery