Tuesday, February 26, 2013

Stop-Word Removal


Stop-words are common words that do not have so much meaning in a retrieval system. Stop-words are a part of natural language with that a text miner will encounter. The reason that stop-words should be removed from a text is that they make the text look heavier and less important for analysts and the stop-words are not necessary for the analysis and so we do get some data reduction by eliminating stop-words.  A query done by using stop-words would have a weak ability to categorize the text because of these words return each element of the data set as a result (Adsiz, 2006). In enterprise search, all stop words, for example, common words like a and the, are removed from multiple word queries to increase search performance. There is not one master too many list of stop words which all tools use. Any group of words can be chosen as the stop words for a given purpose, depending on their importance and data reduction needs.  For some search machines, these are some of the most common, short function words, such as the, is, at, which and on. In this case, stop words can cause problems when searching for phrases that include them, particularly in names such as 'The Who', 'The The', or 'Take That'. Other search engines remove some of the most common words including lexical words, such as "want"—from query in order to improve performance (Stackoverflow, 2008).



1-       Adsiz, A., (Ahmet Yesevi University ). (2006). Dissertation: Text Mining.
2-       Stackoverflow. (2008). http://blog.stackoverflow.com/2008/12/podcast-32/

2 comments:


  1. when i am submitting my website in google search engine i am facing the problem. www.instamag.in
    many of my page are not indexed...let me know how to over come this...

    ReplyDelete

  2. when i am submitting my website in google search engine i am facing the problem. www.instamag.in
    many of my page are not indexed...let me know how to over come this...

    ReplyDelete