Monday, January 28, 2013


An extension of text mining-Quantitative Analysis
In a digital world there is always an accumulation of large volumes of data. Text mining plays a vital role in extracting desired information from large voluminous data. Quantitative analysis starts to make things easier only when we start working on a scale where it’s impossible for a human reader to hold everything in memory. Quantitative Analysis requires a larger collection of data, as it requires more context in order to be processed. One of the most important aspects of quantitative analysis is the organization of documents. The categorization can be done in different senses, depending on the application and the way the data is initially presented. An information retrieval is used to retrieve documents that match a query which is extensively used in the form of search engines. The next form is a supervised classification which categorizes text with respect to certain keywords. An algorithm can be written to identify a combination of factors or ‘keywords’ such as genre, author and so on to categorize data in certain desired formats. An unsupervised form of categorization is used to subdivide a group of documents more on general lines rather than pre-determined categories as used earlier. This might generate some patterns the user is unaware of. This might result in favorable as well as unfavorable outcomes. For example this feature is used to Amazon where the page shows you “things recommended for you” using an unsupervised clustering. A user may never specifically search for these products but these are generated based on a system based analysis that a person might be interested in these products.  The system generates this similarity based on an earlier search by the user of a different product and it matches certain keywords to a more general category of classification.
These techniques can achieve great results from merely just crude knowledge about the data. This takes it a step beyond word counting, hash functions and so on. Simple statistical rules can be put to great use while implementing quantitative analysis generating amazing results.
This is one of the many other features of quantitative analysis. This can be extended to topic modeling, entity extraction, contrasting vocabulary and many more topics. I hope to write about these topics as well once I get a better understanding and see how it can be actually implemented in simple applications.  However quantitative analysis requires some level of programming in order for it to be successfully imparted. Python lets you use some of these features and it could be used as a strong tool in quantitative analysis.

References:   
1.      The Stone and the Shell-Historical questions raised by a quantitative approach to language
2.       Text Mining and the ‘structuralist’ theories of culture 

1 comment: