An extension of text
mining-Quantitative Analysis
In a digital world there is always an
accumulation of large volumes of data. Text mining plays a vital role in
extracting desired information from large voluminous data. Quantitative analysis starts to make
things easier only when we start working on a scale where it’s impossible for a
human reader to hold everything in memory. Quantitative Analysis requires a
larger collection of data, as it requires more context in order to be
processed. One of the most important aspects of quantitative analysis is the
organization of documents. The categorization can be done in different senses,
depending on the application and the way the data is initially presented. An
information retrieval is used to retrieve documents that match a query which is
extensively used in the form of search engines. The next form is a
supervised classification which categorizes text with respect to certain keywords.
An algorithm can be written to identify a combination of factors or ‘keywords’
such as genre, author and so on to categorize data in certain desired formats.
An unsupervised form of categorization is used to subdivide a group of
documents more on general lines rather than pre-determined categories as used
earlier. This might generate some patterns the user is unaware of. This might
result in favorable as well as unfavorable outcomes. For example this feature
is used to Amazon where the page shows you “things recommended for you”
using an unsupervised clustering. A user may never specifically search for
these products but these are generated based on a system based analysis that a
person might be interested in these products. The system generates this similarity based on
an earlier search by the user of a different product and it matches certain
keywords to a more general category of classification.
These techniques can achieve great results from merely just
crude knowledge about the data. This takes it a step beyond word counting, hash
functions and so on. Simple statistical rules can be put to great use while
implementing quantitative analysis generating amazing results.
This is one of the many other features of quantitative
analysis. This can be extended to topic modeling, entity extraction,
contrasting vocabulary and many more topics. I hope to write about these topics
as well once I get a better understanding and see how it can be actually
implemented in simple applications. However
quantitative analysis requires some level of programming in order for it to be
successfully imparted. Python lets you use some of these features and it could
be used as a strong tool in quantitative analysis.
References:
1.
The
Stone and the Shell-Historical questions raised by a quantitative approach to language
Prashant,
ReplyDeleteThank you for sharing.
Fadel