Wednesday, February 27, 2013

KNIME Software



KNIME, the Konstanz Information Miner, is an open source data analytics, reporting and integration platform. KNIME was developed by University of Konstanz Visual Data Mining research group based on Eclipse Rich Client Platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining concept. Since it is based on the Eclipse, it provides assembly of nodes for data preprocessing (ETL: Extraction, Transformation, Loading) using a graphical user interface, for modeling and data analysis and visualization. KNIME is software to allow users to create their own module providing a software development kit.

                  

                          Figure 1- KNIME screen figure                                                   Figure 2- Data Source Adding Objects

 KNIME can be used for every operating system supported by the Eclipse platform. KNIME includes its own JRE (Java Runtime Environment). Thus, it does not need to be installed in operating.  Since 2006, KNIME is used usually in pharmaceutical research, but is also used in other areas like CRM (customer data analysis), business intelligence and financial data analysis.

Data Sources

KNIME can import data from the text files (TXT) or attribute-relation format files (ARFF), TABLE format files.
The cool thing with KNIME to import data is that it allows user to define how much data you keep in your memory and how much you keep in your hard disk. This feature of the KNIME decreases the chances to have over memory problem working on the large data sets.
Furthermore, it supports importing data using SQL and using Predictive Model Markup Language which based on XML language.
KNIME, in addition to importing data, it has Data Write components providing export process.


Data Preprocessing

KNIME does not have any special component for preprocessing but there are some algorithms can be used for data preprocessing. 

Data Mining Algorithms

KNIME has most algorithms used for data mining literature such as Support Vector Machines, Bayes and Multidimensional Scaling. In addition allowing using different advanced algorithms, KNIME also supports to use some statistical methods such as regression, correlation, and correlation filter on data streaming design.
 
                                                          

        Figure 3- KNIME Panel for the selection                                Figure 4- KNIME visualization tools  
             of  the data mining algorithm               


Data Streaming Design

Designing the objects in KNIME is done by dragging the objects from the “node repository” panel to canvas. To connect the objects, user needs to click the object and then click the other object using binding lines. Data stream diagram process structure is made by running the each node separately. The green light on bottom of the node should be on if that node runs without any error. After checking nodes, the next step is configuration set-up and then the model can be run. Note that if the green light on the previous node is not on, the next node cannot run.
                                                      
    
                                                             Figure 5- Data Stream Diagram                                   
 
Visualization

KNIME is one of the richest software comparing with the data mining software literature. In addition to many visualization tools such as scatter plot, parallel coordinates, box plot and histogram, it also provides very detailed Java based visualization tools based on using JFreeChart. 

       
         
         Figure 6- Scatter Plot Graph after                                Figure 7- - Result Table after                             
         running the K-means on KNIME                                running the K-means on KNIME





 

 
 

 

 
 

1 comment:

  1. The article is so appealing. You should read this article before choosing the Big Data Implementation Services you want to learn.

    ReplyDelete