KNIME, the Konstanz Information Miner, is an open source data analytics, reporting and
integration platform. KNIME was developed by University of Konstanz Visual Data
Mining research group based on Eclipse Rich Client Platform. KNIME integrates
various components for machine learning and data mining
through its modular data pipelining concept. Since it is based on the Eclipse,
it provides assembly of nodes for data preprocessing (ETL: Extraction, Transformation, Loading)
using a graphical user interface, for modeling and
data analysis and visualization. KNIME is software to allow users to create
their own module providing a software development kit.
KNIME can be used for every operating system supported by
the Eclipse platform. KNIME includes its own JRE (Java Runtime Environment).
Thus, it does not need to be installed in operating. Since 2006, KNIME is used usually in
pharmaceutical research, but is also used in other areas like CRM (customer
data analysis), business intelligence and financial data
analysis.
Data Sources
KNIME can import data from the text files (TXT) or attribute-relation
format files (ARFF), TABLE format files.
The cool thing with KNIME to import data is that it allows user
to define how much data you keep in your memory and how much you keep in your
hard disk. This feature of the KNIME decreases the chances to have over memory problem
working on the large data sets.
Furthermore, it supports importing data using SQL and using Predictive
Model Markup Language which based on XML language.
KNIME, in addition to importing data, it has Data Write
components providing export process.
Data Preprocessing
KNIME does not have any special component for preprocessing but
there are some algorithms can be used for data preprocessing.
Data Mining
Algorithms
KNIME has most algorithms used for data mining literature
such as Support Vector Machines, Bayes and Multidimensional Scaling. In addition
allowing using different advanced algorithms, KNIME also supports to use some
statistical methods such as regression, correlation, and correlation filter on data
streaming design.
Figure 3- KNIME Panel for the selection Figure 4- KNIME
visualization tools
of the data mining algorithm
Data Streaming Design
Designing the objects in KNIME is done by dragging the
objects from the “node repository” panel to canvas. To connect the objects,
user needs to click the object and then click the other object using binding
lines. Data stream diagram process structure is made by running the each node separately.
The green light on bottom of the node should be on if that node runs without any
error. After checking nodes, the next step is configuration set-up and then the
model can be run. Note that if the green light on the previous node is not on,
the next node cannot run.
Figure 5- Data Stream Diagram
Visualization
KNIME is one of the richest software comparing with the data
mining software literature. In addition to many visualization tools such as scatter
plot, parallel coordinates, box plot and histogram, it also provides very detailed
Java based visualization tools based on using JFreeChart.
Figure
6- Scatter Plot Graph after Figure 7- - Result
Table after
running
the K-means on KNIME running the
K-means on KNIME
The article is so appealing. You should read this article before choosing the Big Data Implementation Services you want to learn.
ReplyDelete