Definition of a Cluster: A cluster is a
set (2+) server nodes dedicated to keep application services alive,
communicating through the cluster software/framework with eachother, test and
probe health status of servernodes/services and with quorum based decisions and
with switchover/failover techniques keep the application services running on
them available. That is, should a node that runs a service unexpectedly lose
functionality/connection, the other ones would take over the and run the
services, so that availability is guaranteed. To provide
availability while strictly sticking to a consistent cluster configuration is
the main goal of a cluster.
At this point we have to add that this defines a
HA-cluster, a High-Availability cluster, where the clusternodes are planned to
run the services in an active-standby, or failover fashion. An
example could be a single instance database. Some applications can be run in a
distributed or scalable fashion. In the latter case
instances of the application run actively on separate clusternodes serving
servicerequests simultaneously. An example for this version could be a
webserver that forwards connection requests to many backend servers in a
round-robin way. Or a database running in active-active RAC setup.
Now, what is a cluster made of? Servers, right.
These servers (the clusternodes) need to communicate. This of course happens
over the network, usually over dedicated network interfaces interconnecting all
the clusternodes. These connection are called interconnects.
How many clusternodes are in a cluster? There are different cluster topologies. The most simple one is a clustered pair topology, involving only two clusternodes:
How many clusternodes are in a cluster? There are different cluster topologies. The most simple one is a clustered pair topology, involving only two clusternodes:
There
are several more topologies, clicking the image above will take you to the
relevant documentation.
Also, to answer the question Solaris Cluster allows you to run up to 16 servers in a cluster.
Also, to answer the question Solaris Cluster allows you to run up to 16 servers in a cluster.
Where shall these clusternodes be
placed? A very important question. The right answer is: It depends on what you
plan to achieve with the cluster. Do you plan to avoid only a server outage?
Then you can place them right next to eachother in the datacenter. Do you need
to avoid DataCenter outage? In that case of course you should place them at
least in different fire zones. Or in two geographically distant DataCenters to
avoid disasters like floods, large-scale fires or power outages. We call this
a stretched- or campus cluster, the
clusternodes being several kilometers away from eachother. To cover really
large distances, you probably need to move to a GeoCluster, which is a
different kind of animal.
There are a number of problems with
clustering. Among them:
- current
clustering techniques do not address all the requirements adequately (and
concurrently);
- dealing
with large number of dimensions and large number of data items can be
problematic because of time complexity;
- the
effectiveness of the method depends on the definition of “distance” (for
distance-based clustering);
- if an obvious distance measure
doesn’t exist we must “define” it, which is not always easy, especially in
multi-dimensional spaces;
- the result of the clustering algorithm (that in many cases can be arbitrary itself) can be interpreted in different ways.
References:
I was just working on a paper that I found out about a java based software for mining the data of literature. "citespace" clusters the literature based on their keyword, Authors, Citations and ... .
ReplyDeletethe software is java based so there is no setup required and can work on almost anything. if you are working on a paper I really suggest that you use citespace for reviewing the literature available in that field.