In class, we discussed Principle Component Analysis (PCA) and
Singular Value Decomposition (SVD). While in class I understood (at least for
the most part) the actual linear algebra that we were doing, it was hard for me
to understand what it really had to do with Big Data and Data Mining. I did a
little research to understand what exactly these methods are useful for.
SVD:
SVD is applied in engineering, genetics, and physics. This
method is important in deriving several things, including approximations of
matrices, determining and defining the range, null space, and rank of a certain
matrix, which we discussed in class. Singular values can be used to estimate
how many components to keep. Because the singular values help us to understand
the variance, we chose a benchmark (percentage) variation that must be
explainable.
PCA:
PCA is useful for data exploration, visualizing data,
compressing data, and outlier detection. PCA is a mathematical process that
uses orthogonal transformation that helps change a set of information into a
new coordinate system. This method is considered to be able to identify the
most important gradients. PCA reduces the dimensionality of the data while
keeping most of the information and maintaining as much variation as possible.
I’ve been trying to look information about these two methods
up for a while now, and I am still having a little trouble understanding. I
would like to be able to make a direct comparison between SVD and PCA, but it
seems like even the “experts” in this field have trouble making it clear. If
anyone could help me understand these two topics better, I would love to hear
what you have to say about it.
In the meantime, I will put both SVD and PCA in the realm of
“showing off” the most important data in the set. Like a sculptor preserving
what is relevant when creating a sculpture. A sculptor has to decide which
details to keep, and which to “chip away at”. Some information is discarded as
irrelevant. Slowly, and image appears from a massive block of data. The
remaining information will hopefully represent the data to a certain degree.
If you have never watched Numb3rs, I highly recommend it. It
is a crime-fighting show that depends on math to figure out the crimes (nerdy,
I know). It is on Netflix and so much of it corresponds to things that we all
know at least a little bit about. I wanted to do a whole blog post relating
different Numb3rs clips to what we have been learning about, but this is the
only one that I could find that was semi-relevant right now. Don’t be surprised
if you see more clips from the show in another one of my posts.
Brianna,
ReplyDeleteVery interesting post. Thank you for making the reference to Numb3rs as well.
Fadel