Sunday, March 3, 2013

SVD, PCA, and Numb3rs


In class, we discussed Principle Component Analysis (PCA) and Singular Value Decomposition (SVD). While in class I understood (at least for the most part) the actual linear algebra that we were doing, it was hard for me to understand what it really had to do with Big Data and Data Mining. I did a little research to understand what exactly these methods are useful for.

SVD:
SVD is applied in engineering, genetics, and physics. This method is important in deriving several things, including approximations of matrices, determining and defining the range, null space, and rank of a certain matrix, which we discussed in class. Singular values can be used to estimate how many components to keep. Because the singular values help us to understand the variance, we chose a benchmark (percentage) variation that must be explainable.

PCA:
PCA is useful for data exploration, visualizing data, compressing data, and outlier detection. PCA is a mathematical process that uses orthogonal transformation that helps change a set of information into a new coordinate system. This method is considered to be able to identify the most important gradients. PCA reduces the dimensionality of the data while keeping most of the information and maintaining as much variation as possible.



I’ve been trying to look information about these two methods up for a while now, and I am still having a little trouble understanding. I would like to be able to make a direct comparison between SVD and PCA, but it seems like even the “experts” in this field have trouble making it clear. If anyone could help me understand these two topics better, I would love to hear what you have to say about it.

In the meantime, I will put both SVD and PCA in the realm of “showing off” the most important data in the set. Like a sculptor preserving what is relevant when creating a sculpture. A sculptor has to decide which details to keep, and which to “chip away at”. Some information is discarded as irrelevant. Slowly, and image appears from a massive block of data. The remaining information will hopefully represent the data to a certain degree.




If you have never watched Numb3rs, I highly recommend it. It is a crime-fighting show that depends on math to figure out the crimes (nerdy, I know). It is on Netflix and so much of it corresponds to things that we all know at least a little bit about. I wanted to do a whole blog post relating different Numb3rs clips to what we have been learning about, but this is the only one that I could find that was semi-relevant right now. Don’t be surprised if you see more clips from the show in another one of my posts.




1 comment:

  1. Brianna,

    Very interesting post. Thank you for making the reference to Numb3rs as well.

    Fadel

    ReplyDelete