Thursday, February 14, 2013

Importance of Big Data

Realizing that I have probably procrastinated enough from blogging, I began hunting for relevant articles. I came across this one article from April 26, 2012 on the Wall Street Journal, “Big Data’s Big Problem: Little Talent”, that caught my eye.

The article puts the whole big data challenge back into perspective. The article touches on the realization of how powerful big data is, but as the title suggests, there isn’t enough talent.  According to an article from 2011, this talent encompasses the expertise of statistics and machine learning, coupled with managers and analysts that can incorporate the insights learned from the data. This gap is data-science.

Hilary Mason, chief scientist behind, says that a data scientist must possess three skills:
1.     Ability to mathematically model and understand the models from the data set
2.     Engineering skills required to actually do number one.
3.     Find the appropriate insights and tell stories from their data.

Asking the right questions and possessing a great understanding of the business is what makes the last skill to be the most challenging. Finding lost nuggets as a data scientist isn’t enough. Turning the nuggets into actions is what is most important, as Donald Rumsfeld would say.

Pat Gelsinger, president and chief operating officer of EMC Corp., helps one realize the scope of big data and its impact. “Thirty years ago we didn’t have computer-science deparments; now ever quality school on the planet has a CS department. Now nobody has a data-science department; in 30 years every school on the planet will have one.”

Seeing how computer science is everywhere now, imagine the world Gelsinger suggests in 30 years. This statement seemed to have rekindled my interest in big data, in the sense of being more proactive. Having this article help one, especially in our field, bring it back to the bigger picture, one that touches on the immaturity and scope of such a new thing, brings back some of the excitement.
I found this article particularly interesting as I seemed to have gone from avoiding reading and blogging, to currently having multiple tabs open on big data waiting for me. I’m hoping this post and article can help others out there as it did me.

  1. The theme behind the “Importance of Big Data” is a very important issue now and in the near future. The skill set needed from a “data scientist” is a unique blend of engineering and computer science backgrounds. GCN* refers to a “data scientist” as a broad term akin to a “doctor”. But, everyone knows there is a great difference in a cardiologist and a neurologist. The network centric world we have been living in required a specific skill set that is different than the data-rich environment we have today and in the future. There are different types of data and there is a need to aggregate this data cross structured and unstructured data repositories. GCN feels that the biggest threat in productivity will not be the lack of useful data but will be the compilation across different lines of business which would include medical/dental history and others. A recent study estimated a shortage of approximately 140,000 – 190,000 jobs, which will require qualified workers with experience in the power of big data by 2018.

    GCN reported that the “Big Data” specialty fields require a wide array of experience with different data sets that include structured data, pre-text data, video and imaging. The growth of Big Data within the academic realm is not enough to sustain the need. Organizations from DoD and public sectors should consider training people internally to help analyze and manage BIG data sets. These on the job trainings would help train the employee for that company’s specific needs tailored for whatever kind of data analytics is needed.