Sunday, March 24, 2013

Open Source Solutions

As of the end of 2012, companies have spent $4.3 billion on Big Data technologies.  The article claims that an estimated $232 billion will be spent over the next five years on these technologies.
There are over 250,000 viable open source solutions available. TechCrunch did the researching and testing and has presented the newest class of tools that people should not look over.

Storm and KAFKA
Storm is a “distributed real-time computation system.” It does for real-time processing what Hadoop did for batch processing. Kafka serves as the foundation for activity stream and the data processing pipeline behind it. Together, one gets the stream in real time at a linear scale.

Benefits of pairing the two:
  • Handles velocities of tens of thousands of messages per second
  • Superior approach to ETL and data integration
  • Great in-memory analytics and real-time decision support

Drill and Dremel
Drill is the open source version of what Google has done with their Dremel. They large-scale, ad-hoc querying of data possible, much like Hadoop.  Data scientists are speculating that Drill and Dremel may actually be better than Hadoop in the wider sense; replacement even.

R is an open source statistical programming language that is quickly becoming the new standard for statistics. With their strong community and daily innovation, it has become one of the best places to be in Big Data currently. Pairing it with Hadoop is a wonderful way to future-proof your Big Data program.

Gremlin and Giraph
Gremlin and Giraph , paired with graph databases, empower graph analysis, which allows a different approach from a relational approach. Gremlin and Giraph are open source alternatives to Google’s Pregel.

The article mentions SAP HANA as well, but upon further investigation, it isn’t a true open source solution.



  1. There is no doubt that open source solutions for big data analysis are powerful alternatives to closed source software and can save companies quite a bit of money. However in certain fields where security is a priority, any software used by employees must undergo rigorous security certifications.
    Pro Open Source:
    A major benefit of open source software is that the code available for anyone to examine, and therefore, the increase in the number of eyes examining it can lead to potential security issues being exposed earlier. Most software security professionals will tell you that the worst kind of security loophole is one that a hacker knows about but the IT people do not. The increased visibility of the code and higher number of code editors can potentially decrease the time it takes to patch the software. This narrows the window considerably in which hackers can take advantage of any flaws in the program.
    Pro-Closed Source:
    As opposed to open source software, the code for closed source software is rarely seen by its consumers. This can have both positive and negative side effects. On one hand, the fact that the code is hidden from the public can delay potential loopholes from being discovered. However, on the other hand, there are fewer “honest” eyes examining the code. This can increase the chance of security issues going unnoticed. To counteract this problem, large companies that produce closed source often have thorough and documented quality processes as well the cash to back them up.
    Both open and closed source have their pros and cons. Open source code can potentially have many more programmers troubleshooting and improving code, while closed source software can pay for the support of highly talented programmers and bring a trusted reputation to security minded customers.

  2. Alex and Sam,

    Brianna posted this article a few days earlier. Can you please highlight the value added in terms of this post? I know you are not addressing the same software necessarily, but the idea is similar. It would be very interesting to see more discussion in this area.