As of the end of 2012, companies have spent $4.3 billion on Big Data technologies. The article claims that an estimated $232 billion will be spent over the next five years on these technologies.
There are over 250,000 viable open source solutions available. TechCrunch did the researching and testing and has presented the newest class of tools that people should not look over.
Storm and KAFKA
Storm is a “distributed real-time computation system.” It does for real-time processing what Hadoop did for batch processing. Kafka serves as the foundation for activity stream and the data processing pipeline behind it. Together, one gets the stream in real time at a linear scale.
Benefits of pairing the two:
- Handles velocities of tens of thousands of messages per second
- Superior approach to ETL and data integration
- Great in-memory analytics and real-time decision support
Drill and Dremel
Drill is the open source version of what Google has done with their Dremel. They large-scale, ad-hoc querying of data possible, much like Hadoop. Data scientists are speculating that Drill and Dremel may actually be better than Hadoop in the wider sense; replacement even.
R is an open source statistical programming language that is quickly becoming the new standard for statistics. With their strong community and daily innovation, it has become one of the best places to be in Big Data currently. Pairing it with Hadoop is a wonderful way to future-proof your Big Data program.
Gremlin and Giraph
Gremlin and Giraph , paired with graph databases, empower graph analysis, which allows a different approach from a relational approach. Gremlin and Giraph are open source alternatives to Google’s Pregel.
The article mentions SAP HANA as well, but upon further investigation, it isn’t a true open source solution.