Thursday, February 21, 2013

Problems Associated with Moving Big Data….Ship it Overnight?



In the world of modern communications and inter-connectivity, the amount of data that is collected and store has grown dramatically. In recent years the cost of large scale computer memory has fallen dramatically and as such, companies are more open to storing large amounts of data for longer periods of time. Additionally, it has become big business to store data on users and then sale the information in aggregate to companies that then find trends in the data for marketing and sales organizations. 

The following information was published by the company DOMO, which specializes in data mining or what they and many in their industry calls “business intelligence”. Each minute of every day:

  • ·         YouTube will see 48 hours of new video uploaded

  • ·         Google will process over 2,000,000 search queries

  • ·         Instagram users will share 3,600 new photos

These figures all involve user’s generated or social data. But big businesses like corporations and banks will generate large amounts of data on things like transactions, emails, and employee records. Much of this data must be kept for long periods of time for regulatory compliance. For these reasons, businesses now haft to think about data in terms of terabytes and not just in gigabytes. 

The problem with keeping and maintaining large stock piles of data is the movement and availability of the data. A business that needs to move 1TB of data between locations could tie up large amounts of their internet access if they tried to move the files online. Let’s say that they have access to a T-1 line, it would take 82 days to transmit the information. If they had access to a T-3 line then it would take 3 days to transfer the files. 

The answer to this problem is simple: don’t try to move the files over network connections. Instead ship the data overnight. It is easier to connect a hard drive to a computer, transfer the files to the drive at the upload rate of the drive (which is generally very fast), then ship the drive to the destination you want the files at. Plug it into the destination computer and upload. 

Banks have been using this method for years because of concerns of moving large amount of data over vulnerable network communications. They believe that it is easier to encrypt a drive, send the drive by secured Currier and reduce the chance that the information will be stolen.

This method has gained a lot of traction. Amazon Web Services will now allow you to use this method upload information to their servers. Amazon actually recommends that most S3 users consider using this option when users need to upload 100GB or more and have only a T-1 connection.

References:



http://www.domo.com/company

3 comments:

  1. Joshua,

    This is very interesting. I actually think that this might be a very good business model for a company. Imagine a branch of UPS or Fedex that specializes in this area.

    It is also important to consider how the data will be crypted such that if the transfer is intercepted no data is made available --> using of hash functions and other methods in data security would be taken to a new paradigm.

    Fadel

    ReplyDelete
  2. Another challenge associated with big data is managing it, storing it and providing access to it all in a timely manner.Every company, every individual is continually generating new data that must be saved. The problem is, the cost of storage isn't going down as fast as we'd hoped. Solid state drives are faster in terms of processing speed but they too cost too much. A cheaper alternative would be to use hybrid drives in order to save money with in a data center, but even this hardware has limited capabilities. Other companies may move to outsource data storage to cloud providers and such but even that approach has its limitations. Fact is, SDDs and HDDs have limits to what they can accomplish. Infrastructure will just have to keep up with the ever increasing demand big data poses. But until we create some new technology to address the problems of big data entirely, storage and analysis will continue to be a challenge.

    ReplyDelete
  3. Reference: http://www.colocationamerica.com/blog/dealing-with-big-data-can-we-keep-up.htm

    ReplyDelete