Big Data has so much potential to shed light on so many concepts, new ideas, and insights that many people have flocked to it as some sort of catch-all. But Big Data can have its problems as well. Mostly when people who are studying Big Data forget that correlation does not always equal causation. For example, a study of Hurricane Sandy-related twitter and FourSquare data (research paper) produced some expected findings. Mostly packed grocery stores the night before the storm hit. This collection of data does not fully represent what occurred over that period though. The majority of the twitter data came from the very populated and higher smart phone ownership area of Manhattan. This would make one think that Manhattan was the center of the area most affected by the storm, but this is not true. As the flood water caused extended power outages this would lead to people's smart phone's batteries dying therefore not allowing them to tweet. This is what happened in some the harder hit areas like Coney Island. This is referred to as a "Signal Problem" where there is no signal coming from certain areas or communities due to particular factors.
Another example of this "Signal Problem" would be with an app used by the City of Boston to fix potholes. The phone app uses accelerometer and GPS data to passively detect potholes around the city. But, if you think about it, this data only provides part of the picture of the potholes around the city. This method will not be able to detect potholes in areas of the city with low smart phone ownership, lower income and areas with a high elderly population. As you can see, Big Data can tell us some much about many of the problems we face today, but we have to remember that it is not the entire picture. We have to remember and consider what areas are being left out of the data and close these gaps.