Monday, April 8, 2013

Comment on "Use OpenHeatMap to introduce the world to your data" from April 5th



Once again Mr. Shaw has found a simple yet useful program on the internet. OpenHeatMap is free software that can be used to express your data in a visually striking way. If you have data that is geographically dependent, such as unemployment or election results, this is a great program. It will display your data on an extremely detailed world map that is capable of being zoomed. This program may even be used on data that is not necessarily geographically dependent, but may show interesting patterns in the data, such as where it comes from.
I decided to gather the top 100 football recruits based on Rivals.com rankings for the past three years. I wanted to create a map that would show where these recruits came from. While creating this visualization I discovered a few helpful tricks and tools. First I discovered that besides location, this program can use size and color of the markers to display secondary information. I was able to display the players overall rank as dot size in the visualization. The default set the higher ranked players (ie 95th or 99th) as the largest dots. In reality, the lower ranked players (ie 3rd or 5th) are the best players in the country. I wanted to map to express this idea. I found that you can reverse the ordering in the chart editing section so that the rank goes from 100 to 1 instead of 1 to 100. This fixed the problem. Another great feature is the ability of the program to show changes over time. I have included a map of each year below.


 











There were a few key insights that I found from my visualization. Most of the recruits are found in the southeast, west coast, and mid-Atlantic regions of the United States. The best recruits were spread out pretty evenly throughout these three main regions. The year did not have a significant impact on the geographical distribution of the recruits.  
There are a couple more useful features in this program that I have not mentioned. One nice addition is that you can mouse over each of the points in the map and all the information on that data point will be displayed. Another is that clicking the play button on the lower left corner of the display will run your data chronologically. At first the data was displayed too quickly, but I was able to adjust the speed and slow it down. After this, the visualization ran perfectly.
With more time, more years of data can be gathered and a better understanding of the data can be had. If there was a dynamic shift in location of recruits, this could mean a shift in the general population. Being able to see this shift in a visualization created with a program such as this would be very powerful.

Big Data on a Smart Grid



Big Data could lower your power bill?








                As shown throughout this blog, Big Data Analytics is the future for a lot of various frameworks of our society. With Healthcare, Space Exploration, and Social Media leading the way with the advancement of the software behind big data analytics, this will no doubt allow other areas of focus to reap the benefits as well. The next market for big data to impact is the home energy market. Opower, a privately held Software-as-a-Service company, helps promote energy efficiency by teaming with utility providers around the world. Opower is known for its optimization reports on home energy consumption. They are able to mail, email, and send SMS’s of these reports to their 15 million plus residential customers. However, big data demands for the smart grid (https://www.greentechmedia.com/articles/read/versant-nosql-and-the-smart-grid-big-data-challenge) are driving Opower to develop a new software platform, which is built on the open-source big data tool Apache Hadoop (http://www.greentechmedia.com/articles/read/opower-takes-on-big-data-for-home-energy). This platform will support mobile phones, smart thermostats (http://www.greentechmedia.com/articles/read/opower-adds-utility-customers-tests-smart-thermostats) and other built in-home devices.

                Opower now can analyze existing home energy data, incoming smart meter data, consumer behavior data, weather data and lots of other disparate pieces of information. After compiling these data sets, they offer the present homeowner with efficiency tips, utility rebate offers, and other suggestions. Opower has announced a partnership with Cloudera (http://finance.yahoo.com/news/cloudera-energizes-opowers-big-data-165700779.html ), a Hadoop software and services provider. This partnership will allow Opower to increase the number of analytics it can provide to the customer. Opower will now be able to compare thousands of different homes’ smart meter reads to find tiny fluctuations that could indicate when a particular home is over-heating or over-cooling at certain thermostat set-points. Other smart grid systems have been using big data to build its data platforms and better satisfy the customer. Ex: https://www.greentechmedia.com/articles/read/dell-and-osisoft-build-smart-grid-platform-for-synchrophasor-big-data , http://news.cnet.com/8301-13846_3-10393259-62.html , https://www.greentechmedia.com/articles/read/autogrid-universal-big-data-plus-apps-platform-for-the-smart-grid . Opower understands the potential of big data in the home energy arena (https://www.greentechmedia.com/articles/read/10-trends-to-watch-in-the-soft-grid ), and has raised ~ $65 million to date.

                Large-scale enterprise software-as-a-service is becoming a reality for energy suppliers across the country. Karen Austin, SVP and chief information officer at Pacific Gas & Electric, talked about investments in fault location isolation and service restoration (FLISR) which have helped reduce outages by 70 percent in some areas (http://www.greentechmedia.com/articles/read/flisr-when-an-hour-outage-becomes-two-minutes/ ). FLISR applications can utilize decentralized, substation, or control center intelligence to locate, isolate, reconfigure, and restore power to healthy sections of a circuit (http://www.greentechmedia.com/articles/read/flisr-of-the-future-tiering-reliability-to-meet-consumer-needs/). With utilities investing a significant amount of capital into infrastructure, communications, and software for the distribution grid, GTM Research publishes an in-depth analysis on the requirements, technologies and strategies that are ushering in the new age of distribution automation (DA) (http://www.greentechmedia.com/articles/read/flisr-of-the-future-tiering-reliability-to-meet-consumer-needs/ ).

                As more energy companies and businesses invest into big data analytics, this should help the consumer to become more aware of the inefficiencies and wastes associated with our energy consumption in our ever day lives. As we begin to use less energy and become more energy efficient, this will help drive down the rates of all of our energy options.



Stream Data


After taking the big data class, we’ve learned fairly cool and useful concepts and skills. This blog will be bringing up a new concept called “stream data”. Yes, as we can image, stream data refers to the data that flows into and out of the system like streams. Normally, stream data is in vast volume, changing dynamically and containing multi-dimensional features. Examples of such data typically include audio and video recording of engineering processes, computer network information flow, web click streams, and satellite data flow. Like every new stuff is expected to come up with new challenges, this kind of data cannot be handled by traditional database systems, and moreover, most systems can only be able to read a data stream in sequential order. Therefore, this poses great challenges to people to find a way of effective mining of stream data.

Till these days, the basis techniques for stream data mining consists of sampling, load shedding and sketching techniques, synopsis data structures and clustering. Progress has been made on efficient approachs for mining frequent patterns in data streams, multidimensional analysis of stream data (such as construction of stream cubes), stream data classification, stream clustering, stream outlier analysis, rare event detection, and so on. The basic idea is to build single-scan algorithms to gather information from stream data in tilted time windows,  imited aggregation, exploring micro-clustering and approximation.

Well, recently the focus of stream pattern analysis has been moved to approximate the frequency counts for infinite stream data. Algorithms have been developed to count frequency using tilted windows based on the fact that users are more interested in the most recent transactions; approximate frequency counting based on previous historical data to calculate the frequent patterns incrementally and track the most frequent k items in the continuously arriving data.

 

Stream data is typically happened in science and engineering applications. It is essential to do stream data mining in these applications and develop application-specific approaches, e.g., real-time anomaly detection in computer network analysis, in electric power grid supervision, in weather modeling, in engineering and security surveillance.
Reference: text book