Monday, April 8, 2013

Stream Data


After taking the big data class, we’ve learned fairly cool and useful concepts and skills. This blog will be bringing up a new concept called “stream data”. Yes, as we can image, stream data refers to the data that flows into and out of the system like streams. Normally, stream data is in vast volume, changing dynamically and containing multi-dimensional features. Examples of such data typically include audio and video recording of engineering processes, computer network information flow, web click streams, and satellite data flow. Like every new stuff is expected to come up with new challenges, this kind of data cannot be handled by traditional database systems, and moreover, most systems can only be able to read a data stream in sequential order. Therefore, this poses great challenges to people to find a way of effective mining of stream data.

Till these days, the basis techniques for stream data mining consists of sampling, load shedding and sketching techniques, synopsis data structures and clustering. Progress has been made on efficient approachs for mining frequent patterns in data streams, multidimensional analysis of stream data (such as construction of stream cubes), stream data classification, stream clustering, stream outlier analysis, rare event detection, and so on. The basic idea is to build single-scan algorithms to gather information from stream data in tilted time windows,  imited aggregation, exploring micro-clustering and approximation.

Well, recently the focus of stream pattern analysis has been moved to approximate the frequency counts for infinite stream data. Algorithms have been developed to count frequency using tilted windows based on the fact that users are more interested in the most recent transactions; approximate frequency counting based on previous historical data to calculate the frequent patterns incrementally and track the most frequent k items in the continuously arriving data.

 

Stream data is typically happened in science and engineering applications. It is essential to do stream data mining in these applications and develop application-specific approaches, e.g., real-time anomaly detection in computer network analysis, in electric power grid supervision, in weather modeling, in engineering and security surveillance.
Reference: text book

No comments:

Post a Comment