After taking the big data class, we’ve learned fairly
cool and useful concepts and skills. This blog will be bringing up a new
concept called “stream data”. Yes, as we can image, stream data refers to the
data that flows into and out of the system like streams. Normally, stream data
is in vast volume, changing dynamically and containing multi-dimensional
features. Examples of such data typically include audio and video recording of engineering
processes, computer network information flow, web click streams, and satellite
data flow. Like every new stuff is expected to come up with new challenges, this
kind of data cannot be handled by traditional database systems, and moreover,
most systems can only be able to read a data stream in sequential order. Therefore,
this poses great challenges to people to find a way of effective mining of
stream data.
Till these days, the basis techniques for stream data
mining consists of sampling, load shedding and sketching techniques, synopsis data
structures and clustering. Progress has been made on efficient
approachs for mining frequent patterns in data streams, multidimensional analysis
of stream data (such as construction of stream cubes), stream data
classification, stream clustering, stream outlier analysis, rare event
detection, and so on. The basic idea is to build single-scan algorithms to gather
information from stream data in tilted time windows, imited aggregation, exploring micro-clustering and approximation.
Well, recently the focus of stream pattern analysis has
been moved to approximate the frequency counts for infinite stream data.
Algorithms have been developed to count frequency using tilted windows based on
the fact that users are more interested in the most recent transactions;
approximate frequency counting based on previous historical data to calculate
the frequent patterns incrementally and track the most frequent k items in the
continuously arriving data.
Stream
data is typically happened in science and engineering applications. It is essential to do stream data mining in these applications and develop
application-specific approaches, e.g., real-time anomaly detection in computer
network analysis, in electric power grid supervision, in weather modeling, in
engineering and security surveillance.
Reference:
text book
No comments:
Post a Comment