Analytics and Visualization of Big Data: Mining data for discovery of high productivity process characteristics.

Data driven approach has been widely used for studying the trend of customer or marked behavior in industrial sectors, finance, retail and services. Recently mining data warehouse has caught up attention in biotechnological sector because of rapid expansion of genomics based data. Increase in biologics manufacturing also present an area of data mining that is yet to be explored.

Today’s manufacturing facilities are advanced, highly automated in their operation and data acquisition. Thousands of process parameters are constantly acquired and stored electronically. Fluctuations in the process productivity and product quality invariably occur in the process of productivity. Understanding the root cause of these abnormalities and increasing the process robustness will have major economic implications for the product. Mining bio – process data to identify parameters which may cause process fluctuations possesses lot of potential for enhancing the productivity and process efficiency.

Many techniques to explore bio – process data are employed from past studies. Principal component analysis (PCA), partial least squares (PLS) and unsupervised clustering have been proposed to analyze and monitor bio – processes. A decision tree based classification approach was proposed to identify the process trends that best differentiate runs with the high and low productivity. Artificial neural network (ANN) is also a popular tool used to model the non – linear interactions on the temporal process data. Despite these attempts mining huge volumes of production scale process data and on line implementation of such schemes remain tedious.

Bio process data sets are unique the frequency of measurement varies with respect to the parameters. In addition to temporal measurements of viability, cell densities, consumption and production rates of nutrients and metabolites large amount of process parameters are commonly recorded. The complexities associated with the vast and unique characteristics of bio process data present substantial challenges as well as opportunities for the data mining process. The data mining steps involves application of descriptive and predictive pattern recognition methods to discover significant changes in the data. Identified models can be interpreted by process experts to gain further insights for process improvement.

Support vector machines (SVM) are class of predictive machine learning algorithms which run on Vapnik Chervonekis theory based on structural risk minimization (SRM). Support vector machines identify a linear decision boundary that separates objects form the 2 classes with maximum distance called margin. The object is described by set of features, non – linear support vendor machines can be constructed my kernel transformation functions.

This model-based data mining is an important step forward in establishing a process data driven knowledge discovery in bio - processes. Implementation of this methodology on the manufacturing floor can facilitate a real time decision making process and hence improve the robustness of large scale bio processes.

Reference - Mining manufacturing data for discovery of high productivity process characteristics.

Analytics and Visualization of Big Data

Thursday, March 28, 2013

Mining data for discovery of high productivity process characteristics.

1 comment: