Data
driven approach has been widely used for studying the trend of customer or
marked behavior in industrial sectors, finance, retail and services. Recently mining
data warehouse has caught up attention in biotechnological sector because of
rapid expansion of genomics based data. Increase in biologics manufacturing
also present an area of data mining that is yet to be explored.
Today’s
manufacturing facilities are advanced, highly automated in their operation and
data acquisition. Thousands of process parameters are constantly acquired and
stored electronically. Fluctuations in the process productivity and product
quality invariably occur in the process of productivity. Understanding the root
cause of these abnormalities and increasing the process robustness will have
major economic implications for the product. Mining bio – process data to
identify parameters which may cause process fluctuations possesses lot of
potential for enhancing the productivity and process efficiency.
Many
techniques to explore bio – process data are employed from past studies.
Principal component analysis (PCA), partial least squares (PLS) and
unsupervised clustering have been proposed to analyze and monitor bio –
processes. A decision tree based classification approach was proposed to
identify the process trends that best differentiate runs with the high and low
productivity. Artificial neural network (ANN) is also a popular tool used to
model the non – linear interactions on the temporal process data. Despite these
attempts mining huge volumes of production scale process data and on line
implementation of such schemes remain tedious.
Bio
process data sets are unique the frequency of measurement varies with respect
to the parameters. In addition to temporal measurements of viability, cell
densities, consumption and production rates of nutrients and metabolites large
amount of process parameters are commonly recorded. The complexities associated
with the vast and unique characteristics of bio process data present
substantial challenges as well as opportunities for the data mining process. The
data mining steps involves application of descriptive and predictive pattern
recognition methods to discover significant changes in the data. Identified
models can be interpreted by process experts to gain further insights for
process improvement.
Support
vector machines (SVM) are class of predictive machine learning algorithms which
run on Vapnik Chervonekis theory based on structural risk minimization (SRM). Support vector machines identify a linear
decision boundary that separates objects form the 2 classes with maximum
distance called margin. The object is described by set of features, non –
linear support vendor machines can be constructed my kernel transformation
functions.
This model-based data
mining is an important step forward in establishing a process data driven
knowledge discovery in bio - processes. Implementation of this methodology on
the manufacturing floor can facilitate a real time decision making process and
hence improve the robustness of large scale bio processes.
Reference - Mining manufacturing
data for discovery of high productivity process characteristics.
This comment has been removed by a blog administrator.
ReplyDelete