This week I will continue to share my research with my classmates.
As I mentioned we focus on predicting how many years can a specific person live with the donated new heart.
In order to solve this problem , the first and the main problem is to determine the factors ( variables) affecting the result.
Conventionally, researchers have been dealing with small sets of dataset with using conventional statistical techniques which does not take collinearity and the nonlinearity into account, as it was discusses in the previous blog. They use some non-parametrical and non-statistical techniques that are computationally expensive and need prior knowledge about the data .
The biggest advantage of todays world is there is a flood of big data in the health informatics that can be dealt with data mining techniques, which reveal better and more accurate solutions for the survival of organ transplant recipients than any of the conventional methods used by previous studies.
We had started to do the research by obtaining a very large dataset from UNOS, which is a tax-exempt, medical, scientific, and educational organization that operates the national Organ Procurement and Transplantation Network.The obtained dataset has 443 variables and 43000 cases which belong to the Heart Transplant Operations. These variables include the socio-demographic and health-related factors of both the donor and the recipients. There are also procedure-related factors among the dataset.
After preprocessing the data ( cleaning, dealing with the missing values, reorganizing the data for the specific studies etc), we used variable selection methods in order to determine the potential predictive factors.
These potential predictive factors are the ones which have questioned whether they are predictive or not by using some data mining algorithms such as Support Vector Machines, Decision Trees and Artificial Neural Networks.
After doing cross-tabulation and doing sensitivity analysis , we observed that all of these three methods gave pretty satisfactory results.
For 3 Years survival study, Support Vector Machines gave the best prediction rate by predicting 94.43 % of the cases correctly, while artificial Neural Network 81.18 % and Decision Trees 77.65 % of them correctly.
What do these results mean ?
For support vector machine, the accuracy rate is 94.43 % , which means if Support Vector Machine is telling us that a specific person will live or die if he/she gets the donated organ , it is 94.43 % correct.But it has 6.57 % of chance to fail to predict.
It also lets us know which factors are playing a role to predict these results.
These results are pretty high results which have not been reached by using the conventional statistical techniques which is pretty promising for the future success of the heart transplants in the future.