Thursday, March 14, 2013

Titanic Competition using Minitab



Titanic Competition using Minitab by: Sam Green, Jay Long, Julian Olander, and Justin Willette

The Titanic competition called for participants to determine which people survived the sinking of the Titanic by analyzing trends in the associated data.  This data included factors such as age, sex, ticket price, name, etc.  By utilizing Minitab, a statistical software package, our team was able to predict who survived with an accuracy of approximately 77%.  The above video walks through our process.

1 comment:

  1. Hello all,

    Thank you for providing the details regarding your visualization. There are a couple of issues that I want you to think about:

    1- Over-fitting: It seems that you have developed the model based on all the available data. It might be best to partition the data into two sets: a) training set- to develop the model and b) a test set to evaluate how well the model works. Typically, 80% of the data is used to develop the model and 20% is used for testing.

    2- Is there a way to look into where the error comes from? This may have two implications: a) assuming that your model suggests that everyone will live, then your accuracy rate will be equal to the survival rate. Obviously, this is an aggregated example, but it would be interesting to see if there are certain patterns in your error, b) you may want to report your results in a matrix form (% Correctly picked to survive, % correctly picked to die, % incorrectly picked to survive and % incorrectly picked to die). see http://www2.cs.uregina.ca/~dbd/cs831/notes/confusion_matrix/confusion_matrix.html for more details on this concept.

    Overall, a great explanation and I really like how your analysis of why the model makes sense. The above points are for you to think about in future work.

    Best,
    Fadel

    ReplyDelete