Decision Trees are useful techniques for classification, prediction and fitting data. In this post I demonstrate how to build a basic decision tree model in RapidMiner.
At first you need to make sure that your data only contains attribute and label types which are allowed in Decision Tree operator. As you can see in the below figure, the Decision Tree operator just accepts Polynomial, Numerical and Binomial attributes and Binomial and Polynomial labels (target attributes). So, if your target data is a numeric variable you may modify it to the accepted type by categorizing it into several intervals and then defining dummy binomial attributes for each interval. I explained this process in my previous post.
Once you prepared your data based on the allowable attributes and labels, you are ready to build the model. Add a Read Excel operator and import your training data set to this operator and then use a Set Role operator to set the target attribute role to Label and then add a Decision Tree operator. Connect these operators to each other in the order that you added them to the model. Your model should looks like the figure below.
The Apply model does not accept the data set which has the label attribute. So if the test data set contains the target attribute, you should eliminate this attribute and let the RapidMiner to fill out it by itself. As an example, I built a model based on a data set which contains 5 numeric regular attributes and a target binomial label which has two values Min and Max. The following figures show the results.
As you see, RapidMiner has created three attributes which are distinguished by pink color in Meta Data View window. Because my target label has two possible outputs, RapidMiner created an attribute for every outputs and calculated their occurrence probabilities for all instances. In the third created attribute, RapidMiner predicts the output for each instance based on the output probabilities. The output with the highest probability is the most likely occurring event, so it is reported as a prediction for that instance. Furthermore in the tab Tree, you can see the generated decision tree for your problem and analyze it.
In my model, “3hr sum” and “month sum” attributes are the most affecting attributes in the model, respectively. In the Text view, you can see the tree summary and also the branches confidences.