Friday, April 19, 2013

Rapid Miner Data Aggregation Tutorial

I have used the aggregation function in Rapid Miner many times while working on David and I's fantasy football project. It is very useful for compiling weekly statistics of athletes. Here is a step by step tutorial of how to use it. First open up Rapid Miner and begin a new process. Next import your data. In my case I am importing an Excel sheet that contains weekly NFL QB stats from 2008-2011.

After selecting your mode of import choose the correct sheet or file to import.


Select your sheet and click Next.


The next page (step 3 of import wizard) will ask if you want to make any annotations. This is not necessary for this data so I will go ahead and click next which brings us to step 4 shown above. De-select any columns that you do not want to import. In this case I do not care to see what teams the QBs play for. You MUST make the column you want to sort by ID. You can see in the first column that contains the names was changed from attribute to ID. After you do that click Next and save your data.


Once you get into your main process drag and drop your data onto process area. Then look to the left hand side. Click Data Transformation > Aggregation. Drag and drop the aggregate widget onto the process area. Next connect the out port of the data to the exa port on the left side of the aggregate widget. Then connect the exa port on the right side of the widget to the result port.


After you connect the ports select edit list by aggregation attributes. Here make an entry for each attribute you want to aggregate and select the functions you want to use. After you do this click Ok.


Next click select attribute by group by attributes. Here move your ID column (in this case Name) into the right side by selecting your ID and clicking on the arrow pointing right. Click Ok.


Now just click the play button on the toolbar and you get your results! From here you can export your data as you like or enter plot view or advanced charts. Hope this tutorial was helpful.

No comments:

Post a Comment