Saturday, February 16, 2013

Using Loop Operator to process multiple input resources in RapidMiner


Sometimes you may have to import the data from multiple resources into your RapidMiner model. One simple way is to import all files one by one to your model and then process them together, but this method becomes very tedious when you have to import more than 10 or 20 files. RapidMiner provides some useful operators which asset you to perform this operation automatically. In this post, I am going to share you two methods that I found to import data from multiple Excel files to the RapidMiner.

2-Simple method:
This method helps you in the case that a few numbers of resources should be imported into the model. The first step is to import all files to the model, manually. This process can be done by using appropriate “Read” Operators. In the Operators window go to the Import and then Data folders. You will see various operators are available to import different types of data into the model. In this example, I created 4 different Excel files; each contains a single row which is name and last name. As indicated in figure 1, these files can be imported to the model by “Read Excel” operators. One should notice that when this operator is used in the model, no changes in the corresponding Excel file can be made as long as the model is open. 
Figure 1
Now, in Operators window look for Append operator. This operator gets various files as input, merges them together and generates a single output table which contains all input tables. Connect all Read Excel operators to the Append operator as illustrated in figure 2.
Figure 2
Figure 3 represents the result of running this model. Each row of the output table corresponds to one input file. Since in this example, the 4 input files just consist of a single row of a name and family, for sake of simplicity, the out output consists of 4 rows.
Figure 3

2-Advanced method
Now, consider a situation that more than 10 input files should be imported into the model. In this case, the above method becomes a tedious method which requires spending tremendous amount of time to import the data manually and even gets worse whenever you want to modify the input resources. RapidMinear provides us a handy tool to perform this process easily.
 Create a new project and then in the operators windows look for Loop operators. There are various loop operators available under Loop folder in the Process Control folder. These Loop operators are used when we need to repeat certain process for a predetermined or undetermined number of iterations. Loop Files operator is the best choice for our problem. It iterates its inner operators for a set of input files in its directory. So, add it to the model and in its properties window in the Directory box specify the location of the folder contains the input files. If your input files are not in the same folder, you should remove them to the same folder. As illustrated in figure **, make sure that Iterate over files checkbox is checked.
Figure 4
Now, double click on the operator icon in the process windows to enter to the Nested process window. Then add one Read Excel operator to the Nested process window. Make sure that fil port of the process windows is connected to the fil port of the operator and its out port is connected to the out port of the process.
Figure 5
Now, use the Up arrow to back to the main process window and then add an Append operator to the model. Your model should looks like figure 6.
Figure 6



6 comments:

  1. Shahab,

    Great post. That should be very helpful.

    Fadel

    ReplyDelete
  2. I used the Loop and Append functions, as described by Shahab in his blog. I discovered that the excel files have to be of the same type. By this, I mean they have to have the same number and headings for their columns otherwise you will receive an error. So, what I did was to make multiple copies of the same excel file in order to test out this process. I got it to work just as Shahab described. I believe that this is extremely useful, especially if you have a large number of excel files you are trying to read in. I was more interested in what I found out after I had run the process though. Under the “Plot View”, RapidMiner has a really helpful graphing ability once you have run your process. I took an excel file from one of my other classes and was very easily able to graph the data. It just took some playing around with in the menu, but it wasn’t hard at all. There are a lot of graph types to choose from as well.

    As you can see from the above picture, I was able to make a bar graph with the results from my data. I found this post to be very helpful. I can see how entering excel files in one by one can become tedious, and this function will save a lot of time. I would like to see a post about what you can do with the data once you have processed it. Maybe that should be something I should look into…

    (I have a picture, but I can't paste it in. Anyone know how to add images to commments?)

    ReplyDelete
  3. I used the Loop and Append functions, as described by Shahab in his blog. I discovered that the excel files have to be of the same type. By this, I mean they have to have the same number and headings for their columns otherwise you will receive an error. So, what I did was to make multiple copies of the same excel file in order to test out this process. I got it to work just as Shahab described. I believe that this is extremely useful, especially if you have a large number of excel files you are trying to read in. I was more interested in what I found out after I had run the process though. Under the “Plot View”, RapidMiner has a really helpful graphing ability once you have run your process. I took an excel file from one of my other classes and was very easily able to graph the data. It just took some playing around with in the menu, but it wasn’t hard at all. There are a lot of graph types to choose from as well.

    As you can see from the above picture, I was able to make a bar graph with the results from my data. I found this post to be very helpful. I can see how entering excel files in one by one can become tedious, and this function will save a lot of time. I would like to see a post about what you can do with the data once you have processed it. Maybe that should be something I should look into…

    (I have a picture, but I can't paste it in. Anyone know how to add images to commments?)

    (This comment is old and hasn't been graded so hopefully reposting it will get it recognized.)

    ReplyDelete
  4. How can I use loop files to tokenize the excel sheets and compare them

    ReplyDelete
  5. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in RAPIDMINER kindly contact us http://www.maxmunus.com/contact
    MaxMunus Offer World Class Virtual Instructor led training on RAPIDMINER We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Saurabh Srivastava
    MaxMunus
    E-mail: saurabh@maxmunus.com
    Skype id: saurabhmaxmunus
    Ph:+91 8553576305 / 080 - 41103383
    http://www.maxmunus.com/

    ReplyDelete
  6. Hi, Nice Blog, Thanks for sharing this loop operator to process multiple input Resourses. It's very easy and helpful blog post. Also, I found a Tool to Merge Multiple Excel Files into one files. Check it here: http://bit.ly/synkronizer

    ReplyDelete