Sometimes you may have to import the data from multiple
resources into your RapidMiner model. One simple way is to import all files one
by one to your model and then process them together, but this method becomes
very tedious when you have to import more than 10 or 20 files. RapidMiner
provides some useful operators which asset you to perform this operation
automatically. In this post, I am going to share you two methods that I found
to import data from multiple Excel files to the RapidMiner.
2-Simple method:
This method helps you in the case that a few numbers of resources
should be imported into the model. The first step is to import all files to the
model, manually. This process can be done by using appropriate “Read”
Operators. In the Operators window go to the Import and then Data folders. You
will see various operators are available to import different types of data into
the model. In this example, I created 4 different Excel files; each contains a
single row which is name and last name. As indicated in figure 1, these files can
be imported to the model by “Read Excel” operators. One should notice that when
this operator is used in the model, no changes in the corresponding Excel file
can be made as long as the model is open.
Figure 1 |
Figure 2 |
Figure 3 |
2-Advanced method
Now, consider a situation that more than 10 input files
should be imported into the model. In this case, the above method becomes a tedious
method which requires spending tremendous amount of time to import the data
manually and even gets worse whenever you want to modify the input resources.
RapidMinear provides us a handy tool to perform this process easily.
Create a new project
and then in the operators windows look for Loop operators. There are various
loop operators available under Loop folder in the Process Control folder. These
Loop operators are used when we need to repeat certain process for a
predetermined or undetermined number of iterations. Loop Files operator is the
best choice for our problem. It iterates its inner operators for a set of input
files in its directory. So, add it to the model and in its properties window in
the Directory box specify the location of the folder contains the input files. If
your input files are not in the same folder, you should remove them to the same
folder. As illustrated in figure **, make sure that Iterate over files checkbox
is checked.
Figure 4 |
Now, double click on the operator icon in the process
windows to enter to the Nested process window. Then add one Read Excel operator
to the Nested process window. Make sure that fil port of the process windows is
connected to the fil port of the operator and its out port is connected to the
out port of the process.
Figure 5 |
Now, use the Up arrow to
back to the main process window and then add an Append operator to the model. Your
model should looks like figure 6.
Figure 6 |
Shahab,
ReplyDeleteGreat post. That should be very helpful.
Fadel
I used the Loop and Append functions, as described by Shahab in his blog. I discovered that the excel files have to be of the same type. By this, I mean they have to have the same number and headings for their columns otherwise you will receive an error. So, what I did was to make multiple copies of the same excel file in order to test out this process. I got it to work just as Shahab described. I believe that this is extremely useful, especially if you have a large number of excel files you are trying to read in. I was more interested in what I found out after I had run the process though. Under the “Plot View”, RapidMiner has a really helpful graphing ability once you have run your process. I took an excel file from one of my other classes and was very easily able to graph the data. It just took some playing around with in the menu, but it wasn’t hard at all. There are a lot of graph types to choose from as well.
ReplyDeleteAs you can see from the above picture, I was able to make a bar graph with the results from my data. I found this post to be very helpful. I can see how entering excel files in one by one can become tedious, and this function will save a lot of time. I would like to see a post about what you can do with the data once you have processed it. Maybe that should be something I should look into…
(I have a picture, but I can't paste it in. Anyone know how to add images to commments?)
I used the Loop and Append functions, as described by Shahab in his blog. I discovered that the excel files have to be of the same type. By this, I mean they have to have the same number and headings for their columns otherwise you will receive an error. So, what I did was to make multiple copies of the same excel file in order to test out this process. I got it to work just as Shahab described. I believe that this is extremely useful, especially if you have a large number of excel files you are trying to read in. I was more interested in what I found out after I had run the process though. Under the “Plot View”, RapidMiner has a really helpful graphing ability once you have run your process. I took an excel file from one of my other classes and was very easily able to graph the data. It just took some playing around with in the menu, but it wasn’t hard at all. There are a lot of graph types to choose from as well.
ReplyDeleteAs you can see from the above picture, I was able to make a bar graph with the results from my data. I found this post to be very helpful. I can see how entering excel files in one by one can become tedious, and this function will save a lot of time. I would like to see a post about what you can do with the data once you have processed it. Maybe that should be something I should look into…
(I have a picture, but I can't paste it in. Anyone know how to add images to commments?)
(This comment is old and hasn't been graded so hopefully reposting it will get it recognized.)
How can I use loop files to tokenize the excel sheets and compare them
ReplyDeleteI really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in RAPIDMINER kindly contact us http://www.maxmunus.com/contact
ReplyDeleteMaxMunus Offer World Class Virtual Instructor led training on RAPIDMINER We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
For Demo Contact us.
Saurabh Srivastava
MaxMunus
E-mail: saurabh@maxmunus.com
Skype id: saurabhmaxmunus
Ph:+91 8553576305 / 080 - 41103383
http://www.maxmunus.com/
Hi, Nice Blog, Thanks for sharing this loop operator to process multiple input Resourses. It's very easy and helpful blog post. Also, I found a Tool to Merge Multiple Excel Files into one files. Check it here: http://bit.ly/synkronizer
ReplyDelete