Analytics and Visualization of Big Data: OutWit Hub: Web-scraping made easy

Tuesday, April 23, 2013

OutWit Hub: Web-scraping made easy

I read a blog earlier this term on web-scraping and decided to check it out. I started with the suggested software, and quickly realized that there are only a few really good tools available for web-scraping and that are supported by Max OS. So, after reading a few reviews, I landed on OutWit Hub.

OutWit Hub has 2 versions: Basic and Pro. The difference is in available tools. In basic, the "words" tools isn't available. This aspect allows you to see the frequency of any word as it occurs on the page you are currently viewing. Several of the scraping tools are offline as well. I've upgraded to Pro, it's only $60 per year and I was curious to see what else it can do.

I'm not a computer scientist, by a long shot, but I have a general grasp on coding and how computers operate. For this reason, I really like OutWit Hub. The tutorials on this site are incredible. They walk you through examples and you can interact with the UI while the tutorial is going. Also, a lot of the tools are pretty intuitive to use. If you're not sold on getting the Pro version, I'd encourage you to visit their website and download the free version just to check out the tutorials. They're really great.

I've used the site for several examples just to test. I needed to get all of the emails off of an organization's website, so instead of copy/pasting everything and praying for the best, I used the "email" feature on OutWit and all of the names and emails of every member on the page populated an exportable table. #boom

Then, I wanted to see if it could be harnessed for Twitter and Facebook. So, using the source-code approach to scraping, I was able to extract text from the loaded parts of my Twitter and Facebook feeds. The problems I encountered were: Not knowing enough about the coding to make the scraper dynamic enough to peruse through unloaded pages, and not knowing how to automate and build a larger dataset (i.e. continuously run the scraper over a set amount of time by continuously reloading the page and harvesting the data. It's possible, I just didn't figure it out).

So, I've videoed a tutorial on how to use OutWit Hub Pro's scraper feature to scrape the loaded part of your Facebook news feed. Below are the written instructions and the video at the bottom gives you the visual.

Essentially, you will:
1.) Launch OutWit Hub (presuming you've downloaded and upgraded to Pro).
2.) Login to your profile on Facebook.
3.) Take note of whatever text you want to capture as a reference point when you go to look in the code. This is assuming you don't know how to read html. For example, if the first person on your news feed says: "Hey check out this video!", then take note of their statement "Hey check out this video!"
4.) Click the "scrapers" item on the left side of the screen.
5.) In the search window, type in the text "Hey check out this video" and observe the indicators in the code that mark the beginning and end of that text.
5.) In the window below the code, click the "New" button.
6.) Type in a name for the scraper
7.) Click the checkbox in row 1 of the window.
8.) Enter a title/description for the information you're collecting in the first column. Using the same example: "Stuff friends say on FB" or "Text". It really only matters if you're going to be extracting other data from the same page and want to keep it separate.
9.) Type in the html code that you indicated as the beginning to the data that you want to extract under the "Marker Before" column.
10.) Repeat step 9 for the next column using the html code that you indicated as the end to the data.
11.) Click "Execute".
12.) Your data is now available for export in several templates - CSV, Excel, SQL, HTML, TXT

Here is a Youtube video example of me using it to extract and display comments made by my Facebook friends that appeared on my news feed.

20 comments:

UnknownApril 24, 2013 at 7:04 PM
Thank you so much for posting this! It is such an awesome tutorial. I have attempted to do webscraping before using RapidMiner (I even was going to post a tutorial about it; assuming I could get it to work), but I was unable to find more than a couple of resources to learn how to do so. My attempt only allowed me to, for example, scrape the first page of search results for a common realtor site. Further investigation into learning how to scrape the remaining results required me to be proficient in Regular Expression, Python, or both (YUCK). OutWit Hub looks like a great option for those like me that may not be necessarily proficient in a particular programming language (like HTML), but has a GUI that allows us to figure it out rather easily on our own and get our data!
ReplyDelete
Replies
UnknownOctober 14, 2013 at 1:46 PM
I have a site where I have:
- A listing page: This page is like a category page in a directory. This page has links to products
- Products page: I have product specification (a particular thing) on this page.

Listing pages have pagination. So there are about 500 listing pages and 20000 product pages.

I am able to get the details manually. But i cannot scroll to 500 pages manually.

How can I do this in Outwit?
ReplyDelete
Replies
UnknownJune 19, 2015 at 2:32 AM
This is nice article. There are many website scraping company using scraping tool which used to scrap data from any website but have some limitation. It acquires data from all websites including the ones with complex extraction routines and those using AJAX and JavaScript. For More information visit: http://www.loginworks.com/blogs/web-scraping-blogs/why-is-a-custom-web-scraping-service-better-than-scrapercrawler-tools/ and
ReplyDelete
Replies
UnknownJuly 14, 2015 at 7:29 AM
Web Scraping is a technique for scrap data for any website. There are many tools are available for web scraping which provides data like as import.io, cloudscrap, 80legs etc. But these have some limitation regarding the quantity and format.

There are some Website Scraping Company which provides Custom Web Scraping Service.
Grepsr
Promptcloud
habiledata
ReplyDelete
Replies
UnknownAugust 21, 2015 at 12:26 AM
Several of the scraping tools are offline as well. I've upgraded to Pro, it's only $60 per year and I was curious to see what else it can do.
scrape a website
ReplyDelete
Replies
UnknownNovember 14, 2015 at 7:29 AM
I have built custom web scraper for my need using dotnet technology.I also scrape using php,python etc.Here is my website to look at : http://prowebscraping.com
ReplyDelete
Replies
MacrosoftNovember 8, 2016 at 2:44 AM
Good information about web scrapping.

Big Data Analytics Services
Big Data Services
ReplyDelete
Replies
UnknownAugust 9, 2018 at 3:59 AM
This is really very informative post. Thanks for sharing such a useful knowledge.
web scraping services
ReplyDelete
Replies
BotScraperFebruary 6, 2019 at 5:38 AM
In addition, extracting data with the help of scraping software is not a piece of cake for everyone. You need to get yourself trained before using the software, since it is complex to use.
scraper bot
data extraction services
web crawling services
web scraping services
ReplyDelete
Replies
SmithMay 6, 2019 at 5:07 AM
Best information about software.Thanks for sharing such great information. hope you keep sharing such kind of information Web Data Extractor
ReplyDelete
Replies
StatsworkJune 4, 2019 at 11:55 PM
This comment has been removed by the author.
ReplyDelete
Replies
StatsworkJune 4, 2019 at 11:57 PM
Hello, You have posted such precious and informative article which gave me lot of information. I hope that you will keep it up and we will have more informative and helping news from you. Thanks Data mining services
ReplyDelete
Replies
UnknownJune 13, 2019 at 5:24 AM
Very informative (CLICK HERE)
Data Mining software

Data Mining Service Providers in Bangalore
ReplyDelete
Replies
PoLSeptember 10, 2019 at 11:47 AM
With TemplateMonster's PowerPoint templates for sale your presentation will look professional without having to spend time on design.
ReplyDelete
Replies
NawazzNovember 26, 2020 at 3:39 AM
If Your are not ediot then try this app Mining Inc. Mod Apk
ReplyDelete
Replies
AnonymousMarch 6, 2021 at 12:55 AM
5. Hello! Great article and thank You for Providing Such a Unique and valuable information on The datamam for your readers. I really appreciate it. You can also visit Best Web scraping services provider for more datamam related information and knowledge.
ReplyDelete
Replies
huawei matepad pro 12 6October 1, 2021 at 5:10 AM
sous traitance informatique I am typically to blogging and i actually recognize your content. The article has actually peaks my interest. I am going to bookmark your web site and maintain checking for brand new information.
ReplyDelete
Replies
RohitAugust 17, 2022 at 3:42 AM
Wow nice very informatic article thank you for sharing the valuable content
ReplyDelete
Replies
Datatera.aiJanuary 23, 2025 at 3:38 AM
This comment has been removed by the author.
ReplyDelete
Replies
Datatera.aiJanuary 23, 2025 at 3:42 AM
While Google Spreadsheets and XPath are great for basic web scraping, Datatera.ai offers a more robust solution for transforming data from various formats—like web pages, PDFs, JSON, and images—into structured, analysis-ready records. No coding or complex formulas are needed. Simplify your data workflows with Datatera.ai.
ReplyDelete
Replies

Add comment