Thursday, April 11, 2013

Outsourcing Analytics: Reducing Initial Capital Investment



With the rise of big data and analytics, it is often difficult for companies to keep pace in developing big data departments. They want the big data initiative but often don’t have the capital to invest in a new department. What is a company to do at this point? That is the purpose of several new start up companies that offer data mining as a service to their companies. These start up companies, such as Mu Sigma, are designed to take a company’s data, and upon applying data mining principles and strategies, deliver meaningful information to their customer. Mu Sigma specifically has a five step process for their data mining strategy. This process includes :
1.       Input of data
2.       Application of data engineering
3.       Extraction using data sciences
4.       Informatics using decision sciences
5.       Providing information using decision support
(Source 1)
By applying the strategy of outsourcing their data mining, companies are able to reduce their overhead while still gathering meaningful information from their data that they have collected. In an inter connected world, it is often a valuable strategy to explore leveraging other company’s resources and enterprises before starting from scratch in house.
Mu Sigma is not the only data mining as a service start up company, yet it has received the highest level of investment from venture capitalists. Another data mining as service company that has received a high level of outside investment is Opera Solutions. Opera solutions focuses on defining signal hubs for their customers which is where the data is stored to be mined. By gleaning data, Opera Solutions can then offer their processing products to their customers as a subscription service and gain more capital.
Overall, I see the data mining as a service as a leading innovator in the data mining field. By providing an alternative to high investment in IT infrastructure and employees, these companies are able to target the majority of businesses in their target audience.



Sources:
33. http://www.operasolutions.com/solutions-and-services/

Wednesday, April 10, 2013

Startups in Big Data



As access to big data grows, so does the marketplace for startup big data companies. One of the leading trends in enterprise is the development of big data startup companies. In a article published last year on datasciencecentral.com, Richard Snee explored five of the up and coming big data startup companies. When looking through the article, a trend that emerged was the key aspect of visualization. One of the five startups, Visual.ly, focuses on providing meaningful information from the data entered by delivering powerful visualizations. Part of their key objective is to remain true to their core values. Visual.ly promises that:

  •   Data will be accurate and verifiable - Visual.ly will not "lie with statistics."
  • Proper Sourcing & Attribution - Visual.ly will always give credit where due and will do its own reporting.
  • Best Practices in Visual Representation - Visual.ly will not exploit idiosyncrasies of the human visual system to exaggerate or misrepresent data. (Source 1)

By adhering to their core ethical principles, Visual.ly has emerged as a way to promote a brand via social medial. They target agencies, brands, and organizations in an effort to promote and provide credibility to entities through the use of data. By harnessing the power of big data, startup companies such as Visual.ly are able to capitalize on the growing access to big data.

Sources:
1. http://www.datasciencecentral.com/profiles/blogs/5-big-data-startups-that-matter-platfora-datastax-visual-ly-domo-
2. http://visual.ly/


Big Data in Oil and Gas Financials

As someone who is entering the oil and gas industry after graduation, I have a keen interest in how concepts learned in this class could be applied to my future career. I found this article: http://www.ogfj.com/articles/print/volume-9/issue-07/features/increased-complexity-presents.html detailing how oil companies can potentially use Big Data analytics to manage an ever more complex portfolio of assets and projects. The problem is rooted in the new wave of oil and gas boom, mainly the drilling of shale natural gas or oil which is trapped in harder-to-reach areas of the world such as Canadian sands and extreme deep-water areas of the world. These more difficult projects mean that more time, manpower and capital must be invested in projects in order to see a return on that investment. The volume of data integrated oil companies are processing is so vast that a Chevron executive was quoted as saying that his company manages as much data as Google. Considering that Google processes over 20 petabytes of data a day, that makes the challenge facing the oil majors all the more daunting.

The challenge in the financial sector of the projects is in executives being able to make decisions which are proactive rather than reactive decisions about projects. This means being able to reduce the time between capturing data, analyzing it and making a final decision regarding the data. Making it even more difficult is the multitude of factors which need to be analyzed in making any decisions, including price of oil, expected revenue from production, production decline, and costs of any project. The decision must be made after seeing how all of these different factors interact together and with other factors to calculate expected ROI of any decisions made regarding various projects.

Big data could also be used in predicting the time and cost of projects before they are begun. Because of the extreme investment needed (according to the article, a single offshore oil platforms can cost $1 billion to build and begin operating. This necessitates that they be entirely sure of how much return they will get, and by using Monte Carlo simulation along with data they have generated about a location, they can be much more confident in whether a project will be profitable or not. The predictive analysis also was able to tell the company using these tools where and when problems were most likely to occur in the project timeline so that they could plan ahead and anticipate problems rather than reacting when they happened, thus falling behind the project timeline. In the end, being able to accurately predict project timelines also enabled the company to enhance its reputation as a company committed to excellence in their projects by delivering them on their promised timelines and budgets.

Big data: About Your Job



Some companies such as Bank of America want to track their employee’s behavior so they ask their workers to wear tiny sensors. The tiny sensor can track and store workers’ movement. Then they can use the data to analyze workers’ performance.  And now, employers and governments want to use Big Data to predict whether or not an employee is likely to be a successful, effective and efficient employee.

First, I want to talk about the reasons why employee leaving the company. According to Evolv, the leaders in this field, there are two major factors to cause employee leave: The job doesn’t suit an employee’s skill set, or the employee doesn’t fit in the culture. The main problems are 1. The HR hires the wrong guy because the employee is unequipped. 2. The payroll can’t fit the employee’s performance which means lower than expected salary. 3. A bullish supervisor.
Then, what are better employees?

1.   A better employee doesn’t use integrated Internet Explorer coming with the computer. Workers who download Firefox or Chrome "may to reach informed decisions. Does that mean using IE is a worse choice??

2.      Workers who have been out of work for a long time don’t mean that they are not good workers. They tend to stay in their new job as long as the other workers.

3.      A criminal background doesn’t hurt a worker’s performance. Sometimes, it is a good asset in some careers.

4.      Honesty matters a log more than experience. According to Xerox, this company gives workers personality tests. One of the questions is the ability to work with computers. And then, the next question is “What does control-V do on a word-processing program?" If applicants fail the second question and answer “yes” in the first question, then they won’t pass the test.





References:

Twitter Sentiment Analysis of March Madness

With March Madness officially over (and congrats to the Louisville Cardinals on a great season; thanks for winning my bracket pool), I thought I should pass along this article I came across regarding an application of Twitter sentiment analysis in the world of sports. The article at this link: http://www.vertica.com/2013/03/20/a-method-to-the-march-madness/ details how researchers used Twitter data to do sentiment analysis about NCAA basketball. Their model and results were presented recently at the MIT Sloan Sports Conference (which I really want to attend some day) in order to see if Twitter sentiment could predict the level of success of certain teams. The researchers hypothesized that those teams or players with a large number of tweets about them were more likely to be more successful due to the large number of people talking about them. Unfortanately, since the Sloan Conference was held before the actual start of what is commonly know as "March Madness" aka the NCAA Men's Basketball Tournament, the Twitter data used was an approximately one-week sample spanning the end of the regular season and beginning of the conference tournament games.
 
Michigan's Trey Burke, Source: http://isportsweb.com/2011/11/15/michigan-basketball-trey-burke-leads-wolverines-victory/

The researchers limited the tweets they looked at to teams which were ranked in the top 25 of the AP poll at the time the data was collected, as well as top scorers from across college basketball. The researchers were able to data-mine almost 500,000 tweets in the week-long period and they show some interesting results when breaking the sentiment analysis down by team and player. Unsurprisingly, Michigan's Trey Burke was the leader in positive sentiment analysis during the time period. This should not be a ground-breaking revelation because at the conclusion of the season he was named the National Player of the Year by the AP, and anyone who watched him carry Michigan to the national title game the past 2+ weeks knows how good he is. Apparently though, being good means getting a lot of Twitter love. Most of the players heading the largest sentiment graphic were well-known stars, although I did notice an absence of players from smaller schools having big years, such as Creighton's Doug McDermott. I guess when you are not on TV every week the Twitterverse isnt all that interested in you. The other player I found a surprising amount of love for was Kentucky center Nerlens Noel. For those of you who don't know, Noel injured a knee earlier in the year and was not even playing at the time the data was mined, but still got a surprising amount of sentiment. My only guess is that UK fans were collectively whining about how the would have made the tourney had he not been injured.

Sad UK Fan, Source: http://www.crimsoncast.com/2012/03/pre-game-meal-kentucky-wants-revenge/imagescau0mx6c-2/

Data about individual teams is only briefly mentioned in the article, with only a supporting graphic showing the sentiment for the Kansas Jayhawks, but the authors did include a note that traditional powerhouse teams led the way in tweet volume, which was not at all surprising. The final thing that really surprised me was the volume of tweets coming from overseas, particularly London. The United Kingdom is not generally know as a basketball fan country, sticking mostly to soccer, but tweets out of London outpaced many major American cities, which was surprising to say the least. Even more impressive is that most people there are asleep when games are being played over here, meaning the Brits are dedicated enough to check up on happenings around college hoops the next day, then tweet about it retroactively. I got some cool insights out of this article, and it concludes with a challenge to all of us to try and use their HP Vertica platform to generate a model combining Twitter sentiment analysis and statistics to try and predict the winner. Its a little late to try it out until next year, but I will be keeping a close eye on their blog to see if anyone gave it their best shot.

Tuesday, April 9, 2013

How Big Data is helping people find jobs.

One of the major areas of concern in our country today is the national economy and the current unemployment rate.  Well, Big Data may be able to help out. A company called Evolv, which monitors recruitment data, made some interesting discoveries after mining through 3 million data points from 30,000 employers.  One thing they found is that potential candidates who fill out online job applications with the browsers Firefox or Chrome are more likely to perform better at their job and change jobs less often.  Analyst believe the reason for this is that people who use Firefox or Chrome, which have to be deliberately downloaded and installed, are the kinds of people who take time to reach informed decisions.  60% of Americans work hourly jobs, and of these workers about 50% change jobs every year.  So, large companies have to process an enormous amount of applications every year, so there is a great opportunity for firms to save money and improve their hiring process through big data.  Some of the research done by Evolv has found that some established practices are not very effective.  For example, many firms will not hire people with past criminal records, but the data has shown that for certain position that there is no correlation with job performance.  Xerox has found that one of the best predictors that a customer-service employee will stick with their job is whether they live nearby and can get to work easily.  This finding has helped Xerox cut attrition in their program by 20%.  Big Data can help people find the right job for them, but these algorithms are created by people and therefore can be prone to errors.  An example is a company turned down many good candidates for a position because they did not hold a certain type of previous position, one that did not exist at any other company....  Big Data is an extremely powerful and helpful tool, but it is only as useful as the people who interpret and use it.

Source:

http://www.economist.com/news/business/21575820-how-software-helps-firms-hire-workers-more-efficiently-robot-recruiters

Interesting Data Visualizations

Recently I came across an article that showed many different interesting approaches to data visualization. I enjoyed it because it opened my eyes to more than just Google motion charts or geographical heat maps. Its strange to think how creative you can get with mass quantities off data and how visually appealing you can make it. The first one on the list shows a mind map of the 200 most successful websites on the web ordered by category, proximity, success, popularity and perspective by informationarchitects.jp.
 This mind map is very intricate and fun to follow all of the different category paths. Some other visualizations use themes of larger items having more significance like we have seen before on some TED talks. I have always been intrigued by geographical maps. Below is a map Time Magazine developed to show the population density throughout the United States. They used visual spikes to denote high population densities.
I find this visualization is more effective in showing the drastic differences in population density rather than just having a scale of green to purple on a flat map. I think its safe to say that New Yorkers don't have a whole lot of space to their selves. This article by Smashing Magazine has many other interesting visualizations. I hope these provide some inspiration for anyone that wants to make some cool visualizations. Below is the link to the article.

http://www.smashingmagazine.com/2007/08/02/data-visualization-modern-approaches/ 

Retailers can use Big Data!


Small retailers that think big data analysis is only important for larger retailers should think again. The fact is using big data analytics is key for small businesses that want to compete with the larger companies.
And it’s even more critical to help online retailers interact with their customers in real time, according to the article in Practical eCommerce.
Practical eCommerce offers six ways online retailers can use big data and big data analytics to improve ROI.

1. Personalization. Different customers shop with the same retailer in different ways. That means online retailers have to process the data from these various touch points in real time so they can provide personalized content and promotions to each customer.
“For example, do not treat loyal customers the same as new ones,” says Gagan Mehra, author of the article. “The experience needs to be personalized to reward loyal customers. It should look attractive and ‘sticky’ for new customers.”
2. Dynamic Pricing. Dynamic pricing is a type of “price discrimination that companies use to change prices on the fly based on circumstances and estimated user demand,” according to Money Crashers. Online retailers need dynamic pricing if their products compete on pricing with other sites.
Dynamic pricing means taking data from a number of sources, such as competitor prices, product sales, regional preferences, and customer actions and then figuring out the price needed to close the sale for a particular product. Supporting this functionality will give businesses a big competitive advantage, according to Mehra.
3. Customer Service. The success of an e-commerce site depends in large part on superior customer service. To excel at customer service, online retailers must use big data analytics to give customer service representatives a 360-degree view of each shopper’s interactions with their companies.
“For example, if a customer has complained via the contact form on your online store and also tweeted about it, it will be good to have this background when he calls customer service,” Mehra says. “This will result in the customer feeling valued, creating a quicker resolution.”
4. Manage Fraud. Online retailers can use big data analytics to process their sales transactions against known patterns of fraud, to detect fraud in real time – or it could be too late to catch the criminals.

5. Supply chain visibility. Retailers can use big data analytics to give customers information on the availability, status and location of their orders, Mehra notes. “This will require your commerce, warehousing, and transportation functions to communicate with each other and with any third-party systems in your supply chain,” he says. “This functionality is best implemented by making small changes gradually.”

6. Predictive analytics. Online retailers can use predictive analytics to predict the revenue from certain products in the next quarter. “Knowing this, a merchant can better manage its inventory costs and avoid key out-of-stock products,” Mehra says.

Reference:

Bringing Machine Learning to the Masses

There is a movement to bring big data analytics to companies with less knowledgeable user base and improve the ease for more technical users. BigML is leading the charge on this with a completely graphical and web based interface to apply machine learning to large datasets. BigML’s major selling point though is the lowered bar of entry. To run machine learning on a dataset of one GB and produce an actionable predictive model it will cost around $50. This may seem like a significant amount when compared to doing machine learning on your own, but the real way that BigML is lowering the bar to entry is by decreasing the amount of knowledge required to make these models. There is no programming done by the users, just running through a graphical interface. The site assists with basic analysis, and provides a blog in which they provide tutorials and discuss more in-depth use cases and analysis methods. BigML also provides an open API and multiple open source bindings for popular programming languages for more advanced users and very detailed documentation and starting guides for those that wish to become advanced users.


Source: https://bigml.com/features