Analytics and Visualization of Big Data

Monday, April 15, 2013

Predicting the Future with Data from the Past

Historians have long searched for answers about today's world from the past, such as "Why do civilizations collapse?". Although historians look for answers from language, today mathematicians like Peter Turchin, a professor at the University of Connecticut, are using math to gain further insight. Turchin is the driving force behind a field called “cliodynamics,” where scientists and mathematicians analyze history in the hopes of finding patterns they can then use to predict the future. And unless something changes, according to Turchin, the U.S should expect a large amount of violence (terrorist activity, uprising) in the year 2020. A summary of these 100 year "waves of violence" in the U.S can be seen below: It is interesting to note that these spikes in violence occur in 50 year cycles in the U.S, and that these "secular cycles" occurred in all past agrarian states in which records were available (i.e Ancient Rome, Medieval England, Dynastic China, Russia).

Although Turchin is not able to apply many big data techniques in his analysis due to the lack of historic data sets, he admits that creating models on these historical data sets were not even possible until recent history when old documents started to become digital.

http://www.wired.com/wiredenterprise/2013/04/cliodynamics-peter-turchin/

Targeting Business Customers vs. Consumers

Indix, a start up company that is now split on two continents, just recently got $4.5 billion in financing in order to work towards their goal of tools to give business managers based on big data. The company's CEO, Sanjay Parthasarathy, just recently moved from India back to Washington to expand the marketing, sales, and customer service center, leaving the research and development center in India. The small, 35 person, firm is doing big things however, despite it's size and new split between continents.

Indix is building a dataset with information on different products, their prices, and certain events that affect those prices. At this moment, they have over one billion prices and lots of products and are aiming for more. What do they want to do with all this data? The interesting thing about Indix is that they are focusing on the business customer side instead of the consumers. Why? Parthasarathy says that it's very interesting looking at the events along the supply chain that affect the prices of the product. He likes to think of what businesses will be capable of if the managers are equipped with the information that Indix is currently compiling and analyzing.

Their goal is to build a platform and a set of tools that will enable both internal and external collaboration within businesses. They want the managers to be able to use the information, analytics and visualizations, suggestions, and many other tools created by Indix to improve their business, mostly to set the most profitable price. What is interesting to me about this is that I haven't seen a lot about this before. Indix seems like a pretty unique company with a unique goal. I've seen a lot about Target and Netflix using the analysis of Big Data to determine what their customers want. But what about ways to improve their business model or supply chain?

This seems to be a place where IE's can truly come in and shine. Yes, what the consumers want from a store or company is very important and meaningful to us. However, we're trained to improve systems and businesses. This is what Indix is trying to do with their massive dataset. Maybe this is something that we should also be focusing more on as IE's as Big Data is becoming more popular.

Source:

http://www.xconomy.com/seattle/2013/04/15/a-billion-prices-and-counting-big-data-ambition-at-startup-indix/

Sunday, April 14, 2013

Big Data and Finance

From at least 2009, Big Data came out in technology industry and became an important theme in their circle, and then, Big Data entered into Wall Street in 2011 to make the financial businessmen to pay attention on it. With the reason that storage costs dropping, CPU costs dropping, bandwidth cost dropping, and network access exploding, there is a big problem for people to know which data is useful and which data is mismatched and incomprehensible, and then, Big Data comes out to help people to fix the problem.

One of the good adopter of Big Data and Data Integration services is financial industry, and of course, healthcare industry, logistics industry, manufacturing industry, military industry, data services bureaus, and casinos are all the adopters of Big Data and Data Integration. Everyone should know that there is treasure in the data and for example, healthcare industry could save three hundred billion dollars with real time data integration, which could be considered as Big Data. Based on these situations, the opportunities from Big Data are not only caught the attention of engineers and technology professionals, but also are taking the interests from the investors to try their best to find out the gold from Big Data, and you could notice that there is no other industries could be eager to get more profits from Big Data than financial industry. The secret of the financial industry prosperity is data, such as, transactions, customer interactions, rate changes, and risk assessment portfolio investments, and so on.

With the rapacious heart, financial industry could seek any methods to put the different kinds of sources of structured and unstructured data together to enhance business intelligence, because they know that the improved business intelligence is able to help you do real time data analytics, and then they could get the ability to make the better decisions with market leading decision making power and insight, with Big Data, the financial industry are trying to develop strong big data architecture to increase value to the business and to use data in real time to use Big Data analyzing method to make a difference. So the investors just pay more attention on Big Data market, and they focus on several companies which could bring the investors big money, such as IBM, EMC, Teradata, Oracle, NetApp, HP, Hitachi’s, Qlik Technologies, and Tibco Software. Among these companies, IBM, EMC and Teradata could be the top name on the Big Data opportunities list. Big Data could help people to organize and analyze a large amount of unstructured data, which could be considered as the information out of the realm of traditional databases, such as email, powerpoint presentations, audio, video and social media. David Goulden, the CFO in EMC, said “ Increasingly, then transformation of business itself is being driven by the ability to harness big data, and business that can quickly use the vast troves of data both inside and outside their companies, can gain competitive advantages, ” in the company’s 2012 fourth quarter conference. There is no doubt that Big Data is a developing and growing field, and JMP Securities make a forecast that the value of Big Data market could increase from nine billion dollars in 2011 to eighty six billion dollars in 2021. IBM has an widespread strategy for catching Big Data opportunity, Charles King, who is the principal analyst at research firm Pund-IT, said “IBM sells standalone Netezza Big Data appliances but is also incorporating them in a broader strategy with its other database, business intelligence and analytics and data management solutions.” Mark Loughridge, who is the CFO of IBM, pointed out that Netezza revenue increased seventy percent year over year, and he said, “We could have done even better on Netezza but we sold out the box, Netezza turned out to be a very, very strong acquisition for us." Based on these, we should notice that Netezza is just one part of IBM’s Big Data plan, but we should see that Big Data makes IBM make more profits in analyzing field business. EMC is another one on the top of Big Data stocks list, and no investors could ignore the significance of EMC’s Greenplum division, Greenplum Unified analytics Platform helps businessmen process structured and unstructured data quickly, and in the same time, help them share the results in different departments in an organization. Teradata is a smart organization to take the good position in Big Data area in a high speed, Greg McDowell, an analyst of JMP Securities, said “ They have been in the big data business for a long, long time, and they are the leading data warehouse vendor.” For taking the good position in Big Data area, Teradata bought Aster Data, a software maker which is good for analyzing unstructured data. All of us know that Silicon Valley could invent future, and the financial industry, especially the stock market know that also, with Big Data, financial industry see the future in their own area.

Big Data is not only a new technology to deal with more and more massive data in the world, but a big opportunity for everyone who wants to get some benefits in the future, to be more familiar with Big Data is not only important for students, researchers, and engineers, but also very important for people who make a living with finance industry to gain more in their career time in the future.

Sources:

http://www.thestreet.com/story/11496773/4/top-3-big-data-stocks-for-2012.html

http://www.bigdatafs.com/

http://www.csmonitor.com/Business/The-Reformed-Broker/2011/0609/Big-Data-hits-Wall-Street

Saturday, April 13, 2013

MIT study on Implicit relationships

-->

I was reading somebody’s blog post and watched the link they posted about the UN’s Big Data research when I heard the emcee reference a study out of MIT that looked at Facebook user profiles at MIT and could predict a user’s sexual preferences based on their friendships. So, I investigated the research.

You can read the entire paper here. The title is 'Gaydar: Facebook friendships expose sexual orientation' by Carter Jernigan and Behram F.T. Mistree.

Disclaimer: I didn't read the entire paper. It's really long. I did read the majority of it.

Here are the high points:

Aggregation of personal data – The researchers talk about how in any environment – state department or social media, the aggregation of data poses a great potential for risk. A few examples they use are officials looking up the passport activity of President Obama as well as confidential information of taxpayers. If the information is available anywhere, people will exploit it.

Sex segregation – As the old saying goes “birds of a feather flock together” and this truth intuitively makes sense. Those of us that enjoy leadership and being on teams will migrate towards campus involvement. Those more proficient in sports activities and have a competitive attitude and drive and thus will most likely find each other on sports teams. Well, the same could be said in regards to how sexes segregate themselves. A study cited in the paper states that men have 65% male friends and 35% female friends and females have 70% female friends and 30% male friends. This suggests a likelihood that if given an individual person identified by a specific sex, they should have the same percentage of friendships. A significant deviation from this might offer some insight to that person's lifestyle.

Homosexuality and sexual segregation – According to the study, homosexual men and women draw the majority of their friends from the LGB community whereas bisexual men and women draw their friends from the heterosexual community. So, again an observation of the amount of the sex of friendships of an individual user should begin to indicate insights about that user.

Forming the hypothesis – Because of these observations, we expect gay males to have a higher proportion of gay male friends online.

Methodology - The researchers used a web-crawling software (Arachne) to go through all of the Facebook users at MIT and extract data concerning student’s sex and “interested in” information as well as the user's friendships. The result was that the researchers could use explicit and implicit (see article for explanantion) friendships to deduce a user’s sexual orientatation based on what their friends listed in their “interested in” bio information. For example, if an MIT male was examined, based on the percentage of male/female friends he had and their listed sexual orientations – the algorithm developed by the researchers could tell the sexual orientation of the user.

Implications – So, the most interesting thing about this, to me, isn't being able to know someone's sexual orientation. That's interesting, but I don't really care. Rather, what's really significant about this is the notion that people can harvest seemingly harmless information about you and use it to make implicit assumptions about you. Think about it this way - you're a CIA agent. You live in Atlanta and work for a private equity firm as a cover so your family and friends think you're normal. Say you're trying to keep up with your kids so you have a very basic Facebook profile and every once in awhile you tweet a picture of you and the fam on vaca or the new boat you just bought. Obviously, you wouldn't post that you work for the government in your 'About Me' information or anything that would significantly link you to a clandestine profession. But, what this research is suggesting is that we can make accurate predictions on a user's lifestyle/personal habits based on information that they aren't making readily available. So, perhaps I scan your friends on Facebook and most of them make sense except a few that are located in D.C. and that information coupled with some of your spending habits that I've observed on your Twitter/Instagram accounts yields a suggestion that you travel to cities that are outside of your job's requirements and your standard of living is significantly different from what it should be. Certainly there are other explanations for these observations. However, another insight that this research suggests is that "types" of people act in similar patterns. So, sure all of these insights into your buying history and friendships would be normal ordinarily. However, when compared with a test set of other CIA agents' info, we come up with a 95% likelihood that you are, in fact, a CIA agent. Cover blown. Your kids and wife are taken hostage and Arnold Schwarzenegger is called in to come and save you.

This is a bit of a reach, but I think the underlying coversation here is very significant. If Big Data analysis is allowing businesses, governments, or (God forbid) terrorists to gain useful insights into aspects of our lives that we aren't intentionally sharing... what could that mean?

Thursday, April 11, 2013

Could Big Data Cripple Facebook?

I read an interesting article that discussed how Big Data could cripple Facebook. It started by discussing a startup company called SmogFarm that has a product, KredStreet, that uses sentiment analysis on stocktwits. It measures reality vs past sentiment in order to determine accuracy and give rankings. This product holds people accountable for what they’ve said and predicted in the past. Imagine what happens when this kind of sentiment analysis is applied to Facebook.

Already, there is software out there that can get meaning out of seemingly useless data very easily. In the near future, people believe that employers can simply point some software at a prospective employee’s facebook profile and find out a lot of information about them without spending any time. The things that software could learn about a person are: work habits, failures, emotional issues, attitudes about authority, etc. There is also research to bake this up. Researchers at Cambridge University studied 58,000 people’s “likes” on Facebook. They were then able to build a model that was able to predict the following things: homosexuality in men (88%) and in women (75%), ethnic background (95%), gender (93%), religious affiliations (82%), political affiliations (85%), use of addictive substances (75%), and relationship status (67%).

Once people realize that this can be done easily, people will stop sharing so much information on Facebook, and other social media, according to Evans. Zuckerberg’s Law is that every year the amount of information people share on the Internet doubles. Will this hold true after people realize that potential employers can learn a lot more about them than they ever thought?

I agree that once people realize the implications of their social media use some will change how they use it. However, adults tell high school and college kids all of the time to be careful what they post on the Internet because you can never take it back. Students are warned of how employers can use what you post on social media to make decisions about you as a potential employee. Yet, if you take a look at a lot students’ facebook profiles, it appears as if they had never been warned. Teenagers and young adults think that bad things won’t happen to them, they are invincible.

Now, once there actually are repercussions to people for what they post on Facebook, they might change how they use it. I don’t think people will stop using it though. According to a recent study, the average American spends around 8 hours a month on Facebook alone. That is a whole workday’s worth of time on one social media site. People aren’t just doing to magically stop using it because it’s possible for people to know more about them than they would like. I think that some people will stop or dramatically change how they use it, but I think the majority of people will continue to use it how they are now.

Sources:

http://techcrunch.com/2013/03/30/big-data-could-cripple-facebook/

http://memeburn.com/2013/03/when-big-data-gets-scary-using-facebook-likes-to-reveal-everything-about-you/

http://mashable.com/2011/09/30/wasting-time-on-facebook/

Pinterest Analytics Alternatives

As a follow up to my last post on Pinterest and their new analytics tools, I would like to analyze other alternatives to the built in tools. Since Pinterest limits the access to their analytics tools to businesses, I wanted to look into several options that could be used by individuals, or by businesses as an alternative to the built in functionality.

One of the more interesting tools is called PinPuff. PinPuff allows businesses to look into the value of each of their pins. Businesses and individual users alike receive a reach score(which measures your following), an activity score (which measures your pinning activity), and a virality score (which gauges how people like your pins by frequency of repins and likes). These quick measures can give a quick look into the health of your Pinterest account.

Similar to PinPuff, Pinreach helps you measure your influence within Pinterest. Unlike PinPuff, Pinreach can show your activity on Pinterest as a function of time. This is a helpful analytics tool. Pinreach is much more visual than PInPuff, allowing users to see pictures of their most popular pins as well as a view of their most influential followers. This could help a business target their marketing strategy by trying to get their influential followers to repin.

There is also a Google Chrome add in which allows you to search images on Google that are related to the Pinterest image. This search power and functionality seems to be in line with Google’s image search expansion mentioned in the video that we watched several weeks ago. It will be interesting to see how the rise of more visually engaging social media (Pinterest, Instagram, Tumblr, ect.) impact Google’s direction of search development.

Sources:

http://www.hongkiat.com/blog/useful-pinterest-tools/

http://pinpuff.com/

http://www.pinreach.com/

https://chrome.google.com/webstore/detail/pin-search-image-search-o/okiaciimfpgbpdhnfdllhdkicpmdoakm

Limitations on Big Data in Marketing

Often times, big data is viewed as the instant solution and answer. Playing “devil’s advocate”, I’d like to explore some of the reasons why bid data fails us in marketing strategies. In a recent article on Forbes, an author explored why big data has limits in marketing.

The first reason why big data limits us in the development of our marketing strategies is that we are operating under the assumption that the real world operates like our simulated world in big data. Big data analysis essentially creates and analyzes an alternate world where our entities are operating randomly, yet predictably. Although people do tend to make decisions in trends, this alternate simulated world is not a perfect and ideal representation of a true reality. Also, in a simulated world, our risks are negligible and costs of failure is minimized. This leads us to explore more risky marketing strategies which could ultimately yield more results at the end of the day.

Another reason why big data often can fail in marketing strategies is that the life span of industry leading technology is fast and fleeting. By the time one companies snags a technology or idea to transform it’s marketing strategy, another company might quickly be closing in on the same strategy. As more industry players close in on the strategy, it is no longer an industry leading idea, but rather an industry standard protocol. This takes what was once innovation and transforms it into potential extra work and expense for marginal benefit to the company.

The Forbes article also emphasizes the benefit of human experience on marketing strategies. It focuses on data mining is performed by machines, which don’t understand the personal experiences of human life and circumstances. The article closed by emphasizing the difference in efficiency versus innovation. Efficiency often comes from machines and algorithms whereas innovation comes from individuals and personal experiences.

Sources:

http://www.forbes.com/sites/gregsatell/2013/03/06/the-limits-of-big-data-marketing/2/