Thursday, April 4, 2013

Marketing and social network dataming

Almost everyone has Facebook, twitter or other account in social networks. People share photos, share pages or talk about something. ALL of this activities could expose their ideas on some products. As I talked about in last blog, companies could find value customer through way. In this blog, I will talked more about this. 

Traditionally, companies get customer profile from their customer databases, and usually, this database could only have customer's basic information. Only when one customer buys a lot things from them, they could obtain accurate profile of customer. 

Now the social networks allow people to share their idea and expose themselves, it provides  more opportunities for companies to analysis their customer or potential customer. For example, by social network mining, some new trend could be predict. The figure below show a research result for Harvard University. 



Application Example 

One application of this is for financial institutes, such as banks and credit card companies. 

This companies need changes in people financial states, based on people's changes, they could offer some promotions or avoid risk. Their are some instances. 


First, when someone update status in Facebook from "in a relationship" to "engage" or "married", banks could give this new couple some promotion of opening couple account or credit card promotions. They could also contact them, such as send mail to them, to introduce real estate loan. 

Second, when credit card companies find their customer is laid off which got from his/her twitter or Facebook, which means he/she would has less money in a period in the future, the credit card company may increase APR or reduce credit limit or pay special attention to this guys. Also, his friend in a same department should be pay attention too. 

Problems

The biggest problem matching their customer between social networks and their own customer database. 

First, if a company has a customer's email, it is easy to find customer on social network. Because Facebook or other SSN website could search people based on their email address, and the email address is the index of database. 

The other case is bad, if no customer identity information is available, then matching technologies  are demanded. The matching method is very sophisticate, and the investment of matching is huge. In this case, decision maker need a trade off. 

As a summary,social networks provide a new source of intelligence for marketing. Mining is necessary, and could increase income and know customers better. 


WEB DATA SOURCES FOR SPORTS (2)


This is continued from the previous one.
Hockey
NHL.com
Professional hockey’s governing body, the National Hockey League, also presents data and statistics to users through their website of nhl.com. This website is as rich in material as both baseball and basketball’s official websites; it offers the standard complement of leaderboards, team statistics and historical game comparisons.
Hockey-reference.com
This website offers all of the conventional sortable historical and real-time hockey data based on players, teams, leagues, coaches, etc. A blog application allows users to share ideas and insights with one another.

Soccer
MLSnet.com
MLSnet.com, American major league soccer’s governing body, has made game data available in fascinating ways. Aside from the typical repast of historical and real-time player, team, and league data available in sortable form, their GameNav application abstracts video footage of game events into a console game-like environment
Soccerbase.com
For fans of English Premier football, soccerbase.com provides the expected historical data on players, teams and leagues. This data is sortable and its unique twist is the ability to compare players or teams

Other Sport Sources
Stats.com
Stats.com is a multi-sport data repository that contains historical and real-time sport data. This subscription-based service works with official league bodies to provide users, teams and the media with up to the second multimedia in the form of textual score updates and interactive graphics indicating shot progression and scoring. The amount of data they maintain is exhaustive and the graphics provided are top notch.
Atsdatabase.com
Atsdatabase.com is a subscription based service that provides line odds and betting advice to bookmakers and individuals alike. The free data provided on the website is fairly basic in its composition and is geared toward enticing users to subscribe to their paid-services.

Hope the above web sources will provide valuable information for you.

Text mining techniques used for hiring process


This tutorial is based on resume sorting and clustering using text mining techniques. The crucial function of every company is hiring new individuals. The pool of resumes a company receives during recruitment are way higher than number of person assigned. Text mining technique is required in order to sort and filter keywords such as Internships, relevant skills, experiences, etc. Based on those keywords, various categories can be defined and resumes can be categorized ultimately leading to selection of better individuals. The video posted below shows different techniques used to filter resumes.


From A Customer Perspective


Consumers are no longer what they used to be. Before they buy, they look around, gather a ton of data about their product, compare prices and are aware of the market forces. They want to be treated as unique. They are influencers and mini-tyrants who will leave shop elsewhere in the blink of an eye if an organization does not keep up. An organization needs to outrun them and be very responsive to the ever changing market trends.
The good news is that organizations have all the information needed to understand, catch and nurture these empowered consumers. Because they literally leave traces everywhere. They leave colossal masses of data from various sources such as Twitter, Facebook, industrial sensors, Geo-tracking data, voter registrations, sports statistics, website logs and much more.
Big Data comes with real predictive insights, actions and results as well. This blog talks about some of the views on the way markets are changing and how an organization can make informed use of these trends to strengthen their profile. I believe this gives retailers a tremendous opportunity to focus all their efforts in building a strong relationship with the consumer and fully utilize what Big Data has to offer. It is not an easy initiative but, I believe Big Data is something all of us will continue to learn and utilize over the years to come and use it to our advantage.
1) Fast Customer + Big Data = Big Marketing
The good news is that an organization has all the information needed - for a 360° view of the customers. There is massive amount of external information that can give superior insights into the customer’s minds and behavior.
2) The customer extends tyranny over the market and can highly influence the pattern of market trends. A customer responds when he or she is being focused upon. Gone are the days of loyalty and it is based on a public cycle of buying and influencing.
Metaphorically expressed as:
HAPPY CUSTOMERS TELL 9 PEOPLE
DISSATISFIED CUSTOMERS TELL 22 PEOPLE
3) Big Data Technology is remolding customer-vendor relationships into something more dynamic, personal and dialectic. Customers want a far reaching personal experience and want it to be unique. With all the data needed to understand a customer is at hand, the process is made very simple. Today it is possible to make a full digital footprint analysis of purchasers by auditing and merging internal and external data feeds, turning those into real insight and making real-time decisions based on individual profiles.
4) Making new customers and keeping them is the motto. McKinsey claims that 55% of the current marketing budget is spent on new customer acquisition and only 12% on customer retention. This division gives us a rough idea to where the focus of the organization lies. One among the many tools big data has to offer is the warning signs that a customer is on the verge of switching to competition. Big Data tools tell you when customers are making negative comments about you online, when they have partially switched to the competition or when their shopping basket has changed in content or size.
Big Data can also be used to recognize frequent shoppers and thank them for their loyalty which enhances the personal shopping experience.
Reference: Analyzing Customer Behavior- Big Data Research Papers by Greenplum.
Peter Hinssen-Predicting What Happens Next

Archiving Social Networking

Is 20/20 Vision Really Clearer in Hindsight?

Over the last few class periods, it is apparent that the power of social media is far reaching, and though we've been raised, even grown with Facebook, Twitter, and Instagram, I do not think we realize the power of social media. In researching what is archived on these sites, I came across this article called "The Inevitability of Archiving Social Networking Data." At first I was almost offended at the title of the article. It makes me nervous to know that there is a plethora of personal information about all of us on the web, and with the power of the search engine continuing to grow, searching for individuals has become easier and easier. I did not think that I would agree with social network archiving, but after reading the author's perspective, Nathaniel Borenstein, I feel somewhat on the fence as to what researchers should be able to archive.

Flashback to 1985, Borenstein is a member of a team at Carnegie Mellon University. The worldwide web and internet are foreign concepts to many. E-mail has just been introduced, and Borenstein states that many were skeptical of the concept in general. Borenstein, however, was very optimistic and urged CMU to archive the e-mails transpired across campus. He was met with resistance and had no way to store the data he sought to collect. Everyone asked him, "who would want to look at old e-mails?" Thirty years later, e-mail archiving is huge and required by law in some cases. Borenstein makes an interesting point saying what's the difference between e-mail archiving in 1985 and social network archiving in 2013?

Though Borenstein states that many social network posts may seem unimportant, over time social media has the power to indicate responsiveness to social and societal events as well as customer responses to products in the business sector. There is the same resistance today that there was in the 1980s against e-mail archiving, and I must say, for the most part I agree. I can not imagine that I will feel the same way about situations in ten years that I do now and so on, let alone want that information as indicator as to how I felt. However, Borenstein makes a good point that with today's technology, ability to store information, and the progression of data mining, social network archiving is not only acceptable, it's crucial. With Borenstein's hindsight into the e-mail epidemic of the 1980s, should we truly begin archiving social media?



Article:

http://www.xconomy.com/detroit/2013/03/27/the-inevitability-of-archiving-social-networking-data/

How Big Data changes the game for non-profits


When a sudden and unpredictable disaster places great demands on humanitarian aid supply chains, there can be finite supply chain capacity into the disaster zone. “Often, the wrong types of goods can be shipped-by organizations that don’t have the ability to traverse the critical “last mile” to the disaster site.  The “coordination” attempt on the ground is nothing more than marshaling of incoming goods and trying to get the most needed ones through a constrained pipeline. Examples of where this has happened include Haiti, the tsunami in Japan and the earthquake in Pakistan.”

Orchestrating disaster response depends not only on coordination, but on the ability of aid organizations to collaborate with one another, access information coming in from all directions, and derive actionable intelligence from information. Humanitarian aid organizations like the IFRC,UNICEF, the Red Cross and others are beginning to meet these challenges with the use of Big Data that gives them more complete views of how well aid is working, and how they can optimize aid efforts. Just as importantly, these organizations are beginning to use Big Data in new ways that can help them in non-disaster aid efforts that have the power to preempt tragedy– if they act swiftly enough, and apply the right solutions.

Improving crop yields and agricultural practices in countries with high starvation and malnutrition rates is one example. For years, non-profit aid organizations have been sending in field workers to advise local farmers on best agricultural practices. These workers file progress reports and keep tabs on agricultural projects to see if crop yields improve.
The difficulty has been in collecting all of these reports, which come in many different forms–and then trying to glean insights into them after they become a monolithic body of unstructured and semi-structured data.  By using Big Data collection, grooming and analytics techniques, humanitarian aid organizations are now able to compile all of these unstructured reports of field farming activity into databases-and then to mine these databases for information about which farming projects are succeeding, which are not, and why.
These Big Data practices allow them to refine their metrics and practices for improved outcomes. They are also tying in weather reports with incidences of malaria and then breaking down malaria outbreaks by age group—again, an example of how Big Data emanating from a variety of collection points can be pulled together into a database and then queried for meaningful aid interventions.
Then there is Benetech, which processes over 1.3 million downloads of accessibility-friendly books from its online library for persons with disabilities like blindness and severe dyslexia. The organization collects information on over 200,000 program participants, as well as data on which books are most widely read. This information can provide insights into the handicapped demographic, how best to serve it-and potentially even intelligence on cognitive and motor skills.

This has resulted in interventions in both disaster and non-disaster scenarios that have yielded more success and relief from suffering-and just as importantly, less waste in situations that are always resource-critical.

The Human Face of Big Data App

The Human Face of Big Data

We all know that technology is a major factor, if not the factor, when it comes to data mining. Massive amounts of computing power are required to process all of the information available to those interested. However, after recent discussions in class regarding social networking, it became apparent that it is difficult to retrieve all of this data without violating the privacy of users around the world. I for one know that I did not enter the world of Facebook, Twitter, or Instagram with the notion that my personal life would be available to anyone who knows how to use a computer. This got me thinking about volunteering certain information. If I could volunteer myself for observation during a particular amount of time, and then be left alone, I would be more than willing to lend my interests to the marketing tycoons of the world. I am sure for most of us that our cell phones are the best indicator as to what we are interested whether it be looking at browser history or the apps on your phone.

This leads me to the topic of this particular post. I searched for phone apps that indicate personal interests based on your phone activity. I stumbled across an app, available to Droid and iOS users. The app is called "The Human Face of Big Data." The app asks you to anonymously agree to submit data your phone collects on a daily basis. Users may also choose to answer questions. You may then be directed to similar users around the world that use similar data sources, essentially creating a doppleganger based on your data profile. The creators of this particular application were looking to provide more information on how humans live, work, and wind down. Initial data collection took place over one week last year, beginning September 26th and ending October 2nd. I've attached the article I've read on the application, and if anyone finds more on this particular app, please let me know. 


http://lifehacker.com/5946396/the-human-face-of-big-data-compares-you-to-millions-of-people-around-the-world

Mining Twitter Followers


I recently posted on how you can use Twitter API developers tools to extract Twitter posts for data mining projects. I kept looking at the tools that Twitter makes available through API and found that one of the other features that it allows you to do is look up a user’s followers.  This can be a great tool for trying to group users based on the followers a user has.

Again you can use this tool in any search engine by a simple URL:

“://api.twitter.com/1/followers/ids.json?cursor=-1&screen_name=(inser the user name of the profile that you want to find the users for here)”

The limitations of this are that Twitter will only load 5000 followers at a time. For Twitter users that have more than 5000 followers, you can still get more than 5000 followers, you will just need to add and use the cursor function. In the above URL you see the cursor is set to -1. When this page loads, the bottom contains the following information:

"next_cursor":1266926330315775567,"next_cursor_str":"1266926330315775567","previous_cursor":-1327399186464802948,"previous_cursor_str":"-1327399186464802948"}”

Just change the cursor from -1 to the next cursor number and reload the URL and it will index to the next 5000 followers. You can keep going on like this till the next cursor reads 0. When it reads 0 that means that you are at the end of the followers list.

What loads from Twitter is a list of user ID’s. Here is an example:

“25797169,37792876,22439366,16049921,16564850,22490684,27653864”

You can then use Twitter’s user/lookup function to find out more information about the followers. Here the URL for user/lookup:



Harvesting Big Data




            Agriculture in the United States is a massive industry that produces food for not only America, but may other countries around the world. The last census of agriculture in 2007 states that there are 2.2 million farms that cover approximately 3.73 million kilometers squared with an average size of around 1.69 kilometers squared . This is an eminence amount of land that is ripe for the picking and I mean picking though the mountains of data that it creates.
            Farmers could really benefit from being able to analyze this data and have open access to the results. My thoughts would be to create an avenue for them to share this data openly for the mutual benefit of all. If all farmers joined a cooperation that gathered all the data that each participating member produced and had a team of analysts search using data mining techniques, imagine how that could benefit the agricultural industry. They could better predict weather, the likely hood of droughts, freezes, or disease.
            How could this data be collected you ask? Well, if you wanted to be a participating member of the co-op you would pay a membership and that would cover the costs of installing small weather stations around your farm. These stations could gather all sorts of data, from the amount of moisture in the soil to the wind speed. The data could then be streamed back to the co-op main office for analyzing.
            I think that some sort of data gathering co-op between farmers is the best way to gather massive amounts of data that could help predict many devastating events. Being able to be proactive to these events will cause a much smaller impact then if they are reactive to the situation.      

Resource:
http://en.wikipedia.org/wiki/Agriculture_in_the_United_States

Big Data Helps Farmers Weather Droughts



In recent years low rain fall numbers and the resulting severe droughts have caused severe damage to crop yields for farmers across the nation. These farmers do have some assistance federally when it comes to recovery of lost profits. But, there is a brilliant upstart company that is trying to lend a helping hand, to those struggling farmers, by allowing them to privately ensure their farms and protect their potential profits.
            The name of this company is The Climate Corporation. They offer Total weather insurance which “is the only full-season program that enables you to protect potential profits by insuring against bad weather that can cause yield shortfalls.”(1) What does this have to do with big data you may ask? Well, they use big data analytics to create models for precipitation, heat, freeze, and drought. These models are then used to create the coverage costs that farmers pay for their policies. The Climate Corporation also uses this data to determine if and when a farmer can collect on their policy. This use of micro predictions and hyper local weather forecasting is revolutionizing the way farmers can insure themselves in case of lost profits. In the second reference The Climate Corporation offers an example of how they utilize their immense amounts of data to help a farmer in need in Kay County, north-central Oklahoma. The company used some very sophisticated algorithms to determine that any day over 98  would qualify as a heat stress day resulting in a $1-$2 payout per acre of farm land. If the heat stress days continue for 3 consecutive days or more it then qualifies as a heat wave and the payout doubles.  Below is a chart developed by the company that illustrates the drought and that the farmer encountered crop damaging temperatures for more than half of the year.

The company says that this if this type weather was to befall a 1000 acre farm then they could receive up to $150k this year paying only $31k in premiums.
            Farming is massive industry in the United States and the profit potential for companies utilizing big data to insure farmers is also massive. Being able to understand and decipher the data and make accurate predictions will allow this company and I am sure many others in time to make millions in annual revenue.  For more information on the company please see reference one. For more information/details on the example please see reference 2.   
References:

Big Data Scientist: Sexiest Job of the 21st Century

Hey everyone, just wanted to pass along the info that what we learn in this class could make all of us very rich someday! (Caveat being you have to go through grad school in the field if you want the really big bucks) I was stumbling through the Yahoo top stories as per my usual daily routine and clicked on this article with the headling "Sexiest Job of the 21st Century": http://finance.yahoo.com/blogs/daily-ticker/sexiest-job-21st-century-122238562.html. This video is part of a twice-daily Yahoo feature on big data titled "Big Data Download" which are videos featuring big-data topics and can be found at this link: http://finance.yahoo.com/blogs/big-data-download/. The sexiest job is not a doctor or lawyer or any other traditional high-paying job, but that of the data scientist, those who are experts in the field we are studying in this class. According to the article/video, the average pay of a data scientist straight out of grad school is $225,000, which would put you in the upper 1% of earners in the country at 26 or 27 years old (if you go to school straight through). In fact, one CEO described people with the big data scientist skillset as "unicorn-like". The CEO even told the story of a data scientist who was lured away from his firm by Microsoft with a $650,000 salary plus bonus. I have to think I would have left in a similar situation.

This job is so high-paying due to the insane demand for people with the skills vs. the lack of supply of people with the skills. I know this is repeating info from a previous blog post but I will reiterate it: McKinsey estimates that there are 190,000 fewer people with analytics experience than are needed by companies in the United States. The good news for us IE majors is that the article emphasizes that companies are not looking at strictly computer science majors for these jobs. They are looking for anyone in any major who can bring a different way of looking at data. The video tries to claim there is "no school or major you can go to to learn to be a data scientist" but I think that will change in the coming years as schools realize the value of this field of study and the benefits it offers graduates. I only hope that Auburn continues pursuing this field with classes in future years such as the one we are taking and perhaps someday becomes a school which offers Big Data Analytics as a major, helping enhance our university's profile as an academic institution.

Sports Analytics and Big Data

I am a huge fan of fantasy sports and sports analytics so when I hear of the MIT Sloan Sports Analytics Conference I immediately delve into their history of speakers. This conference hosts numerous panels spanning several sports and facets of sports as well as the evolution of sport and research papers. One panel that I found most interesting was the fantasy sports analytics panel. This panel featured Peter Schoenke, Jonah Keri, Joe Bryant, and the man with my dream job Matthew Berry. Each of these guys expertise is in fantasy sports. A common theme throughout their session was finding the "new stat" to get a leg up on your competitors. Matthew Berry, fantasy expert for ESPN, mentioned a "Points Per Touch" statistic for each NFL player. Fortunately for Berry he has the ESPN statistics department at his disposal. This is an interesting approach to look at. Sports statistics are easily attainable and span back many seasons. NFL.com, yahoo.com, ESPN.com and one of my favorite sites pro-football-reference.com provide all stats spanning back many seasons.

Creating these new statistics for further insight into players can be very useful for fantasy drafts. The hardest part is finding which stats and situations will provide the best opportunity for fantasy points. As the panelists mention during their session football is hard to predict because there are so many factors to look at including defenses, offensive schemes, weather conditions, etc. as opposed to bowling where the conditions are always the same and involves a single player at a time. I have come up with an "Opportunity Factor" which is essentially the amount of involvement a football player is within his teams offense. For example a running back's opportunity factor would equate to his (number of receiving targets plus number of rushing attempts) divided by (the teams total passing attempts and rushing attempts.) I hope to find this as a great indicator of useful players in fantasy football and also encourage others to find that great new statistic. Below is the link to the video of the panel I referenced to. It's a great watch for anyone who is very involved in fantasy sports.

http://www.sloansportsconference.com/?p=6556




Wednesday, April 3, 2013

Big Data for a Better World






This video is presented by UN Global Pulse Director, Robert Kirkpatrick, and in this video, he tells us how to use Big Data to understand human well-being.

The challenge now the organization face is how they track what happens to populations around the world is still stuck back in the twentieth centurythey were still using household surveys and collecting statistics every few years to see what’s happening. In the same time, they have begun asked the question if the private sector is able to transform its own operations, is able to monitor emerging trends in the market in real time, and is able to understand feedback to its customers in real time. In the Big Data revolution, there are three distinct opportunities, the first one is better early warning: earlier detection of anomalies, trends and events allows earlier response; the second one is real-time awareness: a more accurate and up-to-date picture of population needs supports more effective planning and implementation; the third one is real-time feedback: understanding sooner where needs are changing – or are not being met – allows for rapid, adaptive course correction. Big Data is a human rights issue with considering privacy, Big Data never analyze personally identifiable information, Big Data never analyze confidential data, and Big Data never seek to re-identify individuals. 

The data that the mobile phones produce is very interesting, mobile carriers collect call detail records, such as caller ID, caller tower location, receiver ID, call start time and call duration, putting this records in the form, graph and map, you can construct a social graph, you can see the population movements, and you can see the plot trajectories across the map in time, and you can also understand patterns of consumption. In addition, most people choose to buy airtime expense records, such as caller ID, caller tower location, amount of purchase, time of purchase, and balance at time of purchase. In their organization, lots of work using social media analytics in the health space works very well, and lots of work going on in predicting elections has not really worked so well, so now they want to know if these kinds of approaches could use to understand when unemployment spikes, and they use SAS to analyze the data collected from United Sates and Ireland. Global Pulse has been setup in the secretary general’s office to start experimenting with how these new data sources in new analytical approaches and a new technologies can be useful for understanding what’s happening to pull population around the world. Their basic model is the notion of digital services as sensor networks.

In this video, Robert uses several examples to talk about how could Big Data make the world better, and Big Data does not only give the solutions to people’s preference of consuming products and services, but also can tell people’s refection of global stresses, and the effect of several development programs.


Brain Mapping - A new initiative

President Obama announced a brain-mapping initiative which will require heavy use of cutting-edge and yet-to-be invented data processing and imaging technologies. The Brain Research through Advancing Innovative Neurotechnologies (BRAIN) initiative could help scientists unlock the secrets to Alzheimer's and Parkinson's disease, strokes and even human cognition. More than $100 million will be invested in the initiative which will include National Institutes of Health, DARPA, National Science Foundation, outside academics and private companies. The director of National Institutes of Health said that this initiative might eventually require the handling of yottabytes of data which is equal to a billion petabytes and collection of data could stretch the limits of modern information science. The director Francis Collins said "There have been some conversations about whether the amount of data, if you are going to collect data from tens of thousands of neurons in real time, can you process and store it. This is generally in the direction of the capability where things are headed". DARPA's project Detection and Computational Analysis of Psychological Signals will be funded under BRAIN. This project also requires analysis of very large data sets. The White House reported that DARPA will develop a new set of tools to capture and process dynamic neural and synaptic activities. Not only will BRAIN require new computer technologies merely to perform the necessary research, but the results of the research could also be used to develop new information technologies.

Reference - http://www.informationweek.com/government/information-management/obama-brain-mapping-project-tests-big-da/240152129

Clean Up, Clean Up, Everybody Everywhere!


Although the floor of my room may not be a good example of this, I really do enjoy being organized. I feel so much better when I have a schedule and have things put into to-do lists. Organization not only falls into the categories of cleaning up our rooms and making schedules, but also into the category of cleaning up our data.  The cleaning and maintenance of data is a productive and profitable habit to have.

Michael Della Penna of ClickZ Marketing News & Expert Advice claims that a few reasons we should keep data clean are as follows:
  • “Dirty Data” costs US businesses over $600 billion dollars annually.
  • 46% of survey respondents cite data quality as a barrier for adopting BI/analytics products.
  • Poor data or the lack of visibility into data quality is cited as the number one reason for overrunning project costs.
  • Data quality best practices boost revenue by 66%.
  • If the median Fortune 1000 company were to increase the usability of its data by 10%, company revenue would be expected to increase by $2.01 billion dollars.





There are several suggestions for what businesses should do about the problem of dirty data. One of the suggestions is “Conduct a Data Collection Audit”, which is basically the idea that businesses should only collect data that is relevant and to know how that data will be used. Another suggestion is “Address Dirty or Neglected Data”. This includes inputting, validating, and cleaning up phone numbers and email addresses, as well as reengaging phone numbers and email addresses that haven’t been addressed in a while. Taking time to “Focus on Preferences and Privacy Management” is also important because it is a customer-lead and controlled world; it is essential that customers are aware that the businesses want to know their preferences. Studies show that consumers are more comfortable sharing personal information when it will enhance their experience. Another suggestion to clean up dirty data is to “Break Down Data Silos”. This refers to the breaking down of data silos (mostly in the form of social media programs) by connecting and streamlining all of their efforts into a centralized data-mart. This collection of data could be used to improve program performance. Lastly, Penna suggests to “Invest in Interaction Management”. This idea refers to marketing the data in a way that the customer will enjoy. The data must have value to the customer, depending on both behavior and preferences of the individual.

When considering these suggestions in simple terms, it doesn’t seem like it would take all that much to make large data sets more attractive. I know that it is not as easy as it seems, but if businesses take one step at a time to reach the goal of “Clean Data”, it will be productive. After all, you can’t make your bed and clean your closet out at the same time, can you?

Source:
http://www.clickz.com/clickz/column/2258186/spring-cleanup-bad-data-not-big-data-needs-your-attention

How Big Data can be misleading.

Big Data has so much potential to shed light on so many concepts, new ideas, and insights that many people have flocked to it as some sort of catch-all.  But Big Data can have its problems as well.  Mostly when people who are studying Big Data forget that correlation does not always equal causation.  For example, a study of Hurricane Sandy-related twitter and FourSquare data (research paper) produced some expected findings. Mostly packed grocery stores the night before the storm hit.  This collection of data does not fully represent what occurred over that period though.  The majority of the twitter data came from the very populated and higher smart phone ownership area of Manhattan.  This would make one think that Manhattan was the center of the area most affected by the storm, but this is not true.  As the flood water caused extended power outages this would lead to people's smart phone's batteries dying therefore not allowing them to tweet.  This is what happened in some the harder hit areas like Coney Island.  This is referred to as a "Signal Problem" where there is no signal coming from certain areas or communities due to particular factors.

Another example of this "Signal Problem" would be with an app used by the City of Boston to fix potholes.  The phone app uses accelerometer and GPS data to passively detect potholes around the city.  But, if you think about it, this data only provides part of the picture of the potholes around the city.  This method will not be able to detect potholes in areas of the city with low smart phone ownership, lower income and areas with a high elderly population.  As you can see, Big Data can tell us some much about many of the problems we face today, but we have to remember that it is not the entire picture.  We have to remember and consider what areas are being left out of the data and close these gaps.

Sources:

http://sm.rutgers.edu/pubs/Grinberg-SMPatterns-ICWSM2013.pdf

http://blogs.hbr.org/cs/2013/04/the_hidden_biases_in_big_data.html