Showing posts with label Big Data. Show all posts
Showing posts with label Big Data. Show all posts

Saturday, April 13, 2013

MIT study on Implicit relationships

I was reading somebody’s blog post and watched the link they posted about the UN’s Big Data research when I heard the emcee reference a study out of MIT that looked at Facebook user profiles at MIT and could predict a user’s sexual preferences based on their friendships. So, I investigated the research.

 You can read the entire paper here. The title is 'Gaydar: Facebook friendships expose sexual orientation' by Carter Jernigan and Behram F.T. Mistree.

Disclaimer: I didn't read the entire paper. It's really long. I did read the majority of it.

Here are the high points:

Aggregation of personal data –  The researchers talk about how in any environment – state department or social media, the aggregation of data poses a great potential for risk. A few examples they use are officials looking up the passport activity of President Obama as well as confidential information of taxpayers. If the information is available anywhere, people will exploit it. 

Sex segregation – As the old saying goes “birds of a feather flock together” and this truth intuitively makes sense. Those of us that enjoy leadership and being on teams will migrate towards campus involvement. Those more proficient in sports activities and have a competitive attitude and drive and thus will most likely find each other on sports teams. Well, the same could be said in regards to how sexes segregate themselves. A study cited in the paper states that men have 65% male friends and 35% female friends and females have 70% female friends and 30% male friends. This suggests a likelihood that if given an individual person identified by a specific sex, they should have the same percentage of friendships. A significant deviation from this might offer some insight to that person's lifestyle.

Homosexuality and sexual segregation – According to the study, homosexual men and women draw the majority of their friends from the LGB community whereas bisexual men and women draw their friends from the heterosexual community. So, again an observation of the amount of the sex of friendships of an individual user should begin to indicate insights about that user.

Forming the hypothesis – Because of these observations, we expect gay males to have a higher proportion of gay male friends online.

Methodology - The researchers used a web-crawling software (Arachne) to go through all of the Facebook users at MIT and extract data concerning student’s sex and “interested in” information as well as the user's friendships. The result was that the researchers could use explicit and implicit (see article for explanantion) friendships to deduce a user’s sexual orientatation based on what their friends listed in their “interested in” bio information. For example, if an MIT male was examined, based on the percentage of male/female friends he had and their listed sexual orientations – the algorithm developed by the researchers could tell the sexual orientation of the user.

Implications – So, the most interesting thing about this, to me, isn't being able to know someone's sexual orientation.  That's interesting, but I don't really care.  Rather, what's really significant about this is the notion that people can harvest seemingly harmless information about you and use it to make implicit assumptions about you. Think about it this way - you're a CIA agent. You live in Atlanta and work for a private equity firm as a cover so your family and friends think you're normal. Say you're trying to keep up with your kids so you have a very basic Facebook profile and every once in awhile you tweet a picture of you and the fam on vaca or the new boat you just bought.  Obviously, you wouldn't post that you work for the government in your 'About Me' information or anything that would significantly link you to a clandestine profession. But, what this research is suggesting is that we can make accurate predictions on a user's lifestyle/personal habits based on information that they aren't making readily available. So, perhaps I scan your friends on Facebook and most of them make sense except a few that are located in D.C. and that information coupled with some of your spending habits that I've observed on your Twitter/Instagram accounts yields a suggestion that you travel to cities that are outside of your job's requirements and your standard of living is significantly different from what it should be. Certainly there are other explanations for these observations. However, another insight that this research suggests is that "types" of people act in similar patterns. So, sure all of these insights into your buying history and friendships would be normal ordinarily. However, when compared with a test set of other CIA agents' info, we come up with a 95% likelihood that you are, in fact, a CIA agent. Cover blown. Your kids and wife are taken hostage and Arnold Schwarzenegger is called in to come and save you.

This is a bit of a reach, but I think the underlying coversation here is very significant. If Big Data analysis is allowing businesses, governments, or (God forbid) terrorists to gain useful insights into aspects of our lives that we aren't intentionally sharing... what could that mean?

Saturday, April 6, 2013

Armchair Activism and an Equals Sign

 Undoubtedly, if you are a Facebook user you have been witness to a lot of profile picture changes within the last few weeks. Specifically, around March 26 when the Human Rights Campaign challenged their followers to change their profile pictures to one of the image below (the far left is most popular, and the latter two were for giggles).

 In an article posted on the Fast Company website, an overview is given to some of the analysis that the Facebook Data Science Team cooked up.  You can find it here. What's especially interesting about this event is how it's given researchers some significant insights into activism and how different demographics respond. The team observed that 120% more users (than the previous Tuesday) changed their profile picture over the course of a day. As you can see below, after applying a time-series model, the data shows a very obvious, positive trend.
The team used the changes to indicate the "stance" of each user on the marriage-equality issue. This resulted in giving the team data on the gender of "activists" as well as their age. Even more interestingly, it gave them geographic information in the form of frequency per county (below). Wouldn't you love to have access to their numbers? To read the full break down, visit here.
Further, as we've discussed in class, there is a lot that one can learn from the images themselves. I was interested that the team didn't do any data extraction on the actual images. I think that one reason might be that as the images were saved and re-saved as they transferred from user to user, the quality of the picture degraded (as you can see below). Thus, pixel data may have been skewed. But, I ask the question because I observed a lot of people I know changing their profile pictures in support of Proposition 8 (the legislation in question) which is in opposition of equal marital rights (as defined as man and woman). So, the mere fact that profile pictures were changing doesn't necessarily (to me) represent a full indication of the frequency of support of one side or the other. My observations were that people were changing their picture (in large part) as a response to what others were doing.
 And you also have people (like me) that chose to use the tense climate to recognize things that are TRULY significant... like the fact that April is Mathematics Appreciation Month which coincidentally had a strong association with the symbols being used in this virtual human rights rally.

In closing, I think that the truly significant and telling statistics would be things like:
  • The amount of people that changed their profile picture and are registered to vote.
  • Or, that have ever written/called/have heard of their state elected officials.
  • Or, made any other action whatsoever outside of clicking "edit profile picture".
Pardon my cynicism, and let me explain. I have seen time and again microcosms of the same event transpire on campus. We have a very active and vocal student body that have some great things to say in regards to: tuition, state appropriations, academic excellence, etc. However, if I were to weigh the amount of times I've seen people post an uninformed, aggressive comment on Social Media against the amount of times I've seen said individuals at a Board of Trustee meeting, SGA Senate meeting, or University Senate meeting... the scale would bottom-out. I wholeheartedly believe that for our country to move past this climate of bipartisanship, we will have to engage in informed, healthy debate. And while social media is an excellent platform for this to take place, ultimately policy is decided by appointed officials so it's our duty to first be an informed electorate, vote for the best candidate, and hold them accountable to their actions by keeping our voice known to them directly.

Tuesday, March 26, 2013

March Madness Visualization

This year's NCAA Men's Basketball Tournament has made history by something that has never happened before. For the first time in the Tournament's history, a 15 seed (Florida Gulf Coast) has made the Sweet 16.    It is generally a given that your 1 and 2 seeds will do well in the tournament, regardless of the fact that they are generally playing a 20+ win team.  The 64 teams in the first round of the tournament were awarded a bid by the NCAA committee as a result of their successful season.  Even though we often see teams like Florida Gulf Coast, or Harvard, who did not receive that much exposure throughout the season, why are they generally expected to be inferior to the top seeds?  To get a better idea of this, we collected data from and created a motion chart to determine if there was any possible correlation between the amount of regular season wins a team (or seed in the tournament) and the amount of wins they have in the March Madness Tournament.  To accomplish this we first compared regular season wins to seeding. Does the amount of regular season wins affect the seeding a team is awarded?  We concluded that regular season wins do not necessarily result in a higher seeding, and attributed that to strength of schedule differences.  We then tried to determine if seedings had an affect on tournament wins by analyzing the motion chart and looking for patterns or similarities.  We also ruled out that, in general, the seeding does not reflect the consistent success in the tournament. Even though first and second seeds often see multiple combined wins in the tournament, they were not consistent enough to say with confidence how many wins to expect for each seed outside of 4 combined wins (By analyzing the 16 seeds, we see they have never won a march madness game, leading us to conclude that the first seeds will all win their opening round game.).

Visualization by Carter Astin and Russell Champion
Feel free to comment below

Thursday, March 21, 2013

Simple: Personal Big Data analytics for you finances

Most of the applications of Big Data we've discussed are focused on mass amounts of people over time. Simple, a retail bank opmtimized for you mobile device makes their users into the analysts.

As a Simple user, you essentially replace your bank with yourself in many of the traditional capacities a bank operates in.

As you spend, you can tag your expenditures with hashtags "#" and text to describe where you were, what you were doing, and who you were with. Then, when you look into your history, you can click search a specific hashtag and observe your spending habits on this topic. What's great is that the hashtags can be completely unique to you and don't depend on anyone else being able to understand what they mean and therefore your spending doesn't need to fit into a category generated by a bank.

Another really cool aspect of this business is savings. You can set a goal to do something like "save $400 for a vacation to Aspen" and set a deadline. Your account automatically saves money out of your account each day so that you can't spend it (unless you really need to, of course). Simple separates itself from other similar entities (like Mint) by its unique approach to data collection -

"Simple receives much richer data than third-party tools. With better data, Simple can help you do things like categorize your transactions, understand your spending in real time, and make all your activity searchable. Our Reports feature gives you detailed analysis of your finances over time. Budgeting is also much easier when it's right in your account." (Simple FAQ -

What's really cool is their Reports feature as mentioned above. You can view buying history based on keywords or hashtags and compare it against other things. So, for you parents out there - you can tell little Tommy that you do in fact spend more money on him than his sister Susie.

I think that this is such a cool idea because it gives a user more freedom to create their own tool. One of the main focuses of lean is to externalize operations to make a system/process improve. But, it seems as though in this case our desire to be more efficient has really limited our ability. This model really begs the question - what other areas of life could we see this applying to?

The information I have on this business is limited to their website and an article I read because you have to be "invited" to join Simple. Check it out here.  Make sure you watch the video first (it's only like 3 minutes) to get a feel for what this company is all about.

Sunday, March 3, 2013

Big Data in Energy Saving

The Big Data applications in all kinds of industries have started to be cure for many challenging problems including energy problems in US.
President  Barack Obama  emphasized on the problem and  set a new goal for America during his State of the Union address 2 weeks ago. He challenged states and municipalities, homeowners and businesses, to do more with less when it comes to energy consumption.
“Let’s cut in half the energy wasted by our homes and businesses over the next 20 years,” 
said Obama, adding that states that stepped forward with the best ideas would get financial support from the federal government to make it happen.

According to the article  the buildings are particularly ripe for the picking, accounting for well more than 40 per cent of all energy consumed in North America.

Part of the problem, explains Dan Seto, founder and president of Toronto-based CircuitMeter, is that there is a lack of information about how buildings function on a day-to-day, even minute-by-minute basis. He calls commercial buildings “black boxes” – difficult to see inside without the use of expensive energy-monitoring technologies.
“Once you get granularity of information, it opens up the door,” says Seto.
The company has designed a low-cost and relatively easy-to-install device called WebMeter, which can monitor the electricity flowing through up to 36 individual circuits in a building’s circuit board. Readings from these meters are stored on outside computer servers – “the cloud” – and can be accessed and analyzed any time through the Internet. 

The function of the WebMeter is not to lower the energy bill instantly , but it does gather the huge data from the black-box building, then allows to a near-infinite numbers of applications can surf on top of it.
“It puts a living, breathing building at your fingertips so you can start figuring out how that building is operating per square foot or employee,” says Seto.
So, it seems the future of the energy consumption will not be the matter for the countries with the great help of these huge-data-driven  smart systems built in both business and residential buildings.

Monday, January 28, 2013

Prototype on Paper - App Development

Thus far, we have been covering differing methods of data mining using applications such as RapidMiner and Orange. We've begun to discuss the framework associated with extracting relevant data and displaying that in an understandable way. Therefore, the next step will be considering the audience that this information will be shared with, our customer.

We must consider the idea that the amount of people making decisions in politics, business, and service industries are not necessarily skilled statisticians. Nor are they skilled in the tools to extract data as we are. So, the question becomes: How can we allow the user (who is not a mathemetician or statistician) to access relevant information and make decisions based on it without a baby-sitter? Well, in order to answer this question, we must first think like a designer...

First, we need to empathize with the customer/user and understand his/her environment and motivations. Then, focus in on the things that he/she holds as valuable. Next, generate a number of different ideas that vary in order to arrive at a tool that will meet the needs of the customer.

*This idea of design thinking will be something I post about in the near future, but isn't a significant part of the context of what we're discussing. However, it is important to think about if you're considering using this tool to develop a prototype.

So, after we've identified elements of a tool.. what next? We have to prototype and make something, right? Well, what if the answer you've arrived at isn't something you know how to make... say an iOS application?

That's where the Prototype on Paper iOS app comes in.  This application allows you to literally DRAW out exactly how you see an app being mapped out and make it. Thus, an engineer with next to zero knowledge on app development can communicate and show a developer what he's thinking and how he/she arrived at the idea. However, this also suggests a new way to look at app development.

Currently, app development is somewhat of a mystical process to those that aren't in the "know". A great deal of time is spent on them so they can be readily available for mass spread. BUT, what if the market changed from public focus to individual? What if instead of spending months on creating an app for the public, you could make a quick and dirty app that had very few functions, but worked for the small scope that you needed it to? This is a really neat thought and something to definitely talk about more, but for now I'm focusing on the instance where I need to make an app that serves a specific purpose and I want to see how my user will interact.

For example, I'm working with the Lee County Emergency Management Agency on how they approach natural disaster relief. One of the specific areas we're analyzing is how social media is considered. On April 27, 2012 there were a series of horrific tornadoes that swept through our state. Because of the devastating carnage that ensued, 911 operators were tied up and those in peril could not contact anyone to let them know their plight. So, being resourceful, these people turned to social media to let anyone and everyone know what was wrong, where they were, and what they needed. This in effect, created a whole litany of other problems but the one we'll consider for the sake of this conversation was that this information was not going to the right people. Emergency responders were not notified of these people that were in need of help and therefore could not coordinate the proper relief efforts. So, people were rushing to help while wearing flip-flops and t-shirts and then stepping on rusty nails and becoming another victim in the picture. This image leaves us with some very distinct needs. The entity that is coordinating needs to have a picture of what information is traveling over local social media channels and have a way to manage tasks and send correspondence of needs/locations to people that can help.

Thus, I developed an app that will allow these things to happen. And here's how I did it:
  1. Download the app "Prototype on Paper" from iTunes
  2. Using some sort of methodology (I used design thinking as defined by the at Stanford) to develop the "pages" of your app. Just like you would a website.
  3. Launch the app
  4. touch the "+" in the top left-hand corner of the homescreen after you've gone through the tutorial.
  5. Enter a title for your app (or project as it's defined in the app)
  6. Begin by touching the camera in the bottom left-hand corner of the screen
  7. Take a picture of each of your "pages"
  8. On the project screen (this is where all of your pictured pages sit in rows), select one of your pages.
  9. On the top right-hand corner of the screen, touch the "+" that is inside a box. A red square will appear on your screen.
  10. Touch and drag the red square to any place on your page where you intend for the user to touch to engage a new page. Resize by dragging one of the square corners at a time.
  11. After reaching the desired location and size, touch the prompt "Link To".
  12. On the next page, select the page you want that button to go to when pressed by the user. Note the bottom of the current page has 5 different selections for how the transition from one page to the next can occur.
  13. After selecting, press "Done" in the top right-hand corner of the page.
  14. Repeat this process until you have placed links to all the buttons on your drawn pages.
  15. When you're ready to test your app, select the play button on either the top right (when close up to one of your drawn pages) or bottom center (when on the project's main page).
  16. Navigate through your app and take note of anything you've forgotten.
  17. If you forgot to paste a link, pinch your fingers together on the screen and go back to step 9.
  18. Most important step, keep in mind you just threw together a quick and dirty app in like an hour. Now, give it to your user and see how they interact with it. Receive their criticism as an anthropologist, not an analyst. After all, what's to get upset about? You just spent a minimal amount of time creating this super useful tool and all you have to do to change it is erase something and draw something new or touch a few buttons.
I've created a video on my Youtube channel to show how this bad-boy works. See Below

I hope you enjoy!