We all know that data mining has many extremely useful applications as this blog discusses a variety of them. In looking to expand my knowledge on the subject, I always look for topics on data mining different than the ones we discuss in class, one being using machine learning techniques to combat online fraudulence. The article states that most algorithms designed to detect fraudulence follow anywhere from 175 to 225 questions or rules. Like the rest of the world, those committing fraud are constantly changing and evolving, which does not present any good news to those trying to prevent it from happening. Ex-Google employees consequently sought to develop a new approach that would detect fraud before it occurs. They have developed the Sift system which actively applies to sites, creating millions of connections of fraudulent behaviors. New insights are already being developed as a result of this new tool. Such insights include but are not limited to the statistic that Yahoo users are five times more likely to create a fake email account than those that use G-mail.
More effective data mining as a result of machine learning will soon, if not already, out-perform existing agencies looking to detect fraudulent practices. Though these traditional techniques have worked in the past, the constant barage of information uploaded to the web will soon allow many criminals to fall through the cracks. Teaching a machine to essentially question online users based on individual activities will revolutionize the detection process, and hopefully deter hackers from trying to manipulate the internet, decreasing online fraud altogether. This will be especially useful to government agencies as well. It only makes sense that hackers continually change and adapt in order to remain anonymous. Previous systems designed to protect the public from fraudulence are adapting at a pace must slower than hackers. Consequently, fraudulence is not going anywhere. This Sift system is a huge break through in machine learning because it utilizes the predictive capabilities of the concept in way that can save the United States alone hundreds of millions of dollars a year as well as banks and the general public.
Link to article:
http://gcn.com/Articles/2013/03/26/Sift-Science-machine-learning-anti-fraud.aspx?Page=1
I really like this post because it shows the potential of machine learning. It seems like we could catch cases of fraud; however, one of the biggest issues revolving around this idea is the case of a false positive. In a case of a false positive, there would be a huge waste of resources in order to take action. Also, in the case of a false positive, the machine would learn something that is false to be true. (which would corrupt future readings. There has to be a very careful system put in place in order to avoid these 2 failures.
ReplyDeleteLike Greg said, this issue with the idea of machine learning to detect online fraud is accurate detection. Too much time and money would be wasted trying to take action on an innocent person if the software gives a false positive.
ReplyDeleteI do, however, think this issue is one that can be dealt with in the near future (or possibly already has). I believe that people can be patterned. Cases of fraud would likely have patterns as well, such as certain websites or themes of sites, cities/countries of origin, time of the year, etc. Once these detection systems become more advanced, I think fraud can not only be detected, but predicted.
I'm going to be an outlier here and say that the hackers and what not can be viewed as a positive as well.
ReplyDeleteFor example: jailbreaking an iPhone, which allows the user more control over their phone, as well as installation of 3rd party apps that do not have to meet Apple's AppStore criteria. These teams are constantly finding holes in Apple's iOS and exploiting them, while back at headquarters, Apple is steadily working on patching these holes. So, one can say that due to jailbreaking, it has made Apple 'improve' their operating system at a faster pace than say - if nobody was hacking it.
Going back to machine learning and fraudulence, if the frauds didn't happen, the algorithm wouldn't get any better. In a super ideal situation, I guess if enough happened, and you could predict every one, then that'd be a different story.
Note: I'm in no way advocating hacking/anarchy.