Friday, February 15, 2013

Closing Tax Loopholes Using Big Data Analysis



Closing Tax Loopholes Using Big Data Analysis
As tax season is currently in full swing I have began to see more and more articles on the internet about strange tax deductions that people have made in the past. Many of these deductions are legal due to loopholes that the person found that maybe the Internal Revenue Service didn’t know about. If just one person was exploiting these loopholes then it really wouldn’t make that much of a difference but when it starts getting into the thousands  or millions of people then it really starts to add up to a large amount of money. 

              
Since the US government is trying to get every dollar they can these days and many cuts are being discussed I think it is important for all possible savings to be considered. This means less people have to lose their jobs which would be much better for the economy. Since taxes are the major income source for the government, closing up loopholes could result in much higher amounts money flowing in which would reduce the amount of cuts that would have to be made.
               
The real problem with most loopholes is that they are hard to find. However, the process of finding these loopholes could be much easier using the Big Data Analysis techniques we have used so far. This blog is meant to show how with relatively simple word searches using big data analysis software the internal revenue service could find rare deductions and trace them to the loophole that allows for such a deduction. Then they could close that loophole so that it doesn’t occur again.

You might ask, “How would they use big data to find this?” So far most of the Keyword searches that we have done have been to find the words that appear the most but in this case we would want to look for the ones that appear the least. Since we are just trying to find the loophole and not the person exploiting it, we can cleanse the data to only include itemized tax deductions from the tax returns collected during tax season. This will illuminate all of the names and standard information that appears on all tax returns and just narrow it down to what you really want to see.  


This process would actually become easier each year as more loopholes are eliminated but also as technology increases. The biggest reason for this becoming easier each year is that more people are e-filing their taxes so the information will already be in an electronic format. Once it is in electronic format a program can be used to pull just the itemized deductions portion from each return and insert them into a file. Then a program such as Rapid Miner can be used to look for the words that are used the least. Many of these words won’t mean anything but certain words will be easily spotted by training internal revenue professionals. 

This is not the only place the government could use big data to help get us out of the current budget crisis but this would be a great place to start. Once the huge savings are realized just off of taxes big data analysis will be used in several other areas which could also lead to huge savings.

3 comments:

  1. This may be a dumb question, but can you explain why it makes sense to search for least-used words? You said in the first paragraph that savings only added up if people using a certain loophole numbered in the thousands so how would looking for the least common words find loopholes that many people are using? It would seem to me that this would only find the individual using a certain loophole and not catch the ones many are using. Also, is there a way to search returns that are not e-filed and are only on paper? Perhaps someone has made some sort of image scanning software that could scan a pdf to make out each individual word, convert it to a string and then mine the file full of the individual strings.

    ReplyDelete
  2. The reason for searching for the low numbers are that people exploit the loopholes using different deductions. For example one person may count their cat food while another may say dog food. Those would show up as low numbers but would be expoiting the same loophole. (Please note that these aren't legal deductions but just using that show the point.) In legal deductiions businesses can deduct pretty much anything on their taxes and while many of these will show up on thousands of tax returns this will be very low on the deduction reasons list in a word search. When you are talking about millions of returns, a few thousand is a small number of occurances but that will be backed by $millions in lost tax revenue. As for searching the paper filed returns: Yes there are several text scanning programs that could be used. Each of these would have to be scanned in but since someone is already evaluating the return then they could go ahead and scan in each return as it was processed. Honestly this software is probably already being used to scan them in already so the data is probably already there.

    ReplyDelete
  3. Ok thanks for explaining that, I guess I just did not know that much about tax returns. I figured everyone would use the same language for the same loophole and it wasn't possible for people to use different language for one loophole. Thanks again for clearing that up I learned something new today.

    ReplyDelete