Friday, April 5, 2013

Prognostics & Diagnostics with Big Data



Prognostics & Diagnostics with Big Data




                Damage prognostic’s is the future of structural health monitoring. Damage prognosis (DP) attempts to forecast system performance by assessing the current damage state of the system, which looks at the future loading environments for that system, and predicting through simulation and past experience the remaining useful life of the system according to the Philosophical Transactions of The Royal Society (http://rsta.royalsocietypublishing.org/content/365/1851/623.full).  To fully maximize the potential, damage prognostics model will require further development and the integration of measurement/processing/telemetry hardware with deterministic and probabilistic predictive models. While DP is in its infancy there is tremendous potential for life-safety and economic benefits. 

                DP has applications that could be used in most engineered structures and mechanical systems including civil infrastructure, civil defense hardware, and commercial aerospace systems.  Airframe and jet engine manufacturers are using a business model to allow the customer to assess damage and predict when the damage will reach some critical level that will require corrective action. The owners can better plan scheduled maintenance tasks.  Within Civil infrastructure, there is a great need for prognosis of large building and bridges after a large-scale event such as an earthquake, flood, or tornado. The structural condition assessments will help confidently predict how these structures can be repaired and offer a safe condition for public use. The helicopter industry has used vibration testing and data trending for predictive maintenance for several years and has been proven to increase the rotor component life up to 15%. One factor that allows for rotor blades to be accurately tested is that the rotor speed is typically held at a nominal speed. Other prognostic examples in aircraft include the T55 engine in the CH-45 helicopter. This paper (http://www.impact-tek.com/Resources/TechnicalPublicationPDFs/MaintenanceManagement/Impact_MM_MR_ADEPTFinal_color.pdf) identifies innovative diagnostic, prognostic and maintenance reasoning technologies focused on significantly reducing operational and support requirements. 

                Another avenue for prognostics and diagnostics can be found with Condition Based Maintenance (http://www.arl.army.mil/www/default.cfm?page=980) (CBM). CBM is increasing the Army’s weapon system readiness and is resulting in huge maintenance costs savings for the Army. The Army can save millions in eliminating maintenance test flights and corrective maintenance tasks. Reliability Web.com expounds on how cloud technology can be used for Condition-Based Monitoring (http://reliabilityweb.com/index.php/articles/cloud-technology_condition_based_monitoring/). Condition Based Monitoring is the core of how you can perform condition-based maintenance. Vibration analysis, motor current signature, ultrasound and oil analysis, can be used to help assess the help of machinery and predict future failures. Some advantages of condition-based monitoring include improved system reliability, increased production, decreased maintenance costs, and less human intervention with less human error influence. Using cloud technology with condition based monitoring will allow anyone with approved access rights to the data from anywhere in the world, without having to add additional expensive infrastructure. It also makes it easier to monitor remote installations – pipelines, offshore drilling rigs, small facilities with limited resources. Types of cloud services include: Software as a Service (SaaS), Platform as a Service (PaaS), or Infrastructure as a Service (IaaS). Each form of cloud services are different based on the support requirements. 


                As the Army and private sector businesses with large machines expound the technology of Condition-Based Maintenance, this will allow smaller businesses to reap the benefits as well. The future is bright when it comes to automobile maintenance, lawn-mower, and other small machine preventative maintenance. Hopefully big businesses and the automatic preventative maintenance tasks that the automobile industry mandates will stand in the way. There is a lot of money to be saved by the consumer without having to risk the reliability of the vehicle when implementing Condition-Based Maintenance. 


Other Condition Based Maintenance articles - http://alvarestech.com/temp/milton2012/rcm.pdf




A Practical Implementation of Big Data in Retail



     So far, I have seen several examples where big data can be used as well as how it can be analyzed to help various businesses.  These are important aspects of big data, but I recently came across a real world concrete example of how to implement a big data system.  The article, “How to Implement a Big Data System” by Jean-Pierre Dijcks, lays out the infrastructure of something the author took from a keynote use case called Smartmall.  While I do not pretend to completely understand the inner workings of their business model, the many illustrations contribute to a high level understanding of the process.  

High Level Smartmall Infrastructure


According to the author, the main goals of Smartmall are:

·  Increase store traffic within the mall.
·  Increase revenue per visit and per transaction.
·  Reduce the non-buy percentage.

     The vehicle that drives this process is the customer smartphone.  Smartphones incorporate GPS technology that, when tapped into, can be used to locate individual customers.  When coupled with what the author calls “loyalty cards” (which track information such as buying habits, coupon usage, etc.), a customer profile can be indexed.  This index is referenced whenever the customer, and their smartphone, enter or get near the store.  While this is not completely anonymous, it is much less invasive than many social media sites.  The data collected from this system, which utilizes map reduce algorithms, can be used to more accurately meet customer needs or mail out coupons to specific customers for items tailored specifically for them.  If done over an entire mall, store locations may better be paired based on customer habits.  An example would be that customers who come around meal time wish to shop near the food court.  While that is only an intuitive idea, in-depth data analysis usually sheds light on less intuitive trends, as well as exposes common misconceptions.


High Level plus Batching

    This basic idea of using existing cell phone technology to drive a business model looks like a great idea on paper.  However, I cannot claim the entrepreneurial or technical expertise to significantly evaluate it.  Instead, the associated diagrams, which are mostly high level, seem to make more sense than anything I have read or can write here. Of course, the entire system is self promoting in the sense that it uses the Oracle Big Data Appliance (the article appears in the Oracle Technology Network website).  Further research into Oracle's competitors would be advisable to find the best possible company to contract for specific retailers.  An understanding of their software techniques may also allow for in-house analysis of the data.  Nevertheless, this does provide a good, concrete, example of how to set up a big data network with respect to the retail industry.

Using Map Reduce to Tailor Offers to Customers


Source and Images

 
"How to Implement a Big Data System." Oracle Technology Network. N.p., n.d. Web. 05 Apr. 2013.

Going Beyond the Resume

                While searching for big data information I found a new way companies are looking at data to hire new people. As I always do before writing one of these blogs, I searched through our group of posted blogs and found two recent ones that contained information about using big data analytics in the hiring process. The first was a tutorial about a key word search done by Anto Jeson Raj titled, “Text mining techniques used for hiring process.” The other blog titled, “How Big Data Has Changed HR and Recruiting,” by David Walker discussed how to establish great recruiting methods so that you get the people you want to apply for the jobs that you post. I thought both of these blogs were very informative but wanted to look into using Big Data analytics for hiring from a different standpoint.

                The article I found discussed using computers to evaluate people. The use of computers to narrow down a stack of resumes has been being used for years by eliminating certain ones that do not contain keywords. Now a new way of using big data for the hiring process is being used. Almost everything people do online is recorded somewhere and knowing how to look at the habits of people can help predict what kind of employee they will be.

                For example, from the, “Robot recruiters,” article I read it was found that people who apply for a job using internet browsers not originally installed on the computer, another words not internet explorer, tend to perform better and change jobs less often. According to analysts this shows that you are willing to go the extra step to make your job better and take the time to make more informed decisions. Now this isn’t the case 100 percent of the time but could add an extra step that could help make the final decision between two highly qualified prospects.

                Another interesting example talks about companies using surveys to check the honesty of potential employees. They found that honest people tend to say at their job longer and perform better. The downside to these people is that they tend to be less effective as salespeople. So depending on the job you have to offer you may actually want to hire the person that wasn’t completely honest with you but you can’t count on them to be with the company long.

                Another one that kind of surprised me was a study done by Xerox. It found that the more social a person on the internet is the less likely they are to stay on a particular job.  They came to this by comparing people that used two or less social media sites against those that use four or more social media sites. This would kind of make since to me because most people that belong to four or more social media sites are always wanting to stay current with what is big at the time so they may perform that way in their career as well. These people usually have great careers because they know at least a little about everything and are always on the cutting edge for the newest processes. If a company is looking for someone to do a quick turn around this could be the type of person they should hire but if they want someone that will work hard and always be there for you then you might want to look elsewhere.

                This provides a great new way to make big data analytics useful for hiring people.  This also helps eliminate the problems of using just keyword searches to pick out resumes. One personal instance where the keyword search failed for the group I work for occurred the on the last job posting we had. We posted a job listing for a certain type of engineer and had over 800 applicants but not a single one made it past the computer system because industry doesn’t use the same job title for the type people as we do so the computer elimination process thought that no one met the qualifications. While many people may have been qualified for the job we never got to see them because the computers rejected them.  This shows that as long as the humans are setting up the process there will be mistakes no matter how much information big data analytics can tell you.

If you would like to read the article I did please check out the following link as well as the blogs I discussed in the first paragraph.
"Robot recruiters"
http://www.economist.com/news/business/21575820-how-software-helps-firms-hire-workers-more-efficiently-robot-recruiters
               

               









Big Data and Politics

Using Statistical Analysis in Politics

Last year's election was the first time in my life that I was familiar, or educated, in statistics during a political storm. In Dr. Megahead's Quality Control class, he asked our class to find a poll that was famous for failing. Polls failed for a number of reasons whether it be the sample population, biased questions, or merely just a lack of understanding when it came to sampling in general. After taking Big Data this semester, it's apparent that the power of data mining is a tool that could prove to be extremely useful in predicting election outcomes. I found an interview with author and journalist Sasha Issenberg who describes how data mining is emerging is political analytics.

Issenberg explains that there is a plethora of information on each individual. First, looking at registered voters. Your age, gender, in some states race, as well as location are immediately available to analysts. Take into consideration now the various organizations that may have approached you. That data is also made available, so if you turned down a supporter of the NRA who knocks on your door, somewhere the NRA has this information stored. Issenberg illustrates how data mining in politics works, citing the predictive models of credit stores. Based on past behaviors, how will you behave in the future? Looking at your ability to pay off a loan, or default, whether you charge to a credit card or pay cash, are all indicators of your financial behaviors and earn you a credit score. This concept directly translates to politics. 

Though Issenberg states that analysts do not know much more about what makes voters change their minds, there is more information available to indicate what motivates and individual to cast a ballot. Behavioral psychology has helped to provide this information. In an interesting experiment (read the full description in the article because it is funny), analysts determined that people are considered with their social identities during election times. Issenberg discusses the initial elections occurring in the 21st century, which she credits with the evolution of predictive analytics in the political arena. Below is the link to her video interview and also a transcript of the interview.

http://www.pbs.org/newshour/bb/politics/july-dec12/victorylab_09-14.html

Privacy vs. Scientific Integrity



     

        The internet has undoubtedly changed the way information is shared for the better.  Almost any data, technique, or software a researcher needs can be found instantly online.  Long gone are the days of going to the library to find information that is several years old.  Now, anyone can see a scientific journal or conference paper the second it is published on a website.  As a result, the rate of scientific discovery has grown exponentially.  In fact, even everyday citizens are able to increase the rate of scientific discovery through the internet and mobile apps. Throughout history, the sharing of information has likely played the most important role in the advancement of science.

     As much as the internet has aided in the collection, classification, and analysis of data, the results must still be evaluated to ensure the conclusions are accurate.  Many times, data is analyzed by the very company that is hoping for a particular result.  Bias in data analysis is a very real and common occurrence.  Take for example cigarette companies and their analysis of its health effects.  For years, they amazingly couldn’t find any evidence of cigarettes harmful effects.  I’m sure the fact that their profit came from the sale of cigarettes had no effect on their research.  This is why many conferences and scientific journals require their contributors to provide their data along with their results.  It is very important to eliminate bias and one way to do this is to allow others to analyze the same data and see if they come up with the same conclusion. 

     This is where a new problem is emerging in the world of data analysis.  While the internet has allowed scientific research to make unprecedented strides, the advent of social media and search engines has complicated things.  Sites like Facebook and Google collect massive amounts of data on their users.  Information such as geographic location, age, race, buying or browsing trends, and who our friends are provides companies with extremely valuable marketing data.  Unfortunately, this is where user privacy versus scientific integrity disagrees.  Copyright laws, government legislation, and fierce competition are just a few of the hurdles researchers are encountering with respect to providing data with their research.  This leads to many scenarios where data analysis must be taken at face value without the ability to be validated.  There are many problems with this direction of scientific analysis.  The most obvious problem with private data is that biased research can be presented with little accountability.  When only one entity has access to the data, they can claim any results they wish.  The prevalence of this emerging problem may best be summarized with this quote:
“A recent review found that 44 of 50 leading scientific journals instructed their authors on sharing    data but that fewer than 30 percent of the papers they published fully adhered to the instructions. A 2008 review of sharing requirements for genetics data found that 40 of 70 journals surveyed had policies, and that 17 of those were “weak.””
     At present, there is no easy solution to this problem.  Users will demand that their privacy is respected.  Cell phone records, for example, must remain anonymous to protect people’s rights.  Realistically, a company can’t publicly provide a list of people’s names, addresses, phone numbers, and buying habits without incurring legal problems.  Unfortunately, this also means that any analysis on this data cannot be validated by the scientific community without the burden, financial and otherwise, of collecting the data itself.  It would be infeasible for the scientific community to collect their own data on every piece of research presented. 
     
     In conclusion, the internet has drastically accelerated scientific research and information sharing, but may end up leading to biased and inaccurate, or even fraudulent, results.  What is more important, privacy or scientific integrity?

Sources

TMarkoff, John. "Troves of Personal Data, Forbidden to Researchers." The New York Times. The New York Times, 22 May 2012. Web. 05 Apr. 2013.

Citizen-Research Expands the Rate of Scientific Discovery." The Monroe Institute. N.p., n.d. Web. 05 Apr. 2013.

Big Data and March Madness



 Big Data and March Madness

                Over the past couple of decades, NCAA’s March Madness has gained immense popularity and teams are starting to see how technology can help them better understand the game.  Synergy Sports Technology has filmed over 300 men’s Division 1 basketball teams. They have film on almost all of the 5,700 games played this past season.
                Most people think that basketball analytics has an advantage on improving the offensive numbers, but it actually provides an edge on the defensive side of the ball.  “Inevitably, points scored are going to go down because of the information coaches have at their disposal.” Not only do coaches have lots of information, but they receive it quickly. This allows them to make adjustments during a game rather than a game by game basis. The software analyzes live game video of the basketball games and calculates statistical trends. For example, what a given player does on a pick and roll on the left side of the basket.
                Many people think that because lots of teams are using this technology, the playing field is once again level. But PC Mag believes that analytics has given early adopters a distinct advantage. Synergy Sports first offered its services to Marquette University. Marquette has since gone to the NCAA tournament seven straight times.  The next two programs that used Synergy were UCLA and Kansas.  UCLA went to the Final Four and Kansas won the National Championship.
                The live video statistics industry is going to become a driving force in how teams strategize.  What else can this technology influence? I believe that gamblers will be the next group to benefit from these technologies.