Saturday, March 16, 2013

CERN's Technology for Information Analysis


We have seen Big Data utilized for several big computing applications, and it will now serve for computing for the Large Hadron Collider (LHC). It’s because the European Organization for Nuclear Research (CERN) is facing a tough challenge in the discovery of a particle consistent with the Higgs Boson. CERN has to keep the LHC online for several months, in order to complete this discovery, but this is going to have huge computing needs. They need to collect a lot of event data to have a statistical chance of seeing the same outcome enough times to prove that what they have found is real.

Currently, they are operating at 4 TeV per beam in each direction, which means a collision generates a sum of 8 TeV. The discovery of Higgs Boson is important for IT sector as well as this will open up the data of huge data computing works for it. Going forward, the discovery of the new particle will require intense analysis work, and hence more data.
When asked about the current data processing requirements for the LHC experiments, CERN reported,
“When the LHC is working, there are about 600 million collisions per second. But we only record here about one in ten trillion. If you were to digitize all the information from a collision in a detector, it’s about a petabyte a second or a million gigabytes per second.
There is a lot of filtering of the data that occurs within the 25 nanoseconds between each bunch crossing (of protons). Each experiment operates their own trigger farm – each consisting of several thousand machines – that conduct real-time electronics within the LHC.
Out of all of this comes a data stream of some few hundred megabytes to 1 GB per second that actually gets recorded in the CERN data centre, the facility we call ‘Tier Zero’.”
Such massive data calculations are possible using Big Data that enables fast production of results.
In order to study more about the universe with the Large Hadron Collider (LHC) using Big Data technology is vital for information analysis, according to CERN's Openlab CTO Sverre Jarp. Speaking at the Big Data Warehousing and Business Intelligence 2012 conference in Sydney this week, Jarp told delegates that physics researchers need to measure electrons and other elementary particles inside the LHC at Geneva, Switzerland.
"These particles fly at practically the speed of light in the LHC so you need several metrics in order to study them," he said. "When these collide, they give tremendous energy to the secondary particles that come out."
CERN Openlab uses a mix of tape and disk technology to store this large amount of research data.
"Today, the evolution of Big Data has been such that we can put one terabyte of data on one physical disk or tape cartridge," he said.
"We are safely in the domain of petabytes and we are moving to exabytes in the future."
When asked why the LHC generates so much data, he explained that each particle detector has millions of sensors but the data they sense is "very unstructured."
"A particle may have passed by a sensor in the LHC and this happens at the incredible speed of 40 megahertz or 40 million times per second."
Despite using a mix of disk and tape data storage technology, CERN Openlab experiences disk failures every day.
"We have a team walking around the centre exchanging bad disks for good ones and hoping the storage technology we use is good enough for keeping everything alive," he said.
Jarp advised CIOs and IT managers to get their unstructured data into a structured form as quickly as possible.
"Big Data management and analytics require a solid organisational structure at all levels," he said.
"A change in corporate culture is also required. Our community started preparing for Big Data more than a decade before real physics data arrived."
Jarp added that he estimates the LHC will run for another 15 to 20 years with exabytes of data to be generated and stored.

Reference:

No comments:

Post a Comment