We have seen Big Data utilized for several
big computing applications, and it will now serve for computing for the
Large Hadron Collider (LHC). It’s
because the European Organization for Nuclear Research (CERN) is facing a tough
challenge in the discovery of a particle consistent with the Higgs Boson. CERN has to keep the LHC online for
several months, in order to complete this discovery, but this is going to have
huge computing needs. They need to collect a lot of event data to have a
statistical chance of seeing the same outcome enough times to prove that what
they have found is real.
Currently,
they are operating at 4 TeV per beam in each direction, which means a collision
generates a sum of 8 TeV. The discovery of Higgs Boson is important for IT
sector as well as this will open up the data of huge data computing works for
it. Going forward, the discovery of the new particle will require intense analysis
work, and hence more data.
When
asked about the current data processing requirements for the LHC experiments,
CERN reported,
“When the LHC is working,
there are about 600 million collisions per second. But we only record here
about one in ten trillion. If you were to digitize all the information from a
collision in a detector, it’s about a petabyte a second or a million gigabytes
per second.
There is a lot of filtering
of the data that occurs within the 25 nanoseconds between each bunch crossing
(of protons). Each experiment operates their own trigger farm – each consisting
of several thousand machines – that conduct real-time electronics within the
LHC.
Out of all of this comes a
data stream of some few hundred megabytes to 1 GB per second that actually gets
recorded in the CERN data centre, the facility we call ‘Tier Zero’.”
Such
massive data calculations are possible using Big Data that enables fast
production of results.
In order to study more about the
universe with the Large Hadron Collider (LHC) using Big Data technology is
vital for information analysis, according to CERN's Openlab CTO Sverre Jarp. Speaking
at the Big Data Warehousing and Business Intelligence 2012 conference in Sydney
this week, Jarp told delegates that physics researchers need to measure
electrons and other elementary particles inside the LHC at Geneva, Switzerland.
"These particles fly at
practically the speed of light in the LHC so you need several metrics in order
to study them," he said. "When these collide, they give tremendous
energy to the secondary particles that come out."
CERN Openlab uses a mix
of tape and disk technology to store this large amount of research data.
"Today, the
evolution of Big Data has been such that we can put one terabyte of data on one
physical disk or tape cartridge," he said.
"We are safely in
the domain of petabytes and we are moving to exabytes in the future."
When asked why the LHC
generates so much data, he explained that each particle detector has millions
of sensors but the data they sense is "very unstructured."
"A particle may
have passed by a sensor in the LHC and this happens at the incredible speed of
40 megahertz or 40 million times per second."
Despite using a mix of
disk and tape data storage technology, CERN Openlab experiences disk failures
every day.
"We have a team
walking around the centre exchanging bad disks for good ones and hoping the
storage technology we use is good enough for keeping everything alive," he
said.
Jarp advised CIOs and IT
managers to get their unstructured data into a structured form as quickly as
possible.
"Big Data management and
analytics require a solid organisational structure at all levels," he
said.
"A change in corporate
culture is also required. Our community started preparing for Big Data more
than a decade before real physics data arrived."
Jarp added that he estimates the
LHC will run for another 15 to 20 years with exabytes of data to be generated
and stored.
Reference:
No comments:
Post a Comment