On Processing Extreme Data

Main Article Content

Dana Petcu
Gabriel Iuhasz
Daniel Pop
Domenico Talia
Jesus Carretero
Radu Prodan
Thomas Fahringer
Ivan Grasso
Ramon Doallo
Maria J. Martin
Basilio B. Fraguela
Roman Trobec
Matjaz Depolli
Francisco Almeida Rodriguez
Francisco de Sande
Georges Da Costa
Jean-Marc Pierson
Stergios Anastasiadis
Aristides Bartzokas
Christos Lolis
Pedro Goncalves
Fabrice Brito
Nick Brown

Abstract

Extreme Data is an incarnation of Big Data concept distinguished by the massive amounts of data that must be queried, communicated and analyzed in near real-time by using a very large number of memory or storage elements and exascale computing systems. Immediate examples are the scientific data produced at a rate of hundreds of gigabits-per-second that must be stored, filtered and analyzed, the millions of images per day that must be analyzed in parallel, the one billion of social data posts queried in real-time on an in-memory components database. Traditional disks or commercial storage nowadays cannot handle the extreme scale of such application data. Following the need of improvement of current concepts and technologies, we focus in this paper on the needs of data intensive applications running on systems composed of up to millions of computing elements (exascale systems). We propose in this paper a methodology to advance the state-of-the-art. The starting point is the definition of new programming paradigms, APIs, runtime tools and methodologies for expressing data-intensive tasks on exascale systems. This will pave the way for the exploitation of massive parallelism over a simplified model of the system architecture, thus promoting high performance and efficiency, offering powerful operations and mechanisms for processing extreme data sources at high speed and/or real time.

Article Details

Section
Overview Papers