On Processing Extreme Data

Dana Petcu, Gabriel Iuhasz, Daniel Pop, Domenico Talia, Jesus Carretero, Radu Prodan, Thomas Fahringer, Ivan Grasso, Ramon Doallo, Maria J. Martin, Basilio B. Fraguela, Roman Trobec, Matjaz Depolli, Francisco Almeida Rodriguez, Francisco de Sande, Georges Da Costa, Jean-Marc Pierson, Stergios Anastasiadis, Aristides Bartzokas, Christos Lolis, Pedro Goncalves, Fabrice Brito, Nick Brown


Extreme Data is an incarnation of Big Data concept distinguished by the massive amounts of data that must be queried, communicated and analyzed in near real-time by using a very large number of memory or storage elements and exascale computing systems. Immediate examples are the scientific data produced at a rate of hundreds of gigabits-per-second that must be stored, filtered and analyzed, the millions of images per day that must be analyzed in parallel, the one billion of social data posts queried in real-time on an in-memory components database. Traditional disks or commercial storage nowadays cannot handle the extreme scale of such application data. Following the need of improvement of current concepts and technologies, we focus in this paper on the needs of data intensive applications running on systems composed of up to millions of computing elements (exascale systems). We propose in this paper a methodology to advance the state-of-the-art. The starting point is the definition of new programming paradigms, APIs, runtime tools and methodologies for expressing data-intensive tasks on exascale systems. This will pave the way for the exploitation of massive parallelism over a simplified model of the system architecture, thus promoting high performance and efficiency, offering powerful operations and mechanisms for processing extreme data sources at high speed and/or real time.


Full Text: PDF


  • There are currently no refbacks.