NVIDIA GPU Performance Monitoring using an Extension for Dynatrace OneAgent

Authors

  • Tomasz Gajger Department of Computer Architecture, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Poland

DOI:

https://doi.org/10.12694/scpe.v21i4.1807

Keywords:

Application Performance Management, GPU Performance Monitoring, Dynatrace, GPGPU, CUDA, NVML

Abstract

This work presents a Dynatrace OneAgent extension for gathering NVIDIA GPU metrics using NVIDIA Management Library (NVML). The extension integrates GPU metrics into an industry-leading platform for Application Performance Management extending its capability of monitoring important business workloads to the GPU-oriented computational nodes. A practical approach for acquiring and processing NVML metrics via  Python bindings is described. The work also proposes and discusses implementation of helper applications for convenient simulation of performance problems in a multi-tier web application. These applications are then used in combination with OneAgent-based monitoring and appropriate configuration of Dynatrace platform for web application monitoring. Next, an end-to-end production-like scenarios are presented, which exemplify extension usefulness in test setup resembling a real world implementation. The extension has been released on GitHub under MIT license.

Downloads

Published

2020-12-20

Issue

Section

Research Papers