Energy Efficient Frequency Scaling on GPUs in Heterogeneous HPC Systems

Kraljic, Karlo; Kerger, Daniel; Schulz, Martin

doi:10.1007/978-3-031-21867-5_1

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13642))

Included in the following conference series:

International Conference on Architecture of Computing Systems

710 Accesses

Abstract

With most major corporations and research institutions having pledged to support sustainability goals for High Performance Computing (HPC), energy efficiency is a critical factor when evaluating heterogeneous HPC systems. However, many popular hardware performance & energy measurement frameworks, such as LIKWID, and benchmarks, such as the STREAM or the hipBone benchmark, do not or not fully support execution on heterogeneous systems containing AMD or NVIDIA Graphical Processing Units (GPUs), leading to a gap with regards to the understanding the relationship between frequency, performance and energy. We aim at closing this gap by extending the performance measurement framework LIKWID to support both AMD and NVIDIA GPUs. We run the STREAM and hipBone benchmark on AMD and NVIDIA GPUs at different GPU core frequencies. We show that the minimum period between two measurements for our GPU is at least 100ms and that GPUs have a sweet spot with regards to energy consumption at approximately 75% of their maximum frequency with energy savings up to 30% at a performance overhead between 0.72% and 3.12%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

NVML API Reference Guide: GPU Deployment and Management Documentation. http://docs.nvidia.com/deploy/nvml-api/index.html
Advanced Simulation and Computing: Coral-2 benchmarks (15062022). https://asc.llnl.gov/coral-2-benchmarks
AMD: Radeonopencompute/rocm_smi_lib: Rocm smi lib (27062022). https://github.com/RadeonOpenCompute/rocm_smi_lib
AMD: Rocm-developer-tools/rocprofiler: Roc profiler library. profiling with perf-counters and derived metrics (27062022). https://github.com/ROCm-Developer-Tools/rocprofiler
Bailey, D., Harris, T., Saphir, W.: The NAS parallel benchmarks 2.0 (1995)
Google Scholar
Collange, C., Defour, D., Tisserand, A.: Power consumption of GPUs from a software perspective. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2009. LNCS, vol. 5544, pp. 914–923. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01970-8_92
Chapter Google Scholar
Coplin, J., Burtscher, M.: Energy, power, and performance characterization of GPGPU benchmark programs. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1190–1199 (2016). https://doi.org/10.1109/IPDPSW.2016.164
Dongarra, J., Heroux, M.A., Luszczek, P.: High-performance conjugate-gradient benchmark: a new metric for ranking high-performance computing systems. Int. J. High Perform. Comput. Appl. 30(1), 3–10 (2016). https://doi.org/10.1177/1094342015593158
Article Google Scholar
ECP Proxy Applications: Ecp proxy applications (16062022). https://proxyapps.exascaleproject.org/
Hackenberg, D., Oldenburg, R., Molka, D., Schone, R.: Introducing firestarter: a processor stress test utility. In: 2013 International Green Computing Conference Proceedings. IEEE (2013). https://doi.org/10.1109/igcc.2013.6604507
Hong, S., Kim, H.: An integrated GPU power and performance model. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, pp. 280–289. ISCA 2010, Association for Computing Machinery, New York (2010). https://doi.org/10.1145/1815961.1815998
McCalpin, J.D.: Memory bandwidth and machine balance in high performance computers (1995)
Google Scholar
Kasichayanula, K., Terpstra, D., Luszczek, P., Tomov, S., Moore, S., Peterson, G.D.: Power aware computing on GPUs. In: 2012 Symposium on Application Accelerators in High Performance Computing, pp. 64–73 (2012). https://doi.org/10.1109/SAAHPC.2012.26, iSSN: 2166-515X
Kozhokanova, A.: Papi: Performance API introduction & overview (17062022). https://www.vi-hps.org/cms/upload/material/tw39/PAPI.pdf
Mucci, P.J., Browne, S., Deane, C., Ho, G.: PAPI: a portable interface to hardware performance counters. In: In Proceedings of the Department of Defense HPCMP Users Group Conference, pp. 7–10 (1999)
Google Scholar
MVAPICH: Mvapich 2-2.3.6-userguide (15062022). http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3.6-userguide.pdf
NVIDIA: nvidia-smi documentation. https://developer.download.nvidia.com/com-pute/DCGM/docs/nvidia-smi-367.38.pdf
NVIDIA: Nvidia hpc-benchmarks — nvidia ngc (15062022). https://catalog.ngc.nvidia.com/orgs/nvidia/containers/hpc-benchmarks
Payvar, S., Pelcat, M., Hämäläinen, T.D.: A model of architecture for estimating GPU processing performance and power. Des. Autom. Embedded Syst. 25(1), 43–63 (2021). https://doi.org/10.1007/s10617-020-09244-4
Article Google Scholar
Petitet, A., Whaley R. C., Dongarra, J., Cleary A.: Hpl - a portable implementation of the high-performance linpack benchmark for distributed-memory computers (862019). https://www.netlib.org/benchmark/hpl/
Mucci, P. J., Browne, S., Deane, C., Ho, G.: PAPI: A Portable Interface to Hardware Performance Counters (1999)
Google Scholar
Reddy Kuncham, G.K., Vaidya, R., Barve, M.: Performance study of GPU applications using SYCL and CUDA on tesla V100 GPU. In: 2021 IEEE High Performance Extreme Computing Conference (HPEC). IEEE (2021). https://doi.org/10.1109/hpec49654.2021.9622813
Ren, D.Q., Suda, R.: Modeling and estimation for the power consumption of matrix computation on multi-core platform. In: 2009 International Joint Conference on Computational Sciences and Optimization. vol. 1, pp. 42–46 (2009). https://doi.org/10.1109/CSO.2009.451
SPEC: Spec benchmarks (14062022). https://www.spec.org/benchmarks.html
Terpstra, D., Jagode, H., You, H., Dongarra, J.: Collecting performance data with PAPI-C. In: Müller, M.S., Schulz, A., Nagel, W.E., Resch, M. (eds.) Tools for high performance computing 2009, vol. 14, pp. 157–173. Springer, Cham (2010). https://doi.org/10.1007/978-3-642-11261-4_11
Chapter Google Scholar
Treibig, J., Hager, G., Wellein, G.: LIKWID: lightweight performance tools. In: 2010 39th International Conference on Parallel Processing Workshops, pp. 207–216 (2010). https://doi.org/10.1109/ICPPW.2010.38, http://arxiv.org/abs/1104.4874, arXiv: 1104.4874
Wang, Q., Li, N., Shen, L., Wang, Z.: A statistic approach for power analysis of integrated GPU. Soft. Comput. 23(3), 827–836 (2019). https://doi.org/10.1007/s00500-017-2786-1
Article Google Scholar

Download references

Author information

Authors and Affiliations

Hewlett Packard Enterprise, Herrenberger Straße 140, 71034, Böblingen, Germany
Karlo Kraljic & Daniel Kerger
Technical University of Munich, Chair of Computer Architecture and Parallel Systems, Boltzmannstraße 3, 85748, Garching, Germany
Karlo Kraljic & Martin Schulz

Authors

Karlo Kraljic
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Kerger
View author publications
You can also search for this author in PubMed Google Scholar
Martin Schulz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karlo Kraljic .

Editor information

Editors and Affiliations

Technical University of Munich, Garching, Germany
Martin Schulz
Technical University of Munich, Heilbronn, Germany
Carsten Trinitis
Chalmers University of Technology, Gothenburg, Sweden
Nikela Papadopoulou
Otto-von-Guericke-Universität Magdeburg, Magdeburg, Germany
Thilo Pionteck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kraljic, K., Kerger, D., Schulz, M. (2022). Energy Efficient Frequency Scaling on GPUs in Heterogeneous HPC Systems. In: Schulz, M., Trinitis, C., Papadopoulou, N., Pionteck, T. (eds) Architecture of Computing Systems. ARCS 2022. Lecture Notes in Computer Science, vol 13642. Springer, Cham. https://doi.org/10.1007/978-3-031-21867-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-21867-5_1
Published: 14 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21866-8
Online ISBN: 978-3-031-21867-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Energy Efficient Frequency Scaling on GPUs in Heterogeneous HPC Systems