"Womp Womp! Your browser does not support canvas :'("

NLR HPC Eagle GPU Node Metrics

Publicly accessible License 

Ganglia node metrics and iLO (Integrated Lights Out) power data captured from six representative Eagle GPU nodesThe Eagle HPC operated at NLR from 2019 through 2024. Eagle was a 2,000-node, 8-petaflop system. This dataset is a representative sample of metrics for 6 of the GPU nodes. Each GPU node contained 2 CPUs and 2 GPUs. Data provided in compressed CSV format.Ganglia and iLO Power Time Series Fields ts:  Timestampdv:  Device / Node - Rack and Unit - r103u17 == r(ack)103u(nit)17mt:  Metric (only present for Ganglia)vl:  Value - Value in watts for iLO power (instantaneous value at sampling time) or specified Ganglia metric belowGanglia MetricsMetric name -- Metric description -- Unitcpu_aidle -- Percent of time since boot idle CPU -- Percentcpu_idle -- Percent CPU idle -- Percentcpu_nice -- Percent CPU nice -- Percentcpu_speed -- Speed in MHz of CPU -- MHzcpu_user -- Percent CPU user -- Percentcpu_wio -- The percentage of CPU Wait I/O -- Percentgpu0_bar1_memory -- Used GPU bar1 memory -- MBgpu0_decoder_util -- GPU decoder utilization -- Percentgpu0_ecc_db_error -- Total ECC error counts for the GPU -- Numbergpu0_encoder_util -- GPU encoder utilization -- Percentgpu0_fan -- Fan speed -- RPMgpu0_fb_memory -- Used GPU framebuffer memory -- MBgpu0_graphics_clock_report -- Current clock speeds for the device -- MHzgpu0_mem_total -- Memory total -- MBgpu0_mem_util -- Memory utilization -- Percentgpu0_power_usage_report -- Power usage report -- Wattsgpu0_temp -- GPU 1 temperature -- Celsiusgpu1_bar1_memory -- Used GPU bar1 memory -- MBgpu1_decoder_util -- GPU decoder utilization -- Percentgpu1_ecc_db_error -- Total ECC error counts for the GPU -- Numbergpu1_encoder_util -- GPU encoder utilization -- Percentgpu1_fan -- Fan speed -- RPMgpu1_fb_memory -- Used GPU framebuffer memory -- MBgpu1_graphics_clock_report -- Current clock speeds for the GPU -- MHzgpu1_mem_total -- Memory total -- MBgpu1_mem_util -- Memory utilization -- MBgpu1_power_usage_report -- Power usage report -- Wattsgpu1_temp -- GPU 1 temperature -- Celsiusipmi_cpu1_temp -- CPU 1 temperature -- Celsiusipmi_cpu2_temp -- CPU 2 temperature -- Celsiusipmi_inlet_ambient_temp -- Temperature measured at intake -- Celsiusipmi_vr_p1_temp -- CPU 1 voltage regulator temperature -- Celsiusipmi_vr_p2_temp -- CPU 2 voltage regulator temperature -- Celsiusmem_buffers -- Amount of buffered memory -- Bytesmem_cached -- Amount of cached memory -- Bytesmem_free -- Amount of available memory -- Bytesmem_shared -- Amount of shared memory -- Bytesmem_total -- Amount of available memory -- Bytes

Citation Formats

TY - DATA AB - Ganglia node metrics and iLO (Integrated Lights Out) power data captured from six representative Eagle GPU nodesThe Eagle HPC operated at NLR from 2019 through 2024. Eagle was a 2,000-node, 8-petaflop system. This dataset is a representative sample of metrics for 6 of the GPU nodes. Each GPU node contained 2 CPUs and 2 GPUs. Data provided in compressed CSV format.Ganglia and iLO Power Time Series Fields ts:  Timestampdv:  Device / Node - Rack and Unit - r103u17 == r(ack)103u(nit)17mt:  Metric (only present for Ganglia)vl:  Value - Value in watts for iLO power (instantaneous value at sampling time) or specified Ganglia metric belowGanglia MetricsMetric name -- Metric description -- Unitcpu_aidle -- Percent of time since boot idle CPU -- Percentcpu_idle -- Percent CPU idle -- Percentcpu_nice -- Percent CPU nice -- Percentcpu_speed -- Speed in MHz of CPU -- MHzcpu_user -- Percent CPU user -- Percentcpu_wio -- The percentage of CPU Wait I/O -- Percentgpu0_bar1_memory -- Used GPU bar1 memory -- MBgpu0_decoder_util -- GPU decoder utilization -- Percentgpu0_ecc_db_error -- Total ECC error counts for the GPU -- Numbergpu0_encoder_util -- GPU encoder utilization -- Percentgpu0_fan -- Fan speed -- RPMgpu0_fb_memory -- Used GPU framebuffer memory -- MBgpu0_graphics_clock_report -- Current clock speeds for the device -- MHzgpu0_mem_total -- Memory total -- MBgpu0_mem_util -- Memory utilization -- Percentgpu0_power_usage_report -- Power usage report -- Wattsgpu0_temp -- GPU 1 temperature -- Celsiusgpu1_bar1_memory -- Used GPU bar1 memory -- MBgpu1_decoder_util -- GPU decoder utilization -- Percentgpu1_ecc_db_error -- Total ECC error counts for the GPU -- Numbergpu1_encoder_util -- GPU encoder utilization -- Percentgpu1_fan -- Fan speed -- RPMgpu1_fb_memory -- Used GPU framebuffer memory -- MBgpu1_graphics_clock_report -- Current clock speeds for the GPU -- MHzgpu1_mem_total -- Memory total -- MBgpu1_mem_util -- Memory utilization -- MBgpu1_power_usage_report -- Power usage report -- Wattsgpu1_temp -- GPU 1 temperature -- Celsiusipmi_cpu1_temp -- CPU 1 temperature -- Celsiusipmi_cpu2_temp -- CPU 2 temperature -- Celsiusipmi_inlet_ambient_temp -- Temperature measured at intake -- Celsiusipmi_vr_p1_temp -- CPU 1 voltage regulator temperature -- Celsiusipmi_vr_p2_temp -- CPU 2 voltage regulator temperature -- Celsiusmem_buffers -- Amount of buffered memory -- Bytesmem_cached -- Amount of cached memory -- Bytesmem_free -- Amount of available memory -- Bytesmem_shared -- Amount of shared memory -- Bytesmem_total -- Amount of available memory -- Bytes AU - Clark DB - Open Energy Data Initiative (OEDI) DP - Open EI | National Laboratory of the Rockies DO - KW - HPC KW - ESIF KW - power KW - node usage KW - GPU LA - English DA - 2026/01/29 PY - 2026 PB - National Laboratory of the Rockies T1 - NLR HPC Eagle GPU Node Metrics UR - https://data.openei.org/submissions/8617 ER -
Export Citation to RIS
Clark. NLR HPC Eagle GPU Node Metrics. National Laboratory of the Rockies, 29 January, 2026, NREL. https://data.nrel.gov/submissions/301.
Clark. (2026). NLR HPC Eagle GPU Node Metrics. [Data set]. NREL. National Laboratory of the Rockies. https://data.nrel.gov/submissions/301
Clark. NLR HPC Eagle GPU Node Metrics. National Laboratory of the Rockies, January, 29, 2026. Distributed by NREL. https://data.nrel.gov/submissions/301
@misc{OEDI_Dataset_8617, title = {NLR HPC Eagle GPU Node Metrics}, author = {Clark}, abstractNote = {Ganglia node metrics and iLO (Integrated Lights Out) power data captured from six representative Eagle GPU nodesThe Eagle HPC operated at NLR from 2019 through 2024. Eagle was a 2,000-node, 8-petaflop system. This dataset is a representative sample of metrics for 6 of the GPU nodes. Each GPU node contained 2 CPUs and 2 GPUs. Data provided in compressed CSV format.Ganglia and iLO Power Time Series Fields\ ts: \ Timestampdv: \ Device / Node - Rack and Unit - r103u17 == r(ack)103u(nit)17mt: \ Metric (only present for Ganglia)vl: \ Value - Value in watts for iLO power (instantaneous value at sampling time) or specified Ganglia metric belowGanglia MetricsMetric name -- Metric description -- Unitcpu_aidle -- Percent of time since boot idle CPU -- Percentcpu_idle -- Percent CPU idle -- Percentcpu_nice -- Percent CPU nice -- Percentcpu_speed -- Speed in MHz of CPU -- MHzcpu_user -- Percent CPU user -- Percentcpu_wio -- The percentage of CPU Wait I/O -- Percentgpu0_bar1_memory -- Used GPU bar1 memory -- MBgpu0_decoder_util -- GPU decoder utilization -- Percentgpu0_ecc_db_error -- Total ECC error counts for the GPU -- Numbergpu0_encoder_util -- GPU encoder utilization -- Percentgpu0_fan -- Fan speed -- RPMgpu0_fb_memory -- Used GPU framebuffer memory -- MBgpu0_graphics_clock_report -- Current clock speeds for the device -- MHzgpu0_mem_total -- Memory total -- MBgpu0_mem_util -- Memory utilization -- Percentgpu0_power_usage_report -- Power usage report -- Wattsgpu0_temp -- GPU 1 temperature -- Celsiusgpu1_bar1_memory -- Used GPU bar1 memory -- MBgpu1_decoder_util -- GPU decoder utilization -- Percentgpu1_ecc_db_error -- Total ECC error counts for the GPU -- Numbergpu1_encoder_util -- GPU encoder utilization -- Percentgpu1_fan -- Fan speed -- RPMgpu1_fb_memory -- Used GPU framebuffer memory -- MBgpu1_graphics_clock_report -- Current clock speeds for the GPU -- MHzgpu1_mem_total -- Memory total -- MBgpu1_mem_util -- Memory utilization -- MBgpu1_power_usage_report -- Power usage report -- Wattsgpu1_temp -- GPU 1 temperature -- Celsiusipmi_cpu1_temp -- CPU 1 temperature -- Celsiusipmi_cpu2_temp -- CPU 2 temperature -- Celsiusipmi_inlet_ambient_temp -- Temperature measured at intake -- Celsiusipmi_vr_p1_temp -- CPU 1 voltage regulator temperature -- Celsiusipmi_vr_p2_temp -- CPU 2 voltage regulator temperature -- Celsiusmem_buffers -- Amount of buffered memory -- Bytesmem_cached -- Amount of cached memory -- Bytesmem_free -- Amount of available memory -- Bytesmem_shared -- Amount of shared memory -- Bytesmem_total -- Amount of available memory -- Bytes}, url = {https://data.nrel.gov/submissions/301}, year = {2026}, howpublished = {NREL, National Laboratory of the Rockies, https://data.nrel.gov/submissions/301}, note = {Accessed: 2026-03-13} }

Details

Data from Jan 29, 2026

Last updated Jan 29, 2026

Submitted Jan 29, 2026

Organization

National Laboratory of the Rockies

Contact

Struan Clark

Authors

Clark

National Laboratory of the Rockies

Keywords

HPC, ESIF, power, node usage, GPU

DOE Project Details

Project Number DE-AC36-08GO28308

Share

Submission Downloads