BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting
The BuildingsBench datasets consist of:
- Buildings-900K: A large-scale dataset of 900K buildings for pretraining models on the task of short-term load forecasting (STLF). Buildings-900K is statistically representative of the entire U.S. building stock.
- 7 real residential and commercial building datasets for benchmarking two downstream tasks evaluating generalization: zero-shot STLF and transfer learning for STLF.
Buildings-900K can be used for pretraining models on day-ahead STLF for residential and commercial buildings. The specific gap it fills is the lack of large-scale and diverse time series datasets of sufficient size for studying pretraining and finetuning with scalable machine learning models. Buildings-900K consists of synthetically generated energy consumption time series. It is derived from the NREL End-Use Load Profiles (EULP) dataset (see link to this database in the links further below). However, the EULP was not originally developed for the purpose of STLF. Rather, it was developed to "...help electric utilities, grid operators, manufacturers, government entities, and research organizations make critical decisions about prioritizing research and development, utility resource and distribution system planning, and state and local energy planning and regulation." Similar to the EULP, Buildings-900K is a collection of Parquet files and it follows nearly the same Parquet dataset organization as the EULP. As it only contains a single energy consumption time series per building, it is much smaller (~110 GB).
BuildingsBench also provides an evaluation benchmark that is a collection of various open source residential and commercial real building energy consumption datasets. The evaluation datasets, which are provided alongside Buildings-900K below, are collections of CSV files which contain annual energy consumption. The size of the evaluation datasets altogether is less than 1GB, and they are listed out below:
1. ElectricityLoadDiagrams20112014
2. Building Data Genome Project-2
3. Individual household electric power consumption (Sceaux)
4. Borealis
5. SMART
6. IDEAL
7. Low Carbon London
A README file providing details about how the data is stored and describing the organization of the datasets can be found within each data lake version under BuildingsBench.
Citation Formats
TY - DATA
AB - The BuildingsBench datasets consist of:
- Buildings-900K: A large-scale dataset of 900K buildings for pretraining models on the task of short-term load forecasting (STLF). Buildings-900K is statistically representative of the entire U.S. building stock.
- 7 real residential and commercial building datasets for benchmarking two downstream tasks evaluating generalization: zero-shot STLF and transfer learning for STLF.
Buildings-900K can be used for pretraining models on day-ahead STLF for residential and commercial buildings. The specific gap it fills is the lack of large-scale and diverse time series datasets of sufficient size for studying pretraining and finetuning with scalable machine learning models. Buildings-900K consists of synthetically generated energy consumption time series. It is derived from the NREL End-Use Load Profiles (EULP) dataset (see link to this database in the links further below). However, the EULP was not originally developed for the purpose of STLF. Rather, it was developed to "...help electric utilities, grid operators, manufacturers, government entities, and research organizations make critical decisions about prioritizing research and development, utility resource and distribution system planning, and state and local energy planning and regulation." Similar to the EULP, Buildings-900K is a collection of Parquet files and it follows nearly the same Parquet dataset organization as the EULP. As it only contains a single energy consumption time series per building, it is much smaller (~110 GB).
BuildingsBench also provides an evaluation benchmark that is a collection of various open source residential and commercial real building energy consumption datasets. The evaluation datasets, which are provided alongside Buildings-900K below, are collections of CSV files which contain annual energy consumption. The size of the evaluation datasets altogether is less than 1GB, and they are listed out below:
1. ElectricityLoadDiagrams20112014
2. Building Data Genome Project-2
3. Individual household electric power consumption (Sceaux)
4. Borealis
5. SMART
6. IDEAL
7. Low Carbon London
A README file providing details about how the data is stored and describing the organization of the datasets can be found within each data lake version under BuildingsBench.
AU - Emami, Patrick
A2 - Graf, Peter
DB - Open Energy Data Initiative (OEDI)
DP - Open EI | National Renewable Energy Laboratory
DO - 10.25984/1986147
KW - energy
KW - power
KW - short-term
KW - load forecasting
KW - buildings
KW - deep learning
KW - pretraining
KW - transfer learning
KW - benchmark
KW - dataset
KW - EULP
KW - end use load profiles
KW - STLF
KW - residential
KW - commercial
KW - machine learning
KW - processed data
LA - English
DA - 2018/12/31
PY - 2018
PB - National Renewable Energy Laboratory
T1 - BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting
UR - https://doi.org/10.25984/1986147
ER -
Emami, Patrick, and Peter Graf. BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting. National Renewable Energy Laboratory, 31 December, 2018, Open Energy Data Initiative (OEDI). https://doi.org/10.25984/1986147.
Emami, P., & Graf, P. (2018). BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting. [Data set]. Open Energy Data Initiative (OEDI). National Renewable Energy Laboratory. https://doi.org/10.25984/1986147
Emami, Patrick and Peter Graf. BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting. National Renewable Energy Laboratory, December, 31, 2018. Distributed by Open Energy Data Initiative (OEDI). https://doi.org/10.25984/1986147
@misc{OEDI_Dataset_5859,
title = {BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting},
author = {Emami, Patrick and Graf, Peter},
abstractNote = {The BuildingsBench datasets consist of:
- Buildings-900K: A large-scale dataset of 900K buildings for pretraining models on the task of short-term load forecasting (STLF). Buildings-900K is statistically representative of the entire U.S. building stock.
- 7 real residential and commercial building datasets for benchmarking two downstream tasks evaluating generalization: zero-shot STLF and transfer learning for STLF.
Buildings-900K can be used for pretraining models on day-ahead STLF for residential and commercial buildings. The specific gap it fills is the lack of large-scale and diverse time series datasets of sufficient size for studying pretraining and finetuning with scalable machine learning models. Buildings-900K consists of synthetically generated energy consumption time series. It is derived from the NREL End-Use Load Profiles (EULP) dataset (see link to this database in the links further below). However, the EULP was not originally developed for the purpose of STLF. Rather, it was developed to "...help electric utilities, grid operators, manufacturers, government entities, and research organizations make critical decisions about prioritizing research and development, utility resource and distribution system planning, and state and local energy planning and regulation." Similar to the EULP, Buildings-900K is a collection of Parquet files and it follows nearly the same Parquet dataset organization as the EULP. As it only contains a single energy consumption time series per building, it is much smaller (~110 GB).
BuildingsBench also provides an evaluation benchmark that is a collection of various open source residential and commercial real building energy consumption datasets. The evaluation datasets, which are provided alongside Buildings-900K below, are collections of CSV files which contain annual energy consumption. The size of the evaluation datasets altogether is less than 1GB, and they are listed out below:
1. ElectricityLoadDiagrams20112014
2. Building Data Genome Project-2
3. Individual household electric power consumption (Sceaux)
4. Borealis
5. SMART
6. IDEAL
7. Low Carbon London
A README file providing details about how the data is stored and describing the organization of the datasets can be found within each data lake version under BuildingsBench.},
url = {https://data.openei.org/submissions/5859},
year = {2018},
howpublished = {Open Energy Data Initiative (OEDI), National Renewable Energy Laboratory, https://doi.org/10.25984/1986147},
note = {Accessed: 2025-04-27},
doi = {10.25984/1986147}
}
https://dx.doi.org/10.25984/1986147
Details
Data from Dec 31, 2018
Last updated Jan 11, 2024
Submitted May 30, 2023
Organization
National Renewable Energy Laboratory
Contact
Patrick Emami
904.962.8293
Authors
Research Areas
Keywords
energy, power, short-term, load forecasting, buildings, deep learning, pretraining, transfer learning, benchmark, dataset, EULP, end use load profiles, STLF, residential, commercial, machine learning, processed dataDOE Project Details
Project Name Laboratory Directed Research and Development (LDRD)
Project Number 08GO28308