Frequently Asked Questions
If your question is not answered here, please contact us.
Accessing Data
While cloud computing costs may vary, downloading data is free for end users. This can be achieved with or without an AWS account using the AWS CLI or manually via data-catalog viewers. Additionally, certain datasets are accessible free of charge via Highly Scalable Data Service (HSDS) and Jupiter Notebooks. For more complex operations, use AWS Athena for a small fee, often less than $1 USD. Combine with AWS SageMaker Studio Lab to run machine learning algorithms against the data at no cost to end users.
*Host account must grant users access to nodes through Cognito.
Data can be downloaded free of charge manually or using the AWS CLI. See AWS CLI tutorial.
Data Submission Requirements
To increase public access to the results of federally funded scientific research, as required by the DOE Public Access Plan (PDF).
The principal investigator (PI). The PI must submit data and metadata for his or her funded projects. The PI may delegate this responsibility to another researcher or analyst, but the contact information for the submission should correspond with the principal organization.
As specified in your Data Management Plan and/or per your agreement with your DOE Project Officer. At minimum, data should be submitted as soon as it is considered the final data for your funded project, based on when you published your results or delivered the data. Be sure to allow additional time at the end of your project for the data submission process and to address any questions or issues that arise as a result of data submission.
As specified in your Data Management Plan and/or per your agreement with your DOE Project Officer. At minimum, data should be submitted as soon as it is considered the final data for your funded project, based on when you published your results or delivered the data. Be sure to allow additional time at the end of your project for the data submission process and to address any questions or issues that arise as a result of data submission.
What to Submit
The DOE OEDI Data Lake was established to receive, manage and make available high-value energy research data generated by the U.S. Department of Energy's Programs, Offices, and National Laboratories. This includes data from DOE-funded projects associated with any energy research area (e.g. solar power, buildings efficiency, grid modernization, etc.), as well as data produced by DOE-funded research.
Every submission is different. The resources included in your submission should tell the complete story of your data. For example, a complete data package would include the raw data, the final result or polished data product, and a summary document or link to a published paper explaining the results, methods used, and any assumptions or external factors relevant to the creation of the data.
Preferred formats are those that support the best reusability; however, OEDI accepts a variety of file formats and will, in most cases, accept your submission in whatever format you wish to provide it. For data available in multiple formats, please consider the following guideline when choosing which format to submit. The tiers below are arranged in order of increasing inherent reusability:
Tier 1
Good
unstructured data
- Powerpoint
- image
- etc.
Tier 2
Better
structured data
- Excel
- CSV
- XML
- etc.
Tier 3
Best
structured + standardized data
- Standardized Data in Excel, CSV, XML, RDF, JSON, etc.
- Browse data standards
Tier 4
Best for Large/Complex Data
structured + standardized + cloud-optimized data lakes
- Cloud-optimized Data in HDF5, GeoParquet, etc.
- Browse data lakes
Any personally identifiable information, business proprietary information, or copyrighted material should NOT be submitted.
Personally Identifiable Information (PII) is any piece of information or combination of pieces that could be used to compromise the identity of an individual. A person's name alone is not considered PII, especially in the case of attribution. Contact information, such as email and home addresses, should not appear any submitted data. A submitter's contact information is required, but will only be used for questions about the data submission. Contact information for organizations is ok, including office email, the office address, coordinates, and phone and fax numbers. Personal information, such as home telephone numbers, email and home addresses, and birth dates is not allowed. Furthermore, private information, such as social security numbers, bank account numbers, passport and driver's license numbers, is expressly forbidden. All submissions should be purged of PII prior to submission.
Business Proprietary Information (limited rights) should also not be included in the data submitted. Data submitted will eventually be made available to the public. Data subject to copyright, business arrangement, publication or purchase agreement, and all data not authorized for eventual public release should not be uploaded.
Copyrighted Material of any kind, including journal articles, should not be uploaded to the catalog. When publicly available elsewhere, these data can, however, be linked to, if permanently hosted on other sites, using the Add Link
button.
How to Submit Data
Data can be submitted as a single, consolidated submission or in multiple submissions. An individual submission can contain an unlimited number of data resources (files and links), but each resource must have a unique name within the submission. Submissions should be grouped into logical sets, associating like data together so that elements necessary for the comprehension of a resource are not in a different submission. If needed, a previous submission may always be linked to from a newer submission as one of its resources.
Combining resources by zipping or archiving should only be done when the resources are of little use individually. For example, the zipping of individual shapefile components into a single shapefile resource is strongly encouraged. Zipping is also recommended when submitting large quantities of files which are otherwise unable to be adequately organized. In this case, a separate file describing the structure and contents of the files should be included to allow ease of navigation.
In ways that will allow your colleagues, clients, sponsors, and others to easily find and use your data. Think of your submission as a communication between you and your peers in the greater scientific and research communities. Consider these questions when completing the submission form:
- What is in the data file(s)?
- When, where, why and how was the data was captured/collected?
- Are the units for the data obviously and unambiguously labeled?
- What would someone need to know to use the data properly?
- Are there any assumptions, proprietary software requirements, or other prerequisites to using the data?
Select multiple files from your computer for simultaneous upload:
- Click
Add Files
to open a window showing files on your computer. - To select more than one file at a time, hold
Control
(PC) orcommand
(Mac) while selecting files with your mouse, or press and holdShift
and use the arrow keys to select multiple contiguous files. - When you're done selecting press
Open
orChoose
(actual button will differ depending on your browser and operating system).
This will upload the file and allow you to complete the file-specific information in the form. - Clicking
Add Files
again will allow you to select and upload more files. - After adding files and/or links, click on
Add info
andAdd location
to enter additional required information.
Simply link to the file using the fields provided after clicking the Add Link
button.
The link you submit must be a permanent URI (i.e. a URL that leads directly to a resource and does not pass through a search page or require more than one click to navigate to the data).
The process for creating an archive file (also known as a compressed file) can differ from machine to machine, depending on your operating system and the software you have installed.
The following archive file formats are preferred: .zip, .gz, .tar, .tgz.
Windows: Locate and select the files you would like to archive. Right-click one of the selected files and chose Send to
, then click Compressed (zipped) folder
.
The new archive (compressed folder) is created in the same location. This is what you will want to upload.
Mac: Locate and select the files you would like to archive. Right-click one of the selected files and chose Compress x Items
, where x is the number of items you've selected.
The new archive (compressed folder) is created in the same location. This is what you will want to upload.
There are no limits to filesize or number of files per submission. However, larger files may be difficult to upload over some internet connections, especially shared connections. If you have concerns about your file size or are having trouble uploading a large file, please contact us.
As soon as it has undergone the data curation process, which typically takes less than two weeks. For an update on your data, please contact us.
You may save progress to your submission incrementally using the 'save' feature, which is found in the bottom right corner of the submission page. When you are satisfied with the contents of your submission, click 'submit' and the curation process will be initiated.
Moratoriums apply to entire submissions. All accompanying files will be subject to the moratorium. To expose select files at different times, they must be in separate submissions.
Metadata and Data Curation
Metadata refers to data and information that describe other data. Metadata summarizes your data so that others can easily find and work with them. Many of the metadata fields requested are required to meet data management guidelines from DOE, GSA, and other government agencies. These requirements are designed to promote the discovery of your data, increase their exposure to the scientific community, and enable their proper use.
Data curation is a process we perform to help maintain data submissions for long periods of time to preserve viability, relevance, and usefulness. Curation is a phase of the submission process during which our curators review the metadata provided with each data submission for accuracy, completeness, and relevance to the submitted resources. In some cases, curators may contact submitters to resolve any discrepancies or omissions detected during the review.
Search engine results are impacted by many factors, including the metadata provided to describe your published submission. The attribution of thorough, comprehensive metadata, including detailed descriptions, keywords, author names, and location information, will make a data submission more visible to search engines and other researchers. The keywords and descriptions provided should connect your contributions with a broader audience, including those searching with basic terminology. The metadata provided then follows your submission to Data.gov, the DOE Data Explorer, and many other data-catalogs, providing essential context to anyone interested in your data. Complete metadata helps ensure that your team is properly credited for your work and can lead to new collaboration opportunities.
Coordinate Reference System (CRS) is a framework used to precisely measure locations on the surface of Earth as coordinates. Maps and other geospatial data files require CRS to be provided for positional accuracy. CRS information can often be found under Sources, within you favorite GIS software.
OEDI currently supports:
- WGS 84 (World Geodetic System 1984)
- EPSG:4326 (WGS 84 Geographic)
- NAD83 (North American Datum 1983)
- NAD27 (North American Datum 1927)
- UTM (Universal Transverse Mercator) Zones
- EPSG:3857 (Web Mercator)
- Mercator Projection
- GDA94 (Geocentric Datum of Australia 1994)
- GDA2020 (Geocentric Datum of Australia 2020)
- ETRS89 (European Terrestrial Reference System 1989)
- ED50 (European Datum 1950)
- GCS_NAD83 (Geographic Coordinate System NAD83)
- GCS_WGS_1984 (Geographic Coordinate System WGS 1984)
- Lambert Conformal Conic
- Albers Equal Area Conic
- Polar Stereographic
- Stereographic Projection
- Other
Updating Existing Data Submissions
Editing of data submissions by users is only allowed during submission and curation. Once a dataset has been published its metadata is shared with OEDI's network of data sharing partners. Any edits after that require republication and must be made by our curation staff in order to preserve data provenance for scientific posterity. Please contact us to make any adjustments after publication.
Once a submission has been published, its metadata has been disseminated to our network of data sharing partners. Any changes require republication. Contact the OEDI curation team and we'll be happy to make the updates for you.
Yes. Our data preservation and provenance model allows for supplemental additions. If your data submission is still in progress or in curation you are free to add resources as you see fit. If it has already been published, you'll need to coordinate with our data curation team.
If your data submission is still in progress or in curation you are free to revise existing resources as you see fit. Once your submission has been published, you will need to reach out to our data curation team. Our data preservation and provenance model prohibits significant changes to the original file, since other users may have already used and cited your data in their own research. We do allow minor changes that do not modify the data itself (e.g., fixing a typo or adding units to a column). If you're unsure of the best solution, contact us, explain the changes you are hoping to make, and we will be happy to help determine the best course of action. Otherwise, we encourage adding an additional file to the submission or creating a new submission. If a new file is added to the existing submission, it will be labeled by our curators as the more recent version. If you create a new submission, include a link to your original submission using the Add Link
button in the new submission.
Favoriting Data Submissions
The Favorites feature allows users two options to stay up to date with relevant datasets.
Starring a dataset will add the dataset to your user profile under Data > My Favorites tab. Starring allows users to have quick access to datasets without the need to manually search.
Subscribing to a dataset will add it to your user profile under Data > My Favorites tab. When a user subscribes to a dataset they will receive emails when changes are made to the dataset. Changes can include addition or deletion of resources, changes to the description, or the release of a dataset previously under moratorium.
Starring allows users to save datasets for future retrieval. Each dataset's page includes a button, denoted with a star icon, with the number of total users who have starred the dataset. Users may toggle starred datasets on the dataset's page directly, or via the Favorites dashboard.
To access your starred datasets: open the Data tab on the navigation bar and select My Favorites.
Subscribing allows users to receive dataset updates via email. To access your subscribed datasets open the Data tab on the navigation bar and select My Favorites. Each dataset's page includes a button, denoted with a bell icon, with the number of total users who have subscribed to the dataset. Users may toggle subscribed datasets on the dataset's page directly, or via the Favorites dashboard.
To access your subscribed datasets: open the Data tab on the navigation bar and select My Favorites.