Frequently Asked Questions
If your question is not answered here, please contact us.
Accessing Data
While cloud computing costs may vary, downloading data is free for end users. This can be achieved with or without an AWS account using the AWS CLI or manually via data-catalog viewers. Additionally, certain datasets are accessible free of charge via Highly Scalable Data Service (HSDS) and Jupiter Notebooks. For more complex operations, use AWS Athena for a small fee, often less than $1 USD. Combine with AWS SageMaker Studio Lab to run machine learning algorithms against the data at no cost to end users.
*Host account must grant users access to nodes through Cognito.
Data can be downloaded free of charge manually or using the AWS CLI. See AWS CLI tutorial.
Data Submission Requirements
To increase public access to the results of federally funded scientific research, as required by the DOE Public Access Plan (PDF).
The principal investigator (PI). The PI must submit data and metadata for his or her funded projects. The PI may delegate this responsibility to another researcher or analyst, but the contact information for the submission should correspond with the principal organization.
As specified in your Data Management Plan and/or per your agreement with your DOE Project Officer. At minimum, data should be submitted as soon as it is considered the final data for your funded project, based on when you published your results or delivered the data. Be sure to allow additional time at the end of your project for the data submission process and to address any questions or issues that arise as a result of data submission.
As specified in your Data Management Plan and/or per your agreement with your DOE Project Officer. At minimum, data should be submitted as soon as it is considered the final data for your funded project, based on when you published your results or delivered the data. Be sure to allow additional time at the end of your project for the data submission process and to address any questions or issues that arise as a result of data submission.
What to Submit
The DOE OEDI Data Lake was established to receive, manage and make available all marine and hydrokinetic-relevant data generated from projects funded by the DOE Water Power Programs. This includes data from DOE-funded projects associated with any portion of the marine and hydrokinetic project life-cycle (e.g. resource characterization, device development, demonstration), as well as data produced by DOE-funded research.
Every submission is different. The resources included in your submission should tell the complete story of your data. For example, a complete data package would include the raw data, the final result or polished data product, and a summary document or link to a published paper explaining the results, methods used, and any assumptions or external factors relevant to the creation of the data.
Preferred formats are those that support the best reusability; however, OEDI accepts a variety of file formats and will, in most cases, accept your submission in whatever format you wish to provide it. For data available in multiple formats, please consider the following guideline when choosing which format to submit. The tiers below are arranged in order of increasing inherent reusability:
Tier 1
Good
unstructured data
- Powerpoint
- image
- etc.
Tier 2
Better
structured data
- Excel
- CSV
- XML
- etc.
Tier 3
Best
structured + standardized data
- data or content model
- Standardized Excel, CSV, XML, RDF, JSON, etc.
Any personally identifiable information, business proprietary information, or copyrighted material should NOT be submitted.
Personally Identifiable Information (PII) is any piece of information or combination of pieces that could be used to compromise the identity of an individual. A person's name alone is not considered PII, especially in the case of attribution. Contact information, such as email and home addresses, should not appear any submitted data. A submitter's contact information is required, but will only be used for questions about the data submission. Contact information for organizations is ok, including office email, the office address, coordinates, and phone and fax numbers. Personal information, such as home telephone numbers, email and home addresses, and birth dates is not allowed. Furthermore, private information, such as social security numbers, bank account numbers, passport and driver's license numbers, is expressly forbidden. All submissions should be purged of PII prior to submission.
Business Proprietary Information (limited rights) should also not be included in the data submitted. Data submitted will eventually be made available to the public. Data subject to copyright, business arrangement, publication or purchase agreement, and all data not authorized for eventual public release should not be uploaded.
Copyrighted Material of any kind, including journal articles, should not be uploaded to the catalog. When publicly available elsewhere, these data can, however, be linked to, if permanently hosted on other sites, using the Add Link
button.
How to Submit Data
Data can be submitted as a single, consolidated submission or in multiple submissions. An individual submission can contain an unlimited number of data resources (files and links), but each resource must have a unique name within the submission. Submissions should be grouped into logical sets, associating like data together so that elements necessary for the comprehension of a resource are not in a different submission. If needed, a previous submission may always be linked to from a newer submission as one of its resources.
Combining resources by zipping or archiving should only be done when the resources are of little use individually. For example, the zipping of individual shapefile components into a single shapefile resource is strongly encouraged. Zipping is also recommended when submitting large quantities of files which are otherwise unable to be adequately organized. In this case, a separate file describing the structure and contents of the files should be included to allow ease of navigation.
In ways that will allow your colleagues, clients, sponsors, and others to easily find and use your data. Think of your submission as a communication between you and your peers in the greater scientific and research communities. Consider these questions when completing the submission form:
- What is in the data file(s)?
- When, where, why and how was the data was captured/collected?
- Are the units for the data obviously and unambiguously labeled?
- What would someone need to know to use the data properly?
- Are there any assumptions, proprietary software requirements, or other prerequisites to using the data?
Select multiple files from your computer for simultaneous upload:
- Click
Add Files
to open a window showing files on your computer. - To select more than one file at a time, hold
Control
(PC) orcommand
(Mac) while selecting files with your mouse, or press and holdShift
and use the arrow keys to select multiple contiguous files. - When you're done selecting press
Open
orChoose
(actual button will differ depending on your browser and operating system).
This will upload the file and allow you to complete the file-specific information in the form. - Clicking
Add Files
again will allow you to select and upload more files. - After adding files and/or links, click on
Add info
andAdd location
to enter additional required information.
Simply link to the file using the fields provided after clicking the Add Link
button.
The link you submit must be a permanent URI (i.e. a URL that leads directly to a resource and does not pass through a search page or require more than one click to navigate to the data).
The process for creating an archive file (also known as a compressed file) can differ from machine to machine, depending on your operating system and the software you have installed.
The following archive file formats are preferred: .zip, .gz, .tar, .tgz.
Windows: Locate and select the files you would like to archive. Right-click one of the selected files and chose Send to
, then click Compressed (zipped) folder
.
The new archive (compressed folder) is created in the same location. This is what you will want to upload.
Mac: Locate and select the files you would like to archive. Right-click one of the selected files and chose Compress x Items
, where x is the number of items you've selected.
The new archive (compressed folder) is created in the same location. This is what you will want to upload.
There are no limits to filesize or number of files per submission. However, larger files may be difficult to upload over some internet connections, especially shared connections. If you have concerns about your file size or are having trouble uploading a large file, please contact us.
As soon as it has undergone the data curation process, which typically takes less than two weeks. For an update on your data, please contact us.
You may save progress to your submission incrementally using the 'save' feature, which is found in the bottom right corner of the submission page. When you you are satisfied with the contents of your submission, click 'submit' and the curation process will be initiated.
Even after your submission is officially 'submitted,' you are able to make changes to your submission up until it is made publicly available. After that, you should create a new submission and reference your original submission in the description of the new submission. Include a link to your original submission using the Add Link
button in the new submission.
Moratoriums apply to entire submissions. All accompanying files will be subject to the moratorium. To expose select files at different times, they must be in separate submissions.
Metadata and Data Curation
Metadata refers to data and information that describe other data. Metadata summarizes your data so that others can easily find and work with them. Many of the metadata fields requested are required to meet data management guidelines from DOE, GSA, and other government agencies. These requirements are designed to promote the discovery of your data, increase their exposure to the scientific community, and enable their proper use.
Data curation is a process we perform to help maintain data submissions for long periods of time to preserve viability, relevance, and usefulness. Curation is a phase of the submission process during which our curators review the metadata provided with each data submission for accuracy, completeness, and relevance to the submitted resources. In some cases, curators may contact submitters to resolve any discrepancies or omissions detected during the review.
Search engine results are impacted by many factors, including the metadata provided to describe your published submission. The attribution of thorough, comprehensive metadata, including detailed descriptions, keywords, author names, and location information, will make a data submission more visible to search engines and other researchers. The keywords and descriptions provided should connect your contributions with a broader audience, including those searching with basic terminology. The metadata provided then follows your submission to PRIMRE, Data.gov, the DOE Data Explorer, and many other data-catalogs, providing essential context to anyone interested in your data. Complete metadata helps ensure that your team is properly credited for your work and can lead to new collaboration opportunities.