Dynamic World training dataset for global land use and land cover categorization of satellite imagery

DOI

The Dynamic World Training Data is a dataset of over 5 billion pixels of human-labeled ESA Sentinel-2 satellite image, distributed over 24000 tiles collected from all over the world. The dataset is designed to train and validate automated land use and land cover mapping algorithms. The 10m resolution 5.1km-by-5.1km tiles are densely labeled using a ten category classification schema indicating general land use land cover categories. The dataset was created between 2019-08-01 and 2020-02-28, using satellite imagery observations from 2019, with approximately 10% of observations extending back to 2017 in very cloudy regions of the world. This dataset is a component of the National Geographic Society - Google - World Resources Institute Dynamic World project.The dataset consists of two file types: GeoTIFF files of 510x510 pixel 10m resolution satellite image tiles markup provided by human labelers, and Excel (.xlsx) tables of metadata and class statistics for the above GeoTIFF files. The data is organized into three main folders. One folder contains training data labeled by a team of 25 expert human labelers recruited by National Geographic Society specifically for this project. A second folder contains training data labeled by a larger group of commissioned labelers provided by a commercial crowd-labeler service. The data in these folders is organized by hemisphere and biome number from the RESOLVE Ecoregions2017 biomes categories (https://ecoregions2017.appspot.com/). A third folder contains a validation dataset. This is a holdout set of training data for assessing model accuracy. None of this data is intended to be used in the formulation of the model. Each validation tile was independently labeled by three experts. The validation set contains two versions: the individual markup from each expert labeler, and the image composites of the individual markups.Each GeoTIFF file encodes information on the location of landscape feature classes as determined by a given labeler. Classes were labeled by visual examination of true color (RGB) composites of Sentinel-2 MultiSpectral Level-2A scenes. The Tier 1 class values used in this phase of the project are as follows: 0 No data (left unmarked), 1 Water, 2 Trees, 3 Grass, 4 Flooded Vegetation, 5 Crops, 6 Scrub, 7 Built Area, 8 Bare Ground, 9 Snow/Ice, 10 Cloud. This dataset does not include the original Sentinel-2 imagery tiles, but metadata on the exact image ID and date is provided The original Sentinel-2 imagery was obtained via Google Earth Engine.This data is available under a Creative Commons BY-4.0 license and requires the following attribution: This dataset is produced for the Dynamic World Project by National Geographic Society in partnership with Google and the World Resources Institute. Development of the Dynamic World training data was funded in part by the Gordon and Betty Moore Foundation.

Identifier
DOI https://doi.org/10.1594/PANGAEA.933475
Related Identifier IsSupplementTo https://doi.org/10.1038/s41597-022-01307-4
Related Identifier IsDocumentedBy https://doi.org/10.5281/zenodo.4766508
Metadata Access https://ws.pangaea.de/oai/provider?verb=GetRecord&metadataPrefix=datacite4&identifier=oai:pangaea.de:doi:10.1594/PANGAEA.933475
Provenance
Creator Tait, Alexander M; Brumby, Steven P; Hyde, Samantha Brooks; Mazzariello, Joseph; Corcoran, Melanie
Publisher PANGAEA
Publication Year 2021
Rights Creative Commons Attribution 4.0 International; https://creativecommons.org/licenses/by/4.0/
OpenAccess true
Representation
Resource Type Dataset
Format text/tab-separated-values
Size 10 data points
Discipline Earth System Research
Spatial Coverage (-174.282W, -55.508S, 179.626E, 80.850N)
Temporal Coverage Begin 2017-03-28T00:00:00Z
Temporal Coverage End 2019-12-12T00:00:00Z