This dataset is composed of 702,111 zooplankton individuals, zooplankton pieces, non-living particles and imaging artefacts, ranging from 300 µm to 3.39 mm Equivalent Spherical Diameter, individually imaged and measured with the ZooCAM (Colas et al., 2018). The objects were sorted in 127 taxonomic and morphological groups. The imaged objects originate from samples collected on the Bay of Biscay continental shelf, in spring, from 2016 to 2019 during the PELGAS ecosystemic surveys (Doray et al., 2018). The samples were collected with a WP2 200 µm mesh size fitted with a Hydrobios (back-run stop) mechanical flowmeter, generally from 100 m depth to the surface, or 5 m above the sea floor (if bottom depth less than 100 m) in vertical hauls, at night. The samples were imaged on board, live, after collection and subsampling, and preserved in 4% buffered formaldehyde seawater. Each imaged object is geolocated, associated to a station, a cruise, a year and other metadata that enable the reconstruction of quantitative zooplankton communities for ecological studies (i.e. Grandrémy et al., 2023a). Each object is described by 52 morphological and grey level based features (8 bits encoding, 0 = black, 255 = white), including size, automatically extracted on each individual image by the ZooCAM software. Each object was taxonomically identified using the ZooCAM software and the web based application Ecotaxa with built-in, random forest and CNN based, semi-automatic sorting tools followed by expert validation or correction (Picheral et al., 2017). Images from 2016-2017 contain ROI bounding box limits, metadata at the bottom of each image, and non-homogenised background within and around the ROI bounding box; Images from 2018 contain non-homogenised background within the ROI bounding box only; images from 2019 have a completely homogeneous and thresholded background around the object. The differences arose from successive ZooCAM software updates that do not modify the calculation of object’s features. This dataset is intended to be used for ecological studies as well as machine learning applied to plankton studies.
The archive contains :
- One tab separated file (PELGAS ZooCAM zooplankton dataset) containing all data and metadata associated to each imaged and identified object. Metadata and features are in columns (n =72) and objects are in rows (n = 702,111).
- One comma separated file containing the name, type, definition and unit of each field (column) in the .tsv (dataset descriptor zoocam).
- One comma separated file containing the taxonomic list of the dataset, with counts and nature of the content of the category, i.e. “T” for taxonomical category, and “M” for morphological category (taxonomy descriptor zoocam).
- A individual_images directory containing images of each object, named according to the object id objid and sorted in subdirectories according to their taxonomic identification, across years and sampling stations.
- A Map of the sampling station location over the 2016-2019 period.