We targeted a permanent mesoscale front in the Ligurian Sea (NW Mediterranean) that we repeatedly sampled between January and June 2021 using a SeaExplorer glider equipped with a UVP6, a versatile in situ imager. We aimed to resolve plankton and particle distribution during the spring bloom, to assess whether the front was a location of increased concentration of zooplankton, and if it constrained the distribution of particles. During the 5 months, the glider did more than 5,000 dives and the UVP6 collected 1.1 million images.
Images captured by the UVP6 during cruising (n = 785,405) were imported into the Morphocluster application to quickly detect large clusters of similar objects (e.g. marine snow aggregates). In a second step, images collected during back transects (n = 434,129, on which we focused our analyses) were imported onto the EcoTaxa web application with their Morphocluster label in order to be sorted at a finer scale into taxonomic or morphological groups (marine snow, artefact, badfocus, reflection or unidentifiable) with the help of a supervised machine learning algorithm. As sorting all 400k+ images would have required a multiple months effor, we instead decided to rely on the prediction of a Random Forest classifier fed with both handcrafted and deep features generated by a MobileNet V2 feature extractor previously finetuned on UVP6 data. We selected a RF classifier for the following reasons: RFs tend to produce good classification probability estimates (Niculescu-Mizil and Caruana 2005), they are faster to train than a full CNN stack and, when trained with deep features, they perform as well as a full CNN.
The dataset thus contains the following elements:
- CTD data, some collected by the glider payload, and other collected by a SMRU
- particles data, exported from Ecopart
- plankton data, exported from Ecotaxa. Validated objects were either individually inspected by an operator, or batch validated in the morphocluster application. Predicted classifications were not reviewed.