Foraging traits, lengths and 3D movement trajectories of coral reef fishes obtained via stereo-video in Eilat, Gulf of Aqaba, Red Sea


We used remote underwater stereo-video footage and AI driven object tracking to assess the functional foraging traits and movement trajectories of benthic herbivorous fishes on a degraded model coral reef.Sampling took place on the reef in front of the Inter-University Institute for Marine Sciences (IUI) (29°30'7.0"N, 34°55'3.7"E) in Eilat (Israel, Gulf of Aqaba) between 8th and 14th of March 2018. In preparation for the surveys, calibrated stereo-video setups, each consisting of 2 GoPro (4 x Hero 5 and 2 x Hero 4) cameras, were mounted on a total of 3 racks (Neuswanger et al. 2016, doi:10.1139/cjfas-2016-0010). For each sampling day, racks were sequentially installed at a depth of between 2 to 3 m and set to record continuously. Setting up the cameras was the sole purpose of a dive to minimize the disturbance caused to the site. Sites were chosen based on the criteria that a variety of grazable substratum (not just live coral) must be present, as there are a range of micro-habitats within the grazable substrate for fishes that require specific categorisation (Green & Bellwood, 2009, Therefore, sites with a heterogenous mixture of available benthic substrate cover such as live coral and epilithic algal turf (EAT) on standing dead coral, bare rocks, coral rubble and sand were generally preferred. Because grazing rates in surgeonfishes are highest during midday, the majority of our filming was; conducted between 11:00 – 15:00 (Montgomery et al. 1989 (doi:10.1016/0022-0981(89)90127-5); Fouda and El-Sayed 1994). The analysable video was accumulated from 15 rack placements and comprised 22.9 hrs of footage in total.At the beginning of each recording, we placed a 1 x 1 m PVC quadrat in front of the cameras. We quantified the substrate cover of each quadrat by taking a long shot photograph. These images were uploaded to the program SketchAndCalc Version 1.1.2 (iCalc Inc), in which the 1 x 1 m quadrat was calibrated so each transformed image contained roughly the same number of cells. This equated to ~1000 cells per image, each being around 5 cm². The images with the canvas imprinted upon them were subsequently exported and annotated with each form of substratum having a corresponding colour. Annotated cells were counted and relative substrate cover (in %) was calculated.We then proceeded to measure fish total length (mm), bite rate (bites per min), and the distance between each consecutive bite (bite distance, in mm) only within the delimited quadrat area during the entirety of the recorded video footage. In total, we recorded 2,386 bites by 23 different fish species (from 11 families). We calculated individual fish mass according to the following formula: mass = aTL^b, where a and b for each species were informed from FishBase ( The initial 15-min of each video, however, were discarded to allow for the fishes to resume normal behaviour after the quadrat was removed and divers left the site. To standardize against time, only the subsequent 45-min of recording were used for analysis of feeding traits in all species ( Foraging traits in the three most common surgeonfishes were determined in the entirety of the recorded footage after the initial acclimation period ( The time at which a single fish entered the quadrat to take bites from substrates until the time when it exited constituted a feeding event. For each feeding event, all bites were collated and then standardized to obtain bites per minute. Further, for each feeding event we averaged the distances between consecutive bites to obtain bite distance. We conducted all measurements in VidSync Version 1.661 (Neuswanger et al. 2016, doi:10.1139/cjfas-2016-0010). For the two surgeonfish species we calculated Manly's feeding ratios (Manly et al. 2002), which illustrate an individual's use of each substrate category (number of bites) in relation to the availability of substrate type across the entire reef.We achieved AI driven fish detection, identification and tracking from stereo-video by performing several steps. Firstly, we calibrated the system in Matlab (TheMathWorks) using a checkboard pattern recorded with both cameras. Next, we performed stereorectification using OpenCV (Open Source Computer Vision Library) to locate pixels in both images and triangulate the depth of the scene. Using this method of calibration we obtained an overall mean [±SD] absolute reprojection error of 0.9 [±1.9] mm which corresponds to 0.45% of the true value.For object detection, we employed the You Only Look Once (YOLO) convolutional neural network (CNN) (Bochkovskiy et al. 2020), which we retrained with background images from the recorded videos to improve its performance. We then used the bounding boxes produced by the detection algorithm as input data for the classifier and stereo matching. To classify the detected fish species, we utilized science-grade location invariant images of identified fish species from iNaturalist ( to train the CNN (Van Horn et al. 2018, doi:10.1109/CVPR.2018.00914; Shepley et al. 2021, doi:10.1002/ece3.7344). However, the iNaturalist dataset had limited images, and therefore we employed transfer learning using weights computed from a previously recorded dataset from Mayotte as a starting point (Villon et al. 2018, doi:10.1016/j.ecoinf.2018.09.007). Finally, we implemented the Deep SORT framework - an enhanced version of the Simple Online and Realtime Tracking (SORT) algorithm - for multi-object tracking (Wojke et al. 2017, doi:10.48550/arXiv.1703.07402). This framework tracked each bounding box in both the left and right videos. Triangulation was performed to retrieve the 3D coordinates of the fish relative to the left camera, and we applied de-noising to remove any erroneous data points. Overall, our approach enabled reliable and automatic object detection and tracking from stereo-video, providing valuable data for studying the behaviour and ecology of the two focal species in their natural habitats.We extracted XYZ coordinates from a subgroup of 16 Acanthurus nigrofuscus and 23 Zebrasoma xanthurum individuals whose automatically measured lengths fell within the manually determined length frequency distribution. These individuals had automatically generated tracks that were precisely cut down to 700 frames, ensuring a standardized and consistent observation period.

Related Identifier
Related Identifier
Related Identifier
Metadata Access
Creator Lilkendey, Julian ORCID logo; Zhang, Jingjing ORCID logo; Barrelet, Cyril; Meares, Michael; Larbi, Houssam; Subsol, Gérard; Chaumont, Marc; Sabetian, Armagan ORCID logo
Publisher PANGAEA
Publication Year 2021
Rights Creative Commons Attribution 4.0 International;
OpenAccess true
Resource Type Bundled Publication of Datasets; Collection
Format application/zip
Size 3 datasets
Discipline Earth System Research
Spatial Coverage (34.917W, 29.502S, 34.919E, 29.505N); Red Sea/Gulf of Aqaba
Temporal Coverage Begin 2018-03-08T16:30:00Z
Temporal Coverage End 2018-03-14T11:08:00Z