72 Hours Part2: Fish Detection & Classification Solution

This is the second instalment to our 2-part blog based on our approach of the Nvidia GPU Hackathon and NOAA Fisheries. Part 1 of the blog described and focused on general introduction and the data used for the hackathon, this part focuses on the algorithms used and the results obtained.

Example from object detector

Object detection

15,000 bounding box annotations over 1,733 images were used by Ai.Fish to finetune the model and create a fish detection system. The object detector was then used to infer on new images from the GroundFish dataset. The image chips were created by cropping the area within the inferred region (examples of this are shown below). Roughly 14,000 image chips were created with this process, with multiple chips coming from the same image, which the team at Lynker Analytics used in the active learning process.

Several images used in the building of the detection and classification system

Active Learning — Classification

In the example of the active learning tool an image of a rockfish is shown, below the image is the model’s predicted class and the confidence of the prediction. On the right is the class menu, from here we could select the correct class (which is also rockfish in this case).

The unclassified_remove class was reserved for species we were not included in our class list at the time of the hackathon.

If an image contained more than a single sea-animal then it would be discarded to avoid training the model on image clips with multiple classes. If an error was made the undo previous option could be used to drop the last correction from the dataset.

The active learning stage was an iterative process and was run several times over the hackathon, with the highest entropy samples shown in the active learning tool being updated each time the inception model was retrained.

Final Image Classification

Though training an EfficientNet model was not essential to complete the project as EfficientNet is a higher performance model than InceptionV3 and it was a relatively easy step improve our final classification accuracy.


Before we get into the results here is a quick refresher on accuracy, precision & recall.

What is Accuracy?

What is Precision?

If our model predicted that were 100 flatfish but there were actually only 80, our precision would be 80%.

(the 80 the model got correct)/(the 80 the model got correct + the 20 the model wrongly predicted as flatfish)

What is Recall?

If there were 100 round fish but our model correctly classified 50 then our recall would be 50%.

(50 it predicted correctly)/(the 50 it predicted correctly + the 50 it missed)

Active Learning System Statistics

Evaluation metrics for the active learning system

The accuracy of the active learning system model which used Inception V3 yielded an accuracy of 78% with the weighted average for both precision and recall also reaching 78%.

For the Skate class we can deduce that from a low recall but a high precision we are only finding 56% of skates (recall) but out of everything classified as skates by the model we were correct 83% (precision) of the time.

For the Urchin class we had a precision of 100% meaning there were no false positives for this class i.e. everything the model predicted as an Urchin was correct. Suggesting that this class is easily differentiated from the other classes, which is true for the most part as it is very easy to tell the difference between an Urchin and a fish.

However, the recall value for the Urchin class was 92%, meaning 8% of actual Urchins were misclassified, using a confusion matrix we can further look into this issue.

Confusion matrix for active learning system

The confusion matrix for the active learning system helps us understand that the model has performed excellently when it comes to certain species which are easily identifiable and have distinct features.

Urchins and Sponges are the best performing classes because the model saw a total of 50 urchins and misclassified only 4 of them (as flatfish, Rockfish, and sponge) and saw a total of 39 sponges and misclassified only 10 of the total sponges (largely as the invertebrate class).

Clipped object detection inferences of Starfish (top) and sponge (bottom)

The model performed poorly and struggled when it came to Shortspine Thornyheads as it of mistook these for rockfish, the cause of this problem was rather obvious as both the species have similar characteristics.

The model saw 79 Shortspine Thornyhead images and only classified 40 of these correctly and misclassified 37 of those as rockfish,

Clipped object detection inferences of Rockfish (top) and Shortspine Thornyhead (bottom)

Classification System Statistics

Evaluation metrics for classification system

The accuracy of the final classification model which used EfficientNet-B4 yielded an accuracy of 86% which was more than the active learning system which used Inception, highlighting the benefit of using the latest state-of-art model. The same dataset yielded an 8% improvement using the EfficientNet model architecture.

The precision and recall per class can be seen in the image above and we can clearly identify that Flatfish were correctly predicted 91% of the time and that the model correctly identified 95% of all the images of Flatfish. Whereas the Shortspine Thornyhead class was also correctly predicted 91% of the time but only 49% of all the total images of Shortspine Thornyheads were correctly identified by the model.

The weighted average precision & recall rose from 78% on the Inception model to 91% and 86% respectively with EfficientNet.

Confusion matrix for classification system on holdout dataset

The confusion matrix helps us understand that the best performing class is the flatfish because the model saw a total of 123 flatfish and 117 of them were correctly classified by the model.

Clipped object detection inferences of flatfish

This is a good classification result for this specific species because as seen from the above images the flatfish have distinct features and are easily identifiable.

The worst performing class was the Shortspine Thornyhead because out of the 41 images seen by the model only 20 of them were correctly identified as Shortspine Thornyhead and 21 of them were identified as rock fish.


From both the hackathon and the meet-up presentation that followed, I was able to develop many skills such as my technical knowledge and my confidence in presenting and communicating to a large audience.

The results obtained are impressive bearing in mind the time scale and lack of human annotated data. The use of an active learning system helped build a definitive dataset with annotations over a very small time-frame, and the use of state of the art model architectures was able to boost the performance of the data to its current limits.

Finally, I would like to thank Lynker Analytics, NOAA Fisheries and Nvidia for giving us the opportunity to partake in such an interesting challenge.

I am a Data Science Researcher at Lynker Analytics. Based in Wellington, New Zealand

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store