Michael Karnes and Alper Yilmaz
Photogrammetric Computer Vision Lab.
The Ohio State University
Columbus, Ohio 43210
ABSTRACT
Deep neural network (DNN) image classification has grown rapidly as a general pattern detection
tool for an extremely diverse set of applications; yet dataset accessibility remains a major limiting
factor for many applications. This paper presents a novel dynamic learning approach to leverage
pretrained knowledge to novel image spaces in the effort to extend the algorithm knowledge do
main and reduce dataset collection requirements. The proposed Omni-Modeler generates a dynamic
knowledge set by reshaping known concepts to create dynamic representation models of unknown
concepts. The Omni-Modeler embeds images with a pretrained DNN and formulates compressed
language encoder. The language encoded feature space is then used to rapidly generate a dynamic
dictionary of concept appearance models. The results of this study demonstrate the Omni-Modeler
capability to rapidly adapt across a range of image types enabling the usage of dynamically learning
image classification with limited data availability.
KEYWORDS
Dynamic Learning, Few-shot Learning, Generalized Visual Classification
1. Introduction
Visual classification has an extremely wide area of application. It is now used across almost
every industry from inventory management to plant identification to medical diagnostics
to manufacturing quality control to worker safety to UAS visual navigation [1, 2, 3, 4,
5, 6, 7, 8, 9]. Cameras are becoming cheaper, their sensors are becoming more complex,
and wireless connection more widespread all generating a growing visual information flow.
Deep neural network (DNN) visual classification is an effective tool for handling this river
of information.
Each of these applications requires a a large training dataset that covers the expected
appearance ranges seen while in operation. The more complex the scenes, the larger the
necessary dataset, an often underestimated caveat to these state-of-the-art visual classifi
cation algorithms.
Further more, these training requirements limit implementation in real-time dynamic
learning tasks. A rapid visual classification system with low dataset requirements can
be used for variety of tasks such as dataset annotation, customized visual classification,
online learning, tracking, re-detection, and in data environments with changing concept
appearances.
Few-shot learning (FSL) provides a more adaptable framework focusing on reducing train
ing sample sizes [10]. This has recently evolved to meta-learning where the task transforms
into learning how to learn [11]. Further development of few-shot learning led to the ad
dition of a mutable memory component known as dynamic learning. The Omni-Modeler
DOI: 10.5121/sipij.2023.14501 1
Signal & Image Processing: An International Journal (SIPIJ) Vol.14, No.4/5, October 2023
is a simple, efficient, and generalized few-shot visual classification framework that intro
duces a novel language encoding methodology. The Omni-Modeler language generation
provides a highly flexible approach for distilling the large latent feature space of a DNN
into a compressed discriminative description. The Omni-Modeler differential advantage
lays in its simplicity in implementation, its generalizability to any DNN, and its flexible
image-to-description encoding.
The objective of the Omni-Modeler is to ease the implementation of deep visual recognition in new areas of application through a generalized, understandable, and dynamic
framework. This work specifically provides the following contributions:
• The novel Omni-Modeler architecture which:– Reduces dataset annotation requirements– Extends DNN operational domains to novel spaces– Enables a dynamically adaptable knowledge domain
• A user interface to aid in the implementation
• Results demonstrating the Omni-Modeler performance
2. Related Work
The Omni-Modeler is an agnostic few-shot learning (FSL) visual classification algorithm
that can dynamically adapt to variations in visual input. The focus of this paper is to
demonstrate its ability to classify across concept domains with minimal training data. The
remainder of this section will cover general visual classification, FSL image classification,
and dynamic learning.
Classification is the prediction of an object’s class based on its attributes compared to
a known model. DNN based visual classification embeds the image space into a latent
feature space which is then followed by a class separation task projecting the latent feature
space into a predictive metric. With traditional DNN classification, the network is trained
through an iterative regressive process. Batches of training images are passed through
the network and the network parameters are updated according to a loss function; first
attributed to LeNet in 1989 [12]. This is known as an end-to-end classification algorithm
and is highly demonstrated as an effective approach for tasks with sufficient training data
and static concepts.
FSL takes a different approach of generating object appearance models with a minimal
number of shots (a.k.a. sample reference images). This is known as metric learning where
query images are classified by a similarity score to a set of reference examples. There
is a long history of FSL using DNN learned features. For example, the Siamese network
architecture [13] and the Matching network [14] and the Prototypical-Net [15]. The current
trend in FSL algorithms follow a line of transforming pretrained DNN embedded features
using a series of linear transforms, DNN classifiers, and iterative solvers [16, 17, 18]. The
current state-of-the-art FSL algorithm, PT+MAP+SF+SOT, combines power transform
(PT) mapping with discrete cosine transform spatial frequency (SF) features and adds
Self-Optimal-Transport Feature Transform (SOT) achieving 89.94% for a 5-way 1-shot on
the CIFAR dataset [19].
FSL can also be viewed from a perspective of domain adaptation or meta-learning [10,
11]. Domain adaptation looks to map one latent feature space to another, calculating an
optimal transform. With DNN, this can be in the form of transfer learning and fine tuning,
both retraining a pretrained network on a new dataset using traditional back propagation.
Meta-learning is a more recent approach to the task formulating the problem into a learn
to-learn training, focusing on optimizing the feature embedding space.
Published on: