As we scale AI and machine learning to work on a broader set of tasks for enterprise and industry applications, it is imperative to learn more from less. Data augmentation is one important tool, especially in situations where there isn’t enough training data, that improves learning by synthesizing new training samples automatically. Such is the case for few-shot learning, where only one or very few samples are available per category. Most prior work on few-shot classification for images investigates the ‘single label’ scenario, where every training image contains only a single object and hence has a single category label. However, a more challenging and realistic scenario is multi-label, few-shot image classification where training data has a small number of samples, and images have more than one label, which has not been explored extensively in prior work.
In order to advance this topic, we investigate multi-label, few-shot image classification in our paperpresented at IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019) in June 2019. The paper, titled “LaSO: Label-Set Operations networks for multi-label few-shot learning,” proposes a new method to train deep neural networks by combining pairs of image samples with certain sets of labels to synthesize new samples with ‘merged’ labels. As an example, consider the two images in Figure 1, one depicting ‘a person walking a sheep and a dog’ and another depicting ‘a person holding a dog and a cat’. The labels of the first image are ‘person,’ ‘sheep,’ and ‘dog,’ and the second are ‘person,’ ‘dog,’ and ‘cat.’ Given these two images, the LaSO networks synthesize novel training samples corresponding to operations that perform union, intersection and subtraction of labels. The ‘union’ produces a sample labeled ‘person,’ ‘dog,’ ’cat,’ and ‘sheep,’ while ‘intersection’ and ‘subtraction’ produce samples labeled ‘person,’ ‘dog,’ and ‘sheep’ alone, respectively. The LaSO networks operate directly in the feature space learned by a deep neural network.
The LaSO networks are trained jointly together, as a single multi-task network, using specific loss functions designed to adapt their operation to the corresponding label set manipulation tasks (Figure 2).
The multi-task network is trained on a large-scale multi-labeled dataset, with multiple labels per image corresponding to the objects appearing on that image. The resulting LaSO networks are tested in different ways in order to evaluate their potential in manipulating multi-label content. The tests include both classification of the resulting examples using a classifier pre-trained on real, held-out multi-label data and testing retrieval from the held-out tests set using feature vectors synthesized by LaSO networks (Figure 3) .
The LaSO networks are designed to operate directly on the image representations, not requiring any additional inputs to control the manipulation. In other words, human intervention is not required to indicate which labels to manipulate. Hence, they can potentially generalize to images containing novel categories that are unseen during training. In this respect, the LaSO networks can be used for challenging multi-label few-shot classification tasks. In these cases, the LaSO networks synthesize new training samples from random pairs of the provided training samples. In our paper, we apply this capability of LaSO networks to a novel benchmark for the multi-label few shot classification, which we hope will inspire more work on this important problem. The result of using LaSO networks for data augmentation on the proposed benchmark indicate a strong potential to generalize to novel categories (Figure 4).
Multi-label few-shot classification is a new, challenging and practical task. We propose the first benchmark for this task. The results of evaluating the LaSO label-set manipulation with neural networks on the proposed benchmark demonstrate that LaSO holds a good potential for this task and possibly for other interesting applications. We hope that this work will inspire more researchers to look into this interesting problem.