Object-aware Long-short-range Spatial Alignment for Few-Shot Fine-Grained Image Classification

Published in ACM MM-2021, 2021

Abstract

The goal of few-shot fine-grained image classification is to recognize rarely seen fine-grained objects in the query set, given only a few samples of this class in the support set. Previous works focus on learning discriminative image features from a limited number of training samples for distinguishing various fine-grained classes, but ignore one important fact that spatial alignment of the discriminative semantic features between the query image with arbitrary changes and the support image, is also critical for computing the semantic similarity between each support-query pair. In this work, we propose an object-aware long-short-range spatial alignment approach, which is composed of a foreground object feature enhancement (FOE) module, a long-range semantic correspondence (LSC) module and a short-range spatial manipulation (SSM) module. The FOE is developed to weaken background disturbance and encourage higher foreground object response. To address the problem of long-range object feature misalignment between support-query image pairs, the LSC is proposed to learn the transferable long-range semantic correspondence by a designed feature similarity metric. Further, the SSM module is developed to refine the transformed support feature after the long-range step to align short-range misaligned features (or local details) with the query features. Extensive experiments have been conducted on four benchmark datasets, and the results show superior performance over most state-of-the-art methods under both 1-shot and 5-shot classification scenarios.

Motivation

Spatial alignment examples of two different subclasses from CUB dataset are illustrated, where the red and black color boxes denote Blue Jay and American Goldfinch, respectively. The support and query images in each subclass are initially misaligned, due to the arbitrary pose and position variations of birds between the two images. By leveraging the proposed spatial alignment network to transform the original support features, the semantic distribution between the transformed support features and the query features can be better aligned.

Framework

The framework of FOE is illustrated as follows.

Experimental Results

Our experiments are conducted on Stanford Dogs, Stanford Cars and NABirds. The main experimental results:

Conclusion

We have introduced an object-aware long-short-range spatial alignment approach to boost the performance of few-shot fine-grained image classification. A foreground object enhancement module is first developed to strengthen those discriminative object features. Then a long-range semantic correspondence module is proposed to find a transferable feature transform for the enhanced features, to do the global support-query spatial matching. Finally, a short-range spatial manipulation module is developed to further refine the longrange transformed output by aligning more short-range local parts in the object. Experimental results on four benchmarks demonstrate the effectiveness and superiority of our proposed method.

Download paper here