skip to main content
research-article
Open Access

Deep Uncoupled Discrete Hashing via Similarity Matrix Decomposition

Authors Info & Claims
Published:05 January 2023Publication History

Skip Abstract Section

Abstract

Hashing has been drawing increasing attention in the task of large-scale image retrieval owing to its storage and computation efficiency, especially the recent asymmetric deep hashing methods. These approaches treat the query and database in an asymmetric way and can take full advantage of the whole training data. Though it has achieved state-of-the-art performance, asymmetric deep hashing methods still suffer from the large quantization error and efficiency problem on large-scale datasets due to the tight coupling between the query and database. In this article, we propose a novel asymmetric hashing method, called Deep Uncoupled Discrete Hashing (DUDH), for large-scale approximate nearest neighbor search. Instead of directly preserving the similarity between the query and database, DUDH first exploits a small similarity-transfer image set to transfer the underlying semantic structures from the database to the query and implicitly keep the desired similarity. As a result, the large similarity matrix is decomposed into two relatively small ones and the query is decoupled from the database. Then both database codes and similarity-transfer codes are directly learned during optimization. The quantization error of DUDH only exists in the process of preserving similarity between the query and similarity-transfer set. By uncoupling the query from the database, the training cost of optimizing the CNN model for the query is no longer related to the size of the database. Besides, to further accelerate the training process, we propose to optimize the similarity-transfer codes with a constant-approximation solution. In doing so, the training cost of optimizing similarity-transfer codes can be almost ignored. Extensive experiments on four widely used image retrieval benchmarks demonstrate that DUDH can achieve state-of-the-art retrieval performance with remarkable training cost reduction (30× - 50× relative).

Skip 1INTRODUCTION Section

1 INTRODUCTION

Hashing aims to encode raw data into short binary codes while preserving data similarity information in the Hamming space. Due to its high storage and computational efficiency, hashing has been widely used in various computer vision tasks, e.g., large-scale image retrieval [14, 15, 18, 42, 51], video retrieval [25, 34], cross-modal retrieval [19, 23, 54], person re-identification [45, 57], and classification [37]. In this article, we focus on incorporating deep neural networks into the learning of hash codes for large-scale image retrieval. Such approaches [24, 30], also named deep hashing methods, have shown better performance than traditional hashing methods with hand-crafted features like Locality-Sensitive Hashing (LSH) [11], Spectral Hashing (SH) [43], and Iterative Quantization [12].

To date in the literature, deep hashing methods generally leverage the strong representation capability of the powerful convolutional neural networks (CNNs) to capture the underlying semantics of images. With the paradigm of simultaneously learning features and hash functions, these methods can take full advantage of the pre-trained CNN, achieving satisfactory results. A large number of approaches have been proposed recently, including both symmetric hashing [1, 5, 6, 10, 20, 28, 32, 39, 40, 53, 61] and asymmetric hashing [3, 18, 44] methods.

Among plentiful deep hashing methods, deep asymmetric hashing methods [3, 18, 36] have demonstrated superior retrieval performance to the conventional symmetric ones. Such methods exploit different hashing functions for the query and database, which are proven to be more effective in preserving similarity information. The state-of-the-art Asymmetric Deep Supervised Hashing (ADSH) [18] and Deep Anchor Graph Hashing (DAGH) [3] are two representative approaches. By learning hash function only for queries while directly obtaining hash codes for the database, they can capitalize on the whole training data and thus achieve promising results. Nevertheless, they still suffer from the large quantization error problem on large-scale datasets, though their database codes are directly optimized. ADSH and DAGH adopt the tanh function to approximate the sign function when learning the hash function for queries, which means the quantization error still exists on the query side. As the query codes are closely bound up with the database codes, the quantization error is directly related to the size of the similarity matrix between the query and database. Apart from the quantization problem, the tight coupling between the database and query also causes an efficiency problem, as the computation cost for optimizing the convolutional neural network is directly related to the size of the database.

The main cause of the quantization and efficiency problem in deep asymmetric hashing methods is the tight coupling between the database and query. In this article, we introduce a novel asymmetric hashing method for learning binary hash codes, named Deep Uncoupled Discrete Hashing (DUDH). The proposed DUDH elaborately designs a similarity-transfer matrix to decouple the query from the database, and thus both quantization error and training cost are no longer related to the size of the database. Specifically, a small similarity-transfer image set is constructed by sampling the database, whose code matrix is dubbed the similarity-transfer matrix. Instead of preserving the similarity between the query and database straightly as in ADSH and DAGH (upper part in Figure 1), DUDH leverages the similarity-transfer set to bridge the gap between them. By keeping its similarity to both the query and the database, the underlying semantic structures are transferred from the database to the query and aligned accordingly, which implicitly preserves the similarity between them (lower part in Figure 1). As such, the large similarity matrix is decomposed into two relatively small ones. As illustrated in Figure 2, both database and similarity-transfer codes are directly learned, and query codes are generated by the CNN model. These three parts are guided by the similarity-transfer hashing loss. More specifically, the similarity between the query and similarity-transfer set is preserved by the query-transfer loss, and that between the database and similarity-transfer set is preserved by the database-transfer loss. Note that the quantization error only exists in the query-transfer loss and is no longer related to the size of the database. The main contributions can be summarized as follows:

Fig. 1.

Fig. 1. Schematic of similarity-transfer matrix. The upper part shows that the query and database in conventional deep asymmetric hashing methods are tightly coupled. The bottom part shows that DUDH adopts a similarity-transfer matrix to decouple the query and database while implicitly preserving their similarities. \( n, t, m \) denote the size of the database, similarity-transfer set, and query set, respectively ( \( t\lt m\ll n \) ). \( c \) denotes the code length.

Fig. 2.

Fig. 2. Framework of DUDH. Similarity-transfer images are randomly sampled from database images to preserve semantic similarity between query and database images. The binary codes for similarity-transfer and database images are directly learned during the optimization, while the ones for query images are generated by the CNN model. Best viewed in color.

  • A similarity-transfer matrix is elaborately designed to bridge the gap between query and database images. With such matrix, the quantization error and the cost of optimizing the CNN model can be largely reduced and the semantic similarity between the query and database can be preserved as well. Besides, we further propose to accelerate the optimization of the similarity-transfer matrix with a constant-approximation solution. As a result, the cost of optimizing the similarity-transfer matrix can be ignored as well.

  • We devise a similarity-transfer hashing loss function by preserving the similarity between training images. It incorporates a similarity-transfer set to train the hash functions for query images. Simultaneously, the binary hash codes for the database and similarity-transfer set can be directly obtained during the optimization.

  • Extensive experiments demonstrate that the proposed DUDH can significantly decrease the training time while achieving state-of-the-art retrieval accuracy.

The rest of this article is organized as follows: Related works are briefly discussed in Section 2. Section 3 describes DUDH in detail. Section 4 extensively evaluates the proposed method on three widely used image retrieval datasets. Finally, Section 5 concludes this article.

Skip 2RELATED WORKS Section

2 RELATED WORKS

Hashing is an important technique for fast approximate similarity search. Generally speaking, hashing methods can be divided into two categories: data-independent methods and data-dependent methods. Data-independent methods randomly generate a set of hash functions without any training. Representative data-independent methods include LSH [11] and its variants [35]. However, it has been proven that the LSH method needs long codes to meet the accuracy requirement. To generate more compact binary codes, some data-dependent methods are proposed. They try to learn appropriate hash functions that can well separate the training samples. Existing data-dependent hashing methods can be further classified into supervised hashing and unsupervised hashing. Compared with unsupervised hashing methods [12, 43], supervised hashing methods can leverage the label information to achieve better retrieval performance. Representative supervised hashing methods include KSH [31], LFH [56], SDH [38], and FDUDH [27].

With the rapid development of deep learning techniques, the deep models have been applied to hashing approaches, raising the deep hashing methods. In this article, we mainly focus on improving the training efficiency of deep hashing methods. These methods can be summarized into symmetric deep hashing methods and asymmetric ones. The former methods encode database and query images into binary codes through the same CNN model, while the latter methods learn two different deep hash functions for query and database images separately.

Symmetric deep hashing methods have shown great advantages over the hand-crafted feature-based hashing methods. Convolutional Neural Network Hashing (CNNH) [46] tries to fit the binary codes computed from the similarity matrix. Deep Supervised Hashing (DSH) [30] and Deep Pairwise Supervised Hashing (DPSH) [26] are pair-wise label based hashing methods. Two-stream DH (TSDH) [8] constructs a two-stream ConvNet architecture and learns hash codes with class-specific centers to minimize the intra-class variation. Deep Supervised Discrete Hashing (DSDH) [24] is the first deep hashing method with discrete optimization. Some methods further consider more complicated semantic similarity between images [58]. Other methods [17, 19, 47, 50] focus on learning unified hash codes for different modalities, such as image, text, and video. The aforementioned deep hashing methods are all supervised methods; i.e., they cannot apply to the scenario where label information is unavailable or incomplete. Recently, some deep unsupervised hashing [7, 9, 28, 39, 49, 51, 53] methods were proposed. Deep binary descriptors (DeepBit) [28] treats original images and their corresponding rotated images as similar pairs and attempts to preserve such similarities. DistillHash [51] learns a distilled dataset composed of data pairs that have confident similarity signals. Apart from unsupervised and supervised deep hashing methods, semi-supervised deep hashing [48, 55] methods also have attracted much attention. SSDH [55] tries to utilize the unlabeled images through online graph construction. BGDH [48] constructs a bipartite graph to discover the underlying structure of data, based on which an embedding is generated for each instance.

Asymmetric deep hashing methods recently showed better retrieval performance than symmetric methods. Deep Asymmetric Pairwise Hashing (DAPH) [36] adopts two different CNN models to generate hash codes for query and database images separately. ADSH [18] learns a CNN model only for query images and directly learns the binary hash codes for database images. DAGH [3] is similar to ADSH, and it adopts an anchor graph to benefit the learning of binary codes. ADSH and DAGH perform better than DAPH in most cases since ADSH and DAGH can efficiently utilize the supervision. However, ADSH and DAGH still require a large amount of computation during the optimization.

Several hashing works have exploited the “anchors” for generating feature representation (e.g., SDH [38], KSH [31], and GCNH [59]), which shares the form with the proposed similarity-transfer set that both of them are a subset of the database. However, our similarity-transfer set is fundamentally different from them. Anchor-based methods generally employ anchors to learn features of other points by calculating the distances between the points and anchors. Such process does not aim to reduce the computation or the quantization error but is more like the feature enhancement. In contrast, the similarity-transfer set in DUDH is treated as a springboard to transfer the underlying semantic structures from the database to the query, which reduces both computation cost and quantization error. Meanwhile, the similarities between the query and database are also implicitly preserved.

Apart from learning a representation where the intra-class distances are minimized and inter-class distances are maximized, hashing methods need to pay more attention to reducing or avoiding quantization loss during optimization, which is the major difference between deep hashing methods and deep metric learning methods.

Skip 3DEEP UNCOUPLED DISCRETE HASHING Section

3 DEEP UNCOUPLED DISCRETE HASHING

3.1 Problem Definition

Suppose we have \( m \) query images denoted as \( X = \lbrace x_i\rbrace _{i=1}^m \), \( n \) database images denoted as \( Y = \lbrace y_j\rbrace _{j=1}^n \), and \( t \) similarity-transfer images denoted as \( Z = \lbrace z_k\rbrace _{k=1}^t \). The similarity-transfer image set is simply sampled from the database. Normally, the number of query images is much smaller than that of database images but much larger than that of similarity-transfer images; i.e., \( t\lt m\ll n \). \( \widehat{S}\in \lbrace -1,+1\rbrace ^{m\times t} \) denotes the similarity matrix between the query and similarity-transfer set, and \( \widetilde{S}\in \lbrace -1,+1\rbrace ^{n\times t} \) denotes the similarity matrix between the database and similarity-transfer set. \( \widehat{S}_{ij}=1/-1 \) indicates that \( x_i \) and \( z_j \) are similar/dissimilar, and \( \widetilde{S}_{ij}=1/-1 \) indicates that \( y_i \) and \( z_j \) are similar/dissimilar. The goal of DUDH is to learn a hash function \( h(x_q)\in \lbrace -1,+1\rbrace ^c \) to generate the binary hash codes \( U = \lbrace u_i\rbrace _{i=1}^m\in \lbrace -1,+1\rbrace ^{m\times c} \) for query images and directly learn the hash codes \( V = \lbrace v_i\rbrace _{i=1}^n\in \lbrace -1,+1\rbrace ^{n\times c} \), \( W = \lbrace w_i\rbrace _{i=1}^t\in \lbrace -1,+1\rbrace ^{t\times c} \) for the database and similarity-transfer set, respectively, where \( c \) is the code length. The Hamming distance between \( u_i/v_i \) and \( w_j \) should be as small as possible if \( \widehat{S}_{ij}/\widetilde{S}_{ij}=1 \); otherwise, the distance should be as large as possible. The notations and their descriptions are listed in Table 1.

Table 1.
NotationDescription
\( m \)the number of query images
\( n \)the number of database images
\( t \)the number of similarity-transfer images
\( c \)the length of binary hash codes
\( X \)\( X = \lbrace x_i\rbrace _{i=1}^m \): the set of \( m \) query images
\( Y \)\( Y = \lbrace y_j\rbrace _{j=1}^n \): the set of \( n \) database images
\( Z \)\( Z = \lbrace z_k\rbrace _{k=1}^t \): the set of \( t \) similarity-transfer images
\( \widehat{S} \)\( \widehat{S}\in \lbrace -1,+1\rbrace ^{m\times t} \): the similarity matrixbetween query and similarity-transfer set
\( \widetilde{S} \)\( \widetilde{S}\in \lbrace -1,+1\rbrace ^{n\times t} \): the similarity matrixbetween database and similarity-transfer set
\( h \)\( h(x_q)\in \lbrace -1,+1\rbrace ^c \): the deep hash function
\( U \)\( U = \lbrace u_i\rbrace _{i=1}^m\in \lbrace -1,+1\rbrace ^{m\times c} \):the binary hash codes of query images
\( V \)\( V = \lbrace v_i\rbrace _{i=1}^n\in \lbrace -1,+1\rbrace ^{n\times c} \):the binary hash codes of database images
\( W \)\( W = \lbrace w_i\rbrace _{i=1}^t\in \lbrace -1,+1\rbrace ^{t\times c} \):the binary hash codes of similarity-transfer images

Table 1. Notations and Their Descriptions

3.2 Framework Overview

As illustrated in Figure 2, DUDH has three inputs, i.e., the query images, the similarity-transfer images, and the database images, respectively. In practice, the query and similarity-transfer set are generally unavailable. Therefore, we sample two subsets of \( Y \) to form \( X \) and \( Z \). The supervision \( \widehat{S} \) and \( \widetilde{S} \) can be easily obtained given the label of the whole dataset. DUDH contains three core parts: hash function learning, database code learning, and similarity-transfer code learning. In the hash function learning part, a deep CNN model is employed to extract representative features for query images. Deep hash functions are then learned on top of the features to produce the hash codes \( U \). Note that this module is only applied to the query set. The database code learning part directly learns the hash codes \( V \) for the database, while the similarity-transfer code learning part learns the codes \( W \) for the similarity-transfer set. To achieve the above goals, a similarity-transfer hashing loss function is devised with \( U, V, W \), which implicitly preserves the similarity between the query and database through the similarity-transfer set. We detail each part in the following sections.

3.3 Deep Hash Functions

We construct the hash functions through a CNN model, the output of which is a \( c \)-dimensional vector. \( c \) is the code length. Accordingly, our deep hash function is defined as (1) \( \begin{equation} \begin{aligned}\mathbf {u_i}=h\left(\mathbf {x_i};\mathbf {\theta }\right)&=sign\left(f(\mathbf {x_i};\mathbf {\theta })\right)\!,\\ \end{aligned} \end{equation} \)

where \( \theta \) denotes the parameters of CNN model, and \( f\left(\cdot \right) \) denotes the output of the CNN model.

3.4 Similarity-Transfer Hashing Loss

DUDH utilizes the similarity-transfer set as a springboard to boost the computation efficiency. It aims to preserve the similarity of the similarity-transfer set with both the query and the database, rather than preserve the similarity between the query and database directly. As such, the underlying semantic structures are transferred from the database to the query and aligned accordingly, which implicitly preserves the similarity between them. In this way, the necessary large similarity matrix between the database and query is “decomposed” into two small ones, thus evading the huge computation overhead. Figure 3 illustrates how similarity-transfer images work. When the images (a) and (c) are both similar to (b) (the similarity-transfer image), the similarity between (a) and (c) can be implicitly preserved (left in the figure). In the same way, the dissimilarity between (a) and (c) can also be preserved through a specific image (b) (right in the figure). The proposed similarity-transfer hashing loss thus follows the above paradigm to separately optimize the hash codes, which consists of two components: database-transfer loss and query-transfer loss.

Fig. 3.

Fig. 3. Examples of how similarity-transfer images work. The left part shows that if the images (a) and (c) are both similar to (b), then the similarity between (a) and (c) can be implicitly preserved. The right part shows that if the image (b) is similar to (a) but dissimilar to (c), then the dissimilarity between (a) and (c) can be implicitly preserved. Best viewed in color.

Database-transfer loss tries to preserve the similarity between database and similarity-transfer images. With the similarity matrix \( \widetilde{S} \), the goal is to reduce/enlarge the Hamming distances between the hash codes of similar/dissimilar pairs. The \( L_2 \)-norm loss is then adopted to minimize the difference between the supervision and the inner product of binary code pairs, which is given by (2) \( \begin{equation} \begin{aligned}&\min \limits _{V,W} J(V,W)=\sum _{i=1}^{n}\sum _{j=1}^{t}(v_i w_{j}^T-c\widetilde{S}_{ij})^2,\\ &s.t.~~~V=\lbrace v_1,v_2,\ldots ,v_n\rbrace \in \lbrace -1,+1\rbrace ^{n\times c},\\ &\qquad W=\lbrace w_1,w_2,\ldots ,w_t\rbrace \in \lbrace -1,+1\rbrace ^{t\times c}. \end{aligned} \end{equation} \)

Query-transfer loss aims at preserving the similarity between query and similarity-transfer images. Similar to the above database-transfer loss, the query-transfer loss can be defined as (3) \( \begin{equation} \begin{aligned}&\min \limits _{U,W} J(U,W)=\sum _{i=1}^{m}\sum _{j=1}^{t}(u_i w_{j}^T-c\widehat{S}_{ij})^2,\\ &s.t.~~~U=\lbrace u_1,u_2,\ldots ,u_m\rbrace \in \lbrace -1,+1\rbrace ^{m\times c},\\ &\qquad W=\lbrace w_1,w_2,\ldots ,w_t\rbrace \in \lbrace -1,+1\rbrace ^{t\times c}. \end{aligned} \end{equation} \)

Note that \( u_i \) is obtained through the CNN model. We integrate Equation (1) into Equation (3) and reformulate the loss function as follows: (4) \( \begin{equation} \begin{aligned}&\min \limits _{\theta ,W} J(\theta ,W)=\sum _{i=1}^{m}\sum _{j=1}^{t}[sign\left(f(x_i;\theta)\right) w_{j}^T-c\widehat{S}_{ij}]^2,\\ &s.t.\qquad W=\lbrace w_1,w_2,\ldots ,w_t\rbrace \in \lbrace -1,+1\rbrace ^{t\times c}. \end{aligned} \end{equation} \)

It should be noted that Equation (4) is in general NP-hard since the function \( sign\left(\cdot \right) \) is not differentiable at 0. We adopt a common continuous relaxation to replace the \( sign\left(\cdot \right) \) with \( tanh\left(\cdot \right) \) function, which brings new formulations as (5) \( \begin{equation} \begin{aligned}&\min \limits _{\theta ,W} J(\theta ,W)=\sum _{i=1}^{m}\sum _{j=1}^{t}[tanh\left(f(x_i;\theta)\right) w_{j}^T-c\widehat{S}_{ij}]^2,\\ &s.t.\qquad W=\lbrace w_1,w_2,\ldots ,w_t\rbrace \in \lbrace -1,+1\rbrace ^{t\times c}. \end{aligned} \end{equation} \)

As aforementioned, both query and similarity-transfer set images are sampled from the database. Let \( \Psi =\lbrace 1,2,3,\ldots ,n\rbrace \) denote the indices of database images, \( \Omega =\lbrace 1,2,3,\ldots ,m\rbrace \) the indices of query images, and \( \Phi =\lbrace 1,2,3,\ldots ,t\rbrace \) the indices of similarity-transfer images. Combining the above database-transfer loss and query-transfer loss, we can finally formulate similarity-transfer hashing loss as (6) \( \begin{equation} \begin{aligned}&\min \limits _{\theta ,W,V} J(\theta ,W,V)=\sum _{i\in \Psi }\sum _{j\in \Phi }[v_i w_{j}^T-c\widetilde{S}_{ij}]^2\\ &+\lambda \sum _{i\in \Omega }\sum _{j\in \Phi }[tanh\left(f(x_i;\theta)\right) w_{j}^T-c\widehat{S}_{ij}]^2\\ &+\gamma \sum _{i\in \Omega }[v_i-tanh\left(f(x_i;\theta)\right)]^2,\\ &s.t.~~~V=\lbrace v_1,v_2,\ldots ,v_n\rbrace \in \lbrace -1,+1\rbrace ^{n\times c},\\ &\qquad W=\lbrace w_1,w_2,\ldots ,w_t\rbrace \in \lbrace -1,+1\rbrace ^{t\times c}, \end{aligned} \end{equation} \)

where \( \lambda \), \( \gamma \) are two hyper-parameters. It is worth noting that we further add a constraint \( \sum _{i\in \Omega }[v_i-tanh\left(f(x_i;\theta)\right)]^2 \). This is because each image \( x_i \) in the query set possesses two kinds of code representations: one is the learned binary hash codes \( v_i \) from the database, and the other is the representation \( tanh\left(f(x_i;\theta)\right) \) from the CNN model. By minimizing the difference between them, we hope the two representations can be consistent.

3.5 Optimization

We design an alternating optimization strategy to learn the parameters \( \theta \), \( V \), and \( W \) in Equation (6). More specifically, in each iteration we learn one parameter with the other two fixed. The three steps are repeated for several iterations.

3.5.1 θ-step.

When \( V \) and \( W \) are fixed, we use the standard back-propagation algorithm to optimize \( \theta \). For simplicity, we denote \( p_i=tanh(f(x_i;\theta)) \) and \( q_i=f(x_i;\theta) \). The partial derivative of \( J(\theta ,V,W) \) with respect to \( q_i \) is (7) \( \begin{equation} \begin{aligned}&\frac{\partial J}{\partial q_i} = 2\Big (\lambda \sum _{j\in \Phi }[(p_i w_{j}^T-c\widehat{S}_{ij})w_j]\\ &+\gamma (p_i-v_i)\Big)\cdot ({\bf 1}-p_i\cdot p_i), \end{aligned} \end{equation} \)

where 1 denotes the all-ones vector, and \( (\cdot) \) means the element-wise multiplication operation between two vectors. Then we can use the chain rule to update \( \theta \).

3.5.2 V-step.

When \( \theta \) is fixed, it is not straightforward to update \( V \). We first rewrite Equation (6) into the following matrix form: (8) \( \begin{equation} \begin{aligned}\min \limits _{V} J(V)&=||VW^T-c\widetilde{S}||_F^2+\gamma \left|\left|V_{\Omega }-P\right|\right|_F^2\\ &=||VW^T||_F^2-2ctr(VW^T\widetilde{S}^T)\\ &\qquad -2\gamma tr(V_{\Omega } P^T)+const,\\ &s.t.~~~V\in \lbrace -1,+1\rbrace ^{n\times c},\\ \end{aligned} \end{equation} \)

where \( P=\lbrace p_i|i\in \Omega \rbrace \in [-1,+1]^{m\times c} \) and \( V_{\Omega } \) is the binary hash codes of the database images indexed by \( \Omega \), i.e., \( V_{\Omega }=\lbrace v_{i}|i\in \Omega \rbrace \in \lbrace -1,+1\rbrace ^{m\times c} \). “const” is a constant independent of \( V \). To simplify Equation (8), we newly define \( \widetilde{P}=\lbrace \widetilde{p}_i|i\in \Psi \rbrace \in [-1,+1]^{n\times c} \), where \( \widetilde{p}_i \) is defined as (9) \( \begin{equation} \begin{aligned}&\widetilde{p}_i= {\left\lbrace \begin{array}{ll} ~~p_i,& \text{$i\in \Omega $},\\ ~~0,& \text{$otherwise$}, \end{array}\right.} \end{aligned} \end{equation} \)

and Equation (8) can be rewritten as follows: (10) \( \begin{equation} \begin{aligned}\min \limits _{V} J(V)&=||VW^T||_F^2-2tr(V(cW^T\widetilde{S}^T+\gamma \widetilde{P}^T)),\\ &=||VW^T||_F^2+tr(VQ^T),\\ &s.t.~~~V\in \lbrace -1,+1\rbrace ^{n\times c},\\ \end{aligned} \end{equation} \)

where \( Q=-2c\widetilde{S}W-2\gamma \widetilde{P} \). Note that the constant term in Equation (8) is omitted for simplicity. Then we adopt the discrete cyclic coordinate descent (DCC) algorithm proposed in [38] to solve Equation (10). Specifically, we update \( V \) bit by bit, which means each time we update one column of \( V \) with other columns fixed. We denote \( V_{*l} \) as the \( l \)th column of \( V \) (\( l=1,\ldots , c \)) and \( \hat{V_l} \) as the matrix of \( V \) excluding \( V_{*l} \). For \( W \), let \( W_{*l} \) be the \( l \)th column of \( W \) and \( \hat{W_l} \) be the matrix of \( W \) excluding \( W_{*l} \). Similarly, let \( Q_{*l} \) denote the \( l \)th column of \( Q \) and \( \hat{Q_l} \) denote the matrix of \( Q \) excluding \( Q_{*l} \). Therefore, Equation (10) can be transformed to (11) \( \begin{equation} \begin{aligned}\min \limits _{V_{*l}} J(V_{*l})&=||VW^T||_F^2+tr(VQ^T),\\ &=tr(V_{*l}[2W_{*l}^T\hat{W_l}\hat{V_l}^T+\hat{Q_l}^T]),\\ &s.t.~~~V_{*l}\in \lbrace -1,+1\rbrace ^n.\\ \end{aligned} \end{equation} \)

Obviously, when the sign of each bit in \( V_{*l} \) is different from that of the corresponding bit in \( 2W_{*l}^T\hat{W_l}\hat{V_l}^T+\hat{Q_l}^T \), \( J(V_{*l}) \) can reach its minimum value. Therefore, the solution to Equation (11) is as follows: (12) \( \begin{equation} \begin{aligned}V_{*l}&=-sign(2\hat{V_l}\hat{W_l}^TW_{*l}+\hat{Q_l}).\\ \end{aligned} \end{equation} \)

We then update \( V \) by replacing the \( l \)th column of \( V \) with \( V_{*l} \). Accordingly, all columns of \( V \) are updated sequentially by repeating Equation (12).

3.5.3 W-step.

Given the fixed \( \theta \) and \( V \), the objective in Equation (6) can be rewritten into the formulation of (13) \( \begin{equation} \begin{aligned}\min \limits _{W} J_2(W)&=||VW^T-c\widetilde{S}||_F^2+\lambda ||P W^T-c\widehat{S}||_F^2,\\ &s.t.\qquad W\in \lbrace -1,+1\rbrace ^{t\times c}. \end{aligned} \end{equation} \)

Different from \( \theta \) and \( V \), we optimize \( W \) in a discrete way, where the error of the solution is proven to be bounded. Let’s first consider the following problem, which changes Equation (13) from Frobenius norm to \( L_1 \)-norm: (14) \( \begin{equation} \begin{aligned}\min \limits _{W} J_1(W)&=||VW^T-c\widetilde{S}+\lambda (P W^T-c\widehat{S})||_1\\ &=||(V+\lambda \overline{P})W^T-c(\widetilde{S}+\lambda \overline{\widehat{S}})||_1,\\ &s.t.\qquad W\in \lbrace -1,+1\rbrace ^{t\times c}, \end{aligned} \end{equation} \)

where \( \overline{P}\in [-1,+1]^{n\times c} \) and \( \overline{\widehat{S}}\in \lbrace -1,+1\rbrace ^{n\times t} \). The first \( m \) rows of \( \overline{P} \) are the same as \( P \), and the remaining elements are all zeros. Similarly, elements in the first \( m \) rows of \( \overline{\widehat{S}} \) are the same as \( \widehat{S} \), and the rest are all zeros.

The solution of Equation (14) can be simply found as (15) \( \begin{equation} \begin{aligned}W=sign((\widetilde{S}+\lambda \overline{\widehat{S}})^T(V+\lambda \overline{P})). \end{aligned} \end{equation} \)

Furthermore, we have the following theorem.

Theorem 3.1.

Suppose that \( J_1(W) \) and \( J_2(W) \) reach their minimum at the points \( W_1^* \) and \( W_2^* \), respectively. We have \( J_2(W_1^*)\le 2cJ_2(W_2^*)+(\frac{1}{2}c+8c^2)\lambda mt \).

The proof of Theorem 3.1 can be found in the appendix. Note that the parameters \( c, \lambda , m, t \) are all constants and usually small. In other words, when we utilize Equation (15) to solve Equation (13), the solution is a constant-approximation solution, which provides an error bound for Equation (13). Finally, the entire learning algorithm of DUDH is summarized in Algorithm 1.

3.6 Analysis on Quantization Error and Computational Complexity

The quantization error of DUDH only exists in the query-transfer loss, and its value is no longer related to the size of the database. For ADSH, as the query is tightly coupled with the database, its quantization error is proportional to the size of the database. Compared with ADSH, the quantization error of DUDH can be significantly reduced. The computational cost for training DUDH comprises three parts: one for optimizing \( \theta \), one for optimizing \( W \), and one for optimizing \( V \). Specifically, the training complexity is \( \mathcal {O}(T_i T_e mtc) \) for optimizing \( \theta \), \( \mathcal {O}(T_i ntc) \) for \( W \), and \( \mathcal {O}(T_i ntc^2) \) for \( V \). Here \( T_i \) is the iteration number, and \( T_e \) is the network epoch number in each iteration. On the other hand, the complexity of ADSH is \( \mathcal {O}(T_i T_e mnc) \) for optimizing \( \theta \), and \( \mathcal {O}(T_i mnc^2) \) for \( V \). Generally, optimizing \( \theta \) and \( V \) costs much more time than optimizing \( W \). In comparison with ADSH, the complexity of DUDH for optimizing \( \theta \) is much lower because \( t\ll n \), which indicates that the network optimization cost is simply proportional to the size of the similarity-transfer set. Besides, the complexity of DUDH for optimizing \( V \) is also lower because \( t\lt m \). Therefore, though additional optimization for \( W \) is exhibited, DUDH is still much more efficient than ADSH. The comparison of computational complexity is listed in Table 2.

Table 2.
Methods\( \theta \)WV
ADSH\( \mathcal {O}(T_i T_e mnc) \)-\( \mathcal {O}(T_i nmc^2) \)
DUDH\( \mathcal {O}(T_i T_e mtc) \)\( \mathcal {O}(T_i ntc) \)\( \mathcal {O}(T_i ntc^2) \)

Table 2. Comparison of Computational Complexity

Skip 4EXPERIMENTS Section

4 EXPERIMENTS

4.1 Evaluation Setup

4.1.1 Datasets.

We conduct extensive evaluations of our proposed method DUDH on four widely used datasets, CIFAR-10 [22], SVHN [33], NUS-WIDE [4], and MS-COCO [29].

  • CIFAR-101 consists of 60,000 single-label color images in 10 classes. Each class contains 6,000 images of size \( 32 \times 32 \). For CIFAR-10, two images will be treated as a similar pair if they share one common label.

  • SVHN2 consists of 73,257 training images and 26,032 testing images. The images are categorized into 10 classes, each corresponding to a digital number. For SVHN, two images are similar if they share the same label.

  • NUS-WIDE3 contains 269,648 multi-label web images collected from Flickr. The association between images and 81 concepts is manually annotated. Following [30, 60], we only select the images associated with the 21 most frequent concepts (labels), where each of these concepts (labels) associates with at least 5,000 images, resulting in a total of 195,834 images. For NUS-WIDE, two images will be defined as a similar pair if they share at least one common label.

  • MS-COCO4 is a multi-label dataset. It contains 82,783 training images and 40,504 validation images, which belong to 80 classes. Two images are similar if they share at least one common label.

For CIFAR-10, we randomly select 1,000 images (100 images per class) as the query set, with the remaining images as database images. For SVHN, we randomly select 1,000 images (100 images per class) from the testing set as the query set and utilize the whole training set as the retrieval set. Similarly, for NUS-WIDE, we randomly choose 2,100 images (also 100 images per class) as the query set, leaving the rest as the database. For MS-COCO, we randomly sample 5,000 images as the query set, and the rest of the images are used as the training and gallery images.

4.1.2 Comparison Methods.

We compare DUDH with both traditional and deep hashing methods. For traditional approaches, we compare with LSH [11] and ITQ [12] from unsupervised hashing, and FASTH [27], LFH [56], SDH [38], and COSDISH [21] from supervised hashing. The deep hashing methods include DSH [30], DHN [60], DIHN [44], ADSH [18], JLDSH [13], and CSQ [52]. Note that JLDSH, DIHN, and ADSH are asymmetric deep hashing methods. JLDSH and DIHN are inherited from ADSH.

4.1.3 Implementation Details.

Following [16, 18], we adopt the CNN-F model [2] as the basic network architecture for both DUDH and all the other deep hashing approaches. This CNN architecture has five convolutional layers and two FC layers. For traditional (non-deep) methods, we utilize the 4,096-dim deep features extracted from the CNN-F model pre-trained on ImageNet. In addition, we set \( m=2,\!000 \) for ADSH, JLDSH, and DUDH by cross-validation. In view of both computation cost and retrieval accuracy, we set \( t=100 \) for CIFAR-10/SVHN and \( t=1,\!000 \) for NUS-WIDE/MS-COCO. For DIHN, we randomly select three classes as incremental classes, and we follow [44] that adopts ADSH to train the base model. Note that ADSH, DIHN, JLDSH, and DUDH are all trained on the whole databases for a fair comparison.

4.1.4 Evaluation Metrics.

We report the Mean Average Precision (MAP) to evaluate the overall retrieval performance of DUDH and baselines. MAP is widely used in image retrieval evaluation, which includes the mean of the average precision values obtained from the top returned samples. Following [18, 30, 44], the MAP results for NUS-WIDE are calculated based on the top-5\( k \) returned samples. Additionally, we evaluate the performance from other aspects, including the top-\( k \) precision and precision-recall curve. Note that DUDH also aims to decrease the training cost. Therefore, we also compare the training time of several deep hashing methods, including the detailed training time for optimizing \( \theta \), \( W \), and \( V \).

4.1.5 Experimental Environment.

All of our proposed approaches are implemented on the MatConvNet [41] framework. We carry out the experiments on NVIDIA RTX 2080Ti.

4.2 Accuracy Comparison

4.2.1 Comparison of Retrieval Accuracy.

Table 3 shows the MAP performance comparisons on three datasets. As shown in the table, the most competitive methods are JLDSH, ADSH, and DIHN. Among these three methods, ADSH-based methods (ADSH, DIHN, and JLDSH) are more competitive. DUDH performs the best in most cases. We also report the performance of the top-\( k \) precision and precision-recall curve with 24 bits on three datasets. As shown in Figure 4, DUDH achieves the best performance in all cases on CIFAR-10 and SVHN, which is consistent with the MAP results. For NUS-WIDE, DUDH achieves the best top-\( k \) performance for small \( k \). When \( k \) gets larger, DUDH is a little inferior to JLDSH. For the precision-recall curve, as shown in Figure 5, it is interesting to observe that the precisions of DUDH, JLDSH, ADSH, and DIHN increase with the increasing of recalls on CIFAR-10 and SVHN, which is different from the normal precision-recall curves. This indicates that the three methods may only fail on some extremely hard negative examples. Similar to the top-\( k \) and MAP performance, DUDH can achieve the best precision-recall performance on CIFAR-10 and SVHN, while being a little inferior to ADSH and JLDSH in certain cases on NUS-WIDE. In general, the proposed DUDH can achieve the best accuracy performance in all cases on CIFAR-10 and SVHN and competitive accuracy performance on NUS-WIDE. Meanwhile, the training cost of our DUDH can be largely reduced, which will be discussed later.

Fig. 4.

Fig. 4. Performance of top-k precision on three datasets. The code length is 24. Best viewed in color.

Fig. 5.

Fig. 5. Performance of precision-recall curve on three datasets. The code length is 24. Best viewed in color.

Table 3.
MethodsCIFAR-10NUS-WIDESVHN
12 bits24 bits32 bits48 bits12 bits24 bits32 bits48 bits12 bits24 bits32 bits48 bits
LSH [11]0.1470.1730.1800.1930.3410.3510.3510.3710.1070.1080.1090.111
ITQ [12]0.2580.2730.2830.2940.5050.5040.5030.5050.1110.1140.1150.116
FDUDH [27]0.6200.6730.6870.7160.7410.7830.7950.8090.2510.2960.3180.344
LFH [56]0.4010.6050.6570.7000.7050.7590.7780.7940.1930.2560.2840.325
SDH [38]0.5200.6460.6580.6690.7390.7620.7700.7720.1510.3000.3200.334
COSDISH [21]0.6090.6830.6960.7160.7300.7640.7870.8000.2380.2950.3200.341
DSH [30]0.6460.7490.7860.8110.7620.7940.7970.8080.3700.4800.5230.583
DHN [60]0.6730.7110.7050.7140.7900.8100.8090.8180.3800.4100.4160.430
ADSH [18]0.8900.9280.9310.9390.8400.8780.8950.9060.7970.8900.9120.919
DIHN [44]0.8920.9270.9330.9400.8350.8820.9000.9020.7900.8870.9130.915
CSQ [52]0.9370.9410.9520.9470.8400.8860.8870.8860.9030.9190.9270.935
JLDSH [13]0.8770.9330.9330.9420.8400.8880.9000.9100.7960.8800.8950.886
DUDH (Ours)0.9380.9420.9430.9480.8630.8940.9000.9060.8950.9220.9290.941
  • The best results for MAP are shown in bold.

Table 3. Comparison of MAP w.r.t. Different Number of Bits on Three Datasets

  • The best results for MAP are shown in bold.

4.2.2 DUDH vs. ADSH.

For CIFAR-10 and SVHN, DUDH performs better than ADSH-based methods in all cases. The reason lies in the fact that the quantization error of DUDH is much smaller than that of ADSH-based methods. Due to the tight coupling between the query and database, the quantization errors of ADSH-based methods are directly related to the size of the similarity matrix between the database and query (m \( \times \) n). However, the quantization error of DUDH is only related to the size of the similarity matrix between the query and similarity-transfer set (m \( \times \) t). Meanwhile, DUDH can implicitly preserve similarity between database and query through the similarity-transfer set, which is achieved by keeping its similarity to both the database and the query. For NUS-WIDE, DUDH achieves a comparative performance with ADSH. The reason is that NUS-WIDE is a multi-label dataset, in which the similarities are more complex. Thus, a higher standard for the selection of the similarity-transfer set is demanded.

4.2.3 DUDH vs. CSQ.

We further compare DUDH with the Hadamard-matrix-based method CSQ [52] in the aspect of MAP and training time. For MAP, as listed in Table 3, DUDH is superior to CSQ in most cases. More specifically, DUDH gains obvious advantages on the NUS-WIDE dataset. For CIFAR-10 and SVHN, DUDH is a little inferior to CSQ in a few cases. For training time, as listed in Table 5, DUDH can achieve better retrieval performance using much less training time. Table 4 shows the comparison results on MS-COCO. DUDH shows much better retrieval performance than CSQ on MS-COCO. Meanwhile, the training efficiency of DUDH is much higher than that of CSQ. Besides, with the decreasing number of training images, though CSQ may cost less training time (still much slower than DUDH), the gap between the two methods is getting wider as well.

Table 4.
MethodsMAPTraining Time
12 bits24 bits32 bits48 bits\( \theta \)\( W \)\( V \)TotalMAP
CSQ [52]0.7440.8440.8720.886---512.10.886
CSQ-\( \frac{1}{2} \)0.6900.7870.8090.828---347.90.828
CSQ-\( \frac{1}{8} \)0.6350.7290.7550.769---57.90.769
CSQ-\( \frac{1}{16} \)0.6050.6930.7240.738---9.60.738
DUDH(Ours)0.8710.8990.9010.91019.50.040.0623.60.910
  • For the training time, the code length is 48. CSQ-\( \frac{1}{2}/\frac{1}{8}/\frac{1}{16} \) denotes CSQ trained with \( \frac{1}{2}/\frac{1}{8}/\frac{1}{16} \) training images.

Table 4. Comparison of MAP and Training Time on MS-COCO

  • For the training time, the code length is 48. CSQ-\( \frac{1}{2}/\frac{1}{8}/\frac{1}{16} \) denotes CSQ trained with \( \frac{1}{2}/\frac{1}{8}/\frac{1}{16} \) training images.

4.2.4 Impact of the Size of Similarity-Transfer Set.

To evaluate the impact of the size of the similarity-transfer matrix on retrieval accuracy, we plot the MAP results of DUDH with different values of \( t \) in Figure 6. For CIFAR-10 and SVHN, the size of the similarity-transfer matrix almost has no impact on the retrieval performance. More specifically, when the size of the similarity-transfer matrix is small, e.g., \( t=100 \), DUDH can still achieve competitive retrieval performance. For NUS-WIDE, the MAP value improves with the increasing of the size of the similarity-transfer set at first and then stays stable. Therefore, it requires more similarity-transfer images for DUDH to preserve semantic similarity between images in NUS-WIDE than those in CIFAR-10 and SVHN. It is because the semantic similarities between images in CIFAR-10 and SVHN are much simpler than the ones in NUS-WIDE.

Fig. 6.

Fig. 6. Impact of the size of similarity-transfer set on MAP. The code length is 24.

4.3 Time Complexity

4.3.1 Comparison of Training Costs.

Our other contribution is to accelerate the hash code learning process. Therefore, we further compare the training costs of DUDH with other deep hashing methods. Table 5 displays the detailed computation costs of the six methods on three datasets with 48 bits. Note that the training time of DIHN includes the one for base model training and the one for incremental learning. Both \( W \) and \( V \) are updated on GPU. In general, DUDH shows the best training efficiency, while ADSH is the second best, which is consistent with the computational complexity analysis in Section 3.6.

Table 5.
MethodsCIFAR-10NUS-WIDESVHN
\( \theta \)\( W \)\( V \)TotalMAP\( \theta \)\( W \)\( V \)TotalMAP\( \theta \)\( W \)\( V \)TotalMAP
ADSH [18]20.1-2.323.70.93936.2-15.356.10.90621.5-3.226.50.919
DIHN [44]20.2-2.423.90.94036.3-16.157.30.90222.2-3.127.30.915
CSQ [52]---92.20.947---309.80.886---515.20.935
JLDSH [13]20.2--26.10.94236.5--58.40.91021.8-3.129.00.886
DUDH(Ours)15.10.0040.0615.30.94816.10.070.424.10.90615.30.0030.0515.70.941
  • W and V are updated on GPU.

Table 5. Comparison of Training Time (in Minutes) for Different Variables on Three Datasets with 48 Bits

  • W and V are updated on GPU.

For example, it takes only about 24 minutes for DUDH to achieve promising performance on NUS-WIDE. In contrast, ADSH costs more than two times (56 minutes) as much as DUDH to reach the similar retrieval accuracy. The main advantage of DUDH lies in the process of optimizing \( \theta \) and \( V \). For example, it only takes 0.4 minutes for DUDH to optimize \( V \), while it costs 15.3 minutes for ADSH to complete it. Besides, the training cost for \( W \) is very little. Therefore, the overhead for learning the similarity-transfer matrix can almost be ignored. For DUDH, the training time for \( \theta \) is longer related to the size of database \( n \), which can drastically reduce the training costs especially for large datasets. Besides, the training time for \( V \) is proportional to the size of similarity-transfer set \( t \) instead of the size of query set \( m \), which can also contribute to the reduction of training cost since \( t\lt m \).

To further prove the superiority of DUDH, we compare the training costs between DUDH and the three most competitive methods, i.e., JLDSH, DIHN, and ADSH. As shown in Figure 7, DUDH can gain obvious advantages in all cases. It clearly shows that the training cost for \( W \) is very little under all circumstances, which means the optimization cost for \( W \) can be ignored. As expected, the advantages of DUDH are more obvious with the increasing length of hash codes. In summary, the optimization cost produced by similar-transfer set \( W \) can be ignored, while the cost for \( \theta \) and \( V \) can be largely reduced.

Fig. 7.

Fig. 7. Comparison of the training time (in minutes) among asymmetric deep hashing methods on three datasets. The four sub-figures in each row show the training time on the same dataset with different code length. Best viewed in color.

4.3.2 Differences between Optimizing with GPU and CPU.

Table 6 shows the training time for \( W \) and \( V \) with GPU/CPU on three datasets with 48 bits. Note that it is time consuming to optimize \( \theta \) with CPU. Therefore, we only discuss \( W \) and \( V \) here. Generally, it takes much more time for JLDSH, ADSH, DIHN, and DUDH to update \( V \) and \( W \) on CPU than on GPU. The advantages of DUDH become much more obvious when training with CPU. For example, on CIFAR-10, ADSH costs about four times (72.9 minutes) as much as DUDH to reach the similar retrieval accuracy. On SVHN, ADSH costs about five times (90.8 minutes) as much as DUDH to converge, and its retrieval performance on SVHN is also worse than that of DUDH. An interesting observation is that it costs less time to train DUDH with CPU than to train ADSH/DIHN with GPU on CIFAR-10 and SVHN. Besides, though it takes additional time for DUDH to optimize \( W \) with CPU, the training cost for \( W \) is still negligible, which can further prove the superiority of DUDH.

Table 6.
MethodsCIFAR-10NUS-WIDESVHN
WVTWVTWVT
ADSH [18]-52.172.9-218.9264.8-67.890.8
DIHN [44]-53.273.4-220.2266.3-68.291.4
JLDSH [13]-53.878.7-221.4273.3-69.997.3
DUDH (Ours)0.023.219.10.4103.9124.60.023.7519.6

Table 6. Comparison of Training Time (in Minutes) for Different Variables with CPU on Three Datasets with 48 Bits

4.3.3 Impact of the Size of Similarity-Transfer Set.

Figure 8 shows the training time of DUDH when the size of \( W \) varies. In general, with the increasing value of \( t \), the training time for optimizing \( W \), \( V \), and \( \theta \) increases accordingly, which is consistent with the computational complexity listed in Table 2. Specifically, when optimizing \( V \) with CPU, with the increasing value of \( t \), the training time for optimizing \( V \) rises notably. When optimizing \( V \) with GPU, the training cost of \( V \) is dramatically reduced, which is much less than the training cost of \( \theta \). For \( W \), its training cost can almost be ignored no matter whether it is optimized with GPU or CPU.

Fig. 8.

Fig. 8. Comparison of the training time (in minutes) for different variables when \( t \) varies on three datasets. The four sub-figures in each row show the training time on three datasets when optimizing W and V with CPU/GPU. Best viewed in color.

4.3.4 MAP Comparison When Training Time Is Similar.

We further conduct experiments that reduce the number of training or query images for ADSH-based methods (JLDSH, ADSH, and DIHN) to make them have a similar training time as DUDH. Table 7 shows the comparison of MAP on three datasets with 48 bits when the training time is similar by reducing the number of training images. Specifically, we randomly select a subset of database images as the training images. After the training process, we adopt the CNN model to generate hash codes for database images, which is different from the original settings. As listed in the table, the training time of ADSH-based methods can drop to a similar level as DUDH when using only a quarter of database images. However, the performances of existing methods also drop significantly accordingly, and DUDH can achieve much better retrieval performance with similar training time. Table 8 shows the comparison of MAP on three datasets with 48 bits when the training time is similar by reducing the number of sampled query images. Specifically, we reduce the number of query images sampled from database images. Similar to the previous results, DUDH can achieve the best retrieval performance while using the least time. However, the performance gap between ADSH-based methods and DUDH has narrowed compared to Table 7. The reason is that the database hash codes in the former experiment have to be generated by the CNN model, thus leading to the larger quantization errors. In contrast, the database hash codes in latter experiment are directly learned in the optimization process and the quantization errors are smaller.

Table 7.
MethodsCIFAR-10NUS-WIDESVHN
srtimeMAPsrtimeMAPsrtimeMAP
ADSH [18]0.2517.50.9250.2524.30.8090.2518.20.864
DIHN [44]0.2517.30.9220.2524.60.8120.2517.90.871
JLDSH [13]0.2518.50.9350.2525.70.8230.2519.70.832
DUDH(Ours)115.30.948124.10.906115.70.941
  • W and V are updated on GPU. sr denotes sampling rate.

Table 7. Comparison of MAP on Three Datasets with 48 Bits When Training Time Is Similar by Reducing the Number of Training Images

  • W and V are updated on GPU. sr denotes sampling rate.

Table 8.
MethodsCIFAR-10NUS-WIDESVHN
mtimeMAPmtimeMAPmtimeMAP
ADSH [18]150017.60.930100025.10.881120016.40.910
DIHN [44]150017.30.926100025.60.878120016.50.912
JLDSH [13]150019.80.933100027.10.889120018.50.870
DUDH(Ours)200015.30.948200024.10.906200015.70.941
  • W and V are updated on GPU.

Table 8. Comparison of MAP on Three Datasets with 48 Bits When Training Time Is Similar by Reducing the Number of Query Images

  • W and V are updated on GPU.

4.4 Parameter Sensitivity

Figure 9 shows the performance of DUDH on three datasets with different parameters. We tune one parameter with the other two fixed. More specifically, on CIFAR-10 and SVHN, we tune \( \lambda \) in the range of [1, 2, 3, 4, 5] by fixing \( t=100 \) and \( \gamma =20 \), respectively. On NUS-WIDE, we tune \( \lambda \) in the range of [1, 2, 3, 4, 5] by fixing \( t=1,\!000 \) and \( \gamma =20 \), respectively. Similarly, we set \( \lambda =5 \) and \( \gamma =20 \) when tuning \( t \). For \( \gamma \), on CIFAR-10 and SVHN, we tune \( \gamma \) in the range of [1, 5, 10, 15, 20] by fixing \( t=100 \) and \( \lambda =5 \). On NUS-WIDE, we tune \( \gamma \) in the range of [1, 5, 10, 15, 20] by fixing \( t=1,\!000 \) and \( \lambda =5 \). As shown in the figure, DUDH obtains stable performance when the values of \( \gamma \) and \( \lambda \) vary. For \( t \), on CIFAR-10 and SVHN, it has almost no impact on MAP. On NUS-WIDE, the MAP value improves with the increasing of \( t \), which is consistent with the results aforementioned.

Fig. 9.

Fig. 9. MAPs versus the variations of \( \lambda \) , \( \gamma \) , and t on three datasets. The code length is 48.

Skip 5CONCLUSIONS Section

5 CONCLUSIONS

In this article, we propose a novel asymmetric deep hashing method, namely DUDH, for large-scale image retrieval. Different from the tight coupling between query and database in conventional asymmetric deep hashing methods, DUDH adopts a similarity-transfer matrix to decouple the query from the database. As a result, both the quantization error and the cost of optimizing the CNN model in our DUDH are no longer related to the size of the database and can be largely reduced. Meanwhile, the training cost for optimizing similarity-transfer codes can be ignored with the proposed constant-approximation optimization solution. Extensive experiments demonstrate that DUDH can achieve state-of-the-art performance with much less training cost.

Footnotes

REFERENCES

  1. [1] Cao Yue, Long Mingsheng, Wang Jianmin, Zhu Han, and Wen Qingfu. 2016. Deep quantization network for efficient image retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence. 34573463.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Chatfield Ken, Simonyan Karen, Vedaldi Andrea, and Zisserman Andrew. 2014. Return of the devil in the details: Delving deep into convolutional nets. In Proceedings of the British Machine Vision Conference.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Chen Yudong, Lai Zhihui, Ding Yujuan, Lin Kaiyi, and Wong Wai Keung. 2019. Deep supervised hashing with anchor graph. In Proceedings of the IEEE International Conference on Computer Vision. 97959803.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Chua Tat-Seng, Tang Jinhui, Hong Richang, Li Haojie, Luo Zhiping, and Zheng Yantao. 2009. NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Cui Hui, Zhu Lei, Li Jingjing, Yang Yang, and Nie Liqiang. 2019. Scalable deep hashing for large-scale social image retrieval. IEEE Transactions on Image Processing 29 (2019), 1271–1284.Google ScholarGoogle Scholar
  6. [6] Deng C., Chen Z., Liu X., Gao X., and Tao D.. 2018. Triplet-based deep hashing network for cross-modal retrieval. IEEE Transactions on Image Processing 27, 8 (2018), 38933903.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Deng Cheng, Yang Erkun, Liu Tongliang, Li Jie, Liu Wei, and Tao Dacheng. 2019. Unsupervised semantic-preserving adversarial hashing for image search. IEEE Transactions on Image Processing 28, 8 (2019), 40324044.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Deng Cheng, Yang Erkun, Liu Tongliang, and Tao Dacheng. 2019. Two-stream deep hashing with class-specific centers for supervised image search. IEEE Transactions on Neural Networks and Learning Systems 31, 6 (2019), 21892201.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Dizaji Kamran Ghasedi, Zheng Feng, Sadoughi Najmeh, Yang Yanhua, Deng Cheng, and Huang Heng. 2018. Unsupervised deep generative adversarial hashing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 36643673.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Do Thanh Toan, Doan Anh Dzung, and Cheung Ngai Man. 2016. Learning to hash with binary deep neural network. In Proceedings of the European Conference on Computer Vision. 219234.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Gionis Aristides, Indyk Piotr, and Motwani Rajeev. 1999. Similarity search in high dimensions via hashing. In Proceedings of the International Conference on Very Large Data Bases. 518529.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Gong Yunchao and Lazebnik Svetlana. 2011. Iterative quantization: A procrustean approach to learning binary codes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 817824.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Gu Guanghua, Liu Jiangtao, Li Zhuoyi, Huo Wenhua, and Zhao Yao. 2020. Joint learning based deep supervised hashing for large-scale image retrieval. Neurocomputing 385 (2020), 348357.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] He Tao, Li Yuan-Fang, Gao Lianli, Zhang Dongxiang, and Song Jingkuan. 2019. One network for multi-domains: Domain adaptive hashing with intersectant generative adversarial networks. In Proceedings of the International Joint Conference on Artificial Intelligence. 24772483.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] He Xiangyu, Wang Peisong, and Cheng Jian. 2019. K-Nearest neighbors hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 28392848.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Jiang Qing-Yuan, Cui Xue, and Li Wu-Jun. 2018. Deep discrete supervised hashing. IEEE Transactions on Image Processing 27 (2018), 59966009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Jiang Qing-Yuan and Li Wu-Jun. 2017. Deep cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 32703278.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Jiang Qing-Yuan and Li Wu-Jun. 2018. Asymmetric deep supervised hashing. In Proceedings of the AAAI Conference on Artificial Intelligence. 33423349.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Jiang Qing-Yuan and Li Wu-Jun. 2019. Discrete latent factor model for cross-modal hashing. IEEE Transactions on Image Processing 28, 7 (2019), 34903501.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Jin S., Yao H., Sun X., Zhou S., Zhang L., and Hua X.. 2020. Deep saliency hashing for fine-grained retrieval. IEEE Transactions on Image Processing 29 (2020), 53365351.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Kang Wang-Cheng, Li Wu-Jun, and Zhou Zhi-Hua. 2016. Column sampling based discrete supervised hashing. In Proceedings of the AAAI Conference on Artificial Intelligence. 12301236.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Krizhevsky Alex and Hinton Geoffrey. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report.Google ScholarGoogle Scholar
  23. [23] Li Chuanxiang, Yan Ting-Kun, Luo Xin, Nie Liqiang, and Xu Xin-Shun. 2019. Supervised robust discrete multimodal hashing for cross-media retrieval. IEEE Transactions on Multimedia 21 (2019), 28632877.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Li Qi, Sun Zhenan, He Ran, and Tan Tieniu. 2017. Deep supervised discrete hashing. In Proceedings of the Conference and Workshop on Neural Information Processing Systems. 24822491.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Li Shuyan, Chen Zhixiang, Lu Jiwen, Li Xiu, and Zhou Jie. 2019. Neighborhood preserving hashing for scalable video retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 82128221.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Li Wu-Jun, Wang Sheng, and Kang Wang-Cheng. 2016. Feature learning based deep supervised hashing with pairwise labels. In Proceedings of the International Joint Conference on Artificial Intelligence. 17111717.Google ScholarGoogle Scholar
  27. [27] Lin Guosheng, Shen Chunhua, Shi Qinfeng, Hengel Anton Van den, and Suter David. 2014. Fast supervised hashing with decision trees for high-dimensional data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19631970.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Lin Kevin, Lu Jiwen, Chen Chu Song, and Zhou Jie. 2016. Learning compact binary descriptors with unsupervised deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11831192.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence. 2014. Microsoft Coco: Common objects in context. In European Conference on Computer Vision. Springer, 740755.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Liu Haomiao, Wang Ruiping, Shan Shiguang, and Chen Xilin. 2016. Deep supervised hashing for fast image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 20642072.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Liu Wei, Wang Jun, Ji Rongrong, Jiang Yu-Gang, and Chang Shih-Fu. 2012. Supervised hashing with kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 20742081.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Luo Xin, Zhang Peng Fei, Huang Zi, Nie Liqiang, and Xu Xin Shun. 2019. Discrete hashing with multiple supervision. IEEE Transactions on Image Processing 28 (2019), 2962–2975.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Netzer Yuval, Wang Tao, Coates Adam, Bissacco Alessandro, Wu Bo, and Ng Andrew Y.. 2011. Reading digits in natural images with unsupervised feature learning. In Proceedings of the Conference and Workshop on Neural Information Processing Systems.Google ScholarGoogle Scholar
  34. [34] Nie Xiushan, Jing Weizhen, Cui Chaoran, Zhang Jason, Zhu Lei, and Yin Yilong. 2019. Joint multi-view hashing for large-scale near-duplicate video retrieval. IEEE Transactions on Knowledge and Data Engineering 32 (2019), 1951–1965.Google ScholarGoogle Scholar
  35. [35] Raginsky Maxim and Lazebnik Svetlana. 2009. Locality-sensitive binary codes from shift-invariant kernels. In Proceedings of the Conference and Workshop on Neural Information Processing Systems. 15091517.Google ScholarGoogle Scholar
  36. [36] Shen Fumin, Gao Xin, Liu Li, Yang Yang, and Shen Heng Tao. 2017. Deep asymmetric pairwise hashing. In Proceedings of the ACM International Conference on Multimedia. 15221530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Shen Fumin, Mu Yadong, Yang Yang, Liu Wei, Liu Li, Song Jingkuan, and Shen Heng Tao. 2017. Classification by retrieval: Binarizing data and classifiers. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 595604.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Shen Fumin, Shen Chunhua, Liu Wei, and Shen Heng Tao. 2015. Supervised discrete hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3745.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Shen Fumin, Xu Yan, Liu Li, Yang Yang, Huang Zi, and Shen Heng Tao. 2018. Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (2018), 30343044.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Tang Jinhui, Lin Jie, Li Zechao, and Yang Jian. 2018. Discriminative deep quantization hashing for face image retrieval. IEEE Transactions on Neural Networks and Learning Systems 29 (2018), 61546162.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Vedaldi Andrea and Lenc Karel. 2015. Matconvnet: Convolutional neural networks for Matlab. In Proceedings of the ACM International Conference on Multimedia. 689692.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Wang Jingdong, Zhang Ting, Song Jingkuan, Sebe Nicu, and Shen Heng Tao. 2018. A survey on learning to hash. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2018), 769790.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Weiss Yair, Torralba Antonio, and Fergus Rob. 2009. Spectral hashing. In Proceedings of the Conference and Workshop on Neural Information Processing Systems. 17531760.Google ScholarGoogle Scholar
  44. [44] Wu Dayan, Dai Qi, Liu Jing, Li Bo, and Wang Weiping. 2019. Deep incremental hashing network for efficient image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 90699077.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Wu Lin, Wang Yang, Ge Zongyuan, Hu Qichang, and Li Xue. 2018. Structured deep hashing with convolutional neural networks for fast person re-identification. Computer Vision and Image Understanding 167 (2018), 6373.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Xia Rongkai, Pan Yan, Lai Hanjiang, Liu Cong, and Yan Shuicheng. 2014. Supervised hashing for image retrieval via image representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence. 21562162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Xie De, Deng Cheng, Li Chao, Liu Xianglong, and Tao Dacheng. 2020. Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Transactions on Image Processing 29 (2020), 36263637.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Yan Xinyu, Zhang Lijun, and Li Wu Jun. 2017. Semi-supervised deep hashing with a bipartite graph. In Proceedings of the International Joint Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Yang Erkun, Deng Cheng, Liu Tongliang, Liu Wei, and Tao Dacheng. 2018. Semantic structure-based unsupervised deep hashing. In Proceedings of the International Joint Conference on Artificial Intelligence. 10641070.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Yang Erkun, Deng Cheng, Liu Wei, Liu Xianglong, Tao Dacheng, and Gao Xinbo. 2017. Pairwise relationship guided deep hashing for cross-modal retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Yang Erkun, Liu Tongliang, Deng Cheng, Liu Wei, and Tao Dacheng. 2019. DistillHash: Unsupervised deep hashing by distilling data pairs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 29462955.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Yuan Li, Wang Tao, Zhang Xiaopeng, Tay Francis E. H., Jie Zequn, Liu Wei, and Feng Jiashi. 2020. Central similarity quantization for efficient image and video retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 30833092.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Zhang Haofeng, Liu Li, Long Yang, and Shao Ling. 2018. Unsupervised deep hashing with pseudo labels for scalable image retrieval. IEEE Transactions on Image Processing 27 (2018), 1626–1638.Google ScholarGoogle Scholar
  54. [54] Zhang Jian and Peng Yuxin. 2019. Multi-pathway generative adversarial hashing for unsupervised cross-modal retrieval. IEEE Transactions on Multimedia 22 (2019), 174–187.Google ScholarGoogle Scholar
  55. [55] Zhang Jian and Peng Yuxin. 2019. SSDH: Semi-supervised deep hashing for large scale image retrieval. IEEE Transactions on Circuits and Systems for Video Technology 29 (2019), 212225.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Zhang Peichao, Zhang Wei, Li Wu-Jun, and Guo Minyi. 2014. Supervised hashing with latent factor models. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 173182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Zhang Ruimao, Lin Liang, Zhang Rui, Zuo Wangmeng, and Zhang Lei. 2015. Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE Transactions on Image Processing 24 (2015), 47664779.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Zhao Fang, Huang Yongzhen, Wang Liang, and Tan Tieniu. 2015. Deep semantic ranking based hashing for multi-label image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 15561564.Google ScholarGoogle Scholar
  59. [59] Zhou Xiang, Shen Fumin, Liu Li, Liu Wei, Nie Liqiang, Yang Yang, and Shen Heng Tao. 2020. Graph convolutional network hashing. IEEE Transactions on Cybernetics 50 (2020), 14601472.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Zhu Han, Long Mingsheng, Wang Jianmin, and Cao Yue. 2016. Deep hashing network for efficient similarity retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence. 24152421.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Zhuang Bohan, Lin Guosheng, Shen Chunhua, and Reid Ian. 2016. Fast training of triplet-based deep binary embedding networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 59555964.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Deep Uncoupled Discrete Hashing via Similarity Matrix Decomposition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 1
      January 2023
      505 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3572858
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 January 2023
      • Online AM: 12 March 2022
      • Accepted: 2 March 2022
      • Revised: 22 January 2022
      • Received: 15 August 2021
      Published in tomm Volume 19, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)462
      • Downloads (Last 6 weeks)58

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!