Black-box Adversarial Attacks against Dense Retrieval Models: A Multi-view Contrastive Learning Method

Neural ranking models (NRMs) and dense retrieval (DR) models have given rise to substantial improvements in overall retrieval performance. In addition to their effectiveness, and motivated by the proven lack of robustness of deep learning-based approaches in other areas, there is growing interest in the robustness of deep learning-based approaches to the core retrieval problem. Adversarial attack methods that have so far been developed mainly focus on attacking NRMs, with very little attention being paid to the robustness of DR models. In this paper, we introduce the adversarial retrieval attack (AREA) task. The AREA task is meant to trick DR models into retrieving a target document that is outside the initial set of candidate documents retrieved by the DR model in response to a query. We consider the decision-based black-box adversarial setting, which is realistic in real-world search engines. To address the AREA task, we first employ existing adversarial attack methods designed for NRMs. We find that the promising results that have previously been reported on attacking NRMs, do not generalize to DR models: these methods underperform a simple term spamming method. We attribute the observed lack of generalizability to the interaction-focused architecture of NRMs, which emphasizes fine-grained relevance matching. DR models follow a different representation-focused architecture that prioritizes coarse-grained representations. We propose to formalize attacks on DR models as a contrastive learning problem in a multi-view representation space. The core idea is to encourage the consistency between each view representation of the target document and its corresponding viewer via view-wise supervision signals. Experimental results demonstrate that the proposed method can significantly outperform existing attack strategies in misleading the DR model with small indiscernible text perturbations.


INTRODUCTION
Information retrieval (IR) systems typically employ a multi-stage search pipeline, including the first-stage retrieval and the re-ranking stage [13].The first-stage retrieval returns an initial set of candidate documents from a large repository, and the re-ranking stage reranks those candidates.Dense retrieval (DR) models [13,59] and neural ranking models (NRMs) [7,53] offer substantial performance improvements in the retrieval and re-ranking stage, respectively.
By modifying normal examples with malicious human-imperceptible perturbations, deep learning-based models can be deceived into providing attacker-desired inaccurate predictions [45].DR models and NRMs are prone to inherit the adversarial vulnerability of general neural networks, emphasizing the need for reliable and robust neural IR systems.Exploring potential adversarial attacks against neural models in IR is an important step towards this goal: Such explorations help identify vulnerabilities, serve as a surrogate to evaluate robustness before real-world deployment, and, consequently, aid in devising appropriate countermeasures.
To date, much attention has been devoted to the design of adversarial attacks against NRMs [27,29,51].Given a neural ranking model, the attack aims to promote a low-ranked target document to higher positions via human-imperceptible perturbations.In contrast, little effort has been devoted to investigating how adversarial attacks affect DR models.We believe it is important to address this knowledge gap.Firstly, like NRMs, DR models are increasingly vital in practical IR systems.Adversarial attacks can expose their weaknesses and provide insights for developing more robust search engines.Secondly, within a multi-stage search pipeline, if black-hat search engine optimization practitioners [15] cannot ensure a target document successfully passes the first-stage retrieval, they will not have the chance to promote it in rankings in the final ranked list.
Adversarial attacks against DR models.We are the first to develop adversarial attacks against DR models.The first research question is: What is the goal of attacking DR models?Based on the adversarial attacks against NRMs and inspired by properties of the first-stage retrieval, we propose to define an attack task, the adversarial retrieval attack (AREA) against DR models.As shown in Figure 1, given a DR model, the AREA task is to retrieve a target document outside the initial set of  candidate documents for a given query, by perturbing the document content in a semanticpreserving way.We focus on a practical and challenging decisionbased black-box setting, akin to the adversarial attacks against NRMs [27,29,51], where the adversary can only query the target DR model without direct model information access.For consistency with the multi-stage search pipeline in practical IR systems, we simulate a black-box "retrieval and re-ranking" pipeline, wherein the target DR model initially narrows down the candidate set to  documents, followed by an NRM determining the final top- documents ordering.In this way, we query the pipeline and assess the final decision to perform attacks in a black-box manner.
Using NRM attack methods against DR models.To address the AREA task, the second research question arises: Do existing attack methods against NRMs perform as well against DR models as against NRMs?Our results show that these methods lag behind a simple term spamming attack typically involving query keyword stuffing [15].Deep neural networks with interaction-focused architectures are usually employed for NRMs, while less complicated models with representation-focused architectures are adopted in DR models [9,14,55].Specifically, when attacking NRMs, the perturbation update relies on modeling fine-grained interactions between attacked documents and queries.In contrast, DR models depend on coarse-grained text representations for effective search in the representation space.This distinction renders the existing attacks against NRMs unsuitable for deceiving DR models.
Attack models tailored for DR models.The analysis we have just summarized leads to our third research question: Can we design an effective adversarial attack method tailored for DR models?As DR conducts retrieval purely in the representation space [13,59], we introduce a multi-view contrastive learning-based adversarial retrieval attack (MCARA) to generate adversarial examples.Our key idea is to enhance the consistency of semantic representations between the target document and the  retrieved documents in the initial set using view-wise supervision.Specifically, after training a surrogate model to demystify the target DR model, we first obtain different viewers to represent documents in the initial set via a clustering technique.We produce multi-view representations for the target document through viewers.Then, a view-wise contrastive loss is applied to draw each view representation of the target document closer to its corresponding viewer in the semantic space while distancing it from nearest-neighbor documents outside the initial set.In this way, the attacker captures informative and discriminative semantic signals via view-level contrastive supervision.Finally, following [51], we use prior-guided gradients of the view-wise contrastive loss to identify the important words in a document, and adopt projected gradient descent [34] to generate gradient-based adversarial perturbations.
Experiment on two web search benchmark datasets show that MCARA effectively promotes the target document into the candidate set with high attack success and low time cost.According to both automatic and human evaluations, MCARA retains target documents semantics and fluency.Moreover, the adversarial examples produced by MCARA can deceive the NRM to some extent.

RELATED WORK
Dense retrieval.Dense retrieval [59] conducts first-stage retrieval [13] in the embedding space and has demonstrated several advantages over sparse retrieval [26].It typically employs a dual-encoder architecture to embed queries and documents into low-dimension embeddings [47]using these similarities as estimated relevance scores [13].By fine-tuning BERT with in-batch negatives [19], DR models have been shown to outperform BM25 [41].Subsequently, research has explored various pre-training [10,33], and fine-tuning techniques [21,39,57] to enhance DR models, achieving state-ofthe-art performance on IR tasks.Besides high effectiveness, the robustness of DR models, such as out-of-distribution [28,46,56] and query variations [4,38], has been focused.Unlike the work listed above, we focus on the adversarial robustness of DR models.
Adversarial attacks in IR.In IR, black-hat search engine optimization (SEO) has been a threat to search systems since the dawn of the world wide web [15].Black-hat SEO usually aims to increase the exposure of a document owner's pages by maliciously manipulating documents, resulting in a decline in the quality of search results and inundation of irrelevant pages [1].Research has shown that neural ranking models (NRMs) inherit the adversarial vulnerabilities of deep neural networks, making them susceptible to small perturbations added to documents [52].Research into adversarial attacks against NRMs has been growing, with the goal of promoting a target document in the rankings w.r.t. a query via imperceptible perturbations.Prior work investigates the vulnerability of NRMs in white-box [44,50] or black-box [29,51] scenarios, using word substitution [51] or trigger injection [27] as document perturbations.Similar to NRMs, DR models are also likely to inherit adversarial vulnerabilities of deep neural networks.The adversarial vulnerability of DR models remains under-explored.
Multi-view document representations.A single representation vector may not be able to properly model the fine-grained semantics of a document [59].To tackle this issue, previous work has proposed approaches to explore multiple representations for enhancing the semantic interaction in DR.Poly-Encoder [18] learns multi-representations for modeling the semantics of a text according to multi-views.Zhang et al. [58] introduce multiple viewers to produce multi-view representations to represent documents and enforce them to align with different queries.In this work, we generate multi-view representations of a target document through viewers.
Contrastive learning.Contrastive learning [22] is a branch of self-supervised representation learning in deep learning, which has been widely applied in computer vision [16,17], natural language processing [11,43], and social network [24,25,48,49].The key idea is to contrast pairs of semantically similar and dissimilar pairs of data, encouraging the representations of similar pairs to be close and those of dissimilar pairs to be further apart.In the context of dense retrieval, some work has adopted contrastive learning in guiding models to learn more distinguishable representations of documents [31,54].Unlike existing work, we aim to obtain an effective attack signal by pulling each view representation of the target document towards its corresponding viewer, while pushing it away from representations of counter-viewers.

PROBLEM STATEMENT
Given a query , the aim of first-stage retrieval is to recall a subset of potentially relevant documents from a large corpus C = { 1 ,  2 , . . .,   } with a total of  documents.In general, a first-stage retrieval model produces a relevance score  (, ) of the query  for each document  in C, and then recalls a set of candidates  by selecting the top- documents with the highest predicted scores.Here,  represents the number of candidates in , which is usually significantly smaller than the corpus size  .For example, the retrieval model outputs the initial set In this case,   possesses the lowest relevance score within .
Objective of the adversary.The adversarial retrieval attack (AREA) task is to fool the DR models into retrieving a target document outside the initial set of  candidates in response to a query appearing in the  initial candidates, by finding an optimized and imperceptible perturbation .Formally, given a query  and a target document  out of the initial set, the goal is to construct a valid adversarial example   =  ⊕ , that can be ranked above the -th position.Specifically,  is crafted to conform to the following requirements, Recall(,  ⊕ ) ≤  such that  ≤ , where Recall(,  ⊕ ) denotes the ranking position of the adversarial example recalled by .A smaller value of Recall denotes a higher ranking.In this case, the rank position of the original  is larger than . is the maximum perturbation upper bound of .Ideally, the perturbation  should preserve the semantics of document  and be imperceptible to human judges yet misleading to DR models.
In this work, we use the number of word substitutions and the similarity of the substituted words as restrictions.
Decision-based black-box attacks.Since most real-world search engines are black boxes, here, we focus on the decision-based blackbox attack setting for the AREA task, where the model parameters are inaccessible to the adversary.To align with practical IR systems' multi-stage pipelines, we simulate a retrieval-ranking pipeline by incorporating a representative NRM following the target DR model, refer to as black-box.We train a surrogate model [37] to imitate the target DR model, by querying the pipeline for the final ranking.

OUR METHOD
We first analyze the difference between attacking NRMs and DR models, and then introduce our attack method for AREA task.

Representation and interaction behavior
To address the AREA task, it is natural to consider existing attack methods designed for NRMs.However, as our experimental results of Section 6.1 show, unlike the success in NRMs, these methods designed for NRMs do not achieve promising performance.Below, we investigate the potential reasons from several perspectives.
Different model architectures in DR models and NRMs.During first-stage retrieval the aim is to discriminate a small set of candidate documents from (potentially) millions of documents in a coarsegrained way [13].To this end, DR models with their representationfocused architectures (i.e., dual-encoder) are extensively adopted to evaluate relevance based on high-level representations of each input text and to ensure efficiency [59].In contrast, the re-ranking stage conducts fine-grained relevance matching between a query and a small set of candidate documents [14].To this end, NRMs with their interaction-focused architectures (i.e., cross-encoder) are widely used to directly learn from interactions rather than from individual representations and to maintain good system performance [55].
Different guidelines when attacking DR models and NRMs.
To promote a target document in rankings, attacks on NRMs leverage the interaction signals with attention across the query and the target document tokens.The adversary captures the signal of inner-document representativeness [32,39], which guides the computation of the update direction for adversarial perturbation.In contrast, when attacking DR models, it is important to consider inter-document representativeness [30,31], since the dual-encoder architecture enables the encoding of queries and documents independently.To include the target document in the initial candidate set, the adversary aims to find a minimal perturbation that maximizes the probability of the DR model in distinguishing the target document from millions of documents in the embedding space.
In summary, the variation in model architectures and attack supervision signals pose considerable challenges when attempting to deceive DR models using attacks intended for NRMs.Consequently, it is important to develop attack techniques tailored for DR models.

Overview of MCARA
High-quality text representation is the foundation of DR [31].We propose to formalize the AREA task as a contrastive learning problem [22] in the representation space: (i) Push target document  away from other documents outside the initial set; and (ii) Pull  closer to the candidates inside the initial set.However, contrasting all documents inside and outside the initial set incurs computational overhead and lacks directional control.In this paper, we introduce representative viewers for the  candidates in the initial set and use the nearest neighbors of the target document in the semantic space as counter-viewers to serve as counterexamples.
Considering the viewers, a simple method to conduct contrastive learning is to directly encourage the representation of the target document and that of each viewer in the semantic space to be closer while keeping counter-viewers away.Nevertheless, such simultaneous attraction in multiple directions towards a single document representation could potentially lead to information loss.Here, we introduce a novel multi-view contrastive adversarial retrieval attack (MCARA).The key idea is to disentangle the target document embedding into multi-view representations through viewers, and then enhance the consistency between each view representation and the representation of its corresponding viewer.MCARA can be decomposed into three dependent components: (i) A surrogate model imitation trains a surrogate retrieval model to prepare for a gradient attack; (ii) Multi-view representation learning finds the viewers and counter-viewers, and generates multi-view representations for the target document; and (iii) Attack via view-wise contrastive loss generates the embedding space perturbations by calculating the gradients of the surrogate model via contrast.The overall architecture of MCARA is shown in Figure 2.

Surrogate model imitation
To simulate a realistic scenario, we regard the "retrieval and reranking" pipeline as a unified black-box model, where the retrieval model serves as the target DR model for our attack.For each dataset used in this study, we first train the target DR model and then train the NRM based on the retrieved candidates given by the DR model.We use state-of-the-art models [10,32] as the backbone of the DR model and the NRM in the pipeline, respectively.By sending queries to the black-box pipeline and obtaining the ranked list (given by the NRM), following [3,51], we leverage the relative relevance information among the ranked list [8] to construct a synthetic dataset, for training a surrogate retrieval model.
Given a query   from a pre-collected query collection Q that accesses the black-box pipeline, we get the ranking result   of  documents returned by the pipeline.We generate pseudo-labels as the ground-truth by treating the first ℓ ranked documents   [: ℓ] as relevant documents  +  .Generally, training a well-performed DR model needs to combine random negative sampling and hard negative sampling [54].Therefore, we treat the other documents   [ℓ + 1 : ] as hard negative examples  −  , and the ranked documents of other queries except for   in  are regarded as random negative examples.We initialize the surrogate retrieval model f using the vanilla BERT.The relevance score calculated by the surrogate retrieval model is f (•, •).We train f by optimizing a pairwise loss function as the negative log-likelihood of relevant documents: where R denotes the set of query collection's ranking results, and  ′  is the ranking result of other queries.

Multi-view representation learning
Based on the surrogate model f , we first learn multiple viewers from representations of  documents within the initial set returned to a query , and then generate multi-view representations to represent the target document  through the learned viewers.In addition to the viewers, we employ a set of counter-viewers from representations of documents outside the initial set to prepare for attacks.
Deriving multiple viewers from the initial set of  candidates.
The key idea is to find several indicative viewers to represent the documents within the initial set and provide guidance for the attack process.Here, the viewer is defined as a cluster of documents sharing the same topic.We will try other ways of finding viewers in the future.Given a query , we first obtain the initial set  of  candidates from the simulated pipeline.Then, we use the document embedding generated by f as the representation of each document in .We apply clustering to the representations of  candidates to obtain  clusters where  ≪ , and leverage the representation of each centroid as a topical viewer.
Specifically, given the  documents in the initial set , we use the K-Means clustering algorithm [42] to find a set  of  viewers,  = Kmeans ,  f ,  . ( Here,  ( f , ) are the embeddings of all  documents in , with respect to the surrogate model f .In this way, we can obtain the representations of  viewers, denoted as  = { 1 ,  2 , . . .,   }.
Generating multi-view representations of the target document through viewers.The key idea is to disentangle the view information of the target document aligning to the given viewers, enabling us to effectively extract the specific relevance signal within the candidate set.We use a fully-connected layer with activation function ReLU as a multi-view representations generator.We feed the target document embedding   obtained by f and the representations of viewers  into the generator.To obtain  multi-view representations  = { 1 ,  2 , . . .,   } aligned to viewers, following [5,6], we encourage the   and its corresponding viewer   to be similar while retaining the original information by minimizing the square loss, i.e., where   refers to the -st viewer representation and   denotes -st the disentangled view representation of target document.We maintain the distinction between multi-view representations by maximizing the cosine similarity between them: Combining the two optimization objectives, the multi-view representations  = { 1 ,  2 , . . .,   } of  are calculated by, where  is a trade-off parameter.
Obtaining multiple counter-viewers from dynamic surrounding documents.To enable a contrastive learning based attack, we also propose to find the counter-viewers from documents outside the initial set , pushing the target document away from its original position in the representation space.To achieve this goal, we use the dynamic surrounding documents of the target document as counter-viewers for contrast.During the attack process, a dynamic surrounding document   is the document among the top- nearest-neighbor to the current perturbed document in the semantic space of the surrogate model f .We collect the embedding of each dynamic surrounding document: where  (•) is a function returning top- documents closest to the target document  in corpus C under the semantic space of f , and f (  ) is the embedding of   .Finally, we get the representations of  counter-viewers, denoted as  = { 1 ,  2 , . . .,   }.

Attack via view-wise contrastive loss
Based on the multi-view representations of the target document, viewer representations and counter-viewer representations, we describe how to achieve the attack using a view-wise contrast loss.
View-wise contrastive loss.The view-wise contrastive loss aims to pull each view representation of the target document close to its corresponding viewer, and push it away from the representations from all counter-viewers.Given a query, we aim to find the optimal attack direction for the target document under the semantic space of the surrogate model f with a contrastive loss L  : where   is a viewer representation in  from Eq. ( 6),   is a view representation in  from Eq. ( 3),   is a counter-viewer representation in  from Eq. ( 7),  is the number of viewers, sim(•) is the dot-product function, and  is the temperature hyperparameter.
Perturbation word selection.As demonstrated in [23,51], only some important words in the target document act as influential signals for the final attack performance.Therefore, for each token ℎ  in the target document, we calculate the gradient magnitude  ℎ  to the embedding vector of each token in f using L  , where the L  is the view-wise contrastive loss from Eq. ( 8), and is the embedding vector of ℎ  obtained by f .Then, the word importance  ℎ  of each token ℎ  is calculated by We only attack the top- words with the highest importance for each target document , i.e.,  = { 1 ,  2 , . . .,   }.
Embedding perturbation and synonym substitution.We adopt the projected gradient descent [34] to generate gradient-based adversarial perturbations to the embedding space.Specifically, for each step  in total iterations , we calculate the gradient    of L  with respect to target document  on embedding space.After  iterations, we obtain the perturbed embeddings   of all the important words  in :   =  Then, we substitute the important words with synonyms .Following [51], we utilize the embedding similarity of counter-fitted word embeddings [35] to determine synonyms and employ the same greedy word replacement strategy computed by perturbed important word embeddings   and synonym embeddings.Unlike existing work [29,51], we select words from the documents in the initial candidate set as the pool of potential synonym set .To further consider semantic and fluency constraints of the perturbed sentence, we use the language model perplexity [40] threshold  of the sentence containing the replacement word to refine the selection of the synonym set.

EXPERIMENTAL SETTINGS
In this section, we introduce our experimental settings.

Datasets
Benchmark datasets.We conduct experiments on two standard dense retrieval benchmark datasets: the MS MARCO Document Ranking dataset [36] (MS-MARCO Document) which is a largescale benchmark dataset for web document retrieval, with about 3.21 million documents, and the MS MARCO Passage Ranking dataset [36] (MS-MARCO Passage) which is another large-scale benchmark dataset for web passage retrieval, with about 8.84 million passages.The relevant documents to user queries are obtained using Bing, thereby simulating real-world web search scenarios.
Target queries and documents.Following [3,51], for each dataset, we randomly sample 500 Dev queries as target queries for evaluation.We adopt three types of target documents outside the initial candidate set, which exhibit different levels of attack difficulty, i.e., Easy, Middle, Hard.These documents are sampled from the retrieval results of the target DR model.For each target query, we select a total of 30 target documents.Beyond the above three separate sets of target documents, for each query, we also incorporate a random sampling of 10 documents from the original pool of 30 target documents.These documents are selected to showcase a diverse range of attack difficulties, forming a Mixture level.

Models
Baselines.We compare our method with several representative attack methods: (i) Term spamming (TS) [15] randomly selects a starting position in the target document and replaces the subsequent words with terms randomly sampled from the target query.(ii) TF-IDF simply replaces the important words in the target document, which have the highest TF-IDF scores based on the target query, with their synonyms.(iii) PRADA [51] is a decision-based black-box ranking attack method against NRMs via word substitution.We use the pairwise hinge loss between the target document and the documents from the initial candidate set of DR models to guide the attack.(iv) PAT [27] is an anchor-based ranking attack method against NRMs via trigger generation.We use the pairwise loss between the target document and the anchor (top-1 document) of DR models to guide the attack.
Model variants.We implement two variants of MCAR, denoted as (i) MCARA  removes the multi-view representation learning and directly leverages the single document embedding obtained by the surrogate model to contrast with different viewers.(ii) MCARA  contrasts each viewer representation of the target document with its corresponding viewer independently and then calculates the gradient to find important words accordingly.In this way, we can obtain the intersection of important words found by independent gradient perturbation.

Implementation details
For MS-MARCO Document and MS-MARCO Passage, the size  of the initial candidate set is 100 and 1000 [30,31,33], respectively.To obtain the target documents, for each sampled query in the MS-MARCO Document, the Easy level comprises 10 documents ranked between [101, 200], with documents evenly sampled from the range.The Middle level includes 10 documents ranked between [201, 1000], again with documents evenly sampled from the range.The Hard level consists of 10 documents ranked outside the top 1000, with each document randomly selected from those outside top 1000.For MS-MARCO Passage, the Easy, Middle, and Hard documents are similarly sampled from the ranking range of [1000, 2000], [2000,10000] and outside of the top 10000, respectively.
For the black-box "retrieval and re-ranking" pipeline, we choose a representative DR model called coCondenser [10] as the retriever and also as our target DR model.Following Zhan et al. [57], we fine-tune the pre-trained coCondenser using two-stage hard negatives sampling strategy on the corresponding dataset.We choose a representative NRM called PROP [32] as the re-ranker and fine-tune the pre-trained PROP using the relevance labels and the retrieval results given by coCondenser.Finally, we use the fine-tuned PROP to re-rank the initial candidate set retrieved by the fine-tuned co-Condenser and get the final ranked list for guiding the learning of the surrogate model.For surrogate model imitation, we choose vanilla BERT  [20] as the backbone of the surrogate DR model with a dual-encoder architecture.For each dataset, we utilize the Eval queries as the pre-collected query collection Q.We set ℓ to 1 due to the average number of relevant documents per query.
For multi-view representation learning, the number of viewers and counter-viewers  is set to 5 for MS-MARCO Document and 10 for MS-MARCO Passage, respectively.The trade-off parameter  is 10.We train the multi-view representations generator using our target query-document pairs for 1 epoch with a learning rate of 1e-6.For attack via view-wise contrastive loss, we set the temperature hyperparameter  as 0.1.The total iterations of attack  are 3.The perplexity threshold  is set to 50 for filtering synonyms that do not fluent in the original text.Following [51], the number of substitution words  in MCARA is set to 50 and 20 for the MS-MARCO Document and MS-MARCO Passage, respectively.For a fair comparison, we maintain the same number of substitutions in all baselines.And the trigger length of PAT is set to 10 and 5 for the MS-MARCO Document and MS-MARCO Passage, respectively.

Evaluation metrics
Attack performance.We consider two automatic metrics: (i) Success recall rate (SRR)@ (%) evaluates the percentage of after-attack documents   retrieved into the candidate set  with  ≤  documents.Note that the evaluation with  <  is more strict than that with  = .(ii) Normalized Ranking Shifts Rate (NRS)@ (%) evaluates the relative ranking improvement of after-attacked documents which are successfully recalled into the initial set with  candidates, i.e., NRS @ = (Π  − Π    )/Π  × 100%, where Π  and Π    are the rankings of  and   respectively, produced by the target DR model.Note that if   is not successfully recalled into the initial set of  candidates, its NRS is set to 0. Naturalness performance.We consider three automatic metrics: (i) Automatic spamicity detection, which identifies whether target pages are spam.Following [27], we adopt the utility-based term spamicity method [60] to detect the adversarial examples.(ii) Automatic grammar checkers, which compute the average number of errors in the attack documents.Specifically, we use two online grammar checkers, i.e., Grammarly [12] and Chegg Writing [2], following the settings in [27,29].(iii) Language model perplexity (PPL), which measures the fluency using the average perplexity calculated using a pre-trained GPT-2 model [40].Furthermore, we leverage the human evaluation, which measures the quality of the attacked documents following the criteria in [51].

EXPERIMENTAL RESULTS
In this section, we discuss experimental results, findings and the attack effect between the first-stage retrieval and re-ranking stage discussed in earlier sections of the paper.@10 @100 @100 @10 @100 @100 @10 @100 @100 @10 @100 @100 @100 @1000 @1000 @100 @1000 @1000 @100 @1000 @1000 @100 @1000 @1000 As shown in Table 1, (i) TF-IDF performs poorly, especially on Hard target documents, indicating that the heuristic method is not able to effectively find the most-vulnerable words that help the target model make judgments.(ii) TS performs moderately well, showing that directly adding spamming with query terms helps improve the relevance between the target document and the query.However, spamicity can easily be detected by anti-spamming detection [27,29].We will discuss this further in Section 6.3.(iii) When we look at the attack methods tailored for NRMs (PRADA and PAT), PRADA performs better than PAT.The reason may be that PRADA considers more documents for the pairwise loss calculation, thus obtaining more comprehensive information about the candidate set.However, both PRADA and PAT perform worse than the simple Term Spamming method on DR models.The reason may be that NRMs and DR models have different model architectures and behaviors and thus require different supervision signals to guide the attack process.In general, the adversarial attack against DR models is a non-trivial problem for existing attack methods.

How does MCARA perform on DR models?
Overall performance.The performance of MCARA and its variants in the DR attack scenario can be found in Table 1: (i) Our MCARA outperforms all the baseline methods significantly, illustrating that it is necessary to attack DR models by capturing the inter-document representativeness in the semantic space.Generally, as the difficulty of an attack increases, the performance tends to decrease.We will explore more advanced objectives tailored for challenging documents in the future.(ii) In general, attacks on MS-MARCO Document tend to have a higher success rate compared to MS-MARCO Passage.The reason may be that the number of documents addressing the relevant topic is generally smaller than the number of passages extracted from those documents, offering a more focused and concise set of information.(iii) The improvement of MCARA over MCARA  suggests that incorporating multi-view document representations is more beneficial in finding fine-grained semantic information than a single document representation, and thus facilitates better contrasting between the target document and each viewer.(iv) The improvement of MCARA over MCARA  indicates that optimizing from only one view in the semantic space at a time may lead to disorder in the optimization direction of attacking the target document.
Impact of the number of viewers.We examine the impact of the important hyperparameter  of MCARA, i.e., the number of viewers, on the attack performance.The results on the Mixture target documents in MS-MARCO Document are shown in Figure 3 (a), with similar findings on the other target documents.We observe that the performance gets boosted when more representative viewers are incorporated into contrastive learning.The reason may be that more viewers can help extract sufficient representative signals for the attack.However, the performance gradually decreases when the number of viewers exceeds some threshold.Too many viewers increase the risk of making the clusters less representative, even introducing noise which is not good for contrast.In the future, we will explore other viewer extraction techniques, such as token embeddings and document-query alignment.
The impact of the perplexity threshold.We examine the impact of the fluency constrains hyperparameter , i.e., the perplexity threshold, on MCARA's performance.Lower  implies tighter substitution fluency constraints in the original sentences.The results on the Mixture target documents in MS-MARCO Document are shown in Figure 3 (b), with similar findings on other target documents.Reducing the fluency constraint leads to an improvement in the attack performance, since more disruptive synonyms could be selected.However, such a way may lead to the attack easily being detected, as discussed in Section 6.3.In the future, it is necessary to investigate more flexible ways to strike a balance between attack performance and the imperceptibility of adversarial perturbations.

Naturalness of adversarial examples
We discuss the naturalness of generated adversarial examples with respect to the Mixture-level target documents in MS-MARCO Document, with similar findings on MS-MARCO Passage.Here, we remove the fluency constraints of MCARA in the synonym substitution process, denoted as MCARA − , for comparison.Automatic spamicity detection.Table 2 lists the automatic spamicity detection results.If the spamicity score of an example is higher than a detection threshold, it is detected as suspected spam content.We observe that: (i) As the threshold decreases, the detector intensifies in stringency, leading to an augmented detection rate across all methods.(ii) TS can easily be detected as it integrates numerous repeated query terms into documents.(iii) PAT and PRADA are relatively more undetectable than TS and TF-IDF since they both introduce naturalness constraints.(iv) MCARA outperforms the baselines significantly (p-value < 0.05), demonstrating the effectiveness of the synonym set derived from the words within candidate documents and the fluency constraints.When applying PRADA for MS-MARCO Document, the rankings of 96.7% target documents are improved, with an average boost of 40.1% over the original ranking.When these adversarial examples are applied to the target DR model in first-stage retrieval, as shown in Table 5, 38.6% of them are not recalled.Similar results are observed in MS-MARCO Passage.In the "retrieval and re-ranking" pipeline, only considering the attacks against NRMs could risk target documents failing to be recalled.This underlines the importance of devising adversarial documents for first-stage retrieval.

Black-box vs. white-box attack
In addition to the black-box attack setting, exploring the whitebox setting is valuable for gaining a deeper understanding of our method.We evaluate the retrieval performance of the surrogate model and the target DR model in the black-box pipeline over all the queries in the Dev sets of the MS-MARCO Document and MS-MARCO Passage, respectively.The MRR@100 of the target DR model and surrogate model on the MS-MARCO Document are 41.71 and 38.60, respectively.The MRR@100 of the whole blackbox pipeline is 42.68.The MRR@10 of the target DR model and surrogate model on the MS-MARCO Passage are 39.82 and 36.94,respectively.The MRR@10 of the whole black-box pipeline is 41.46.To simulate a white-box scenario, we designate the target DR model as the surrogate model while keeping other components unchanged in MCARA, denoted as MCARA ℎ .The result of the Mixture target documents is shown in Table 6.The performance of MCARA under the black-box scenario is similar to that under the white-box scenario.This suggests that our surrogate model training method for the black-box pipeline can effectively mimic the behavior of the target DR model to execute threatening attacks on it.PAT, relying solely on Anchor document information to optimize perturbations, has lower time costs but poorer attack performance due to insufficient information.In contrast, PRADA takes longer due to comparisons with more documents, while MCARA reduces the time overhead while achieving excellent attack results through efficient view-wise supervision.

CONCLUSION
In this paper, we proposed the AREA task against DR models, demonstrating that by adding small indiscernible perturbations, the adversarial examples can fool the DR models and pass them into the initial retrieval results.We developed a novel attack method MCARA, which utilizes view-wise supervision to capture the innerdocument representativeness information in DR models for an effective attack.The proposed methodology and experimental results reveal the potential risk and vulnerabilities of DR models.
In future work, it is important to focus on the practical usage of adversarial attacks, specifically for sophisticated real-world search engines that operate with pipelined and ensemble approaches and dynamic corpora.A promising direction would involve designing a general unified attack method that can cater to different DR models and NRMs across multiple corpora and modalities.Besides, developing effective detection and defense mechanisms against such attacks is crucial for ensuring robustness in IR systems.

Figure 1 :
Figure1: The adversarial retrieval attack (AREA) task.models and NRMs are prone to inherit the adversarial vulnerability of general neural networks, emphasizing the need for reliable and robust neural IR systems.Exploring potential adversarial attacks against neural models in IR is an important step towards this goal: Such explorations help identify vulnerabilities, serve as a surrogate to evaluate robustness before real-world deployment, and, consequently, aid in devising appropriate countermeasures.To date, much attention has been devoted to the design of adversarial attacks against NRMs[27,29,51].Given a neural ranking model, the attack aims to promote a low-ranked target document to higher positions via human-imperceptible perturbations.In contrast, little effort has been devoted to investigating how adversarial attacks affect DR models.We believe it is important to address this knowledge gap.Firstly, like NRMs, DR models are increasingly vital in practical IR systems.Adversarial attacks can expose their weaknesses and provide insights for developing more robust search engines.Secondly, within a multi-stage search pipeline, if black-hat search engine optimization practitioners[15] cannot ensure a target document successfully passes the first-stage retrieval, they will not have the chance to promote it in rankings in the final ranked list.

Figure 2 :
Figure 2: The overall architecture of MCARA.After training the surrogate retrieval model: (a) We learn the multi-view representations of the target document by identifying viewers and counter-viewers.(b) During the attack, a view-wise contrast is used to force each view of the target document close to its corresponding viewer, while away from other counter-viewers.

Figure 3 :
Figure 3: The impact of the number of viewers (a) and the perplexity threshold (b) on the attack performance of MCARA.amore focused and concise set of information.(iii) The improvement of MCARA over MCARA  suggests that incorporating multi-view document representations is more beneficial in finding fine-grained semantic information than a single document representation, and thus facilitates better contrasting between the target document and each viewer.(iv) The improvement of MCARA over MCARA  indicates that optimizing from only one view in the semantic space at a time may lead to disorder in the optimization direction of attacking the target document.

Figure 4 :
Figure 4: The average time cost of generating an adversarial document and attack performance of different methods.theaverage time taken by different attack methods to generate an adversarial document using one Tesla V100 GPU.The results on the Mixture target documents in MS-MARCO Document are shown in Figure4, with similar findings on the other target documents.PAT, relying solely on Anchor document information to optimize perturbations, has lower time costs but poorer attack performance due to insufficient information.In contrast, PRADA takes longer due to comparisons with more documents, while MCARA reduces the time overhead while achieving excellent attack results through efficient view-wise supervision.

Table 1 :
Attack performance of MCARA and the baselines; * indicates significant improvements over the best baseline ( ≤ 0.05).

Table 2 :
The detection rate (%) via a representative antispamming method on the MS-MARCO Document.

Table 3 :
[51]7] grammar checkers, perplexity, and human evaluation results on the MS-MARCO Document.Table3lists the results of the automatic grammar checker, PPL, and human evaluation, including the annotation consistency test results (the Kappa value and Kendall's Tau coefficient) following[3,27].For human evaluation, we recruit five annotators to annotate 32 randomly sampled Mixture level adversarial examples from each attack method[3].Following[51], annotators score the Fluency of the mixed examples from 1 to 5; higher scores indicate more fluent examples.In terms of Imperceptibility, annotators determine whether an example is attacked (labeled as 0) or not (labeled as 1).Here, we explore the ability of the proposed MCARA to fool NRMs.For each dataset, we first obtain the candidate set for each query given by the target DR model, including the successful adversarial examples generated by MCARA.The size of the candidate set is 100 and 1000 for MS-MARCO Document and MS-MARCO Passage, respectively.Then, we directly feed these candidate sets to the NRM in the black-box pipeline.Avg.rankmeasures the average ranking of adversarial examples in the final ranked list and T50% and T10% measure the percentage of adversarial examples entering the top-50% and top-10% of the final ranking list.As shown in Table4, some adversarial examples, as determined by the NRM, are positioned among the high-ranked entries in the final list.This suggests that the proposed MCARA method also poses a threat to NRMs.

Table 4 :
The performance of adversarial examples generated by MCARA on attacking against NRMs.We employ PRADA and PAT to attack NRMs and directly feed the adversarial examples into the target DR model.Here, Drop (%) measures the percentage of adversarial examples, whose rankings are decreased in the first-stage retrieval compared to the original rankings given by the target DR model.NRS (%) measures the relative ranking changes of adversarial examples in the first-stage retrieval.Here we remove the constraint on the  candidates.Less than zero indicates an overall decrease in ranking, and greater than zero indicates an overall increase in ranking.Lost (%) counts the percentage of adversarial examples against NRMs that cannot be recalled in the first-stage retrieval.

Table 5 :
The performance of adversarial examples against NRMs on attacking against DR models.

Table 6 :
Attack performance comparisons of MCARA between the black-box and the white-box attack setting.
Considering practical application scenarios, an effective attack method should also be efficient, meaning it should find the optimal perturbation with minimal time cost.Hence, we measure