Papers
arxiv:2501.18636

SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model

Published on Jan 28
· Submitted by jimi888 on Feb 4
Authors:
,
,
,
,
,
,
,

Abstract

The indexing-retrieval-generation paradigm of retrieval-augmented generation (RAG) has been highly successful in solving knowledge-intensive tasks by integrating external knowledge into large language models (LLMs). However, the incorporation of external and unverified knowledge increases the vulnerability of LLMs because attackers can perform attack tasks by manipulating knowledge. In this paper, we introduce a benchmark named SafeRAG designed to evaluate the RAG security. First, we classify attack tasks into silver noise, inter-context conflict, soft ad, and white Denial-of-Service. Next, we construct RAG security evaluation dataset (i.e., SafeRAG dataset) primarily manually for each task. We then utilize the SafeRAG dataset to simulate various attack scenarios that RAG may encounter. Experiments conducted on 14 representative RAG components demonstrate that RAG exhibits significant vulnerability to all attack tasks and even the most apparent attack task can easily bypass existing retrievers, filters, or advanced LLMs, resulting in the degradation of RAG service quality. Code is available at: https://github.com/IAAR-Shanghai/SafeRAG.

Community

ACL_8pgs_Page3.png
This paper introduces SafeRAG, a benchmark designed to assess the security vulnerabilities of RAG against data injection attacks. We identified four critical attack surfaces: noise, conflict, toxicity, and DoS, and revealed significant weaknesses across the retriever, filter, and generator components of RAG.
ACL_8pgs_Page1.png
By proposing novel attack strategies such as silver noise, inter-context conflict, soft ad, and white DoS, we exposed critical gaps in existing defenses and demonstrated the susceptibility of RAG systems to subtle yet impactful threats.
ACL_8pgs_Page5.png
ACL_8pgs_Page6.png
ACL_8pgs_Page7.png

Paper submitter

The Retrieval-Augmented Generation (RAG) paradigm significantly enhances the capability of large language models (LLMs) in knowledge-intensive tasks. However, it also introduces new security risks—externally retrieved information may be maliciously tampered with, thereby affecting the reliability of generated content. To address this, we propose SafeRAG, the first Chinese RAG security evaluation benchmark, which comprehensively reveals the risks of data injection attacks.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2501.18636 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2501.18636 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2501.18636 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.