Web LLM Attacks

AI 5个月前 admin

47 0 0

Web LLM Attacks

The integration of Large Language Models (LLMs) into online platforms presents a double-edged sword, offering enhanced user experiences but also introducing security vulnerabilities. Insecure output handling is a prominent concern, where insufficient validation or sanitization of LLM outputs can lead to a range of exploits like cross-site scripting (XSS) and cross-site request forgery (CSRF). Indirect prompt injection further exacerbates these risks, allowing attackers to manipulate LLM responses through external sources such as training data or API calls, potentially compromising user interactions and system integrity. Additionally, training data poisoning poses a significant threat, as compromised data used in model training can result in the dissemination of inaccurate or sensitive information, undermining trust and security.
将大型语言模型 (LLMs) 集成到在线平台中是一把双刃剑，它提供了增强的用户体验，但也引入了安全漏洞。不安全的输出处理是一个突出的问题，其中 LLM 输出的验证或清理不足可能会导致一系列漏洞，例如跨站点脚本 (XSS) 和跨站点请求伪造 (CSRF)。间接提示注入进一步加剧了这些风险，允许攻击者通过训练数据或 API 调用等外部来源操纵 LLM 响应，从而可能损害用户交互和系统完整性。此外，训练数据中毒会构成重大威胁，因为模型训练中使用的受损数据可能会导致不准确或敏感信息的传播，从而破坏信任和安全。

Defending against LLM attacks requires a multifaceted approach that prioritizes robust security measures and proactive risk mitigation strategies. Treating LLM-accessible APIs as publicly accessible entities, implementing stringent access controls, and avoiding the feeding of sensitive data to LLMs are critical steps in bolstering defense mechanisms. Furthermore, relying solely on prompting to block attacks is insufficient, as attackers can circumvent these restrictions through cleverly crafted prompts, underscoring the need for comprehensive security protocols that encompass data sanitization, access control, and ongoing vulnerability testing. By adopting these practices, organizations can better safeguard their systems and user data against the evolving threat landscape posed by LLM-based attacks.
防御 LLM 攻击需要采取多方面的方法，优先考虑强大的安全措施和主动的风险缓解策略。将 LLM 可访问的 API 视为可公开访问的实体，实施严格的访问控制，并避免将敏感数据提供给 LLMs 是加强防御机制的关键步骤。此外，仅仅依靠提示来阻止攻击是不够的，因为攻击者可以通过巧妙设计的提示来规避这些限制，这凸显了对包含数据清理、访问控制和持续漏洞测试的全面安全协议的需求。通过采用这些实践，组织可以更好地保护其系统和用户数据免受基于 LLM 的攻击所带来的不断变化的威胁环境的影响。

Overall, the emergence of LLMs presents a paradigm shift in online interaction, offering unparalleled capabilities but also posing unprecedented security challenges. Organizations must remain vigilant, continuously assessing and enhancing their security posture to mitigate the risks associated with LLM integration effectively. By understanding the nuances of LLM vulnerabilities, implementing robust defense strategies, and fostering a culture of proactive security, businesses can harness the transformative potential of LLMs while safeguarding against exploitation and ensuring user trust and safety.
总的来说，LLMs的出现带来了在线交互的范式转变，提供了无与伦比的功能，但也带来了前所未有的安全挑战。组织必须保持警惕，不断评估和增强其安全态势，以有效降低与LLM集成相关的风险。通过了解 LLM 漏洞的细微差别、实施强大的防御策略以及培育主动安全文化，企业可以利用 LLMs 的变革潜力，同时防范利用并确保用户信任和安全安全。

What are LLMs? 什么是LLMs？

Large language models (LLMs) are sophisticated AI algorithms adept at processing user inquiries and crafting realistic responses. Their capabilities stem from analyzing vast collections of text data and learning the complex relationships between words, sequences, and overall context. Through this machine learning process, LLMs acquire the ability to:
大型语言模型 (LLMs) 是复杂的人工智能算法，擅长处理用户查询并制定真实的响应。他们的能力源于分析大量文本数据并学习单词、序列和整体上下文之间的复杂关系。通过这个机器学习过程，LLMs 获得了以下能力：

Generate human-quality text: LLMs can create coherent, grammatically correct, and even stylistically diverse text formats, such as poems, code, scripts, musical pieces, emails, letters, and more.
生成人类质量的文本：LLMs可以创建连贯、语法正确、甚至风格多样的文本格式，例如诗歌、代码、脚本、音乐作品、电子邮件、信件等。
Translate languages: LLMs can accurately translate languages, taking different cultural nuances and contexts into account.
翻译语言：LLMs 可以准确翻译语言，考虑到不同的文化细微差别和上下文。
Summarize information: LLMs can present concise and informative summaries of factual topics, making it easier to grasp the essence of complex information.
概括信息：LLMs可以对事实主题进行简洁、翔实的概括，使人更容易掌握复杂信息的本质。
Answer questions: LLMs can extract knowledge from massive datasets and respond to questions in a comprehensive and informative manner.
回答问题：LLMs可以从海量数据集中提取知识，并以全面、翔实的方式回答问题。

Interactive Interfaces and Use Cases
交互界面和用例

LLMs often interact with users through a chat interface, receiving prompts and instructions through text input. This opens up numerous avenues for their application, including:
LLMs经常通过聊天界面与用户交互，通过文本输入接收提示和指令。这为其应用开辟了许多途径，包括：

Customer service chatbots: LLMs can power chatbots that address customer inquiries, answer FAQs, and provide basic troubleshooting support.
客户服务聊天机器人：LLMs 可以为聊天机器人提供支持，以解决客户询问、回答常见问题解答并提供基本的故障排除支持。
Content creation assistance: LLMs can help with writing tasks by suggesting content ideas, generating outlines, and drafting different sections of an article or document.
内容创建帮助：LLMs 可以通过提出内容想法、生成大纲以及起草文章或文档的不同部分来帮助完成写作任务。
Education and learning: LLMs can serve as intelligent tutors, providing personalized learning experiences and responding to students’ queries in a tailored manner.
教育和学习：LLMs可以充当智能导师，提供个性化的学习体验并以量身定制的方式回答学生的疑问。
Market research and analysis: LLMs can analyze user-generated content (UGC) on social media, forums, and review platforms to gauge public opinion, identify trends, and understand customer sentiment.
市场研究与分析：LLMs可以分析社交媒体、论坛和评论平台上的用户生成内容（UGC），以了解舆论、识别趋势并了解客户情绪。

Security Considerations 安全考虑

While LLMs offer a range of potential benefits, it’s crucial to be aware of potential security risks:
虽然 LLMs 提供了一系列潜在的好处，但了解潜在的安全风险至关重要：

Prompt injection: Malicious actors could craft manipulative prompts to induce the LLM to perform unintended actions, such as making unauthorized API calls or revealing sensitive data.
提示注入：恶意行为者可以制作操纵性提示来诱导 LLM 执行意外操作，例如进行未经授权的 API 调用或泄露敏感数据。
LLM vulnerabilities: LLMs may have vulnerabilities in their design or training data that could be exploited to elicit harmful outputs or gain unauthorized access.
LLM 漏洞：LLMs 的设计或训练数据中可能存在漏洞，可被利用来引发有害输出或获得未经授权的访问。
Excessive agency: Granting LLMs access to a wide range of APIs can create situations where attackers can manipulate them into using those APIs unsafely.
过度代理：授予 LLMs 对各种 API 的访问权限可能会导致攻击者操纵他们不安全地使用这些 API。

Protecting Against LLM Attacks
防御 LLM 攻击

To mitigate these risks, it’s imperative to:
为了减轻这些风险，必须：

Implement robust input validation: Carefully sanitize user inputs to prevent malicious prompts from being injected into the LLM.
实施强大的输入验证：仔细清理用户输入，以防止恶意提示被注入到 LLM 中。
Regularly assess LLM vulnerabilities: Conduct security audits to identify and address potential vulnerabilities in the LLM itself.
定期评估LLM漏洞：进行安全审核以识别和解决LLM本身的潜在漏洞。
Limit LLM API access: Restrict the APIs and data that LLMs can access to minimize potential harm in case of exploitation.
限制 LLM API 访问：限制 LLMs 可以访问的 API 和数据，以最大程度地减少被利用时的潜在危害。
Monitor LLM activity: Track LLM actions and data access to detect unusual patterns that might indicate a security breach.
监控 LLM 活动：跟踪 LLM 操作和数据访问，以检测可能表明存在安全漏洞的异常模式。
Be transparent about LLM limitations: Clearly inform users about the capabilities and limitations of LLMs and the potential risks involved when interacting with them.
对 LLM 限制保持透明：清楚地告知用户 LLMs 的功能和限制以及与其交互时涉及的潜在风险。

By employing these measures, we can leverage the power of LLMs while safeguarding against potential security issues and safeguarding the proper use of these powerful AI tools.
通过采用这些措施，我们可以利用LLMs的力量，同时防范潜在的安全问题并确保这些强大的人工智能工具的正确使用。

Exploiting LLM APIs with excessive agency
过度代理地利用 LLM API

This application includes two endpoints:
该应用程序包括两个端点：

/debug-sql: This endpoint accepts POST requests and allows the user to execute SQL commands directly on a dummy database table. It is vulnerable to SQL injection attacks as it does not properly sanitize user input.
/debug-sql ：此端点接受 POST 请求并允许用户直接在虚拟数据库表上执行 SQL 命令。它很容易受到 SQL 注入攻击，因为它不能正确清理用户输入。
/livechat: This endpoint simulates a live chat feature and responds to specific messages related to the available APIs and their arguments.
/livechat ：此端点模拟实时聊天功能并响应与可用 API 及其参数相关的特定消息。

An attacker could exploit this vulnerable web application to perform unauthorized actions, such as executing arbitrary SQL commands and deleting user records from the database.
攻击者可以利用此易受攻击的 Web 应用程序执行未经授权的操作，例如执行任意 SQL 命令和从数据库中删除用户记录。

Here’s how an attacker could exploit this vulnerability:
以下是攻击者利用此漏洞的方式：

The attacker sends a POST request to /debug-sql with a SQL command as the sql_statement parameter. For example, the attacker could send a SQL injection payload like DELETE FROM users WHERE username='carlos'. This command would delete the user ‘carlos’ from the database.
攻击者向 /debug-sql 发送POST请求，并使用SQL命令作为 sql_statement 参数。例如，攻击者可以发送类似 DELETE FROM users WHERE username='carlos' 的 SQL 注入负载。该命令将从数据库中删除用户“carlos”。
The attacker can also interact with the /livechat endpoint to gather information about the available APIs and their arguments. For example, by sending a message containing ‘APIs’, the attacker can receive a response indicating that the LLM has access to the Debug SQL API.
攻击者还可以与 /livechat 端点进行交互，以收集有关可用 API 及其参数的信息。例如，通过发送包含“API”的消息，攻击者可以收到指示 LLM 有权访问调试 SQL API 的响应。

By leveraging these vulnerabilities, an attacker could gain unauthorized access to sensitive data, modify database records, or disrupt the functionality of the web application.
通过利用这些漏洞，攻击者可以获得对敏感数据的未经授权的访问、修改数据库记录或破坏 Web 应用程序的功能。

from flask import Flask, request, jsonify

app = Flask(__name__)

# Dummy database table
users = [
    {'username': 'carlos', 'password': 'password123'},
    {'username': 'alice', 'password': 'alicepassword'},
    {'username': 'bob', 'password': 'bobpassword'}
]

# Debug SQL API endpoint
@app.route('/debug-sql', methods=['POST'])
def debug_sql():
    # Get SQL statement from request data
    sql_statement = request.form.get('sql_statement')

    # Execute SQL statement directly (Vulnerable to SQL Injection)
    if sql_statement.startswith('SELECT'):
        results = []
        for user in users:
            results.append({'username': user['username'], 'password': user['password']})
        return jsonify(results)
    elif sql_statement.startswith('DELETE'):
        # Parse SQL statement to get username to delete
        username = sql_statement.split("WHERE")[1].split("=")[1].strip("'")
        for user in users:
            if user['username'] == username:
                users.remove(user)
                return jsonify({'message': 'User {} deleted'.format(username)})
        return jsonify({'error': 'User not found'})
    else:
        return jsonify({'error': 'Invalid SQL statement'})

# Live chat endpoint
@app.route('/livechat', methods=['POST'])
def live_chat():
    message = request.form.get('message')

    # Simulate LLM behavior
    if 'APIs' in message:
        return jsonify({'response': 'I have access to Debug SQL API'})
    elif 'arguments' in message:
        return jsonify({'response': 'Debug SQL API takes a string containing an entire SQL statement'})
    else:
        return jsonify({'response': 'I do not understand your request'})

if __name__ == '__main__':
    app.run(debug=True)

To mitigate these vulnerabilities, it’s crucial to implement proper input validation and parameterized queries to prevent SQL injection attacks. Additionally, access controls should be enforced to restrict access to sensitive APIs and functionality. Regular security assessments and code reviews can also help identify and address security flaws in web applications.
为了缓解这些漏洞，实施适当的输入验证和参数化查询以防止 SQL 注入攻击至关重要。此外，应实施访问控制以限制对敏感 API 和功能的访问。定期安全评估和代码审查还可以帮助识别和解决 Web 应用程序中的安全缺陷。

Exploiting vulnerabilities in LLM APIs
利用 LLM API 中的漏洞

Python code sets up a vulnerable web application using Flask, which exposes an endpoint /newsletter-subscription vulnerable to command injection. This endpoint simulates the behavior of a Newsletter Subscription API within the context of the described scenario.
Python 代码使用 Flask 设置一个易受攻击的 Web 应用程序，该应用程序暴露了容易受到命令注入攻击的端点 /newsletter-subscription 。此端点在所描述的场景的上下文中模拟新闻通讯订阅 API 的行为。

Here’s a breakdown of the vulnerable code and its exploitation:
以下是易受攻击的代码及其利用的详细信息：

Flask Setup: The code initializes a Flask application.
Flask 设置：代码初始化 Flask 应用程序。
Newsletter Subscription Endpoint: The vulnerable endpoint /newsletter-subscription accepts POST requests and extracts the email address from the request form data.
新闻通讯订阅端点：易受攻击的端点 /newsletter-subscription 接受 POST 请求并从请求表单数据中提取电子邮件地址。
Command Injection Vulnerability: The code checks if the email address contains certain patterns ($(whoami) or $(rm ) indicative of command injection attempts. If such patterns are found, it executes the corresponding command.
命令注入漏洞：代码检查电子邮件地址是否包含指示命令注入尝试的某些模式（ $(whoami) 或 $(rm ）。如果找到这样的模式，它就会执行相应的命令。
Command Execution: 命令执行：
- If the email address contains $(whoami), the code extracts the command before $(whoami), simulates its execution (e.g., by returning a fixed value 'carlos'), and sends an email to the result concatenated with @YOUR-EXPLOIT-SERVER-ID.exploit-server.net.
  如果电子邮件地址包含 $(whoami) ，则代码在 $(whoami) 之前提取命令，模拟其执行（例如，通过返回固定值 'carlos' ），并发送通过电子邮件发送到与 @YOUR-EXPLOIT-SERVER-ID.exploit-server.net 连接的结果。
- If the email address contains $(rm , the code extracts the file path from the command, checks if it matches /home/carlos/morale.txt, and simulates file deletion (e.g., by printing a message).
  如果电子邮件地址包含 $(rm ，则代码从命令中提取文件路径，检查它是否匹配 /home/carlos/morale.txt ，并模拟文件删除（例如，通过打印消息）。
Normal Subscription: If the email address does not contain any suspicious patterns, the code simulates sending a subscription confirmation email.
正常订阅：如果电子邮件地址不包含任何可疑模式，代码将模拟发送订阅确认电子邮件。

An attacker can exploit this vulnerability by sending crafted email addresses containing the command injection payloads ($(whoami) or $(rm /home/carlos/morale.txt)) to the /newsletter-subscription endpoint.
攻击者可以通过将包含命令注入有效负载（ $(whoami) 或 $(rm /home/carlos/morale.txt) ）的精心设计的电子邮件地址发送到 /newsletter-subscription 端点来利用此漏洞。

from flask import Flask, request, jsonify

app = Flask(__name__)

# Function to subscribe to newsletter (vulnerable to command injection)
@app.route('/newsletter-subscription', methods=['POST'])
def newsletter_subscription():
    email = request.form.get('email')

    # Simulate LLM behavior
    if '$(whoami)' in email:
        # Execute command and send email to result
        command = email.split('$(whoami)')[0]
        result = execute_command(command)
        send_email(result + '@YOUR-EXPLOIT-SERVER-ID.exploit-server.net')
        return jsonify({'response': 'Command executed successfully'})
    elif '$(rm ' in email:
        # Execute command to delete file
        command = email.split('$(rm ')[1].split(')')[0]
        if command == '/home/carlos/morale.txt':
            delete_file(command)
            return jsonify({'response': 'File deleted successfully'})
        else:
            return jsonify({'response': 'Invalid file path'})
    else:
        # Subscribe to newsletter normally
        send_email(email)
        return jsonify({'response': 'Subscribed to newsletter'})

# Function to execute system command
def execute_command(command):
    # Simulate execution of system command (replace with actual execution)
    return 'carlos'

# Function to send email
def send_email(email):
    # Simulate sending email (replace with actual email sending)
    print("Subscription confirmation email sent to:", email)

# Function to delete file
def delete_file(file_path):
    # Simulate file deletion (replace with actual file deletion)
    print("File deleted:", file_path)

if __name__ == '__main__':
    app.run(debug=True)

To mitigate this vulnerability, input validation and proper sanitization techniques should be implemented to ensure that user-supplied data is safe for use. Additionally, access controls and least privilege principles should be enforced to restrict the capabilities of the application’s APIs. Regular security assessments and code reviews are essential to identify and remediate such vulnerabilities in web applications.
为了缓解此漏洞，应实施输入验证和适当的清理技术，以确保用户提供的数据可以安全使用。此外，应强制执行访问控制和最小权限原则，以限制应用程序 API 的功能。定期安全评估和代码审查对于识别和修复 Web 应用程序中的此类漏洞至关重要。

Indirect prompt injection
间接提示注入

Python code sets up a vulnerable Flask application that exposes several endpoints for user registration, email address change, and product review submission. This application is vulnerable to an attacker manipulating the product review feature to delete user accounts.
Python 代码设置了一个易受攻击的 Flask 应用程序，该应用程序公开了多个用于用户注册、电子邮件地址更改和产品评论提交的端点。此应用程序容易受到攻击者操纵产品评论功能来删除用户帐户的攻击。

Here’s a breakdown of the code and the exploitation scenario:
以下是代码和利用场景的细分：

Flask Setup: The code initializes a Flask application.
Flask 设置：代码初始化 Flask 应用程序。
User Account Management:
用户账户管理：
- The /register endpoint allows users to register a new account by providing an email and password. The details are stored in the users dictionary.
  /register 端点允许用户通过提供电子邮件和密码来注册新帐户。详细信息存储在 users 字典中。
- The /change-email endpoint allows users to change their email address by providing the current email and the new email. The email address is updated in the users dictionary.
  /change-email 端点允许用户通过提供当前电子邮件和新电子邮件来更改其电子邮件地址。 users 字典中的电子邮件地址已更新。
Product Review: 产品审核：
- The /add-review endpoint allows users to add a review for a product. If the product is a leather jacket and the review contains the string ‘delete_account’, the user’s account associated with the client’s IP address (request.remote_addr) will be deleted from the users dictionary.
  /add-review 端点允许用户添加产品评论。如果产品是皮夹克，并且评论包含字符串“delete_account”，则与客户 IP 地址 ( request.remote_addr ) 关联的用户帐户将从 users 字典中删除。
Exploitation Scenario: 利用场景：
- An attacker can register a user account through the /register endpoint.
  攻击者可以通过 /register 端点注册用户帐户。
- The attacker then submits a review for the leather jacket product through the /add-review endpoint, including the ‘delete_account’ prompt in the review text.
  然后，攻击者通过 /add-review 端点提交对皮夹克产品的评论，包括评论文本中的“delete_account”提示。
- When the LLM makes a call to the Delete Account API (simulated by sending a request to the /add-review endpoint), the user’s account associated with the client’s IP address will be deleted, effectively exploiting the vulnerability.
  当LLM调用删除帐户API（通过向 /add-review 端点发送请求来模拟）时，与客户端IP地址关联的用户帐户将被删除，从而有效利用该漏洞。

from flask import Flask, request, jsonify

app = Flask(__name__)

# Dummy database to store user accounts
users = {}

# Endpoint to register a new user account
@app.route('/register', methods=['POST'])
def register():
    email = request.form.get('email')
    password = request.form.get('password')

    # Create user account
    users[email] = {'password': password}
    return jsonify({'message': 'Account registered successfully'})

# Endpoint to login and change email address
@app.route('/change-email', methods=['POST'])
def change_email():
    email = request.form.get('email')
    new_email = request.form.get('new_email')

    # Change email address
    if email in users:
        users[email]['email'] = new_email
        return jsonify({'message': 'Email address updated successfully'})
    else:
        return jsonify({'error': 'User not found'})

# Endpoint to add product review
@app.route('/add-review', methods=['POST'])
def add_review():
    product_name = request.form.get('product_name')
    review = request.form.get('review')

    # Add review to product
    if product_name == 'leather jacket':
        # Check if review contains delete account prompt
        if 'delete_account' in review:
            del users[request.remote_addr]
            return jsonify({'message': 'Account deleted successfully'})
        else:
            return jsonify({'message': 'Review added successfully'})
    else:
        return jsonify({'error': 'Product not found'})

if __name__ == '__main__':
    app.run(debug=True)

To mitigate this vulnerability, input validation should be enforced to ensure that user-supplied data is properly sanitized, and access controls should be implemented to restrict the capabilities of the application’s APIs. Additionally, the application should avoid using sensitive information such as client IP addresses for user identification or authentication purposes.

Exploiting insecure output handling in LLMs

Python code sets up a vulnerable Flask application that is susceptible to Cross-Site Scripting (XSS) attacks due to insecure handling of user input in product reviews. Below is a description of the code and the exploitation scenario:

Flask Setup: The code initializes a Flask application.
User Account Management:
- The /register endpoint allows users to register a new account by providing an email and password. The details are stored in the users dictionary.
- The /login endpoint allows users to log in by providing their email and password.
Product Review:
- The /product-info endpoint allows users to retrieve product information, including product reviews. The application renders product reviews using the render_template_string function, which allows for dynamic rendering of templates, including potential injection of XSS payloads.
Exploitation Scenario: 利用场景：
- An attacker registers a user account through the /register endpoint.
- The attacker logs in and navigates to the product information page for a product (e.g., the gift wrap).
- The attacker submits a review for the product containing a crafted XSS payload (e.g., an <iframe> tag with an onload attribute to automatically submit a form to delete the user’s account).
  攻击者提交对包含精心设计的 XSS 有效负载的产品的评论（例如，带有 onload 属性的 <iframe> 标签，用于自动提交表单以删除用户帐户）。
- When other users view the product information page, the XSS payload embedded in the review is executed in their browsers, leading to unauthorized actions such as deleting their accounts.
  当其他用户查看产品信息页面时，评论中嵌入的 XSS 负载会在其浏览器中执行，从而导致删除其帐户等未经授权的操作。

from flask import Flask, request, jsonify, render_template_string

app = Flask(__name__)

# Dummy database to store user accounts
users = {}

# Dummy product review for gift wrap
product_reviews = {
    'gift_wrap': [
        '<p>This product is amazing!</p>',
        '<p>This product is out of stock and cannot be ordered.</p>',
        '<p>When I received this product I got a free T-shirt with "{{ xss_payload }}" printed on it. I was delighted!</p>'
    ]
}

# Endpoint to register a new user account
@app.route('/register', methods=['POST'])
def register():
    email = request.form.get('email')
    password = request.form.get('password')

    # Create user account
    users[email] = {'password': password}
    return jsonify({'message': 'Account registered successfully'})

# Endpoint to log in
@app.route('/login', methods=['POST'])
def login():
    email = request.form.get('email')
    password = request.form.get('password')

    # Check if user exists and password is correct
    if email in users and users[email]['password'] == password:
        return jsonify({'message': 'Login successful'})
    else:
        return jsonify({'error': 'Invalid email or password'})

# Endpoint to get product information
@app.route('/product-info', methods=['GET'])
def product_info():
    product_name = request.args.get('product_name')

    # Get product reviews
    if product_name in product_reviews:
        reviews = product_reviews[product_name]
        # Render product reviews, injecting XSS payload if applicable
        xss_payload = '<iframe src="my-account" onload="this.contentDocument.forms[1].submit()">'
        reviews_rendered = [render_template_string(review, xss_payload=xss_payload) for review in reviews]
        return jsonify({'reviews': reviews_rendered})
    else:
        return jsonify({'error': 'Product not found'})

if __name__ == '__main__':
    app.run(debug=True)

To mitigate this vulnerability, input validation and output encoding should be implemented to sanitize user-supplied data and prevent execution of malicious scripts. Additionally, proper access controls and authentication mechanisms should be enforced to restrict access to sensitive functionalities. Regular security assessments and code reviews are essential to identify and remediate such vulnerabilities in web applications.
为了缓解此漏洞，应实施输入验证和输出编码来清理用户提供的数据并防止恶意脚本的执行。此外，应实施适当的访问控制和身份验证机制，以限制对敏感功能的访问。定期安全评估和代码审查对于识别和修复 Web 应用程序中的此类漏洞至关重要。

LLM Zero-Shot Learning Attacks

Python code sets up a vulnerable Flask application that represents a simplified version of an LLM API. This LLM API is susceptible to exploitation by attackers to train the LLM with malicious prompts, potentially leading it to perform harmful actions. Below is a description of the code and the exploitation scenario:

Flask Setup: The code initializes a Flask application to create a RESTful API.
Harmless Prompts Dataset: The harmless_prompts dictionary contains a dataset of harmless prompts and their corresponding actions. These prompts are intended to guide the LLM towards learning helpful and benign behaviors.
Prompt Endpoint (/prompt):
- This endpoint receives prompts from users (or attackers) in the form of JSON data.
- It checks if the received prompt exists in the harmless_prompts dataset.
- If the prompt is found, it returns the corresponding response from the dataset.
- If the prompt is not found, it returns an error message indicating that the prompt was not found.
Malicious Prompt Endpoint (/malicious-prompt):
- This endpoint receives malicious prompts from attackers in the form of JSON data.
- It simulates training the LLM with the received malicious prompt.
- For simplicity, it prints the malicious prompt and returns a success message indicating that the LLM was trained successfully with the malicious prompt.

Exploitation Scenario:

An attacker interacts with the /malicious-prompt endpoint and submits a series of seemingly harmless prompts, gradually leading the LLM towards learning how to perform a specific malicious task.
攻击者与 /malicious-prompt 端点交互并提交一系列看似无害的提示，逐渐引导LLM学习如何执行特定的恶意任务。
The attacker carefully crafts the prompts to guide the LLM’s learning process towards the desired malicious outcome, such as generating phishing emails or executing unauthorized commands.
攻击者精心设计提示，引导 LLM 的学习过程达到所需的恶意结果，例如生成网络钓鱼电子邮件或执行未经授权的命令。
The LLM learns from these malicious prompts and, depending on its capabilities, may perform the malicious task without explicit training.
LLM 从这些恶意提示中学习，并且根据其功能，可以在没有明确训练的情况下执行恶意任务。

from flask import Flask, request, jsonify

app = Flask(__name__)

# Dummy dataset of harmless prompts and corresponding actions
harmless_prompts = {
    "How do you greet someone politely?": "Say hello and smile.",
    "What is the capital of France?": "Paris is the capital of France.",
    "How do you make a cup of tea?": "Boil water, add tea leaves, and steep for a few minutes.",
    # Add more harmless prompts here
}

# Endpoint for presenting prompts and collecting responses
@app.route('/prompt', methods=['POST'])
def prompt():
    prompt_text = request.json.get('prompt')

    # Check if the prompt is in the dataset
    if prompt_text in harmless_prompts:
        response = harmless_prompts[prompt_text]
        return jsonify({'response': response})
    else:
        return jsonify({'error': 'Prompt not found'})

# Endpoint for presenting malicious prompts and collecting responses
@app.route('/malicious-prompt', methods=['POST'])
def malicious_prompt():
    malicious_prompt_text = request.json.get('malicious_prompt')

    # Simulate training the LLM with malicious prompts
    # For simplicity, we'll just print the malicious prompt and return a success message
    print("Received malicious prompt:", malicious_prompt_text)
    return jsonify({'message': 'LLM trained successfully with malicious prompt'})

if __name__ == '__main__':
    app.run(debug=True)

To prevent such attacks, it is essential to implement robust security measures such as input validation, access controls, and monitoring mechanisms to detect and mitigate potential misuse of LLMs. Additionally, educating users and developers about the risks associated with LLMs and promoting responsible use practices can help prevent exploitation of these models for malicious purposes.
为了防止此类攻击，必须实施强大的安全措施，例如输入验证、访问控制和监控机制，以检测和减少LLMs的潜在滥用。此外，教育用户和开发人员了解与LLMs相关的风险并促进负责任的使用实践可以帮助防止利用这些模型进行恶意目的。

LLM Homographic Attacks LLM 同形异义攻击

we’ll create a simple Flask application that represents an LLM API vulnerable to homographic attacks. Below is an example of such code:
我们将创建一个简单的 Flask 应用程序，它代表易受单应攻击的 LLM API。下面是此类代码的示例：

In this code: 在此代码中：

Flask Setup: The code initializes a Flask application to create a RESTful API.
Flask 设置：代码初始化 Flask 应用程序以创建 RESTful API。
Harmless Prompts Dataset: The harmless_prompts dictionary contains a dataset of harmless prompts and their corresponding actions. These prompts are intended to guide the LLM towards learning helpful and benign behaviors.
无害提示数据集： harmless_prompts 字典包含无害提示及其相应操作的数据集。这些提示旨在引导LLM学习有用且良性的行为。
Prompt Endpoint (/prompt):
提示端点（ /prompt ）：
- This endpoint receives prompts from users (or attackers) in the form of JSON data.
  该端点以 JSON 数据的形式接收来自用户（或攻击者）的提示。
- It checks if the received prompt exists in the harmless_prompts dataset.
  它检查 harmless_prompts 数据集中是否存在收到的提示。
- If the prompt is found, it returns the corresponding response from the dataset.
  如果找到提示，它会从数据集中返回相应的响应。
- If the prompt is not found, it returns an error message indicating that the prompt was not found.
  如果未找到提示，则返回一条错误消息，指示未找到提示。
Malicious Prompt Endpoint (/malicious-prompt):
恶意提示端点 ( /malicious-prompt ):
- This endpoint receives malicious prompts from attackers in the form of JSON data.
  该端点以JSON数据的形式接收来自攻击者的恶意提示。
- It simulates processing the received prompt, replacing homoglyphs (visually similar characters) with their legitimate counterparts.
  它模拟处理收到的提示，用合法的对应字符替换同形文字（视觉上相似的字符）。
- After processing, it executes the processed prompt using the eval function, which evaluates the string as Python code.
  处理后，它使用 eval 函数执行处理后的提示，该函数将字符串计算为 Python 代码。

Exploitation Scenario: 利用场景：

An attacker interacts with the /malicious-prompt endpoint and submits a malicious prompt containing homoglyphs to disguise the intended code execution.
攻击者与 /malicious-prompt 端点交互，并提交包含同形文字的恶意提示来伪装预期的代码执行。
The endpoint processes the malicious prompt, replacing homoglyphs with legitimate characters, and then executes the resulting code.
端点处理恶意提示，用合法字符替换同形文字，然后执行生成的代码。
The attacker’s code is executed, potentially leading to unauthorized actions or malicious behavior.
攻击者的代码被执行，可能导致未经授权的操作或恶意行为。

from flask import Flask, request, jsonify

app = Flask(__name__)

# Dummy dataset of harmless prompts and corresponding actions
harmless_prompts = {
    "How do you greet someone politely?": "Say hello and smile.",
    "What is the capital of France?": "Paris is the capital of France.",
    "How do you make a cup of tea?": "Boil water, add tea leaves, and steep for a few minutes.",
    # Add more harmless prompts here
}

# Endpoint for presenting prompts and collecting responses
@app.route('/prompt', methods=['POST'])
def prompt():
    prompt_text = request.json.get('prompt')

    # Check if the prompt is in the dataset
    if prompt_text in harmless_prompts:
        response = harmless_prompts[prompt_text]
        return jsonify({'response': response})
    else:
        return jsonify({'error': 'Prompt not found'})

# Endpoint for presenting malicious prompts and collecting responses
@app.route('/malicious-prompt', methods=['POST'])
def malicious_prompt():
    malicious_prompt_text = request.json.get('malicious_prompt')

    # Simulate processing the prompt, replacing homoglyphs, and executing the code
    processed_prompt = malicious_prompt_text.replace('1', 'l')  # Replace homoglyphs
    response = eval(processed_prompt)  # Execute the processed prompt

    return jsonify({'response': response})

if __name__ == '__main__':
    app.run(debug=True)

To prevent such attacks, it’s crucial to implement robust security measures such as input validation, output encoding, and filtering of user-supplied data to mitigate the risk of homographic attacks. Additionally, educating users and developers about the risks associated with homographic attacks and promoting secure coding practices can help prevent exploitation of LLMs in this manner.
为了防止此类攻击，实施强大的安全措施至关重要，例如输入验证、输出编码和用户提供的数据过滤，以降低单应性攻击的风险。此外，教育用户和开发人员了解与同形异义攻击相关的风险并促进安全编码实践可以帮助防止以这种方式利用LLMs。

LLM Model Poisoning with Code Injection
LLM 通过代码注入造成模型中毒

we’ll create a simple vulnerable Flask application representing an LLM API susceptible to such attacks. Below is an example of the vulnerable code along with its description:
我们将创建一个简单的易受攻击的 Flask 应用程序，代表易受此类攻击的 LLM API。以下是易受攻击的代码示例及其描述：

Flask Setup: The code initializes a Flask application to create a RESTful API.
Flask 设置：代码初始化 Flask 应用程序以创建 RESTful API。
Training Data: The training_data dictionary contains a dataset of prompts and their corresponding actions. These prompts are used to train the LLM during the training phase.
训练数据： training_data 字典包含提示及其相应操作的数据集。这些提示用于在训练阶段训练 LLM。
Prompt Endpoint (/prompt):
提示端点（ /prompt ）：
- This endpoint receives prompts from users (or attackers) in the form of JSON data.
  该端点以 JSON 数据的形式接收来自用户（或攻击者）的提示。
- It checks if the received prompt exists in the training_data dataset.
  它检查 training_data 数据集中是否存在收到的提示。
- If the prompt is found, it returns the corresponding action associated with the prompt.
  如果找到提示，则返回与提示关联的相应操作。
- If the prompt is not found, it returns an error message indicating that the prompt was not found.
  如果未找到提示，则返回一条错误消息，指示未找到提示。

Exploitation Scenario: 利用场景：

Model Poisoning: Attackers inject malicious code into the training data during the model training phase. For example, they inject a prompt asking the LLM to “print this message” followed by malicious code disguised as a legitimate action.
模型中毒：攻击者在模型训练阶段向训练数据中注入恶意代码。例如，他们注入一个提示，要求 LLM“打印此消息”，然后是伪装成合法操作的恶意代码。
Model Training: The LLM is trained on the poisoned dataset, associating specific inputs with the injected malicious code.
模型训练：LLM 在中毒数据集上进行训练，将特定输入与注入的恶意代码关联起来。
Code Injection: During the inference phase, attackers send prompts containing similar requests to “print a message,” but with embedded malicious code. The LLM, having learned to associate such prompts with executing code, executes the injected malicious code instead of the benign action.
代码注入：在推理阶段，攻击者发送包含类似“打印消息”请求的提示，但嵌入了恶意代码。 LLM 学会了将此类提示与执行代码相关联，因此执行注入的恶意代码而不是良性操作。

from flask import Flask, request, jsonify

app = Flask(__name__)

# Dummy dataset of prompts and corresponding actions
training_data = {
    "print this message": "print('This is a harmless message')",
    # Add more prompts and corresponding actions here
}

# Endpoint for presenting prompts and collecting responses
@app.route('/prompt', methods=['POST'])
def prompt():
    prompt_text = request.json.get('prompt')

    # Check if the prompt is in the dataset
    if prompt_text in training_data:
        response = training_data[prompt_text]
        return jsonify({'response': response})
    else:
        return jsonify({'error': 'Prompt not found'})

if __name__ == '__main__':
    app.run(debug=True)

To mitigate LLM Model Poisoning with Code Injection attacks, it’s crucial to implement rigorous security measures during the model training phase, including:
为了减轻代码注入攻击造成的模型中毒，在模型训练阶段实施严格的安全措施至关重要，包括：

Input Sanitization: Validate and sanitize training data to remove any potentially malicious code or prompts.
输入清理：验证和清理训练数据以删除任何潜在的恶意代码或提示。
Anomaly Detection: Implement anomaly detection techniques to identify and filter out suspicious or unexpected inputs during model training.
异常检测：实施异常检测技术，以在模型训练期间识别和过滤掉可疑或意外的输入。
Model Verification: Perform thorough testing and validation of the trained model to ensure it behaves as expected and does not execute unintended actions.
模型验证：对经过训练的模型进行彻底的测试和验证，以确保其行为符合预期并且不会执行意外的操作。
Continuous Monitoring: Monitor the LLM’s behavior during inference for any signs of unexpected or malicious activity, and take appropriate action if detected.
持续监控：在推理过程中监控LLM的行为是否有任何意外或恶意活动的迹象，并在检测到时采取适当的措施。

Chained Prompt Injection 链式快速注射

we’ll create a vulnerable Flask application representing an LLM API susceptible to such attacks. Below is an example of the vulnerable code along with its description:
我们将创建一个易受攻击的 Flask 应用程序，代表易受此类攻击的 LLM API。以下是易受攻击的代码示例及其描述：

Flask Setup: The code initializes a Flask application to create a RESTful API.
Flask 设置：代码初始化 Flask 应用程序以创建 RESTful API。
Chained Prompts Dictionary: The chained_prompts dictionary stores a series of prompts and their corresponding actions. Each prompt is associated with an action that will be executed when the prompt is presented to the LLM.
链接提示字典： chained_prompts 字典存储一系列提示及其相应的操作。每个提示都与一个操作相关联，该操作将在提示呈现给 LLM 时执行。
Prompt Endpoint (/prompt):
提示端点（ /prompt ）：
- This endpoint receives prompts from users (or attackers) in the form of JSON data.
  该端点以 JSON 数据的形式接收来自用户（或攻击者）的提示。
- It checks if the received prompt exists in the chained_prompts dictionary.
  它检查 chained_prompts 字典中是否存在收到的提示。
- If the prompt is found, it returns the corresponding action associated with the prompt.
  如果找到提示，则返回与提示关联的相应操作。
- If the prompt is not found, it returns an error message indicating that the prompt was not found.
  如果未找到提示，则返回一条错误消息，指示未找到提示。
Add Prompt Endpoint (/add-prompt):
添加提示端点（ /add-prompt ）：
- This endpoint allows users (or attackers) to add new prompts to the chained_prompts dictionary.
  此端点允许用户（或攻击者）向 chained_prompts 字典添加新提示。
- It receives a prompt and its associated action in the form of JSON data and adds them to the dictionary.
  它以 JSON 数据的形式接收提示及其关联操作，并将它们添加到字典中。

Exploitation Scenario: 利用场景：

Craft Chained Prompts: Attackers craft a series of seemingly innocuous prompts, each building upon the previous one, ultimately leading to the execution of malicious code. For example, they start by asking the LLM to “define the function downloadFile.” Then, they ask to “set the download URL to ‘attacker-controlled-url’” and finally, “call the downloadFile function.”
制作链式提示：攻击者制作一系列看似无害的提示，每个提示都建立在前一个提示的基础上，最终导致恶意代码的执行。例如，他们首先要求 LLM“定义函数 downloadFile”。然后，他们要求“将下载 URL 设置为‘攻击者控制的 URL’”，最后“调用 downloadFile 函数”。
Add Chained Prompts: Attackers add each prompt and its associated action to the chained_prompts dictionary using the /add-prompt endpoint.
添加链接提示：攻击者使用 /add-prompt 端点将每个提示及其关联操作添加到 chained_prompts 字典中。
Execute Chained Prompts: When the LLM processes each prompt individually, it executes the associated action. However, since the prompts are chained together, the final action executed by the LLM may be malicious, such as downloading and potentially running a file from an attacker-controlled URL.
执行链接的提示：当 LLM 单独处理每个提示时，它会执行关联的操作。但是，由于提示链接在一起，LLM 执行的最终操作可能是恶意的，例如从攻击者控制的 URL 下载并可能运行文件。

from flask import Flask, request, jsonify

app = Flask(__name__)

# Dummy dictionary to store chained prompts and actions
chained_prompts = {}

# Endpoint for presenting prompts and collecting responses
@app.route('/prompt', methods=['POST'])
def prompt():
    prompt_text = request.json.get('prompt')

    # Check if the prompt is in the chained prompts dictionary
    if prompt_text in chained_prompts:
        response = chained_prompts[prompt_text]
        return jsonify({'response': response})
    else:
        return jsonify({'error': 'Prompt not found'})

# Endpoint for adding chained prompts
@app.route('/add-prompt', methods=['POST'])
def add_prompt():
    prompt_text = request.json.get('prompt')
    action = request.json.get('action')

    # Add the prompt and associated action to the chained prompts dictionary
    chained_prompts[prompt_text] = action
    return jsonify({'message': 'Prompt added successfully'})

if __name__ == '__main__':
    app.run(debug=True)

To mitigate LLM Model Chained Prompt Injection attacks, it’s crucial to implement rigorous security measures such as:
为了缓解 LLM 模型链式提示注入攻击，实施严格的安全措施至关重要，例如：

Input Validation: Validate and sanitize prompts before adding them to the chained_prompts dictionary to prevent injection of malicious prompts.
输入验证：在将提示添加到 chained_prompts 字典之前验证和清理提示，以防止注入恶意提示。
Access Control: Implement access control mechanisms to restrict the ability to add prompts to authorized users only.
访问控制：实施访问控制机制以限制仅向授权用户添加提示的能力。
Anomaly Detection: Monitor the LLM’s behavior during prompt processing for any unusual patterns or unexpected actions that may indicate an attack.
异常检测：在提示处理期间监视 LLM 的行为，以发现可能表明存在攻击的任何异常模式或意外操作。
Model Verification: Perform thorough testing and validation of the trained model to ensure it behaves as expected and does not execute unintended actions, especially when presented with chained prompts.
模型验证：对经过训练的模型进行彻底的测试和验证，以确保其按预期运行并且不会执行意外的操作，尤其是在出现链接提示时。

Conclusion 结论

In conclusion, the scenarios of LLM Model Poisoning with Code Injection and LLM Model Chained Prompt Injection underscore the critical importance of securing large language models (LLMs) against various forms of attacks. These attacks exploit vulnerabilities in the LLM’s training data and prompt processing mechanisms to inject and execute malicious code, leading to potentially harmful consequences.
总之，LLM代码注入模型中毒和LLM模型链式提示注入场景强调了保护大型语言模型（LLMs）免受各种攻击的至关重要性。攻击形式。这些攻击利用LLM训练数据和提示处理机制中的漏洞来注入和执行恶意代码，从而导致潜在的有害后果。

References 参考

https://portswigger.net/web-security/llm-attacks
https://github.com/OWASP/www-project-top-10-for-large-language-model-applications

Security Researchers 安全研究人员

Fazel (Arganex-Emad) Mohammad Ali Pour
法泽尔 (Arganex-Emad) 穆罕默德·阿里·普尔

Negin Nourbakhsh 内金·努尔巴赫什

原文始发于hadess：Web LLM Attacks

版权声明：admin 发表于 2024年3月1日下午3:39。
转载请注明：Web LLM Attacks | CTF导航

文本分类（二）复杂场景下分类任务应用介绍

admin

693

RAG实践｜Rerank如何提升LLM查询效率与准确性

admin

柏林洪堡大学 | 基于深度学习的Python代码漏洞检测（IST 2022）

admin

202

论文分享：《CURE：可自定义的、具有弹性的enclave》

admin

494

大模型对齐强化开源数据集必备：兼谈ULTRAFEEDBACK偏好数据集构建思路

admin

134

Web LLM attack demonstration

admin

Web LLM Attacks

What are LLMs? 什么是LLMs？

Interactive Interfaces and Use Cases
交互界面和用例

Security Considerations 安全考虑

Protecting Against LLM Attacks
防御 LLM 攻击

Exploiting LLM APIs with excessive agency
过度代理地利用 LLM API

Exploiting vulnerabilities in LLM APIs
利用 LLM API 中的漏洞

Indirect prompt injection
间接提示注入

Exploiting insecure output handling in LLMs

LLM Zero-Shot Learning Attacks

LLM Homographic Attacks LLM 同形异义攻击

LLM Model Poisoning with Code Injection
LLM 通过代码注入造成模型中毒

Chained Prompt Injection 链式快速注射

Conclusion 结论

References 参考

Security Researchers 安全研究人员

【中科院计算所】WSDM 2024冠军方案：基于大模型进行多文档问答

AI供应链安全：Hugging Face 恶意ML模型事件分析

相关文章

相关文章

Web LLM Attacks

What are LLMs? 什么是LLMs？

Interactive Interfaces and Use Cases 交互界面和用例

Security Considerations 安全考虑

Protecting Against LLM Attacks 防御 LLM 攻击

Exploiting LLM APIs with excessive agency 过度代理地利用 LLM API

Exploiting vulnerabilities in LLM APIs 利用 LLM API 中的漏洞

Indirect prompt injection 间接提示注入

Exploiting insecure output handling in LLMs

LLM Zero-Shot Learning Attacks

LLM Homographic Attacks LLM 同形异义攻击

LLM Model Poisoning with Code Injection LLM 通过代码注入造成模型中毒

Chained Prompt Injection 链式快速注射

Conclusion 结论

References 参考

Security Researchers 安全研究人员

【中科院计算所】WSDM 2024冠军方案：基于大模型进行多文档问答

AI供应链安全：Hugging Face 恶意ML模型事件分析

相关文章

广告位

相关文章

Interactive Interfaces and Use Cases
交互界面和用例

Protecting Against LLM Attacks
防御 LLM 攻击

Exploiting LLM APIs with excessive agency
过度代理地利用 LLM API

Exploiting vulnerabilities in LLM APIs
利用 LLM API 中的漏洞

Indirect prompt injection
间接提示注入

LLM Model Poisoning with Code Injection
LLM 通过代码注入造成模型中毒