LLM Discovery and Testing in Web App Scanning

Published November 21, 2024 | Last modified October 1, 2025

Objective

In this article, we will provide an overview of the large language model (LLM) discovery and testing capabilities in F5® Distributed Cloud Web App Scanning.

Note: This feature is still in Beta. We are currently evaluating feedback from early users of these capabilities and will continuously implement improvements based on usage.

Continuous testing of large language models in production is essential to identify and mitigate security vulnerabilities that arise from dynamic threats, real-world usage, and updates to the model. It ensures resilience against adversarial attacks, prevents sensitive data leakage, and adapts to evolving regulatory and ethical standards. Regular testing also protects user trust by addressing issues like prompt injection and data extraction, while reinforcing compliance and reliability. By incorporating practices such as automated audits, adversarial simulations, and user feedback monitoring, organizations can safeguard their LLMs and ensure their secure, responsible deployment.

To help our users, we have developed a set of automated discovery and testing capabilities in F5® Distributed Cloud Web App Scanning with the purpose of uncovering threats outlined by the OWASP Top 10 for LLM Applications v1.1. Using the insights provided by F5® Distributed Cloud Web App Scanning, our users can leverage F5 AI Gateway to mitigate found vulnerabilities and protect their applications. Contact your F5 representative to learn more about F5 AI Gateway's strengths and capabilities.

Procedure

The LLM discovery and test suite in F5® Distributed Cloud Web App Scanning builds on a four-step process outlined below.

Probing: Web App Scanning will probe and interact with the interface of the web application to detect whether one or more large language models are embedded in the application.
Identification: Once an embedded large language model has been detected, Web App Scanning runs a fingerprinting algorithm to identify the type of model (e.g., Mistral 7B Instruct). Web App Scanning is capable of identifying more than 150 different types of popular models used in the industry.
Vulnerability Assessment: Web App Scanning runs its test suite against the identified model to uncover LLM-specific security vulnerabilities. Web App Scanning incorporates the Generative AI Red-teaming & Assessment Kit (garak) developed by NVIDIA and the Python Risk Identification Tool for generative AI (PyRIT) developed by Microsoft. Users can select specific test modules offered by both testing frameworks in the configuration of penetration tests in Web App Scanning.
1. garak: Learn more about garak on NVIDIA's website.
2. PyRIT: Learn more about PyRIT on Microsoft's website.
Reporting: If vulnerabilities are found in the large language model, Web App Scanning will report these issues in the penetration test report (along with technical details).

By the end of the test, the penetration test report will contain a list of LLM-specific security vulnerabilities (if any is found) together with the web-specific issues found (as outlined by the OWASP Top 10:2021).

Configuration

By default, the LLM discovery and testing capabilities are not enabled for new web applications created in the Scan service in Web App Scanning. To enable these capabilities, navigate to application of interest in the Web App Scanning console, click on Edit, and click on Profiles. Select the test profile you would like to make use of for testing.

On the test profile page, you will see a section named Scan Settings with an option named Enable LLM Discovery.

Figure: Scan Settings

Click on Enable LLM Discovery to activate the probing and identification module in Web App Scanning. This will enable the two first steps in the four-step procedure outlined above.

Figure: LLM Tests

Once you have enabled LLM discovery, you can select one or more test modules (from garak and PyRIT), which will be run as part of the penetration test. Please note that each test module comes with its own set of potential consequences. Therefore, you must accept the associated warning when enabling each module.

Figure: LLM Test Modal

The settings you configure will automatically be saved and applied the next time you start a penetration test using the given test profile.

Results

Once you run a penetration test with one or more test modules from garak and/or PyRIT enabled, Web App Scanning will execute the four-step process outlined above and report on found LLM-specific security vulnerabilities. The issues will be reported similar to how regular web-specific issues are reported in the Web App Scanning console.

Figure: LLM Findings

In cases where it makes sense, Web App Scanning will capture a screenshot of the interaction with the model such as the example shown below.

Figure: LLM Chat

If you need assistance interpreting the LLM-specific results found by F5® Distributed Cloud Web App Scanning, please contact our support team.

LLM Discovery and Testing in Web App Scanning

Objective

Procedure

Configuration

Results

On this page: