Modern black-box web application scanners face persistent scalability barriers caused by complex interaction dependencies, dynamic content generation, and deployment fragmentation in distributed environments. The persistent challenge in intelligent information security is the accurate and deep vulnerability discovery within modern, dynamic web systems. Traditional black-box web scanners suffer from a critical coverage deficit, failing to reach complex, interaction-dependent pages essential for complete forensic analysis and security validation. This deficit is significantly exacerbated in distributed, containerized deployments where hidden attack surfaces reside across decoupled nodes. To confront this limitation, we present Hoyen, a semantics-aware black-box fuzzing framework that utilizes Large Language Models (LLMs) for intention-driven attack path inference in real-world environments. Hoyen employs three integral components designed for deep exploration and security precision: (1) LLM-driven user intention prediction for deep state navigation, (2) structural content refinement for vulnerability analysis efficiency, and (3) topology-aware access point discovery for identifying and mapping hidden or orphaned attack vectors. Extensive evaluations on 12 open-source web applications against six state-of-the-art baselines show Hoyen achieves 2× average page coverage and 90%+ request accuracy. Crucially, Hoyen successfully uncovered 10 XSS vulnerabilities, including 4 unique ones missed by other scanners, validating its superior capability in deep vulnerability discovery and precision attack inference. These results establish Hoyen as a robust and scalable foundation for next-generation intelligent information security analysis systems.
Our prototype implementation consists of over 5,000 lines of code. We use Python and Selenium to control the mainstream web browser, Chrome. By analyzing page rendering results across the browser, we can verify whether the web application renders as expected and obtain sufficient contextual information for the LLM, enabling it to understand and interpret the web application effectively.
Coverage:
To evaluate the coverage extent of the scanner, we gather comprehensive access data on the server side and perform comparative analyses with other scanners. The collected data is consistently processed and analyzed, allowing us to compare the success and failure rates of access attempts. This enables us to assess testing strategies and the accuracy of requests made by different scanners.
vulnerabilities:
To evaluate the vulnerability detection capabilities of different scanners, we compile and review vulnerability test reports for the same web application from various scanners, then reproduce the reported vulnerabilities for verification. This process enables us to measure the effectiveness of each scanner in identifying vulnerabilities within the web application. However, there is no universally accepted standard for classifying and evaluating vulnerabilities, especially when determining whether multiple similar vulnerabilities should be treated as a single, unique instance. It is common for scanners to report several distinct XSS vulnerabilities on a single web application page or even within a specific function or URL. In our analysis, we consider vulnerabilities with similar trigger paths, the same type of trigger page, and analogous exploit code as a single, unique vulnerability to ensure consistency. By clustering the vulnerabilities identified in the reports, we can determine the number and characteristics of unique vulnerabilities.
Web applications:
To ensure that our scanner can effectively process and analyze a wide variety of web applications, we selected 12 distinct open-source web applications. To reflect real-world scenarios, each of these applications has garnered over 100 stars on GitHub and has an average of over 15,000 stars, indicating their widespread adoption across various web platforms. These web applications were deployed on our servers, with each assigned a separate Docker container to prevent interference and ensure consistency in the testing results. The 12 web applications were divided into two categories: the first includes known vulnerable applications, such as DVWA and WackoPicko, while the second comprises modern, production-grade applications like WordPress, OsCommerce, Prestashop, Joomla, Drupal, Pico, Typecho, Emlog, Z-Blog, and Jizhi.
Scanners:
To benchmark coverage and other relevant performance metrics, we compare our scanner, Hoyen, with the following scanners, which are widely used in both academic research and industry: BlackWidow, WebExplor, w3af, ZAP, BurpSuite, and Xray. Among these, BlackWidow and WebExplor are considered state-of-the-art black-box web scanners. However, it is important to note that WebExplor is primarily designed for individual page testing and has limited capability to scan entire web applications. Xray relies on a passive proxy for scanning, while Rad, officially supported by Xray, is used as an auxiliary tool for active scanning. Additionally, w3af and ZAP are well-known open-source scanners, frequently referenced in academic comparisons, and are commonly employed to scan web applications. BurpSuite, on the other hand, represents a popular commercial scanner. While its closed-source nature makes an exhaustive analysis of its features challenging, including BurpSuite in our evaluation allows for a more comprehensive comparison of our scanner’s capabilities. The comparison with a commercial scanner like BurpSuite is especially valuable in providing a broader perspective on the strengths of our approach.The target web application was scanned in a consistent environment using the default configurations of each scanner.
Given the potential for significant impact of changes to the crawler configuration on test results, we have also provided the configuration of each scanner to ensure the accuracy of the results.
Burpsuite: In BurpSuite, we have both crawl and audit enabled for scanning, and the scanning configuration uses Balanced.
Hoyen: The following command was used to run Hoyen:
python3 hoyen.py -u [url]
w3af: In w3af, we enabled all the plugins related to XSS and used web_spider as the crawler.
WebExplor: In WebExplor, we changed the URL address in top_sites and ran runWild.sh in shell to start testing.
Xray: Before running Xray it need to run Rad first. Rad acts as a crawler that collects web application data and sends it to Xray by proxy. their commands are as follows:
rad -t [URL] -http-proxy 127.0.0.1:17777
xray webscan --listen 127.0.0.1:17777 --html-output result.html
ZAP: In ZAP, we used the automated scan with both traditional spider and ajax spider.