Black-box scanners have played a significant role in detecting vulnerabilities for web applications. A key focus in current black-box scanning is increasing test coverage (i.e., accessing more web pages). However, since many web applications are user-oriented, some deep pages can only be accessed through complex user interactions, which are difficult to reach by existing black-box scanners. To fill this gap, a key insight is that web pages contain a wealth of semantic information that can aid in understanding potential user intention. Based on this insight, we propose Hoyen, a black-box scanner that uses the Large Language Model to predict user intention and provide guidance for expanding the scanning scope. Hoyen has been rigorously evaluated on 12 popular open-source web applications and compared with 6 representative tools. The results demonstrate that Hoyen performs a comprehensive exploration of web applications, expanding the attack surface while achieving about 2x than the coverage of other scanners on average, with high request accuracy. Furthermore, Hoyen detected over 90% of its requests towards the core functionality of the application, detecting more vulnerabilities than other scanners, including unique vulnerabilities in well-known web applications.
Our prototype implementation consists of over 5,000 lines of code. We use Python and Selenium to control the mainstream web browser, Chrome. By analyzing page rendering results across the browser, we can verify whether the web application renders as expected and obtain sufficient contextual information for the LLM, enabling it to understand and interpret the web application effectively.
Coverage:
To evaluate the coverage extent of the scanner, we gather comprehensive access data on the server side and perform comparative analyses with other scanners. The collected data is consistently processed and analyzed, allowing us to compare the success and failure rates of access attempts. This enables us to assess testing strategies and the accuracy of requests made by different scanners.
vulnerabilities:
To evaluate the vulnerability detection capabilities of different scanners, we compile and review vulnerability test reports for the same web application from various scanners, then manually reproduce the reported vulnerabilities for verification. This process enables us to measure the effectiveness of each scanner in identifying vulnerabilities within the web application. However, there is no universally accepted standard for classifying and evaluating vulnerabilities, especially when determining whether multiple similar vulnerabilities should be treated as a single, unique instance. It is common for scanners to report several distinct XSS vulnerabilities on a single web application page or even within a specific function or URL. In our analysis, we consider vulnerabilities with similar trigger paths, the same type of trigger page, and analogous exploit code as a single, unique vulnerability to ensure consistency. By clustering the vulnerabilities identified in the reports, we can determine the number and characteristics of unique vulnerabilities.
Web applications:
To ensure that our scanner can effectively process and analyze a wide variety of web applications, we selected 12 distinct open-source web applications. To reflect real-world scenarios, each of these applications has garnered over 100 stars on GitHub and has an average of over 15,000 stars, indicating their widespread adoption across various web platforms. These web applications were deployed on our servers, with each assigned a separate Docker container to prevent interference and ensure consistency in the testing results. The 12 web applications were divided into two categories: the first includes known vulnerable applications, such as DVWA and WackoPicko, while the second comprises modern, production-grade applications like WordPress, OsCommerce, Prestashop, Joomla, Drupal, Pico, Typecho, Emlog, Z-Blog, and Jizhi.
Scanners:
To benchmark coverage and other relevant performance metrics, we compare our scanner, Hoyen, with the following scanners, which are widely used in both academic research and industry: BlackWidow, WebExplor, w3af, ZAP, BurpSuite, and Xray. Among these, BlackWidow and WebExplor are considered state-of-the-art black-box web scanners. However, it is important to note that WebExplor is primarily designed for individual page testing and has limited capability to scan entire web applications. Xray relies on a passive proxy for scanning, while Rad, officially supported by Xray, is used as an auxiliary tool for active scanning. Additionally, w3af and ZAP are well-known open-source scanners, frequently referenced in academic comparisons, and are commonly employed to scan web applications. BurpSuite, on the other hand, represents a popular commercial scanner. While its closed-source nature makes an exhaustive analysis of its features challenging, including BurpSuite in our evaluation allows for a more comprehensive comparison of our scanner’s capabilities. The comparison with a commercial scanner like BurpSuite is especially valuable in providing a broader perspective on the strengths of our approach.The target web application was scanned in a consistent environment using the default configurations of each scanner.
Given the potential for significant impact of changes to the crawler configuration on test results, we have also provided the configuration of each scanner to ensure the accuracy of the results.
Burpsuite: In BurpSuite, we have both crawl and audit enabled for scanning, and the scanning configuration uses Balanced.
Hoyen: The following command was used to run Hoyen:
python3 hoyen.py -u [url]
w3af: In w3af, we enabled all the plugins related to XSS and used web_spider as the crawler.
WebExplor: In WebExplor, we changed the URL address in top_sites and ran runWild.sh in shell to start testing.
Xray: Before running Xray it need to run Rad first. Rad acts as a crawler that collects web application data and sends it to Xray by proxy. their commands are as follows:
rad -t [URL] -http-proxy 127.0.0.1:17777
xray webscan --listen 127.0.0.1:17777 --html-output result.html
ZAP: In ZAP, we used the automated scan with both traditional spider and ajax spider.