
Microsoft’s NLWeb: An Overview
In May 2023, during the Build 2025 conference, Microsoft unveiled an innovative initiative known as NLWeb, which stands for “Natural Language Web.”This project aimed to facilitate intelligent interactions between AI agents and websites, enabling them to perform complex tasks by engaging directly with online services.
Implications of NLWeb
Among the companies involved, Shopify and TripAdvisor have expressed their support for this concept, positioning the NLWeb as a key component in what Microsoft refers to as the “agentic web.”This vision promises a future where AI can autonomously execute a range of tasks online, streamlining user experiences significantly.
Discovery of Security Vulnerabilities
However, significant concerns have emerged surrounding the security of the NLWeb framework. Aonan Guan, a security researcher, alongside collaborator Lei Wang, identified a path traversal vulnerability within the open-source project’s code repository. Their discovery stemmed from an examination of the NLWeb GitHub repository, particularly a file named webserver/static_file_handler.py
.
Technical Analysis of the Vulnerability
The identified flaw lies within a specific segment of code:
# The vulnerable code snippet safe_path = os.path.normpath(path.lstrip('/'))
possible_roots = [ APP_ROOT, os.path.join(APP_ROOT, ‘site’, ‘wwwroot’), ‘/home/site/wwwroot’, os.environ.get(‘HOME’, ”), ]
# Later in the code…full_path = os.path.join(root, safe_path)
The first line of this snippet seems innocuous; os.path.normpath()
is designed to normalize paths by eliminating unnecessary separators and up-level references, a process documented in the official Python documentation.
Impact of the Vulnerability
While this function is beneficial, it introduces a security loophole. According to Guan, it fails to restrict users from utilizing directory traversal techniques, such as ../
sequences, to navigate beyond the intended web directory.
To validate the issue, Guan set up a local server configuration on 0.0.0.0:8000
. By executing the command curl "http://localhost:8000/static/..%2f..%2f..%2fetc/passwd"
, he successfully retrieved the contents of /etc/passwd
, vital information on UNIX systems that contains user account details.
In addition, Guan was able to access other sensitive files, including the project’s .env
file, which should always remain confidential due to its housing of sensitive credentials such as API keys. This was further demonstrated by running the command curl "http://localhost:8000/static/..%2f..%2f..%2fUsers//NLWeb/code/.env"
.
Microsoft’s Response and Recommendations
Following Guan’s findings, which were reported on May 28, Microsoft acknowledged the issue on the same day and implemented a fix within 48 hours. The resolution involved several critical measures:
- Initial filtering for any instances of
..
in the path to counter potential directory traversal attempts. - Verification that the requested file has an acceptable extension, including formats like.html, .css, and.json.
- Resolution of the absolute path to confirm it resides within an authorized root directory, thus preventing any unauthorized access.
Given this incident, Guan emphasizes the importance of updating any instances of NLWeb immediately. He highlights that as the concept of the agentic web expands, it exposes new vulnerabilities, particularly as natural language processing may inadvertently interpret malicious file paths or commands if not managed with utmost scrutiny.
You can find more information about this issue and the complete vulnerability report on Neowin.
Leave a Reply