Free Access to Windows 11’s Copilot Vision: Get Hands-On with the Screen-Seeing Feature

Copilot Vision has been available for some time, but its recent integration into the Windows 11 Copilot app marks a significant leap forward. Having tested it a few months ago when it was confined to the browser, I found it underwhelming. However, its desktop application version, now accessible to users outside the United States (excluding EU regions), reveals considerable improvements.

Initially, my experience with Vision in the Edge browser was limited—it could only interact with the currently open tab. This meant that its functionality was restricted. The notable upgrade in the Windows desktop app is its ability to interact beyond just a single browser tab, allowing users to select any open window. This enhancement greatly expands its usability across various applications, whether it’s accessing command prompts, statistics from apps, or gaming windows—though I did not test it in a gaming context.

Upon launching the desktop app and selecting the Vision feature, I encountered a menu to choose from any open window. For my first test, I opened an article on building a media server. While the interaction felt similar to my prior experience, the AI provided a fluid conversation. However, it fell short when I inquired about the operating system used by the article’s author—it was mentioned too far down the page for Vision to retrieve that information.

One limitation remains: Vision can only access content visible within the selected window. This means it cannot scroll or access additional information on a web page or interact with buttons directly. It can, however, guide users by highlighting relevant buttons with a distinct animation, but the final interaction still requires manual clicking.

Exploring Enhanced Capabilities

In a departure from its earlier limitations, Copilot Vision has gained the ability to search the web for additional information. Initially, when I requested the author’s designation at the publishing company, it responded negatively and sought permission to search online. After obtaining permission, it successfully provided the title and further details from the author’s page, demonstrating a reasonable grasp of information rephrasing.

To further assess the AI’s capabilities, I presented a screenshot of a shell command script result from my DietPi setup. In this instance, the assistant correctly outlined the purpose of each command, reiterating details without needing to consult the web.

Next, I displayed only the commands and requested clarification. Vision accurately described each parameter’s function, suggesting a robust internal knowledge base, as it did not reference online sources.

To further validate its accuracy, I tested a list of Docker commands I hadn’t previously introduced. While Vision described the actions of the first four commands effectively, further engagement was needed to compel it to continue beyond that point.

docker commands to test in copilot vision

Concluding my assessment, the responses were generally accurate, but it remained ambiguous whether Vision leverages online resources or solely utilizes its own dataset.

This overview of Copilot Vision on Windows 11 underscores its significant advancements. If you are comfortable navigating Copilot data policies, I encourage you to explore its capabilities—it’s seamlessly integrated into the app.

Source & Images