This document proposes a new API aimed at grabbing a screenshot of either the current viewport, the current window or the current screen.

Extending this API beyond the current viewport/window/screen is possible, but we avoid it for the time being. The rationale for this decision is discussed further below.

This is an early draft of a proposal for an API for grabbing screenshots.

Terminology

Definitions for This Document

For the purposes of this document, when the viewport is discussed, we mean the viewport of the top-level document; essentially the entire user-visible part of the user agent tab.

Definitions from Other Documents

This document uses the definitions of {{MediaStream}} and {{MediaStreamTrack}} from [[!MEDIACAPTURE-STREAMS]].

This document uses the definitions of display surfaces from [[!SCREEN-CAPTURE]], including the distinction between browser, window and monitor display surfaces.

The terms fulfilled, rejected and resolved used in the context of Promises are defined in [[ECMA-262]].

Introduction

Goals

We define a "screenshot" as an image representing what the user sees on their screen, or part thereof. A screenshot is defined as a subset of the user's current browser, window or monitor display surface.

Non-Goals

We avoid extended definitions for the time being. For the purposes of this document, the following are NOT considered a screenshot:

  1. The "Capture full size screenshot" functionality in various browsers' Developer Console does not match our definition of a screenshot. We avoid this use case due to the security and privacy complexities inherent in capturing more than the user is likely to visually inspect against over-capture by a malicious application.
  2. An image of a display surface other than the current one does not match our definition of a "screenshot." We avoid this use case due to the API and UX complexities inherent in obtaining an image from an arbitrary "foreign" source.

Necessity of this API

Sample Use Case

Applications often present end users the option to file feedback. This feedback can be used to report bugs about the application itself, or about its interaction with either the user agent or the operating system. When reporting such defects, it is often true that a picture is worth a thousand words.

Insufficiency of Workarounds

Current mechanisms exist for an application to grab a screenshot. They all have significant associated issues, necessitating the definition of a new API.

Workarounds using getDisplayMedia

Applications sometimes use getDisplayMedia, then attempt to grab a single frame from the resulting video track. This workaround is problematic on several accounts:

  1. The user is compelled to grant the application permission to grab multiple frames. If the user is not careful to manually revoke that permission, the application could be spying on the user for a long time after the initial permission is granted.
  2. Even if the user or the application cut the capture after a single frame, the user does not really know what that frame contained, as the application can quickly flash a cross-origin iframe, record the frame where that iframe was visible, then immediately hide/unload the iframe and terminate the capture.

Workarounds using Redrawing

Some libraries exist for redrawing a part of the DOM onto a Canvas.

  1. The main problem with such workarounds is that they do not work well for cross-origin content, leading to a discrepancy between what the user sees on their screen, and what is captured as an image.
  2. Additionally, these solutions are by definition confined to producing a screenshot of the browser display surface, whereas an ideal screenshot-producing solution would allow grabbing an image of the window or monitor.

Workarounds using Extensions

Various browser extensions exist aiming to fill in the gap in the web platform. Reliance on extensions suffers from several issues:

  1. Most importantly, the various well-known privacy and security issues associated with compelling the user to install third-party software.
  2. Awkward user flow requiring the installation of what essentially amounts to third-party software.
  3. Specificity to a user agent.

Workarounds using Manual Screenshot Upload

It is possible to direct the user to take a screenshot manually, using whatever tools are available on the user's operating system, and then ask the user to hand the application that file.

  1. This creates a very awkward user journey, and requires that the user would either already know how to take a screenshot, or that the user be instructed how do to so on their specific operating system. On many popular modern operating system, the keyboard shortcuts required are elaborated, excluding less tech-savvy users.
  2. The user's choice is limited to grabbing a screenshot of the entire screen or of a window; the ability to take a screenshot of just the web-application is missing.

Workarounds using Developer Tools

Most modern browsers include a Developer Tools module which includes the ability to grab a screenshot. An application can direct the user to use one of these and to then share with the application the resulting image.

It should go without saying that this workaround is highly perilous for the user, as directions to use the Developer Tools module could easily lead to the user being asked to help the application exceed the permissions which it should have.

Reputable applications would never direct the user to make use of the Developer Tools module. Tech-savvy users would find such instruction highly suspect, whereas unsavvy users would often find the instruction difficult to follow.

Conclusion

No good workaround exists, yet the capability is highly desired on the web, as evidenced by the many high-profile applications which provide this functionality using complicated workarounds. An API for accomplishing this is necessary.

API

captureScreenshot

          partial interface MediaDevices {
            Promise<CaptureScreenshotResultType> captureScreenshot(ScreenshotSurfaceType surface);
          };
        
captureScreenshot

Prompts the user for permission to grab a screenshot of either the surface or any [=surface-size|contained=] type. The screenshot is presented to the user for approval and editing before it's handed to the application.

When the {{captureScreenshot()}} method is invoked, the User Agent MUST run the following steps before proceeding:

  1. If the [=relevant global object=] of [=this=] does not have [=transient activation=], the user agent MUST return a [=rejected=] promise with a {{DOMException}} object whose {{DOMException/name}} attribute has the value {{InvalidStateError}}.
  2. If the current settings object's [=environment settings object/responsible document=] is NOT [=Document/fully active=], the user agent MUST return a [=rejected=] promise with a {{DOMException}} object whose {{DOMException/name}} attribute has the value {{InvalidStateError}}.
  3. If {{captureScreenshot()}} is called from a document lacking the [=screenshot-policy|screenshot=] {{PermissionsPolicy}}, the user agent MUST return a [=rejected=] promise with a {{DOMException}} object whose {{DOMException/name}} attribute has the value {{NotAllowedError}}.
  4. The user agent SHOULD inspect the DOM and return a [=rejected=] promise if it suspicious behavior is detected, such as:
    • An overlaid cross-origin iframe at an opacity that's likely to escape the user's notice.
    • An overlaid cross-origin iframe displayed inside the viewport using a suspicious size in a way that the user agent suspected might be intended to escape the user's notice.
  5. The user agent MAY compared the document's URL against a list of suspicious sites and return a [=rejected=] promise if it believes the likelihood of trickery is high. The user agent MAY also degrade surface based on the same criteria, or let the URL otherwise influence its sensitivity when evaluating suspicious behavior in any of the previous steps.

If all of these validations pass, the user agent MUST then return promise p to the application, and present the user with a prompt containing a preview of the screenshot.

If surface was anything [=surface-size|larger=] than {{ScreenshotSurfaceType/viewport}}, the user agent nevertheless MUST offer all [=surface-size|smaller=] surfaces as well, so as to prevent the user being compelled into sharing more than they would like.

This preview MUST be large enough for the user to be able to inspect it reasonably. It is recommended that the user agent provide a mechanism to zoom in or magnify the preview.

The user agent SHOULD provide the user with mechanisms to crop the image or black out parts of it. Although the application is better positioned to perform such image manipulation than the user agent, the rationale here is that the user should be able to reduce the amount of information shared with an application to begin with.

The user agent SHOULD add a random delay between when {{captureScreenshot()}} is called and when the screenshot is taken and presented to the user. This makes it harder for a malicious application to flash new content to the screen exactly at the time that the preview is presented to the user, thereby escaping the user's notice and gaining their approval of the screenshot based on the content they saw before the preview was presented.

If the user does not approve the screenshot, the user agent MUST [=reject=] p with a {{DOMException}} object whose {{DOMException/name}} attribute has the value {{AbortError}}.

If the user approves the screenshot, the user agent MUST resolve p with a {{CaptureScreenshotResultType}} constructed with the {{CaptureScreenshotResultType/surface}} the user ultimately chose and the {{CaptureScreenshotResultType/image}} resulting from the user's cropping and otherwise editing of the initial screenshot.

ScreenshotSurfaceType

Describes the different surfaces which can be captured as a screenshot. As input to {{CaptureScreenshot}}, it is used to describe which surface the application is interested in capturing. When returned by {{CaptureScreenshot}}, it describes which surface the user approved of capturing.

Note that a type for a monitor display surfaces is not currently defined. This is intentional. It avoids, for now, the complicated topic of calling {{CaptureScreenshot}} from a document presented in a window that spans multiple monitors.

          enum ScreenshotSurfaceType {
            "viewport",
            "window",
          };
        
Enumeration description
viewport The viewport associated with the document from which {{captureScreenshot}} was called. Note that this means that an embedded document captures its parent document as well as sibling documents.
window The window displaying the document from which {{captureScreenshot}} was called.

With respect to surface-size, note that each surface in this enum is contained within the next one, allowing us to define "larger" surfaces as those that appear later in this enum.

CaptureScreenshotResultType

Successful invocations of {{captureScreenshot}}, where the user approves a capture, produce a result of type {{CaptureScreenshotResultType}}.

          [Exposed=Window] 
          interface CaptureScreenshotResultType {
            attribute ScreenshotSurfaceType surface;
            attribute ImageBitmap image;
          };
        
surface

The surface which the user approved to capture. This may be a lesser surface than requested by the application.

image

The screenshot, after applying any editing that the user might have chosen to peform, such as cropping and/or blacking-out of parts of the image.

Screenshot Permissions Policy

This document defines a [=policy-controlled feature=] identified by the string "screenshot". Its [=default allowlist=] is "self".