Stephanie Coates

WebGL under the hood

I recently saw a youtube video highlighting some amazing personal portfolios centered around interactive 3D graphics.

I've always been curious about 3D graphics and animations running in the browser, but never looked into how it all works. That video inspired me to dig in and learn more.

How do you make 3D work on a 2D screen?

Developers can play with light, shadow, size, and position in the same way artists do when they draw or paint. These tricks—like making things smaller when they’re farther away, adding shadows, or overlapping objects—fool your brain into seeing depth and volume, even though everything is technically flat on your screen. But, without a pencil and paper, what tools do we use to create that same illusion on a screen?

WebGL

The most crucial component of 3D rendering in the browser is a technology called WebGL, or Web Graphics Library.

WebGL is a JavaScript API that allows web browsers to render interactive 3D and 2D graphics without plugins. The open standard was initially developed by engineers at Mozilla and is now widely adopted and incorporated into all modern browsers. It's built on top of OpenGL ES (Open Graphics Library for Embedded Systems), a low-level API that interacts directly with the system's GPU to render 2D and 3D vector graphics.

A bit about OpenGL

OpenGL, or Open Graphics Library, was developed in 1992 and provided a unified, standardized way for developers to access graphics hardware (GPUs) on various platforms. Before this open standard, graphics programming was often tied to specific hardware, which made it difficult to build portable applications.

While the CPU is the brain of a computer and responsible for executing instructions and processing data for a wide variety of tasks, it can only focus on a limited number of things at once. It will delegate certain tasks, like data-intensive calculations needed to render images, animations, and 3D graphics to the GPU, which is built with thousands of cores (the individual "workers" to handle computations) and designed to process many parallel tasks simultaneously. This makes it ideal for fast and efficient graphics rendering, where lots of data (like pixels or vertices) need to be processed at once.

OpenGL first released OpenGL ES (Embedded Systems) in 2003, a subset of the full OpenGL standard and designed specifically for mobile devices, embedded systems, and gaming consoles, where hardware resources are more limited than on desktop system.

WebGL wraps the OpenGL ES 2.0 API and works on any platform (Windows, MacOS, Linux, Android, and iOS) as long as a compatible web browser is installed. Since it runs inside the browser, it's subject to that browser's security sandbox, which helps prevent malicious hardware access but also limits some features compared to native OpenGL ES.

How does WebGL fit into the existing web platform?

3D WebGL graphics are embedded in a webpage using the <canvas> HTML element, which creates a blank space, or drawing board, on the page with a defined width and height. JavaScript accesses the canvas’s rendering context and calls methods to instruct the GPU to handle drawing shapes, images, and animations directly on the canvas.

How WebGL works, step by step

1) Setting up the WebGL canvas and context:
Because WebGL is built into modern web browsers, it doesn't require any plugins or 3rd party installations. Within the HTML body, an id is attached to the <canvas> element. In the JavaScript code, here's how to access the canvas element and create a context variable:

const canvas = document.getElementById('glCanvas');
const gl = canvas.getContext('webgl');

The gl variable is the WebGL context, acting as the interface between the JavaScript code and the GPU. It provides methods to create and manage shaders, buffers, textures, and handle drawing operations.

Shaders, written in GLSL (OpenGL Shading Language), are small specialized programs designed to handle tasks like positioning shape vertices in 3D space (vertex shaders) and determining the color of each pixel (fragment shaders). With parallel processing, the GPU can compute the position of thousands or millions of vertices simultaneously, rather than one at a time.

The browser handles things like memory management, resource allocation, double buffering, frame rate control, and efficient re-drawing to make sure animation rendering is smooth.

2) Sending JavaScript data to GPU buffers

In order for the GPU shaders to work, JavaScript needs to pass down information about the shape and its vertices. These vertex data, known as attributes—vertex position (3D [x, y, z] coordinates), color, texture coordinates, and normals—are sent to the GPU and stored in special data structures called buffers. Buffer objects are memory blocks optimized for rapid access and manipulation during rendering—they doesn't have the complex operations of high-level data structures of arrays or objects; instead, they function more as raw containers for data the GPU uses directly.

For example, you could describe a square using these position coordinates in a vertex buffer:

Vertices: 
(0, 1), // Vertex 0 
(1, 1), // Vertex 1
(0, 0), // Vertex 2
(1, 0) // Vertex 3

Index buffers, another type of buffer, work in conjunction with vertex buffers and tell the GPU the order in which to connect the points specified in the vertex buffer to form shapes. An index buffer for the square specified above might look like this:

Index Buffer: [0, 2, 1, 1, 2, 3]

This means that the index buffer tells the GPU to draw two triangles, the first one by connecting vertices 0, 2, and 1, and the second one by connecting vertices 1, 2, and 3. The index buffer lets you use the same vertices multiple times without re-sending all the position coordinate data, saving memory and making things more efficient.

3) GPU Rendering Pipeline

Once the vertex and index buffers are created and bound (selected), a step-by-step process happens inside the GPU to get the final image appearing on the web browser canvas.

Vertex Processing: The vertices of the 3D shape are processed. Each vertex goes through the vertex shader to be positioned, rotated, and scaled into clip space (relative to itself, to other shapes on the canvas, and from the camera's point of view).
Other vertex attributes like color, texture, and normals are applied here, as well as uniforms, data defined and passed from JS that apply to the whole shape and don't change per vertex—things like overall color, lighting/texture information, or transformation matrices that influence shape behavior in response to user interaction. Uniforms function like global variables and are applied and remain consistent across all vertices and pixels during a drawing operation.
Primitive Assembly: Vertices are grouped into primitive triangle shapes based on the instructions in the index buffer.

Why triangles?

Triangles are the most commonly used shape in 3D graphics and the preferred foundational shape GPUs use for rendering. It's the simplest polygon, is a guaranteed flat surface (any 3 points in 3D space will always form a flat plane), is computationally simple and fast to render, and functions as the building block for all shapes, no matter how complex. Even circles, complex curves, or detailed 3D models are eventually represented as thousands (or millions) of small triangles.

Clipping: Only portions of the primitive triangles that are visible from the camera's point of view are processed further. Anything out of view is discarded, reducing GPU load and freeing up processing power.
Rasterization: Rasterization is the process of generating fragments, individual points corresponding to pixel locations on the screen that the primitive triangles cover. Potential pixel colors are generated by interpolating the vertex colors and texture coordinates across the triangle's surface. Think of this as the step where the shapes you see in 3D space are turned into a flat 2D image that the monitor can display.

What's the difference between a fragment and a pixel?

They're similar, but not exactly the same. A fragment is better characterized as a potential pixel. Not every fragment will end up as a visible pixel on the screen—after rasterization generates all fragments, depth testing checks which fragments are in front of others from the camera viewpoint. If two fragments cover the same pixel but one is closer to the camera, only that one will be displayed. Fragments represent possible outputs, but need to go through additional processing to determine which ones will be displayed and what their final colors will be based on lighting, textures, and other uniforms.

Fragment processing: While rasterization generates the fragments, the actual color calculation and pixel generation happens here. After depth testing identifies which fragments to display, the fragment shader processes them. Pixel colors are then calculated based on several factors: the interpolated color from the rasterization step, lighting effects such as shadows and reflections, texture, transparency, and blending applications, and other uniforms and calculations.
Writing to the framebuffer: After processing the fragments, the final color values for each pixel are written to the framebuffer, the memory structure that stores the pixel data that will be displayed on the screen.

4) Handling interactive inputs

Typically, animated 3D graphics will re-render in sync with the browser's refresh rate at 60 FPS. In addition, WebGL contexts can handle user inputs, such as mouse or keyboard events, to interact with and manipulate the 3D graphics. Event listeners capture user inputs, such as rotating, repositioning, or scaling a shape up or down, or zooming, panning, or orbiting the camera perspective around objects. These inputs modify the JS transformation matrices and are passed down to the GPU as uniforms, which are then passed into the shaders and fed into the rendering pipeline.

That was a lot! Here's the TLDR:

WebGL canvas and context setup: A WebGL context is created and attached to an HTML canvas and acts as the interface between the JavaScript code and the GPU. It provides methods to create and manage shaders, buffers, textures, and handle drawing operations.
JS -> GPU data flow: Vertex attributes like position and color are sent from JS to the GPU and stored in buffers for efficient access during rendering.
Render pipeline: Vertices are transformed, assembled into triangle primitives, clipped, rasterized into fragments, processed into pixels based on color, lighting, and texture input, and written to the framebuffer for display.
Interactive inputs: User inputs modify transformation matrices that are passed as uniforms to the GPU shaders, allowing for real-time manipulation of 3D objects.

WebGL visual illusion techniques

I'd like to dig more into some of the specific techniques you can use in the vertex processing and fragment processing stages to create 3D illusions. WebGL holds so much power when it comes to manipulating light, shadows and reflections, applying texture to surfaces, changing camera perspective, and shaping visual and motion perception, and more.

Aside from Z-buffering, WebGL doesn't have built-in methods to drag-and-drop these techniques into a project, however it's possible to incorporate all of these manually using vertex shaders, fragment shaders, and transformation matrices. Some of the techniques:

Lighting models and techniques like Phong shading (named after Bùi Tường Phong, a Vietnamese computer graphics researcher who introduced the technique) are used to calculate how light reflects off objects based on their angles and materials. Shadows and highlights are laid onto shapes to indicate curves and edges.
The depth of objects on a visual plane are tracked with a Z-buffer—objects closer to the virtual camera or point of view obscure objects farther away, helping the brain perceive which objects are in front and which are behind.
Bump mapping gives a smooth 3D object the illusion of texture, like bumps or grooves, by manipulating how light interacts with the surface. It uses a bump map, a 2D grayscale image where light areas represent raised bumps and dark areas represent indents. By adjusting the surface normals (the direction perpendicular to the surface at each pixel point) of the 3D object, the surface appears textured with shadows and highlights, without actually changing the object's shape by adding more polygons to the 3D model. This keeps the model efficient while making it look much more detailed and realistic.
Within the digital plane, WebGL simulates an adjustable camera that can be moved and rotated around the scene, and allowing the camera's field of view to be widened or narrowed, affecting how distances within the scene are perceived. Being able to move the camera and adjust the field of view mimics how we see the world, making it appear more lifelike and 3D.
Parallax—the shift in position of an object when you look at it from different angles—is simulated by having objects in the digital scene move at different speeds depending on their distance from the camera or point of view. This simulates how motion is perceived in the real world: objects in the background move slower relative to those in the foreground.

These techniques are powerful, but cumbersome to code manually into WebGL. What if you wanted them provided out-of-the-box? This is where additional layers of abstraction come in.

3D libraries and frameworks

Three.js is the most popular library for building 3D graphics and animations in the browser, however there are other options out there! To list a few:

Three.js: A popular, open-source JS library that's built atop WebGL and makes it easier to use. Three.js provides a higher-abstraction interface for creating and displaying 3D content and allows developers to create 3D objects, apply textures, lights, and cameras, and build and animate 3D scenes without writing a lot of complex WebGL code.
Babylon.js: A powerful 3D engine also built on top of WebGL. It's similar to Three.js but offers more built-in tools for creating large-scale 3D experiences, including physics engines and advanced materials. It's often used for VR, AR, and game development.
PlayCanvas: A 3D WebGL game engine hosted in the cloud.
A-Frame: A framework built on top of Three.js, abstracting away more complexity with an easy, HTML-like declarative syntax.

WebGL alternatives

I'm not aware of any current alternatives to WebGL that provide a browser-native API to interact with the GPU and render 3D graphics. If you've got an graphics project that doesn't require intricate 3D work, it's possible to simply use the HTML <canvas> element and create a 2D rendering context (const ctx = canvas.getContext('2d')), which provides methods to draw shapes, images, and animations. SVGs can be used for static or animated 2D graphics. Basic 3D effects on elements can also be generated using just CSS animations.

There's exciting news just around the corner!
WebGPU is an emerging standard with the same functionality to render 3D graphics, but with better performance and more control over GPU hardware. It aims to be the successor to WebGL, but is still an experimental technology (a W3C Working Draft was released in 2023) and has limited browser support.