Dev Thicket

How Display Trees Work (and Why They Matter)

Immediate mode vs retained mode rendering. What display trees are, how they work, and why frameworks like PixiJS, Starling, and Willow use them.

How Display Trees Work (and Why They Matter for Game Rendering)

If you have built a 2D game or interactive application, you have faced a fundamental architectural decision: how do you organize what gets drawn to the screen? Most 2D renderers fall into one of two camps. Immediate mode rendering asks you to issue draw calls every frame, telling the GPU exactly what to paint and in what order. Retained mode rendering asks you to build a persistent tree of objects; the framework walks that tree each frame and handles drawing for you.

Both approaches work. Both ship real games. But they scale very differently as your project grows. A hundred sprites are easy to manage with raw draw calls. A thousand sprites with layered UI, camera transforms, and particle effects? That is where the choice starts to matter.

This article explains what display trees are, how they work under the hood, and why they have become the dominant architecture for 2D rendering frameworks over the last two decades.

Immediate Mode Rendering

In immediate mode, you tell the renderer what to draw every single frame. There is no persistent scene structure. You call functions like DrawImage() in your render loop, passing in textures, positions, and transforms directly. The renderer executes those calls, flushes the batch, and forgets everything. Next frame, you do it all again.

Here is a minimal example using Go and Ebitengine, a popular immediate mode 2D engine:

Ebitengine Immediate Mode Draw

This is straightforward and easy to reason about. You can see exactly what gets drawn and in what order. For a small game with a handful of sprites, it works perfectly.

The problems show up as complexity grows. Need to sort enemies by Y position so characters further away render behind closer ones? You write sorting code. Need a camera that scrolls the world but not the UI? You manually apply camera offsets to some draw calls but not others. Need a character made of multiple parts (body, armor, weapon) that all move together? You compute each part's world position from the parent manually. Every one of these concerns ends up as ad-hoc code scattered throughout your Draw() function.

Retained Mode and Display Trees

A display tree (also called a scene graph) flips the model. Instead of issuing draw calls, you build a persistent hierarchy of objects that describes your scene. The framework traverses this tree every frame, computing transforms, determining draw order, culling off-screen objects, and batching draw calls automatically.

Your job shifts from "tell the renderer what to draw" to "keep the tree up to date." Move a character? Set its position property. Add an explosion? Insert a node into the tree. Remove an enemy? Detach its node. The framework handles the rest.

Here is an interactive example. Hover over nodes to see how they map to paint order:

Scene Root
Backgroundz:0
Worldz:1
HUDz:2
Player
Enemy
Paint Order (back to front)
Background1st
Player2nd
Enemy3rd
HUD4th
Screen (front)

Containers

Group nodes into layers. A container's transform (position, rotation, scale) applies to all its children. Move the parent, everything inside moves with it.

Painter's Algorithm

Nodes draw back-to-front by Z-index. Background first, then world objects, then HUD on top. No manual draw ordering needed.

Describe, Don't Draw

Add nodes to the tree and set their properties. Willow traverses the tree each frame and produces the draw commands for you. You describe what exists; Willow handles how it renders.

This hierarchy encodes several things at once. Draw order comes from tree position: background draws before player, which draws before particles. Grouping comes from parent-child relationships: moving world moves everything inside it, which is exactly how a camera works. The ui subtree sits outside world, so it stays fixed on screen regardless of camera movement.

The player's equipment (body, armor, weapon) is nested under the player node. When the player moves, all three pieces move with it automatically. No manual offset math required.

How Display Trees Work Under the Hood

A display tree is not just a convenient organizational tool. It enables several important rendering optimizations and behaviors that would be tedious or error-prone to implement by hand.

Transform Inheritance

Every node in the tree has a local transform (position, rotation, scale) relative to its parent. The framework computes each node's world transform by concatenating its local transform with its parent's world transform. This means moving a parent automatically moves all of its descendants. Scaling a container scales everything inside it. Rotating a group rotates the entire subtree around the group's origin.

This is the mechanism behind cameras, UI scaling, and composite characters. It replaces dozens of lines of manual math with a single property change on the right node.

Alpha and Visibility Propagation

Alpha (transparency) propagates down the tree the same way transforms do. Setting a container to 50% alpha makes everything inside it 50% transparent, relative to each child's own alpha. Setting a node to invisible hides the entire subtree, skipping it during traversal entirely.

This gives you fade-in and fade-out effects on entire groups for free. Fade the ui node's alpha from 1 to 0 and every element inside fades together, without touching any of them individually.

Draw Order from Tree Traversal

The renderer walks the tree in a depth-first order. Children draw on top of earlier siblings. This gives you predictable layering by default. For cases where you need to override the natural order (a speech bubble that should render above other characters, for example), most frameworks offer a z-index property that adjusts a node's sort position within its parent without moving it in the tree.

Dirty Flags and Caching

Recomputing world transforms for every node every frame is wasteful when most nodes are not moving. Display trees use dirty flags to track which nodes have changed. When you set a node's position, it marks itself and all descendants as dirty. During the next render pass, only dirty nodes recompute their world transforms. Everything else reuses cached values from the previous frame.

This optimization is significant. In a scene with 5,000 nodes where only 50 are moving, the framework skips 99% of the transform math. The savings compound as scene complexity grows.

Automatic Batching and Culling

Once world transforms are resolved, the renderer can batch draw calls by texture atlas and shader. Nodes sharing the same texture sheet get submitted to the GPU in a single draw call, regardless of where they sit in the tree. The framework can also perform frustum culling, skipping nodes whose bounding boxes fall entirely outside the visible area. Neither of these optimizations requires any effort from the developer.

A Proven Pattern

Display trees are not a new idea. They are one of the most refined patterns in 2D rendering, with over two decades of real-world validation.

Adobe Flash introduced its display list architecture around 2005 with ActionScript 3. Flash's DisplayObject hierarchy (Stage, Sprite, MovieClip, TextField) gave developers a clear, composable model for building rich interactive content. Millions of games and applications shipped on this architecture.

When Flash's runtime performance became a bottleneck, Starling Framework (2011) re-implemented the same display tree API on top of Stage3D, Flash's GPU-accelerated rendering backend. Same tree model, dramatically better performance. This proved that the display tree pattern was not tied to any particular renderer; it was a general-purpose scene management strategy.

PixiJS (2013) brought the pattern to the web, implementing a Flash-style display tree on WebGL. It remains one of the most widely used 2D rendering libraries in the JavaScript ecosystem. Cocos2d-x, LibGDX's Scene2D, and numerous other frameworks followed the same blueprint.

The display tree pattern keeps reappearing because it solves a real structural problem. It separates scene description from rendering mechanics, giving developers a high-level API while letting the framework optimize draw calls underneath.

How Willow Implements It

Willow is a display tree renderer for Go, built on Ebitengine. It provides the retained mode layer that Ebitengine's immediate mode API does not include out of the box.

Willow's tree is built from a set of node types, each serving a specific purpose:

  • Node is the base type. It has a transform and can hold children, but draws nothing on its own. Use it for grouping and containers.
  • Sprite draws a texture region. This is the workhorse for characters, backgrounds, tiles, and most visual elements.
  • TextNode renders text using bitmap fonts generated by Willow's fontgen tool.
  • TilemapNode renders large tile-based maps efficiently with built-in culling.
  • ParticleEmitter manages particle systems as part of the tree, so they inherit transforms and get culled like any other node.

When you change a property on any node (position, scale, alpha, visibility), Willow marks the affected subtree dirty. On the next frame, it resolves world transforms top down, culls off-screen nodes, sorts by z-index where needed, batches by texture atlas, and submits the minimal set of draw calls to Ebitengine's GPU backend.

Building a scene in Willow looks like this:

Building a Willow Scene

From here, scrolling the world is a single call to world.SetPosition(-cameraX, -cameraY). Adding enemies means creating sprites and calling world.AddChild(). The framework handles draw order, transform math, batching, and culling.

For a hands-on walkthrough, see Your First Scene.

When Immediate Mode Is Better

Display trees are not always the right choice. There are legitimate cases where immediate mode rendering is the better fit.

  • Very simple scenes. If your entire game is a single screen with a handful of sprites, a display tree adds abstraction without much payoff. A flat list of draw calls is easier to understand and debug.
  • Fully custom render pipelines. If you need precise control over every draw call, custom shaders on a per-object basis, or non-standard blending at each step, a display tree's automatic batching might work against you. Some rendering techniques require explicit control over submission order.
  • Learning how rendering works. If your goal is to understand what happens between your code and the pixels on screen, writing immediate mode draw calls teaches you more than using a framework that handles it for you. Display trees abstract away the details that you are trying to learn.
  • Data-driven visualization. Tools like debug overlays, profiling graphs, or procedurally generated content that changes shape every frame may not benefit from a persistent tree structure. Rebuilding the visual output from scratch each frame can be simpler than maintaining and diffing a tree.

There is no shame in choosing immediate mode for the right project. The question is whether the project will stay simple or grow into something that needs the structure a display tree provides.

Conclusion

Display trees trade a small amount of flexibility for a large amount of productivity. They give you transform inheritance, automatic draw ordering, alpha propagation, dirty flag optimization, batching, and culling as built-in behaviors rather than problems you solve yourself. The pattern has been refined across Flash, Starling, PixiJS, and dozens of other frameworks over more than twenty years. It is battle-tested.

If you are building anything beyond a simple demo; if your scene has layers, cameras, composite objects, or UI overlaid on gameplay; the structure of a display tree pays for itself quickly. You spend less time wrangling draw calls and more time building your game.

Willow brings this architecture to Go. If you are curious about why we built it and what problems it solves, read Why Willow.