The problem with rendering HTML as images
You have a Next.js blog. Every post needs an OG image. You could design each one in Figma — or you could render your existing HTML as an image. This is the "html to image" problem, and it's deceptively hard.
The rendering part is straightforward. The hard part is everything around it: font loading race conditions, viewport sizing that doesn't match what you expect, memory leaks in long-running processes, and the gap between "works on my machine" and "works at 3am when the CI pipeline runs."
This guide walks through the three real approaches — client-side rendering, headless browsers, and rendering APIs — with production code, benchmarks, and the failure modes you'll hit at scale. I've built image generation pipelines that render 50K+ images/month, and most of the lessons here came from things breaking at 2am.
How browsers actually render HTML to pixels
Before picking a tool, you need to understand what's happening under the hood. Every approach to HTML-to-image conversion is essentially asking a browser engine to do its normal job (render pixels) and then intercept the result as a file.
The browser rendering pipeline:
Layout computes the geometry: where every box goes, how text wraps, what overflows. Paint fills in colors, borders, shadows, images. Composite handles layers, transforms, and opacity.
When you "screenshot" a page, you're capturing the output after compositing. The quality of your HTML-to-image tool depends on how faithfully it reproduces this pipeline.
This is where the three approaches diverge:
- html2canvas re-implements Layout + Paint in JavaScript on a
<canvas>element. It's a partial reimplementation — fast but inaccurate. - Puppeteer/Playwright launches an actual Chromium instance and captures the real composite output. Accurate but heavy.
- Rendering APIs run Chromium in managed infrastructure and return the result over HTTP. Accurate and lightweight (for you).
Approach 1: Client-side with html2canvas
How it actually works
html2canvas doesn't take a screenshot. It traverses the DOM, reads computed styles via getComputedStyle(), and manually redraws everything onto a <canvas> element using the Canvas 2D API.
This is a massive simplification. The actual html2canvas source is ~15,000 lines because CSS is incredibly complex. And even at 15K lines, it doesn't cover everything.
What breaks and why
CSS Grid: html2canvas doesn't implement the CSS Grid layout algorithm. Your grid-template-columns: repeat(3, 1fr) will render as a single column.
Web Fonts: The Canvas API uses ctx.font = "16px Inter" — but if the font hasn't loaded yet (or can't load due to CORS), Canvas silently falls back to the default serif font. There's no error.
Stacking context: z-index, position: fixed, transform, and opacity create stacking contexts that change the paint order. html2canvas handles some of these but not all — elements can appear in the wrong order.
The tl;dr: html2canvas is fine for "save this chart as PNG" in a dashboard. It is not fine for generating consistent, pixel-perfect images at scale.
When to actually use it
The one legitimate use case: you need to capture a DOM element that the user is currently looking at, client-side, without a server round-trip. Example: "Export this chart" button in a data dashboard.
Approach 2: Headless browsers (Puppeteer / Playwright)
The rendering is perfect. Everything else is the problem.
Puppeteer and Playwright launch real Chromium. The rendering fidelity is 100% — if Chrome can display it, Puppeteer can screenshot it. The issues are all operational.
A minimal working example
This works. Ship it to production and you'll discover these problems within a week:
Problem 1: Memory leaks
Each browser.newPage() allocates ~30-50MB. Each navigation loads fonts, images, and stylesheets into memory. If you're rendering 100 images/hour, you're cycling through 3-5GB of allocations.
Chromium's garbage collector is lazy. Memory isn't freed immediately when you close a page. In Node.js, the V8 GC and Chromium's GC don't coordinate — you get sawtooth memory patterns that eventually hit the container limit and OOM-kill.
The fix: browser pool with forced recycling.
Problem 2: Font loading races
waitUntil: 'networkidle0' waits until there are no network requests for 500ms. But fonts are loaded asynchronously, and the browser may start painting before the font file arrives. You get a screenshot with the system fallback font.
Even this isn't bulletproof. If the Google Fonts CDN is slow (it happens), networkidle0 fires before the font file arrives, and document.fonts.ready resolves with the fallback font already committed to the render tree.
A more robust approach:
Problem 3: Viewport vs content sizing
You set page.setViewport({ width: 1200, height: 630 }). You expect a 1200×630 screenshot. But page.screenshot() captures the full page by default — if your content overflows, the image is taller than 630px.
Or, ensure your root element has explicit dimensions:
Problem 4: Zombie processes
If your Node.js process crashes between browser.launch() and browser.close(), you get an orphaned Chromium process consuming 100-300MB of RAM. In a container environment, these accumulate until the container is killed.
The Docker problem
Puppeteer needs system-level dependencies that vary by distro. A minimal Puppeteer Docker image is 400-900MB.
Every Chromium version bump can break these dependencies. You'll discover this when your CI build fails on a Monday morning.
Approach 3: Rendering API
The API approach is conceptually simple: POST your HTML, GET an image URL. The rendering service runs Chromium, handles the font loading, memory management, and browser pooling for you.
Production-grade Node.js implementation
Error handling for production
Python implementation
Go implementation
When to use which
This isn't a "one-size-fits-all" decision. Here's my actual decision framework after building rendering pipelines across multiple projects:
Use html2canvas if:
- You're capturing a visible DOM element on the client
- CSS fidelity doesn't matter (the output is "good enough")
- You literally cannot make a server call
Use Puppeteer/Playwright if:
- You need to render pages you don't control (scraping, visual regression testing)
- You already have a container orchestration platform (K8s, ECS)
- You have a DevOps team that enjoys maintaining Chromium
Use a rendering API if:
- You're generating images from templates (OG images, social cards, certificates)
- You need consistent output across environments
- You don't want to own the rendering infrastructure
- You're building a product feature, not a science project
Real-world architecture: OG images at build time
Here's a concrete architecture for auto-generating OG images for a Next.js blog — the most common html-to-image use case:
This script runs during your build step. It only regenerates images for posts that changed (hash check). The manifest maps slugs to CDN URLs, which your layout component reads to set <meta og:image>.
Performance benchmarks
I tested all three approaches rendering the same 1200×630 HTML template (Inter font, gradient background, two text blocks, one image) on a 4-core machine:
| Metric | html2canvas | Puppeteer (cold) | Puppeteer (pooled) | Pictify API |
|---|---|---|---|---|
| First render | 180ms | 2,800ms | 2,800ms | 340ms |
| Subsequent (p50) | 120ms | 1,200ms | 380ms | 180ms |
| Subsequent (p99) | 400ms | 4,500ms | 1,100ms | 420ms |
| Memory per render | Client-side | 180MB | 45MB (shared) | 0 (your side) |
| Font accuracy | Wrong font 30% of time | Correct after explicit wait | Correct | Correct |
| CSS Grid | Broken | Correct | Correct | Correct |
The API approach has slightly higher first-render latency than html2canvas (network round-trip) but lower p99 because it doesn't depend on the client's CPU or Chromium's GC pauses.
Debugging checklist
When your HTML-to-image output doesn't look right, work through this:
- Fonts wrong? → Is the font loaded before the screenshot fires? Use
document.fonts.readyor the API (handles it automatically). - Blank image? → Your root element needs explicit
widthandheight. The renderer captures element dimensions, not the viewport. - Layout broken? → If using html2canvas, check CSS Grid/Flexbox compatibility. Switch to Puppeteer or API for full CSS support.
- Image too large (file size)? → Switch from PNG to JPG for photos/gradients. Reduce dimensions. Simplify CSS (shadows and blurs increase size).
- Colors off? → JPG compression shifts colors. Use PNG for exact color matching. Check if your gradient has too many color stops.
- External images missing? → The renderer needs network access to fetch external images. If images are behind auth or on a private network, inline them as base64 data URIs.
Next steps
- Try it now: HTML to PNG converter — paste HTML, get an image
- HTML to JPG converter — for photos and social cards
- Get an API key — start building
- API reference — full docs for the image endpoint
Built with Pictify — the image generation API for developers. No Puppeteer, no infra, no headaches.