From Figma to Production Code Using AI Agents / Calum Baines

One of the biggest bottlenecks in frontend teams isn’t building components. It’s translating design into code consistently, accurately, and repeatedly.

I’ve been experimenting with a workflow that takes components directly from Figma and turns them into production-ready code using AI agents. The goal isn’t just speed. It’s creating a repeatable system that enforces quality across design, engineering, and accessibility.

This is how it works.

The idea

Instead of a single AI generating code in one go, the process is broken into a team of specialised agents — each with a clear responsibility:

A lead engineer plans the component and writes a spec
An engineer implements the component
A design reviewer compares the implementation against Figma
An accessibility reviewer validates usability and standards
A Figma designer syncs everything back

Each agent works independently with a single goal. This avoids blurred responsibilities and creates a built-in review loop.

The design system this runs against is Beach Towel, a multi-brand React component library built with Tailwind and design tokens — but the principles apply to any design system.

The governance layer

Before any agent touches code, there’s a human process that makes all of this possible.

Beach Towel is owned by a design system team, but it’s built for multiple product teams. Each owning team has a representative who reviews proposals, debates API decisions, and signs off on components before they enter the library.

Every component starts in Figma. Designers publish finalised components to the library. Engineers review those designs, agree on behaviour, and collaboratively write specs — what variants exist, what states need handling, how it connects to the token system.

Figma design finalised
        ↓
Design system owners + team representatives review
        ↓
Specs agreed — variants, states, token mapping, sub-components
        ↓
Implementation begins

Every component is explicitly linked to the tokens it uses and any sub-components it depends on. When a token changes, you know exactly what’s affected.

This governance layer is what gives the AI step its grounding. Agents aren’t making design decisions. They’re executing decisions that humans have already made.

Step 1

Prompt + context

An engineer opens Claude Code and provides a Figma URL. Claude Code connects to Figma via MCP (Model Context Protocol), giving it direct API access to the design file — no exporting, no copy-pasting specs. The agent reads variants, states, spacing, and colour values directly.

Before any code is written, the orchestrator collects:

Figma design context — exact variants, states, and visual properties from the Figma API
Available design tokens — the agreed token vocabulary (e.g. --bt-colour-primary, --bt-gap-400)
Existing component patterns — real code from the codebase as reference

The Figma URL is non-negotiable. It’s the source of truth for everything that follows.

Step 2

Planning with a spec

The lead engineer agent creates a structured set of planning documents before any code exists — a proposal, a design, capability specs, and a task list.

openspec/changes/button-component/
├── proposal.md    # Why, what changes, capabilities, impact
├── design.md      # Architecture, token mapping, API decisions
├── specs/         # Requirements and scenarios per capability
└── tasks.md       # Implementation checklist

The proposal captures why the component needs to exist and what it will change. The design records howit will be built — including the token-to-variant mapping and key architectural decisions:

Variant	Background	Text	Hover
Primary	`--bt-colour-primary`	`--bt-colour-tertiary`	`--bt-colour-primary-600`
Secondary	`--bt-colour-secondary`	`--bt-colour-white`	`--bt-colour-secondary-600`
Tertiary	transparent	`--bt-colour-secondary`	`--bt-colour-secondary-50`

Specs are written as testable scenarios:

### Scenario: Disabled button prevents interaction
- WHEN a Button is rendered with `isDisabled={true}`
- THEN it SHALL NOT fire `onClick` events
- AND it SHALL have `aria-disabled="true"` on the element

This isn’t an agent inventing requirements. It’s transcribing decisions the team already made — in a form both humans and agents can act on.

Step 3

Implementation

Once the spec is complete, the engineer agent works through the task list. This is where Claude Code’s Skills become important.

Skills are reusable instruction sets that encode project-specific rules: where files go, which patterns to follow, which anti-patterns to avoid, what the token naming convention is. They’re what makes output consistent across components — the first Button and the tenth form input follow the same conventions, not because an engineer reviewed both for style, but because the same Skill applied to both.

The output is split across focused files:

atoms/button/
├── button.tsx           # Component
├── button.types.ts      # Props, enums
├── button.constants.ts  # Variant classes, size maps
├── button.stories.tsx   # Storybook stories
├── button.test.tsx      # Behaviour tests
├── button.figma.tsx     # Figma Code Connect
└── index.ts

Variant styling lives in constants, not the component, keeping both readable:

button.constants.ts

export const variantClasses: Record<ButtonVariant, string> = {
  primary: cn(
    "bt:bg-primary bt:text-tertiary",
    "bt:hover:bg-primary-600",
    focusClasses
  ),
  secondary: cn(
    "bt:bg-secondary bt:text-white",
    "bt:hover:bg-secondary-600",
    focusClasses
  ),
  tertiary: cn(
    "bt:bg-transparent bt:text-secondary bt:border-2 bt:border-secondary",
    "bt:hover:border-secondary-600",
    focusClasses
  ),
  // ...
};

Step 4

Parallel reviews

As soon as implementation is complete, three agents unblock simultaneously:

Implement ──┬── Push API spec to Figma
            ├── Accessibility review  ──┬── Iterate & finalise
            └── Visual design review ──┘

The design revieweropens Storybook in a browser, fetches Figma screenshots via the API, and compares them side by side — checking colours, spacing, border radius, and states. Issues go directly to the engineer:

MISMATCH — tertiary-floating hover: border should be grey-300, not grey-200
MISMATCH — disabled link variant: use text-grey-700, not opacity-50
MATCH    — primary variant colours, padding, focus ring

The accessibility reviewerworks through a WCAG 2.1 AA checklist — semantic HTML, keyboard navigation, focus ring, ARIA attributes, icon accessibility, motion preferences. Findings go straight to the engineer with severity and specific fixes.

The Figma designer reads the final component API and pushes a Code Connect mapping back to the Figma file, so anyone inspecting the component in Figma sees an accurate, live code snippet.

Step 5

Storybook

Storybook is both the documentation layer and the testing surface. Every component and design token is documented here.

Stories cover every variant, state, and brand combination. Because all values reference CSS custom properties, brand switching is one attribute change:

button.stories.tsx

export const BrandComparison: Story = {
  render: () => (
    <div className="flex gap-8">
      <div>
        <Button variant="primary">Book now</Button>
      </div>
      <div data-brand="sunshine">
        <Button variant="primary">Book now</Button>
      </div>
    </div>
  ),
};

Automated testing runs across three dimensions:

Functionality — unit tests verify click handling, disabled state, loading state, and ARIA attributes
Visual regression — snapshot diffs catch unintended changes to tokens or classes before they reach production
Accessibility — automated axe checks run against every story, catching regressions after the initial build

A merged component isn’t reviewed once. It’s re-verified on every subsequent change.

What works well

The biggest shift isn’t speed — it’s structure.

Clear separation of responsibilities means the reviewer doesn’t decide architecture and the engineer doesn’t check visual accuracy. Built-in review loops catch issues that would normally surface in a PR review, or not at all. Parallel execution means there’s no waiting. And because agents are given the token vocabulary upfront and instructed never to use arbitrary values, the output is inherently consistent — bg-bt-primary not bg-[#FEDC07].

Design tokens are currently updated through a manual AI-assisted process, guided by a Skill that enforces the naming convention and token format. It’s consistent, but still human-triggered — automating it fully is on the roadmap.

Where it still needs work

Token correctness is a prerequisite.The agent can only be as accurate as the token vocabulary it’s given. Gaps surface during implementation and need human resolution.

Behaviour decisions still need humans. Should disabled use aria-disabled or the native attribute? Should the link variant render an <a>tag? These decisions need to be made once, recorded in the spec, and applied consistently. Agents follow decisions — they don’t make them.

Design reviewer access depends on tooling. The visual review works best when the agent can load Storybook in a browser and compare against Figma screenshots. Without that, it falls back to code inspection, which is less reliable for subtle spacing or radius differences.

Realistically, this gets a complex component to around 90% on the first pass. The last 10% — visual polish, edge cases, token gaps — is refinement a human should own anyway.

Phase 1: Where we’re starting

Rather than a full rollout, we’re beginning with a focused integration: a single page in one of our frontend applications. That scoped deployment validates the component library in a real product context without risk.

From there, the plan is incremental:

One page — validate tokens, rendering, and brand theming in production
Remaining pages in the same application — replace ad-hoc components with Beach Towel equivalents
Other applications — extend the library to other frontend surfaces

Each phase stress-tests the library against actual usage — interaction states that weren’t in the Figma spec, browser quirks that unit tests don’t catch.

Where this could go

Triggered from Figma publish.When a designer marks a component ready in Figma, the agent pipeline kicks off automatically. No handoff, no ticket. The component appears in Storybook — reviewed, tested, Code Connected — without an engineer initiating anything.

Automatic merge request creation.The workflow already produces tested, linted, type-checked code. Wrapping that in an MR — with a description generated from the planning docs — means engineers review rather than build.

Automated design token updates. A script that watches for Figma variable changes, runs the token build, and opens a PR with the diff is well within reach. Combined with visual regression tests, token changes become safe to automate.

Native app components.The library currently targets web only. React Native support for iOS and Android is a future phase — the token pipeline already outputs a format suitable for native. The agent workflow would need a native-specific Skill, but the planning and review loop would be identical.

At that point, the role shifts. Less time translating pixels into code. More time defining the rules, reviewing decisions, and handling the cases the system can’t handle on its own.

That’s a better use of the skill.

Beach Towel is a multi-brand React component library built at OTB Group.