MakeMyStats
Blog
Back to blog

JSON Schema validation — when and how to use it

A practical guide to JSON Schema: what it solves, how to write schemas for real data, common pitfalls, and a worked example you can try in your browser.

Try /json-schema-validator

JSON Schema validation — when and how to use it

You have an API that accepts JSON. A client sends something unexpected — a missing field, a string where you expected a number, an array with 10,000 items when you expected 5 — and your code throws a cryptic error three function calls deep. You could write a pile of if statements to check every field. Or you could describe what valid data looks like, once, and let a validator do the work.

That's what JSON Schema does. It's a vocabulary for describing the shape of JSON data, plus a set of rules for checking whether a given document matches that shape. The schema itself is JSON, so it's easy to store, version, and share.

When JSON Schema earns its keep

Not every project needs a schema. If you control both the producer and consumer of the data, and the shape is simple, inline checks might be fine. But there are a few situations where schemas pay for themselves quickly.

API request validation. You receive JSON from an external client — a mobile app, a webhook, a third-party integration. You don't control what they send. A schema at the boundary catches malformed payloads before they reach your business logic.

Configuration files. Your app reads a JSON or YAML config at startup. A schema validates the config before anything else runs, so a typo in a key name doesn't surface as a mysterious runtime failure twenty minutes later.

Data interchange. Two teams agree on a data contract. The schema is the contract. Both sides can validate independently, and disagreements become "your output doesn't match the schema" instead of "something broke and nobody knows why."

Form validation. Some UI libraries (like react-jsonschema-form) generate forms directly from schemas. Even without auto-generation, a schema is a useful spec for what your frontend should enforce.

Core keywords, briefly

A JSON Schema is an object with keywords that constrain the data. Here are the ones you'll use 90% of the time.

type — the most basic constraint. Accepts "string", "number", "integer", "boolean", "null", "object", and "array".

{ "type": "string" }

This schema matches "hello" and rejects 42, true, and null.

properties + required — for objects. properties defines the expected keys and their schemas. required lists which of those keys must be present.

{
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "age": { "type": "integer" }
  },
  "required": ["name"]
}

Here, name must exist and be a string. age is optional, but if present, it must be an integer.

items — for arrays. Constrains what each element looks like.

{
  "type": "array",
  "items": { "type": "number" },
  "minItems": 1
}

Matches [1, 2, 3], rejects [] (too few items) and ["a"] (wrong item type).

enum — restricts a value to a fixed set.

{ "type": "string", "enum": ["draft", "published", "archived"] }

minimum, maximum, minLength, maxLength, pattern — numeric and string bounds.

{
  "type": "string",
  "minLength": 1,
  "maxLength": 255,
  "pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+$"
}

That's a rough email pattern. (Don't use it in production — real email validation is famously subtle.)

A worked example

Say you're building an API for a task manager. Tasks look like this:

{
  "id": "task-001",
  "title": "Write the docs",
  "status": "in_progress",
  "assignee": {
    "name": "Alex",
    "email": "[email protected]"
  },
  "tags": ["documentation", "v2"],
  "priority": 3,
  "due": "2025-05-01"
}

Here's a schema for it:

{
  "type": "object",
  "required": ["id", "title", "status", "priority"],
  "properties": {
    "id": {
      "type": "string",
      "pattern": "^task-[0-9]+$"
    },
    "title": {
      "type": "string",
      "minLength": 1,
      "maxLength": 200
    },
    "status": {
      "type": "string",
      "enum": ["todo", "in_progress", "done", "blocked"]
    },
    "assignee": {
      "type": "object",
      "required": ["name"],
      "properties": {
        "name": { "type": "string" },
        "email": { "type": "string" }
      }
    },
    "tags": {
      "type": "array",
      "items": { "type": "string" },
      "uniqueItems": true
    },
    "priority": {
      "type": "integer",
      "minimum": 1,
      "maximum": 5
    },
    "due": {
      "type": "string",
      "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"
    }
  },
  "additionalProperties": false
}

Walk through it piece by piece:

  • id must match task-NNN — the regex catches IDs like task-001 but rejects TASK-001 or 001.
  • title is required, between 1 and 200 characters. No empty strings.
  • status is an enum — four allowed values. If someone sends "Status": "active", it fails validation with a clear error.
  • assignee is optional (not in required), but if present, must be an object with at least a name.
  • tags is an array of unique strings. Duplicates are rejected.
  • priority is an integer from 1 to 5. A value of 0 or 6 or 3.5 would fail.
  • due is a string matching a date pattern. This doesn't verify the date is real (February 31st would pass), but it catches obviously wrong formats.
  • additionalProperties: false rejects any keys not listed in properties. This is strict — but it's exactly what you want at an API boundary. Typos in key names get caught immediately.

You can paste both the schema and the task JSON into the JSON Schema Validator to see this in action. Try removing a required field, changing priority to 6, or adding an unknown key — the validator shows you the exact path and keyword that failed.

Common pitfalls

Forgetting additionalProperties. By default, JSON Schema allows extra properties on objects. If a client sends {"naem": "Alex"} (note the typo), a schema without additionalProperties: false will happily accept it — name is not required, so its absence is fine, and naem is just an extra property. The document is "valid" but the data is wrong.

required is on the parent, not the property. This trips up almost everyone on their first schema. You don't write "name": { "type": "string", "required": true }. Instead, required is a sibling of properties on the containing object, and it's an array of key names. The reason: required is about presence, not about type constraints.

Confusing type: "number" and type: "integer". In JSON Schema, "number" matches any numeric value, including 3.14. "integer" only matches whole numbers. If your field is a count, an ID, or an array index, you probably want "integer".

Over-constraining strings. A pattern regex matches a substring by default. If you want the entire string to match, anchor it: "^value$". Without anchors, "pattern": "[0-9]+" matches "abc123def" because it contains a substring of digits.

Ignoring composition. JSON Schema has allOf, anyOf, and oneOf for combining schemas. These are powerful when your data has variant shapes. For example, a notification payload might be oneOf a message notification, a friend request, or a system alert — each with different required fields. Trying to express that without composition leads to overly permissive schemas.

Validation strategies

Strict at the boundary, lenient inside. Validate incoming data with additionalProperties: false and tight constraints. Once the data is inside your system and has passed validation, you can trust its shape and skip redundant checks.

Version your schemas. As your data shape evolves, old schemas don't disappear. If clients might send v1 or v2 payloads, use oneOf or a discriminator field ("version": 1) to validate against the right schema.

Use $ref for reuse. When the same structure appears in multiple places (like an "address" object in both a user and a company), $ref lets you define it once and reference it elsewhere. This keeps schemas DRY and makes changes propagate consistently.

Generate, don't hand-write (when possible). If you already have TypeScript types, tools like typescript-json-schema or zod-to-json-schema can generate schemas from your source of truth. Hand-written schemas drift from code over time; generated ones don't.

Try it yourself

The fastest way to learn JSON Schema is to experiment with real data. The JSON Schema Validator lets you paste a schema and a document side by side and see validation results instantly — no server, no signup, everything runs in your browser. Start with the example above, break things on purpose, and read the error messages. That feedback loop teaches more than any specification document.

JSON Schema isn't glamorous, but it's one of those tools that quietly prevents entire categories of bugs. Write the schema once, and you stop debugging malformed data by hand.