JSON Schema Demystified: Understanding Schemas, Dialects, Vocabularies, and Metaschemas - Ian Duncan

If you’ve ever tried to dive into JSON Schema, you’ve probably encountered a wall of terminology that makes your head spin: schemas, metaschemas, dialects, vocabularies, keywords, anchors, dynamic references. It feels like the community invented new words for things that already had perfectly good names, just to make the rest of us feel inadequate.

I’ve been working on a Haskell JSON Schema library that’s actually fully spec-compliant, which meant I had to figure all of this out. The problem isn’t that the concepts are inherently difficult. The terminology creates artificial barriers to understanding.

This post will break down the key concepts in JSON Schema in a way that actually makes sense, connecting the dots between all these terms that seem designed to confuse. By the end, you’ll understand not just what these words mean, but how they fit together into a coherent system.

Starting simple

Before we dive into terminology, let’s look at what we’re actually trying to accomplish. JSON Schema is fundamentally about describing the shape and constraints of JSON data. Here’s a simple example:

{
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "age": { "type": "number", "minimum": 0 }
  },
  "required": ["name"]
}

This schema says: “I expect a JSON object with a string name field (required) and an optional numeric age field that must be non-negative.” Simple enough, right?

Now here’s where it gets interesting: this schema is itself valid JSON. And since JSON can describe the structure of JSON documents, we can describe the structure of schemas using more schemas. This recursive property is what gives rise to metaschemas, and where the terminology starts to get confusing.

What’s a schema anyway?

A schema is just a JSON document that describes constraints on other JSON documents. That’s it. The example above is a schema.

Schemas tell you what type a value should be (string, number, object, array), what values are allowed or disallowed, what properties must or may exist on an object, how many items should be in an array. When you write a schema, you’re essentially writing rules that say “valid JSON documents that I care about look like this.”

{
  "type": "string",
  "minLength": 1,
  "maxLength": 100
}

This schema says: “I want a string between 1 and 100 characters long.” Any JSON validator that understands JSON Schema can take this schema and your data and tell you whether your data follows the rules.

The confusing part is that schemas themselves are JSON documents. So naturally, you might ask: “What describes the structure of a schema?” And that leads us to the next layer.

Metaschemas: schemas all the way down

A metaschema is a schema that describes the structure of other schemas. The “schema of schemas,” if you will.

This sounds abstract and philosophical, but it’s actually quite practical. Remember how our simple schema used keywords like "type", "properties", and "minimum"? The metaschema defines what those keywords mean, what values they can have, and how they work together.

Here’s a tiny excerpt of what a metaschema might look like:

{
  "$id": "https://json-schema.org/draft/2020-12/schema",
  "type": ["object", "boolean"],
  "properties": {
    "type": {
      "anyOf": [
        { "enum": ["null", "boolean", "object", "array", "number", "string"] },
        { "type": "array", "items": { "$ref": "#/properties/type/anyOf/0" } }
      ]
    },
    "properties": {
      "type": "object",
      "additionalProperties": { "$dynamicRef": "#meta" }
    }
  }
}

This fragment says things like: “The type keyword can be a single type string or an array of type strings” and “The properties keyword should be an object where each value is itself a schema.”

Why does this matter? Well, you can validate that your schema is well-formed by checking it against the metaschema. If someone writes "type": "stirng" (typo!), the metaschema validation will catch it. The metaschema is also the formal specification of what’s allowed in schemas. Tools that process schemas (validators, code generators, documentation generators) use the metaschema to understand what they’re working with.

The relationship is simple: schemas validate data, metaschemas validate schemas.

Data → validated by → Schema → validated by → Metaschema

Here’s where it gets recursive: since a metaschema is also a schema (JSON describing JSON structure), it can validate itself. The JSON Schema metaschema is designed to be self-describing. This is similar to how a compiler written in its own language can compile itself (bootstrapping).

Dialects: when versions matter

So we have schemas and metaschemas. But JSON Schema has evolved over time. Different versions have added new keywords, changed behavior, and deprecated old features. How do we keep track of which version of JSON Schema we’re using?

A dialect is a specific version or flavor of JSON Schema, defined by a particular metaschema. When someone says they’re using “Draft 2020-12” or “Draft 7,” they’re referring to specific dialects.

Each dialect has its own metaschema that defines which keywords are available, its own set of behaviors and validation rules, and is identified by a URI (usually something like https://json-schema.org/draft/2020-12/schema).

You declare which dialect your schema uses with the $schema keyword:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "name": { "type": "string" }
  }
}

This tells validators: “Hey, interpret this schema according to the Draft 2020-12 rules.”

Different dialects can have different keywords and different behaviors. Draft 4 didn’t have the const keyword, but Draft 6 added it. The $id keyword worked differently in Draft 4 versus Draft 7. Draft 2019-09 introduced the concept of vocabularies (we’ll get to that).

If you write a schema using Draft 2020-12 features and someone tries to validate it with a Draft 4 validator, things won’t work correctly. The $schema keyword ensures everyone is on the same page.

Think of dialects like programming language versions. Python 2 and Python 3 are different dialects of Python. Your code needs to declare which one it’s written for, or chaos ensues.

Vocabularies: the modular twist

Here’s where JSON Schema gets really interesting (and where my initial confusion peaked). Starting with Draft 2019-09, JSON Schema introduced the concept of vocabularies.

A vocabulary is a named collection of keywords that work together to provide a specific kind of functionality. Instead of having one monolithic metaschema that defines all possible keywords, you can compose metaschemas from smaller, focused vocabularies.

Think of vocabularies as modules or packages. Each vocabulary provides a set of related keywords. The core vocabulary has fundamental keywords like $id, $schema, $ref, and $defs. The applicator vocabulary has keywords that apply schemas to different parts of the data like properties, items, and additionalProperties. The validation vocabulary has keywords for constraints like minimum, maxLength, pattern, and enum. The metadata vocabulary has keywords for human-readable information like title, description, and examples.

Here’s a schema using keywords from different vocabularies:

{
  "$id": "https://example.com/my-schema",
  "title": "User Name",
  "description": "The user's full name",
  "type": "string",
  "minLength": 5,
  "pattern": "^[A-Z]"
}

The $id comes from the core vocabulary, title and description from the metadata vocabulary, type from the applicator vocabulary, and minLength and pattern from the validation vocabulary.

Why vocabularies? They enable modularity and extensibility. You can pick and choose which vocabularies your dialect supports. Maybe you want validation but not format checking? Just include the vocabularies you need. You can define your own vocabulary with custom keywords specific to your domain. For example, a database schema dialect might add keywords like indexed or foreignKey. Each vocabulary is independently specified, making it easier to understand and implement different parts of JSON Schema.

Here’s how a metaschema declares which vocabularies it uses:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$vocabulary": {
    "https://json-schema.org/draft/2020-12/vocab/core": true,
    "https://json-schema.org/draft/2020-12/vocab/applicator": true,
    "https://json-schema.org/draft/2020-12/vocab/validation": true,
    "https://json-schema.org/draft/2020-12/vocab/meta-data": false
  }
}

The true versus false values indicate whether the vocabulary is required or optional. If a validator doesn’t understand a required vocabulary, it should refuse to process the schema. If it doesn’t understand an optional vocabulary, it can safely ignore those keywords.

Extending with your own keywords

Here’s where this all gets practical. Once you understand vocabularies, you realize you can extend JSON Schema with your own domain-specific keywords. This is incredibly powerful.

In fact, you’ve probably already used extended JSON Schema without realizing it. OpenAPI (the spec for describing REST APIs) is exactly this: JSON Schema extended with custom keywords for HTTP-specific concerns like operationId, responses, parameters, and so on. OpenAPI is JSON Schema plus a vocabulary for APIs. And you could extend OpenAPI further with your own vocabulary for framework-specific behaviors or company-specific conventions.

Say you’re building an API framework and you want to annotate your schemas with HTTP-specific metadata. Standard JSON Schema doesn’t have keywords for things like “this field comes from a query parameter” or “this response uses status code 201.” So you create your own vocabulary.

First, you define your custom keywords in a vocabulary document:

{
  "$id": "https://api.example.com/vocab/http",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$vocabulary": {
    "https://json-schema.org/draft/2020-12/vocab/core": true
  },
  "type": "object",
  "properties": {
    "httpSource": {
      "enum": ["query", "path", "header", "body"]
    },
    "httpStatus": {
      "type": "integer",
      "minimum": 100,
      "maximum": 599
    }
  }
}

This vocabulary document describes the structure of your custom keywords. Now you can use them in your schemas:

{
  "$schema": "https://api.example.com/schema",
  "$vocabulary": {
    "https://json-schema.org/draft/2020-12/vocab/core": true,
    "https://json-schema.org/draft/2020-12/vocab/validation": true,
    "https://api.example.com/vocab/http": true
  },
  "type": "object",
  "properties": {
    "userId": {
      "type": "string",
      "httpSource": "path",
      "pattern": "^[0-9]+$"
    },
    "filter": {
      "type": "string",
      "httpSource": "query"
    }
  }
}

Your validator needs to understand what to do with httpSource, of course. When it encounters a schema using your custom vocabulary, it checks whether it supports that vocabulary. If the vocabulary is marked as required and the validator doesn’t support it, validation should fail with an error saying “I don’t understand this vocabulary.” If it’s optional, the validator can safely ignore those keywords.

The beauty of this approach is that your extensions are explicit and discoverable. Someone reading your schema can see exactly which vocabularies it uses. A validator can definitively say whether it supports your schema or not. You’re not just stuffing random properties into schemas and hoping validators ignore them.

You can extend validation rules too. Maybe you’re working with database schemas and want to validate that certain string fields match database identifier conventions. You could define a custom keyword like dbIdentifier:

{
  "type": "string",
  "dbIdentifier": true,
  "description": "Must be a valid PostgreSQL identifier"
}

Your validator would implement the logic to check PostgreSQL identifier rules (no leading numbers, only certain special characters, length limits, etc.). Standard JSON Schema validators would ignore this keyword if you mark the vocabulary as optional, or refuse to process the schema if you mark it as required.

This extensibility is why JSON Schema has all this vocabulary machinery. It’s not just academic complexity for its own sake. The vocabulary system lets you build domain-specific validation languages on top of JSON Schema’s foundation, while maintaining clear boundaries about what’s standard and what’s custom.

Putting it all together

Let’s connect all the dots. You write a schema that describes your data structure (like a User object). The schema uses keywords like type, properties, and minimum to express constraints. These keywords are defined by vocabularies (the validation vocabulary, applicator vocabulary). The vocabularies are bundled into a dialect (like Draft 2020-12). The dialect is defined by a metaschema that describes which keywords are available and how they work. Your schema declares its dialect using the $schema keyword.

Here’s a visual:

Your Data (JSON)
    ↓ validated by
Your Schema (JSON)
    ↓ uses keywords from
Vocabularies (sets of related keywords)
    ↓ bundled into
Dialect (specific version/flavor)
    ↓ defined by
Metaschema (schema of schemas)

A concrete example showing all the layers:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/schemas/user",

  "title": "User",
  "description": "A registered user in the system",

  "type": "object",
  "properties": {
    "username": {
      "type": "string",
      "pattern": "^[a-zA-Z0-9_]+$",
      "minLength": 3,
      "maxLength": 20
    },
    "email": {
      "type": "string",
      "format": "email"
    },
    "age": {
      "type": "integer",
      "minimum": 13,
      "maximum": 120
    }
  },
  "required": ["username", "email"]
}

Breaking this down: $schema declares we’re using the Draft 2020-12 dialect. $id is a core vocabulary keyword that uniquely identifies this schema. title and description are metadata vocabulary keywords for documentation. type, properties, and required are applicator vocabulary keywords that apply constraints. pattern, minLength, minimum, and format are validation vocabulary keywords that enforce rules.

All of these keywords are defined in the Draft 2020-12 metaschema, which specifies their meaning and behavior.

Other terms you’ll encounter

While we’ve covered the big four (schema, metaschema, dialect, vocabulary), there are a few other terms worth understanding.

A keyword is a specific property name with defined semantics in a schema. Examples: type, properties, minimum, $ref. Keywords are the building blocks defined by vocabularies. Some keywords are universal (type, properties), while others are specific to certain vocabularies (contentMediaType from the content vocabulary, deprecated from the metadata vocabulary).

There’s also a distinction between annotations and assertions. Assertions are keywords that can make validation fail (like type, minimum, required, pattern). If your data violates an assertion, validation fails. Annotations are keywords that just provide information and never cause validation to fail (like title, description, examples, default). Annotations are useful for documentation and tooling but don’t affect validity. Some keywords can produce both annotations and assertions. For instance, properties asserts the types of the properties while also annotating which properties were validated.

Anchors provide named locations within a schema that you can reference. They’re like bookmarks:

{
  "$defs": {
    "address": {
      "$anchor": "addr",
      "type": "object",
      "properties": {
        "street": { "type": "string" }
      }
    }
  },
  "properties": {
    "billingAddress": { "$ref": "#addr" },
    "shippingAddress": { "$ref": "#addr" }
  }
}

Dynamic anchors ($dynamicAnchor and $dynamicRef) are more advanced. They allow references to be resolved differently depending on the “context” of evaluation. This is mostly useful for extending metaschemas and creating recursive schemas that can be overridden. Honestly, you can probably ignore dynamic anchors until you’re doing very advanced schema composition.

A bundled schema is a single schema document that contains multiple schema resources, usually via $defs. This is handy for distributing related schemas together:

{
  "$id": "https://example.com/schemas/bundle",
  "$defs": {
    "user": {
      "$id": "user",
      "type": "object",
      "properties": { "name": { "type": "string" } }
    },
    "product": {
      "$id": "product",
      "type": "object",
      "properties": { "title": { "type": "string" } }
    }
  }
}

Now you can reference https://example.com/schemas/user and https://example.com/schemas/product from other schemas, even though they’re defined in the same document.

Why is the terminology so confusing?

You might be wondering: why did they make this so complicated? The answer is that JSON Schema has evolved significantly over more than a decade, and the terminology evolved with it.

Early versions (Draft 3, Draft 4) had simpler, more monolithic metaschemas. As the specification matured, the community recognized the need for modularity, extensibility, and clearer versioning. That’s when concepts like dialects and vocabularies were formalized.

The terminology can feel academic because it comes from formal specification work. These are precise technical terms designed for specification writers and implementers, not necessarily for end users. Unfortunately, they leaked into the documentation that everyone reads, creating a steep learning curve.

But here’s the thing: you don’t need to think about most of this complexity to use JSON Schema effectively.

What do you actually need to know?

For 95% of JSON Schema usage, you need to understand that schemas describe data structure and constraints, $schema declares which version (dialect) you’re using, and keywords like type, properties, and minimum define your rules.

That’s it. You can write perfectly good schemas for years without ever thinking about metaschemas or vocabularies in depth.

The deeper concepts matter when you’re building tools that process schemas (validators, code generators), extending JSON Schema with custom keywords, working on the specification itself, or debugging complex reference resolution issues. For everyone else, just know that these concepts exist and form a coherent system. If you encounter them in documentation, you’ll know what they mean, but you probably won’t need to think about them day-to-day.

Some practical advice

Always specify $schema to make it explicit which dialect you’re using:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object"
}

This ensures validators interpret your schema correctly.

Start with the latest stable dialect. As of this writing, that’s Draft 2020-12. It has the most features and best tooling support. Don’t worry about older drafts unless you’re maintaining legacy schemas.

Use clear, descriptive metadata. Even though title, description, and examples don’t affect validation, they make your schemas much more useful:

{
  "type": "object",
  "title": "User Account",
  "description": "Represents a user account in the system",
  "properties": {
    "username": {
      "type": "string",
      "description": "Unique username for login (alphanumeric and underscores only)",
      "examples": ["john_doe", "alice123"]
    }
  }
}

For complex schemas, use $defs to break things into reusable pieces:

{
  "$defs": {
    "timestamp": {
      "type": "string",
      "format": "date-time"
    },
    "identifier": {
      "type": "string",
      "pattern": "^[a-z0-9-]+$"
    }
  },
  "type": "object",
  "properties": {
    "id": { "$ref": "#/$defs/identifier" },
    "createdAt": { "$ref": "#/$defs/timestamp" }
  }
}

If you’re ever confused about whether a keyword exists or how it works, check the metaschema. The Draft 2020-12 metaschema lives at: https://json-schema.org/draft/2020-12/schema

Test your schemas with online validators or schema testing tools to ensure they work as expected. The official JSON Schema website has a validator you can try: https://www.jsonschemavalidator.net/

Wrapping up

JSON Schema’s terminology can feel intimidating, but the core ideas are straightforward. Schemas describe data. Metaschemas describe schemas. Dialects are specific versions of JSON Schema. Vocabularies are modular collections of keywords. Keywords are the actual properties you use in schemas.

The terminology exists to support a powerful, extensible system for describing JSON data structures. But for everyday use, you can mostly ignore the academic terminology and focus on writing clear, useful schemas.

The next time you see “metaschema” or “vocabulary” in JSON Schema documentation, don’t panic. You know what these terms mean now, and more importantly, you understand how they fit together. That’s the real goal: building a mental model of how the system works, not memorizing definitions.

Now go forth and write some schemas. And remember: if you find yourself confused by JSON Schema terminology again, you’re not alone. The important thing is that underneath the jargon, there’s a well-designed system for a genuinely useful purpose.

Navigation