Authoring schemas
One *.schema.yml file describes one search document — its root table, its fields, and how related tables fold in. This guide is the full reference for that format. For the deployment side (flusso.toml), see Configuration.
Quick reference
Fields are written type-first: - <type>: <name>, with siblings the type allows.
fields:
- keyword: email # scalar; document key `email`
required: true
- has_many: orders # join; folds a related table in
table: orders
foreign_key: user_id
primary_key: id
fields: [ { double: total } ]
- count: orderCount # aggregate; rolls a related table up
table: orders
foreign_key: user_id
| Need | Type keys | Jump to |
|---|---|---|
| A scalar column | text identifier keyword enum uuid boolean short integer long float double decimal date timestamp binary json custom | Types |
| Same-row sub-object | object | Objects |
| Dynamic-key object (translations, …) | map | Maps |
| A geographic point | geo | Geo points |
| Fold in a related table | belongs_to has_one has_many many_to_many | Joins |
| Roll a related table up | count sum avg min max ids | Aggregates |
| A fixed value | constant | Fields |
| Treat rows as deleted | top-level soft_delete | soft_delete |
| Index only a subset of rows | top-level filters | Root filters |
ℹ️ Info — Two JSON Schemas are the machine-readable source of truth for the file formats:
config.schema.jsonandindex.schema.yml. Point an editor at them for completion and inline validation.
*.schema.yml
Each schema file describes one search document: the root table it is built from, the fields it contains, and how related tables fold in.
Top-level keys
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
version | int | yes | — | Schema format version. Only 1 is supported. |
table | Postgres identifier | yes | — | The root table the document is built from. |
schema | Postgres identifier | no | public | The database schema the root table lives in. |
primary_key | Postgres identifier | no | — | The root table’s primary-key column. Used to derive the document id and to resolve which documents a related-row change affects. Relations and reverse-resolution require it. |
doc_id | string | no | — | Not supported yet — setting it is a hard error. The document _id is always derived from primary_key. (A non-pk _id needs the id value at delete time, which the pk-keyed tombstone path can’t supply; tracked as a follow-up.) |
soft_delete | object | no | — | Treat rows as deleted based on a column/field rather than a physical DELETE. See below. |
filters | list | no | — | Root filters: only root rows matching every filter become documents. See below. |
fields | list | yes | — | The document’s fields. See Fields. |
version: 1
table: users
schema: public
primary_key: id
soft_delete
When soft_delete is set, a row matching the soft-delete condition emits a
tombstone (a delete to the sink) instead of an upsert. Key it off either a
mapped field or a raw column, and optionally narrow it with when filters.
# Off a column: users.deleted = true → delete.
soft_delete:
column: deleted
# Off a mapped field, narrowed to a subset of rows.
soft_delete:
field: status
when:
- { column: archived, op: eq, value: true }
| Key | Required | Description |
|---|---|---|
column or field | exactly one | The column (Postgres identifier) or mapped field (field name) signalling deletion. |
when | no | A list of filters; the soft-delete applies only to matching rows. |
Root filters
When only a subset of a table should be an index, a top-level filters list
(same filter forms as joins use) scopes which root rows become
documents:
version: 1
table: item
primary_key: id
filters:
- { column: item_type, op: eq, value: serialized }
- { column: archived_at, op: is_null }
A row outside the set never produces a document; a row that leaves the set (an
UPDATE that stops matching) emits a tombstone on its next rebuild, exactly
like soft_delete — both fold into the document query’s WHERE,
so “no row came back” means “this document should not exist”. A row that enters
the set upserts. Backfill walks the whole root table and lets the same predicate
decide, so filtered-out rows cost a no-op delete during seeding.
Fields
fields is a list. Each item is written type-first: a single type key
whose value is the document key, plus the siblings that type allows.
fields:
- keyword: email # a `keyword` scalar; document key `email`
required: true
- text: bio # analyzed full text
required: false
- integer: age
required: false
The type key is one of:
- a scalar type —
text,identifier,keyword,enum,uuid,boolean,short,integer,long,float,double,decimal,date,timestamp,binary,json(see Types) — orcustom(an explicit Postgres/OpenSearch pair); geo— a geographic point (see Geo points);object— a same-row sub-object (see Objects);map— a dynamic-key object (see Maps);belongs_to/has_one/has_many/many_to_many— a related table folded in (see Joins);count/sum/avg/min/max— a rollup over a related table (see Aggregates);constant— a fixed value.
There is exactly one type key per field. Which siblings a field accepts depends on that type:
| Sibling | Applies to | Description |
|---|---|---|
required | scalar, geo, map, to-one join | true forces a scalar/geo/map leaf non-null (nullable otherwise); on a to-one join (belongs_to/has_one) it maps the object non-null — see Joins. Rejected on to-many joins and aggregates (their nullability is structural). |
column | scalar, geo, belongs_to | The source column — for a belongs_to, this table’s column pointing at the related row. Defaults to the document key when omitted. |
options | types with a mapping | Extra OpenSearch mapping properties merged beside the derived type (e.g. analyzer, format, scaling_factor). |
transforms | scalar | Value transforms to apply. See Transforms. |
default | scalar | Value to coalesce a null column to. Must be a scalar (string, number, bool, or date) — an array/object/binary default is a hard error. |
postgres / opensearch | custom | The Postgres types accepted and the OpenSearch type emitted. |
lat / lon | geo | The two coordinate columns (two-column form). |
values | map | Mandatory — the leaf type shared by every value (a leaf kind: text/keyword/number/date). See Maps. |
fields | object, joins | The nested projection. |
table, primary_key, column/foreign_key/through, order_by, filters, limit | joins | Which key sibling applies depends on the verb. See Joins. |
table, column, value_type, element_type, foreign_key, through, filters | aggregates | See Aggregates. |
value | constant | The fixed value (null/absent renders as JSON null). |
fields:
# column source, renamed + transformed + defaulted
- keyword: email
column: email_address
required: false
transforms: [lowercase, trim]
default: "unknown@example.com"
Objects
An object nests sibling columns of the same row under one document key,
without reading a related table. It renders as an OpenSearch object and is never
null; its members declare their own types.
- object: address
fields:
- keyword: street
column: address_street
required: true
- keyword: city
column: address_city
required: true
- keyword: zip
column: address_zip
required: false
→ { "address": { "street": …, "city": …, "zip": … } }, all from one row.
An object differs from a to-one join (belongs_to/has_one): the
join reads a related table by key, an object stays put on the current row.
Optional options pass extra properties to the object mapping.
Maps
A map is a dynamic-key object over a json/jsonb column: the keys are
runtime-determined, but every value shares one leaf type. The motivating case is
translations — {"en": "…", "it": "…"} with an open-ended language set, where
declaring each language as its own field would defeat the purpose.
- map: title
values: text # the leaf type of every value (required)
required: true
values must be a leaf kind — text/identifier (text), keyword/enum/
uuid (keyword), a numeric, or date/timestamp. boolean, binary, json,
geo, and custom are rejected. The field maps to an OpenSearch object with
dynamic: true (injected automatically — an explicit dynamic in options
wins), so unmapped keys are accepted and stay searchable rather than rejected by
the index’s dynamic: strict. column defaults to the document key.
On the query side a map gets a typed handle (flusso-query): .key("it")
returns a fully-typed leaf of the declared kind, and a text map also offers a
cross-key .search(..) with per-key preference. See Querying.
Types
A scalar field declares its type from a fixed set. Each type bridges a
Postgres column type and an OpenSearch mapping type, so the schema describes the
document fully — flusso derives the index mapping (and validates a config) without
a database. Shorthand fields and columns with no type default to keyword.
type | Postgres | OpenSearch | Notes |
|---|---|---|---|
text | text, varchar | text | Analyzed natural-language full text (descriptions, bios) — the default analyzed type. Plain tokenize + accent/case fold. |
identifier | text, varchar | text | Analyzed identifier-like short text (names, SKUs, codes, statuses) — splits on punctuation/case so C-01234 is found by C01234, c-01234, or 01234. |
keyword | text, varchar | keyword | Exact, aggregatable. |
enum | text, varchar, PG enum | keyword | A closed string set stored as text, indexed exactly. |
uuid | uuid | keyword | |
boolean | boolean | boolean | |
short | smallint / int2 | short | |
integer | integer / int4 | integer | |
long | bigint / int8 | long | |
float | real / float4 | float | |
double | double precision / float8 | double | |
decimal | numeric / money | double | Lossy; use a custom scaled_float when exactness matters. |
date | date | date | |
timestamp | timestamp(tz), time | date | |
binary | bytea | binary | |
json | json, jsonb | object |
(A geographic point is a geo field, not a scalar type — see
Geo points.)
For anything the named types don’t cover, declare a custom field with the
OpenSearch type and the Postgres types it accepts:
- custom: price
postgres: [numeric]
opensearch: scaled_float
required: false
options: { scaling_factor: 100 }
options carries any extra OpenSearch mapping properties (analyzers, formats, …)
merged beside the derived type. Objects, joins, aggregates, and geo points carry
their own type keys rather than a scalar type; their shape is structural.
Production-ready defaults. The OpenSearch sink does not emit your
text/keywordfields bare. By default it attaches a strong analyzer and a set of subfields (keyword,keyword_lowercase,text) so search, exact filtering, and case-insensitive sort all work out of the box — see Index analysis & subfields. Anything inoptionsoverrides the default for that field.
text vs identifier
Both are analyzed (full-text searchable) text fields; they differ only in the
analyzer:
text— natural language (descriptions, bios, comments). The default analyzed type; tokenizes on word boundaries with accent/case folding.identifier— short structured strings (names, SKUs, codes, statuses). Splits on punctuation/case soC-01234is found byC01234,c-01234, or01234.
fields:
- text: bio # natural-language analyzer + default subfields
required: false
- identifier: sku # punctuation/case-splitting analyzer + default subfields
required: false
(Use keyword instead for exact match, sort, or aggregation rather than full-text
search.) Both apply only to scalar column fields, and an explicit analyzer in
options always wins over the type’s default. The analyzers themselves are
documented in
Index analysis & subfields.
Geo points
A geo field is a geographic point → OpenSearch geo_point. Two forms:
Two columns — a latitude and a longitude column assembled into a point. A
missing coordinate makes the whole point null (never {lat: null, lon: null},
which OpenSearch rejects):
- geo: location
lat: latitude
lon: longitude
required: false
Single column — a column already holding a geo_point-shaped value: a
json/jsonb {"lat": …, "lon": …} or [lon, lat], or a text "lat,lon":
- geo: location
column: location_json
required: false
PostGIS geometry and PG-native point aren’t accepted directly (they serialize
as WKB / (x,y), which OpenSearch won’t take); expose a generated jsonb/text
column in one of the shapes above. The two-column form needs no such column —
flusso assembles the point in the document query.
Transforms
A list applied in sequence to a column value before it lands in the document:
| Transform | Effect |
|---|---|
lowercase | Lowercase the string value. |
trim | Strip leading/trailing whitespace. |
- keyword: email
required: false
transforms: [trim, lowercase]
Joins
Fold rows from a related table into the document as nested documents. The join’s relationship verb is its type key, and the verb names which table holds the key.
| Type key | The key lives on… | Reads as | Renders as |
|---|---|---|---|
belongs_to | this table (column) | “my column points at the related row” | object (nullable; required: true → non-null) |
has_one | the related table (foreign_key) | “one related row points back at me” | object (nullable; required: true → non-null) |
has_many | the related table (foreign_key) | “many related rows point back at me” | nested array (never null) |
many_to_many | a junction table (through) | “we connect through a junction” | nested array (never null) |
The fields — the projection from each related row — and the related table’s
primary_key are siblings of the type key. The field reading that primary_key
is marked non-null automatically, like the root primary_key.
# My column points at them: embed the user a `created_by` column references.
# `column` defaults to the field name — here, the FK column IS `created_by`.
- belongs_to: created_by
table: users
primary_key: id
fields:
- keyword: email
required: true
- text: name
required: false
# Their column points at me: fold in the rows holding my key.
- has_many: orders
table: orders
foreign_key: user_id
primary_key: id
order_by:
- { column: created_at, direction: desc }
limit: 5
fields:
- integer: id
required: false
- double: total
required: true
- keyword: status
required: true
| Key | Type | Required | Description |
|---|---|---|---|
| (type key) | field name | yes | belongs_to, has_one, has_many, or many_to_many; its value is the document key. |
table | Postgres identifier | yes | The related table. |
primary_key | Postgres identifier | yes | The related table’s primary key. The projected field reading it is forced non-null. |
column | Postgres identifier | belongs_to only | This table’s column pointing at the related row. Defaults to the field name (so belongs_to: created_by reads the created_by column). |
foreign_key | Postgres identifier | has_one/has_many | The related table’s column pointing back at the parent. |
through | object | many_to_many | A junction table. |
required | bool | to-one only | belongs_to/has_one only. true maps the object non-null; omitted means nullable. Rejected on has_many/many_to_many (always a non-null array). |
filters | list | no | Filters narrowing which related rows are folded in. |
order_by | list | no | Ordering — a list of { column, direction }, where direction is asc (default) or desc. Not allowed on belongs_to (its target is unique); on has_one it picks which row becomes the object. |
limit | int ≥ 1 | has_many/many_to_many only | Cap the number of related rows folded in (the to-one verbs imply their own LIMIT 1). |
fields | list | yes | The fields projected from each related row. |
Key arity rule: a join takes exactly the key sibling its verb implies —
column for belongs_to, foreign_key for has_one/has_many, through for
many_to_many. Anything else is a load-time error naming the right one.
A belongs_to target that changes — or is deleted — re-emits every document
pointing at it: flusso finds the referrers on the parent table itself
(WHERE column = <changed key>), so a deleted target rebuilds those documents with
a null object rather than leaving them stale.
The through object (junction table for many-to-many):
| Key | Required | Description |
|---|---|---|
table | yes | The junction table. |
left_key | yes | Column joining the junction to the parent. |
right_key | yes | Column joining the junction to the related table. |
- many_to_many: tags
table: tags
through:
table: post_tags
left_key: post_id
right_key: tag_id
primary_key: id
fields:
- keyword: name
required: true
Aggregates
Reduce rows from a related table to a single value. The operation is the type
key: count, sum, avg, min, max, or ids.
A count is always a non-null long and an avg a nullable double, so they
take no value_type. A sum/min/max mirrors the aggregated column, so it
must declare a column and a value_type.
- count: orderCount
table: orders
foreign_key: user_id
- sum: lifetimeValue
table: orders
column: total
value_type: decimal
foreign_key: user_id
filters:
- { column: status, op: eq, value: paid }
| Key | Type | Required | Description |
|---|---|---|---|
| (type key) | field name | yes | count/sum/avg/min/max/ids; its value is the document key. |
table | Postgres identifier | yes | The related table. |
column | Postgres identifier | conditional | The column to reduce. Required for sum/avg/min/max; not used by count/ids. |
value_type | type name | conditional | The result type. Required for sum/min/max (it mirrors the column); not used by count/avg/ids. |
element_type | type name | conditional | Required for ids (and only ids): the scalar type of each collected primary key — long or keyword. |
foreign_key | Postgres identifier | conditional | The aggregated table’s column pointing back at the parent (exactly one of foreign_key xor through). |
through | object | conditional | Junction table for aggregating across many-to-many. |
filters | list | no | Filters restricting which rows count. |
ids — a flat array of a related table’s primary keys
ids collects the related table’s primary key into a flat scalar array (it
takes no column — the key is always the related table’s PK). OpenSearch has no
array type, so the field’s mapping type is just the element type
(element_type: long → type: long, keyword → type: keyword); the value is
multi-valued. An empty relation yields [], never null, so the field is non-null
(project it as a bare Vec<…>, not Option<Vec<…>>).
# one-to-many: orders.user_id points back at this row
- ids: orderIds
table: orders
foreign_key: user_id
element_type: long
# many-to-many: collected straight off the junction's right_key
- ids: tagIds
table: tags
through: { table: post_tags, left_key: post_id, right_key: tag_id }
element_type: long
Filters
Filters narrow which related rows a join or aggregate sees, which rows a
soft_delete applies to, and — as the top-level filters key —
which root rows become documents at all. Three forms:
Raw SQL — an escape hatch for conditions the structured forms can’t express:
- { raw: "amount > 0 AND currency = 'USD'" }
Null check — no value operand:
- { column: deleted_at, op: is_null }
- { column: confirmed_at, op: is_not_null }
Value comparison — operator plus a value whose shape matches its arity:
- { column: status, op: eq, value: paid }
- { column: status, op: in, value: [paid, shipped] }
- { column: total, op: between, value: [10, 100] }
op | Value shape |
|---|---|
eq, neq, lt, lte, gt, gte, like, ilike | a single scalar |
in, not_in | a list |
between | a list of exactly two values [lower, upper] |
is_null, is_not_null | (no value) |
A value op with a missing value, a list op given a scalar, or a between with
other than two values is a load-time error.
Conventions
Identifiers
Two distinct identifier rules apply depending on what is being named:
| Rule | Applies to | Pattern | Notes |
|---|---|---|---|
| Postgres identifier | table, column, schema, index, and sink names | ^[a-z_][a-z0-9_]*$, max 63 chars | Lowercased and trimmed on load, matching Postgres’ folding of unquoted identifiers. A name that isn’t a valid identifier this way must be addressed explicitly (e.g. set column:). |
| Field name | the document key a field lands under (field:) | ^[a-zA-Z_][a-zA-Z0-9_]*$, max 63 chars | Case is preserved — field: orderCount stays camelCase in the emitted document. Only trimmed. |
The split is deliberate: the value comes from a Postgres column (lowercase identifier) but lands under a document key you choose (which may be camelCase to suit the search index).
Validation, in one place
Loading enforces — beyond what the file format itself can express — that:
- the schema
versionis supported (only1); - all table/column/schema/index/sink names are valid Postgres identifiers, and field names are valid field-name identifiers;
- each field has exactly one type key, and only the siblings that type allows;
- a join carries exactly the key sibling its verb implies —
columnforbelongs_to(defaulting to the field name),foreign_keyforhas_one/has_many,throughformany_to_many— and the to-one verbs take nolimit(nororder_by, forbelongs_to); - an aggregate specifies exactly one of
foreign_keyorthrough; sum/avg/min/maxaggregates carry acolumn, andsum/min/maxalso declare avalue_type(it mirrors the column);- an
idsaggregate declares anelement_type(a scalar type) and takes nocolumnorvalue_type;element_typeis rejected on every other op; - a
geofield gives eitherlatandlon, or a singlecolumn; - a
betweenfilter has exactly two values, andin/not_inget a list.
A failure at any of these stops the load with a specific error naming the cause.
None of it needs a database. When the source is reachable, flusso check
additionally confirms each declared type and nullability against the live columns
and reports any disagreement.
A complete example
users.schema.yml:
version: 1
table: users
schema: public
primary_key: id
soft_delete:
column: deleted
fields:
- integer: id
required: false
- keyword: email
required: true
transforms: [lowercase, trim]
- text: name
required: false
- has_many: orders
table: orders
foreign_key: user_id
primary_key: id
order_by:
- { column: id, direction: asc }
fields:
- integer: id
required: false
- double: total
required: true
- keyword: status
required: true
- count: orderCount
table: orders
foreign_key: user_id
A change to a users row — or to any of that user’s orders — rebuilds the whole
users document and re-emits it to every sink. Setting users.deleted = true
emits a tombstone instead.