Schema Stage
The Schema stage analyzes a table in the pipeline context and infers a lightweight schema. It samples rows to detect field types, nullability, and uniqueness, then writes a { fields: [...] } schema object back into the context without modifying the original data.
What the stage does
- Source field — Reads from a context field that must be an array of rows (arrays or objects). Errors if missing, not an array, or empty.
- Output field — Writes the inferred schema to a target context path (defaults to
{source}_schema when not set). - Sample size — Uses the first N rows (up to the total number of rows) to infer the schema; helps keep analysis fast on large tables.
- Field stats — For each column/key, tracks: detected types, whether it can be null/undefined, whether values are unique across the sample, and up to three example values.
- Output shape — Writes
{ fields: [{ field, types, isNullable, isUnique, examples }] } to the target field in the context. - No mutation — Does not change the source table; it only adds or updates the schema object at the target path.
Configure the Schema stage
- Choose the Source field (table to analyze) from the available context fields.
- Set an Output field (optional). If left blank, the stage will use
{source}_schema (for example, orders_schema for a source orders). - Adjust the Sample size if needed. Larger samples provide more accurate uniqueness/nullability detection, but take longer to scan.
- Click Run Stage (or Run All) to infer the schema. On success, the pipeline context will include the schema object under the chosen output field.
Example: infer customer schema
Source field: customers
- Sample size: 100
- Output field:
customers_schema
After running the stage, the context contains customers_schema with one entry per column, showing detected types (e.g., ["string"] or ["string", "null"]), whether the field is nullable, whether values are unique across the sample, and a few example values. You can then feed this into Validate rules or use it as documentation for downstream consumers.
Tips for using schema inference
- Run after cleaning: Use Transform first so types and nulls reflect your normalized data, not raw imports.
- Tune sample size: Increase it when you care about uniqueness or nullability across a large table; decrease it to speed up exploration.
- Pair with validation: Use the inferred schema as a reference for building Validate rules (types, required fields, uniqueness).
- Document pipelines: Keep the schema output in the context so other stages — or collaborators — can inspect how a table is shaped at each step of the pipeline.