Validate Stage
The Validate stage enforces data quality rules on a table in the pipeline context. It checks required fields, types, ranges, regex patterns, enums, cross-field comparisons, and uniqueness, then outputs a report or filtered rows.
What the stage does
- Source / target — Read from a context field; write results to a target (defaults to
{source}_validation). - Rules (per field) — Required, Non-empty string/array, Type (string/number/boolean/date/json), Min/Max (number), Min/Max length, Date min/max (ISO or date), Regex (with presets), Enum allowlist, Not-in blocklist, Compare fields (<, <=, >, >=, =, !=), Unique (composite keys).
- Severity — Each rule can be an Error or Warning.
- Output modes — Validation report, Only valid rows, or Only invalid rows.
- Row annotations — Optionally add validation messages and rule metadata into each output row.
- Presets — Quick schema presets (e.g., email regex, unique ID) to speed setup.
- Input expectations — Source must be an array of rows (arrays or objects); throws if missing or not an array.
Configure the Validate stage
- Choose the Source field (table to check) and set an Output field (defaults to
{source}_validation). - Pick an Output mode:
- Validation report: outputs rows with rule results.
- Only valid rows: filters to rows with no errors (warnings allowed).
- Only invalid rows: filters to rows that have errors.
- Optionally enable Add results to rows to annotate outputs with messages and metadata (field, rule, severity).
- Add rules (field index for arrays, or property for objects):
- Choose a Rule type (Required, Type, Regex, Enum, Compare, Unique, etc.).
- Set parameters per rule type (e.g., expected type, regex pattern/preset, allowed/disallowed lists, compare operator and field, min/max values or lengths, date bounds, unique fields).
- Choose Severity (Error/Warning) and an optional custom message.
- Use Schema presets if you need a quick start (e.g., required column + numeric second column, email format, unique ID + non-empty name).
- Click Run Stage to preview, or Run All to run the pipeline. The stage logs results and writes the chosen output (report/valid/invalid) to the target path.
Example: validate customers
Source field: customers
- Rule 1: Column 0 — Required (Error)
- Rule 2: Column 1 — Regex (Email preset), message: “Must be a valid email”
- Rule 3: Column 2 — Type Number (Warning)
- Output mode: Validation report
- Output field:
customers_validation
The stage produces a report with row-level messages. Switch to “Only valid rows” to feed clean data into Merge or Visualize stages.
Tips for reliable validation
- Normalize before validating: Use Transform to coerce types (dates, numbers) before running rules.
- Branch outputs: Keep the validation report separate from cleaned data so you can inspect issues without losing rows.
- Severity strategy: Use warnings for soft-checks (e.g., optional formatting), errors for blocking issues.
- Composite uniqueness: Provide multiple fields for unique constraints when one field isn’t enough.