GridscriptDonate

Validate Stage

The Validate stage enforces data quality rules on a table in the pipeline context. It checks required fields, types, ranges, regex patterns, enums, cross-field comparisons, and uniqueness, then outputs a report or filtered rows.

What the stage does

  • Source / target — Read from a context field; write results to a target (defaults to {source}_validation).
  • Rules (per field) — Required, Non-empty string/array, Type (string/number/boolean/date/json), Min/Max (number), Min/Max length, Date min/max (ISO or date), Regex (with presets), Enum allowlist, Not-in blocklist, Compare fields (<, <=, >, >=, =, !=), Unique (composite keys).
  • Severity — Each rule can be an Error or Warning.
  • Output modes — Validation report, Only valid rows, or Only invalid rows.
  • Row annotations — Optionally add validation messages and rule metadata into each output row.
  • Presets — Quick schema presets (e.g., email regex, unique ID) to speed setup.
  • Input expectations — Source must be an array of rows (arrays or objects); throws if missing or not an array.

Configure the Validate stage

  1. Choose the Source field (table to check) and set an Output field (defaults to {source}_validation).
  2. Pick an Output mode:
    • Validation report: outputs rows with rule results.
    • Only valid rows: filters to rows with no errors (warnings allowed).
    • Only invalid rows: filters to rows that have errors.
  3. Optionally enable Add results to rows to annotate outputs with messages and metadata (field, rule, severity).
  4. Add rules (field index for arrays, or property for objects):
    • Choose a Rule type (Required, Type, Regex, Enum, Compare, Unique, etc.).
    • Set parameters per rule type (e.g., expected type, regex pattern/preset, allowed/disallowed lists, compare operator and field, min/max values or lengths, date bounds, unique fields).
    • Choose Severity (Error/Warning) and an optional custom message.
  5. Use Schema presets if you need a quick start (e.g., required column + numeric second column, email format, unique ID + non-empty name).
  6. Click Run Stage to preview, or Run All to run the pipeline. The stage logs results and writes the chosen output (report/valid/invalid) to the target path.

Example: validate customers

Source field: customers

  • Rule 1: Column 0 — Required (Error)
  • Rule 2: Column 1 — Regex (Email preset), message: “Must be a valid email”
  • Rule 3: Column 2 — Type Number (Warning)
  • Output mode: Validation report
  • Output field: customers_validation

The stage produces a report with row-level messages. Switch to “Only valid rows” to feed clean data into Merge or Visualize stages.

Tips for reliable validation

  • Normalize before validating: Use Transform to coerce types (dates, numbers) before running rules.
  • Branch outputs: Keep the validation report separate from cleaned data so you can inspect issues without losing rows.
  • Severity strategy: Use warnings for soft-checks (e.g., optional formatting), errors for blocking issues.
  • Composite uniqueness: Provide multiple fields for unique constraints when one field isn’t enough.