Transform Stage
The Transform stage applies one or more operations to a table in your pipeline context. It reads rows from a source path, runs operations in order, and writes the result to a target path (defaulting to the source).
What the stage does
- Source / output paths — Pick a source field from context (table of rows). Write back to the same path or a new one to branch results.
- Row window — Apply operations to a start/end row range (inclusive) before writing output.
- Multiple operations — Run operations in the order listed; remove or reorder as needed.
- Supported operations:
- Set: Copy a field or set a literal into a target field.
- Remove field: Drop a column/field from each row.
- Rename field: Rename a field on each row.
- Convert type: Convert to string, number, boolean, date, or JSON-parse a string.
- String op: Uppercase, lowercase, trim, replace (find/replace), split, or join with a delimiter.
- Formula: Evaluate an expression per row with access to
row, context, and Math. - Deduplicate rows: Remove duplicates by fields; keep first or last occurrence.
- Group by: Group rows by fields and add aggregations (sum, average, count, list) with custom output names.
- Array or object rows — Works with row arrays (by index) or row objects (by property name).
Configure the Transform stage
- Choose the Source field (context path of the table to transform).
- Set an Output field (defaults to source) to overwrite or branch the result.
- Optionally set Start row and End row (inclusive indexes) to scope the transformation.
- Add operations:
- For Set/Rename/Remove/Convert/String/Formula: specify the source field (or expression), and a target field where applicable.
- For String ops: choose the operation; provide delimiter or search/replace text when needed.
- For Convert: choose the target type.
- For Deduplicate: provide fields to compare (comma separated; blank = all) and choose keep first/last.
- For Group by: set group fields, then add aggregations (field, operation, and output name).
- Click Run Stage to preview or Run All to run the pipeline. The stage logs progress and writes the updated table to the output path.
Example: normalize customer names
Source field: customers
Operations:
- String op: Lowercase on column 1 (Name) → target 1
- String op: Trim on column 1 (Name) → target 1
- Convert type: column 2 (CreatedAt) → Date
Output field: customers (overwrites). Downstream Filter or Visualize stages can now use normalized values.
Tips for reliable transforms
- Scope rows: Limit Start/End row to avoid accidentally transforming entire large tables when testing.
- Branch results: Write to a new output path to compare transformed vs. raw data in downstream Visualize stages.
- Validate after grouping: Use Validate to ensure grouped outputs have the expected schema.
- Be explicit with formulas: Reference fields via
row and use Math helpers; handle missing values in your expression where needed.