Skip to content

storage: introduce StorageScalarExpr for postgres source casts#36047

Open
antiguru wants to merge 8 commits intoMaterializeInc:mainfrom
antiguru:storage_se
Open

storage: introduce StorageScalarExpr for postgres source casts#36047
antiguru wants to merge 8 commits intoMaterializeInc:mainfrom
antiguru:storage_se

Conversation

@antiguru
Copy link
Copy Markdown
Member

@antiguru antiguru commented Apr 13, 2026

Replace MirScalarExpr in PostgresSourceExportDetails.column_casts with a new closed-enum StorageScalarExpr in mz-storage-types.
Storage needs stable cast evaluation that does not change when the expression framework evolves.

  • Define StorageScalarExpr (4 variants: Column, Literal, CallUnary, ErrorIfNull) and CastFunc (24 cast variants) in mz-storage-types
  • Copy eval implementations from mz_expr CastStringTo* functions, delegating to the same strconv::parse_* functions
  • Rewrite generate_column_casts to build StorageScalarExpr directly from PG type metadata, bypassing the HIR/MIR planning pipeline
  • Update cast_row in storage to use StorageScalarExpr::eval
  • No migration needed — column_casts is never durably persisted, always recomputed at plan time
  • Parity tests assert structural equality of results and errors against MirScalarExpr
  • Error snapshot tests hardcode expected EvalError variants for each cast function

Stability contract

Error variants, error messages, and output Datum types produced by StorageScalarExpr::eval must not change across releases.
The eval implementations delegate to mz_repr::strconv::parse_*, which depend on these external crates whose version bumps may alter parse behavior:

  • chrono / chrono-tz — date, time, timestamp, timestamptz parsing
  • dec (decnumber-sys) — numeric parsing and rescaling
  • serde_json — jsonb parsing
  • uuid — uuid parsing
  • hex — bytea hex decoding
  • regex — float special-value detection (inf, NaN)
  • ordered-float — float Datum representation

Additionally, Materialize-internal crates control behavior for:

  • mz-repr (strconv, adt types) — core parsing logic
  • mz-pgtz — timezone resolution
  • mz-ore — lexing for container types (array, list, map, range)

antiguru and others added 4 commits April 13, 2026 18:04
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy the exact eval implementations from mz_expr's CastStringTo*
functions. Each CastFunc variant delegates to the same
strconv::parse_* functions to ensure bit-for-bit identical behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace MirScalarExpr with StorageScalarExpr in the full column cast
pipeline:

* Change PostgresSourceExportDetails.column_casts field type
* Rewrite generate_column_casts to build StorageScalarExpr directly
  from PG type metadata, bypassing plan_cast/lower_uncorrelated
* Update cast_row and SourceOutputInfo in storage to use the new type

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a `#[cfg(test)] mod tests` block to `src/storage-types/src/sources/casts.rs`
covering all categories: simple scalar casts (Bool, Int16, Int32, Int64, Float32,
Float64, Date, Uuid, Bytes, Jsonb), parameterized casts (Numeric with/without scale,
Timestamp with/without precision, Char, VarChar), null propagation, ErrorIfNull
(fires on null, passes through non-null), error cases (invalid input, fail_on_len),
Literal unpacking, and Column extraction.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

antiguru and others added 4 commits April 14, 2026 08:59
Handle Json as jsonb (matching old plan_cast behavior) and produce
TableContainsUningestableTypes errors for reg* types with proper
table/column context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 22 parity tests that compare StorageScalarExpr::eval against
MirScalarExpr::eval for identical inputs, asserting structural
equality of both Ok and Err results.

Also fix error handling for uningestable PG types (reg*, json):
produce TableContainsUningestableTypes with table/column context,
and treat json as jsonb.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Assert exact EvalError variants for each CastFunc on invalid input.
These tests lock down error behavior independently of MirScalarExpr —
parity tests can be removed in a follow-up PR while these remain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Error variants, error messages, and output Datum types must not change
across releases. Document external crate dependencies whose version
bumps could alter parse behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@antiguru antiguru marked this pull request as ready for review April 14, 2026 08:08
@antiguru antiguru requested review from a team as code owners April 14, 2026 08:08
@antiguru antiguru requested review from aljoscha and martykulma April 14, 2026 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant