Skip to content

[iceberg-rust] transform_arrow lacks a Utf8 + Truncate branch #124

@rampage644

Description

@rampage644

Filed here because issues are disabled on Embucket/iceberg-rust. The fix lives in that fork.

Summary

iceberg-rust/src/arrow/transform.rs::transform_arrow() implements Transform::Truncate(N) for Int16, Int32, and Int64 but not for DataType::Utf8. Any write path against a table partitioned by truncate(N, string_column) falls through the match to the catchall and fails with:

```
Arrow error: Compute error: Failed to perform transform for datatype
```

The Iceberg spec defines `truncate(W, s)` on strings as "take the first W bytes" — straightforward to implement using `StringArray::iter()` and slicing, mirroring the existing `Utf8 + Bucket` arm a few lines above.

Repro

Create any Iceberg table partitioned by `truncate(N, some_string_column)` (e.g. via Athena with `partitioning = ARRAY['truncate(10, id)']` where `id` is `STRING`), then run any INSERT or MERGE through `datafusion_iceberg`. Seen while verifying Embucket/iceberg-rust#57 against a set of Athena-created probe tables on S3 Tables.

Location

`iceberg-rust/src/arrow/transform.rs` — add a `(DataType::Utf8, Transform::Truncate(m))` arm to `transform_arrow()` returning a `StringArray` of truncated values. Should follow the nullable-preserving style of the existing `(DataType::Utf8, Transform::Bucket(m))` arm.

Related

Unmasked once Embucket/iceberg-rust#57 landed — before that, the projection schema mismatch on partitioned targets short-circuited every MERGE before the transform compute ran.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions