Filed here because issues are disabled on Embucket/iceberg-rust. The fix lives in that fork.
Summary
iceberg-rust/src/arrow/transform.rs::transform_arrow() implements Transform::Truncate(N) for Int16, Int32, and Int64 but not for DataType::Utf8. Any write path against a table partitioned by truncate(N, string_column) falls through the match to the catchall and fails with:
```
Arrow error: Compute error: Failed to perform transform for datatype
```
The Iceberg spec defines `truncate(W, s)` on strings as "take the first W bytes" — straightforward to implement using `StringArray::iter()` and slicing, mirroring the existing `Utf8 + Bucket` arm a few lines above.
Repro
Create any Iceberg table partitioned by `truncate(N, some_string_column)` (e.g. via Athena with `partitioning = ARRAY['truncate(10, id)']` where `id` is `STRING`), then run any INSERT or MERGE through `datafusion_iceberg`. Seen while verifying Embucket/iceberg-rust#57 against a set of Athena-created probe tables on S3 Tables.
Location
`iceberg-rust/src/arrow/transform.rs` — add a `(DataType::Utf8, Transform::Truncate(m))` arm to `transform_arrow()` returning a `StringArray` of truncated values. Should follow the nullable-preserving style of the existing `(DataType::Utf8, Transform::Bucket(m))` arm.
Related
Unmasked once Embucket/iceberg-rust#57 landed — before that, the projection schema mismatch on partitioned targets short-circuited every MERGE before the transform compute ran.
Summary
iceberg-rust/src/arrow/transform.rs::transform_arrow()implementsTransform::Truncate(N)forInt16,Int32, andInt64but not forDataType::Utf8. Any write path against a table partitioned bytruncate(N, string_column)falls through the match to the catchall and fails with:```
Arrow error: Compute error: Failed to perform transform for datatype
```
The Iceberg spec defines `truncate(W, s)` on strings as "take the first W bytes" — straightforward to implement using `StringArray::iter()` and slicing, mirroring the existing `Utf8 + Bucket` arm a few lines above.
Repro
Create any Iceberg table partitioned by `truncate(N, some_string_column)` (e.g. via Athena with `partitioning = ARRAY['truncate(10, id)']` where `id` is `STRING`), then run any INSERT or MERGE through `datafusion_iceberg`. Seen while verifying Embucket/iceberg-rust#57 against a set of Athena-created probe tables on S3 Tables.
Location
`iceberg-rust/src/arrow/transform.rs` — add a `(DataType::Utf8, Transform::Truncate(m))` arm to `transform_arrow()` returning a `StringArray` of truncated values. Should follow the nullable-preserving style of the existing `(DataType::Utf8, Transform::Bucket(m))` arm.
Related
Unmasked once Embucket/iceberg-rust#57 landed — before that, the projection schema mismatch on partitioned targets short-circuited every MERGE before the transform compute ran.