Source: zarr-developers/zarr-extensions#43
Two array-to-array codecs for zarr v3, designed to work together for the common pattern of storing floating-point data as compressed integers.
Type: array -> array (does NOT change dtype)
Encode: out = (in - offset) * scale
Decode: out = (in / scale) + offset
offset(optional, float): scalar subtracted during encoding. Default: 0.scale(optional, float): scalar multiplied during encoding (after offset subtraction). Default: 1.
- Arithmetic uses the input array's own data type semantics (no implicit promotion).
- If neither scale nor offset is given,
configurationmay be omitted (codec is a no-op). - Fill value is transformed through the codec (encode direction).
- Only valid for real-number data types (int/uint/float families). Complex dtypes are rejected at validation time.
{"name": "scale_offset", "configuration": {"offset": 5, "scale": 0.1}}When both offset and scale are defaults: {"name": "scale_offset"} (no configuration key).
Type: array -> array (CHANGES dtype)
Purpose: Value-convert (not binary-reinterpret) array elements to a new data type.
data_type(required): target zarr v3 data type name (e.g."uint8","float32"). Internally stored as aZDTypeinstance, resolved viaget_data_type_from_json.rounding(optional): how to round when casting float to int. Values:"nearest-even"(default),"towards-zero","towards-positive","towards-negative","nearest-away".out_of_range(optional): what to do when a value is outside the target's range. Values:"clamp","wrap". If absent, out-of-range values raise an error."wrap"is only valid for integer target types.scalar_map(optional): explicit value overrides.{"encode": [[input, output], ...], "decode": [[input, output], ...]}. Applied BEFORE rounding/out_of_range. Each entry's source is deserialized using the source dtype and target using the target dtype (viaZDType.from_json_scalar), preserving full precision for both sides.
Dispatches on (src_type, tgt_type, has_map) where src/tgt are "int" or "float":
| Source | Target | scalar_map | Procedure |
|---|---|---|---|
| any | float | no | arr.astype(target_dtype) |
| int | float | yes | widen to float64, apply map, cast |
| float | float | yes | copy, apply map, cast |
| int | int | no | range check, then astype |
| int | int | yes | widen to int64, apply map, range check |
| float | int | any | widen to float64, apply map (if any), reject NaN/Inf, round, range check |
All casts are wrapped in np.errstate(over='raise', invalid='raise') to convert
numpy overflow/invalid warnings to hard errors.
- Only integer and floating-point dtypes are allowed (both source and target).
out_of_range='wrap'is rejected for non-integer target types.- Int-to-float casts are rejected if the float type's mantissa cannot exactly represent the full integer range (e.g. int64 -> float64 is rejected because float64 has only 52 mantissa bits, but int64 has values up to 2^63-1). Same check applies for the float-to-int decode direction.
- NaN: detected dynamically via
isinstance(src, (float, np.floating)) and np.isnan(src). NaN-to-integer casts error unlessscalar_mapprovides a mapping. Hex-encoded NaN strings (e.g."0x7fc00001") preserve NaN payloads per the zarr v3 spec. _check_int_rangehandles out-of-range integer values with clamp (vianp.clip) or wrap (via modular arithmetic).
- Cast using the same
_cast_arraypath as array elements, including scalar_map and rounding. - Done in
resolve_metadata, which also changes the chunk spec's dtype to the target.
{
"name": "cast_value",
"configuration": {
"data_type": "uint8",
"rounding": "nearest-even",
"out_of_range": "clamp",
"scalar_map": {
"encode": [["NaN", 0], ["+Infinity", 0], ["-Infinity", 0]],
"decode": [[0, "NaN"]]
}
}
}Only non-default fields are serialized (rounding and out_of_range are omitted when default).
{
"data_type": "float64",
"fill_value": "NaN",
"codecs": [
{"name": "scale_offset", "configuration": {"offset": -10, "scale": 0.1}},
{"name": "cast_value", "configuration": {
"data_type": "uint8",
"rounding": "nearest-even",
"scalar_map": {"encode": [["NaN", 0]], "decode": [[0, "NaN"]]}
}},
"bytes"
]
}src/zarr/codecs/scale_offset.py—ScaleOffsetclasssrc/zarr/codecs/cast_value.py—CastValueclass and casting helperstests/test_codecs/test_scale_offset.py— ScaleOffset teststests/test_codecs/test_cast_value.py— CastValue tests + combined pipeline tests
@dataclass(kw_only=True, frozen=True), subclassesArrayArrayCodec.- Uses
ScaleOffsetJSON(aNamedConfigTypedDict) for typed serialization. from_dictusesparse_named_configuration(data, "scale_offset", require_configuration=False).to_dictomits theconfigurationkey entirely when both offset=0 and scale=1.resolve_metadata: transforms fill_value via(fill - offset) * scale, dtype unchanged._encode_sync:(arr - offset) * scaleusing the array's own dtype._decode_sync:(arr / scale) + offsetusing the array's own dtype.is_fixed_size = True,compute_encoded_sizereturns input size unchanged.
@dataclass(frozen=True)with custom__init__(acceptsdata_type: str | ZDType).- Stores
dtype: ZDType(not a string). String data_type is resolved viaget_data_type_from_json. from_dictusesparse_named_configuration(data, "cast_value", require_configuration=True).to_dictserializes dtype viaself.dtype.to_json(zarr_format=3), only includes non-default rounding/out_of_range/scalar_map.resolve_metadata: casts fill value, changes chunk spec dtype to target._encode_sync/_decode_sync: delegate to_cast_array, threading the appropriate scalar_map direction ("encode" or "decode") and the correct src/tgt ZDType pair for scalar map deserialization.compute_encoded_size: scales bytarget_itemsize / source_itemsize.
_cast_array— public entry point, wraps_cast_array_implwithnp.errstate._cast_array_impl— match-based dispatch on(src_type, tgt_type, has_map)._check_int_range— integer range check with clamp/wrap/error._round_inplace— rounding dispatch (rint, trunc, ceil, floor, nearest-away)._apply_scalar_map— in-place value remapping with NaN-aware matching._parse_map_entries— deserializes scalar_map JSON using separate src/tgt ZDType instances._extract_raw_map— extracts "encode" or "decode" direction from ScalarMapJSON.
- Encode =
(in - offset) * scale(subtract, not add) — matches HDF5 and numcodecs. - No implicit precision promotion — arithmetic stays in the input dtype.
out_of_rangedefaults to error (not clamp).scalar_mapentries are typed: each side is deserialized with its own ZDType, so int64 scalars don't lose precision through float64 intermediaries.- Fill value is cast through the same
_cast_arraypath as data elements. - Int-to-float precision loss is caught at validate time (mantissa bit check).
- Runtime overflow/invalid is caught via
np.errstate(over='raise', invalid='raise').