Discussion: dtype system and integrating record types

I've been looking at how record types can be integrated in rust-numpy and here's an unsorted collection of thoughts for discussion.

Let's look at `Element`:

```rust
pub unsafe trait Element: Clone + Send {
    const DATA_TYPE: DataType;
    fn is_same_type(dtype: &PyArrayDescr) -> bool;
    fn npy_type() -> NPY_TYPES { ... }
    fn get_dtype(py: Python) -> &PyArrayDescr { ... }
}
```

- `npy_type()` is used in `PyArray::new()` and the like. Instead, one should use `PyArray_NewFromDescr()` to make use of the custom descriptor. Should all places where `npy_type()` is used split between "simple type, use `New`" and "user type, use `NewFromDescr`"? Or, alternatively, should arrays always be constructed from descriptor? (in which case, `npy_type()` becomes redundant and should be removed)
- Why is `same_type()` needed at all? It is only used in `FromPyObject::extract` where one could simply use `PyArray_EquivTypes` (like it's done in pybind11). Isn't it largely redundant? (or does it exist for optimization purposes? In which case, is it even noticeable performance-wise?)
- `DATA_TYPE` constant is really only used to check if it's an object or not in 2 places, like this:
  ```rust
  if T::DATA_TYPE != DataType::Object
  ```
  Isn't this redundant as well? Given that one can always do
  ```rust
  T::get_dtype().get_datatype() != Some(DataType::Object)
  // or, can add something like: T::get_dtype().is_object()
  ```
- With all the notes above, `Element` essentially is just
  ```rust
   pub unsafe trait Element: Clone + Send {
       fn get_dtype(py: Python) -> &PyArrayDescr;
   }
   ```
- For structured types, do we want to stick the type descriptor into `DataType`? E.g.:
   ```rust
   enum DataType { ..., Record(RecordType) }
   ```
  Or, alternatively, just keep it as `DataType::Void`? In which case, how does one recover record type descriptor? (it can always be done through numpy C API of course, via `PyArrayDescr`).
-  In order to enable user-defined record dtypes, having to return `&PyArrayDescr` would probably require:
   -  Maintaining a global static thread-safe registry of registered dtypes (kind of like it's done in pybind11)
   -  Initializing this registry somewhere
   - Any other options?
- `Element` should probably be implemented for tuples and fixed-size arrays.
- In order to implement structured dtypes, we'll inevitably have to resort to proc-macros. A few random thoughts and examples of how it can be done (any suggestions?):
  - ```rust
    #[numpy(record)]
    #[derive(Clone, Copy)]
    #[repr(packed)]
    struct Foo { x: i32, u: Bar } // where Bar is a registered numpy dtype as well
    // dtype = [('x', '<i4'), ('u', ...)]
    ```
  - We probably have to require either of `#[repr(C)]`, `#[repr(packed)]` or `#[repr(transparent)]`
  - If repr is required, it can be an argument of the macro, e.g. `#[numpy(record, repr = "C")]`. (or not)
  - We also have to require `Copy`? (or not? technically, you could have object-type fields inside)
  - For wrapper types, we can allow something like this:
  - ```rust
    #[numpy(transparent)]
    #[repr(transparent)]
    struct Wrapper(pub i32);
    // dtype = '<i4'
    ```
  - For object types, the current suggestion in the docs is to implement a wrapper type and then impl `Element` for it manually. This seems largely redundant, given that the `DATA_TYPE` will always be `Object`. It would be nice if any `#[pyclass]`-wrapped types could automatically implement `Element`, but it would be impossible due to orphan rule. An alternative would be something like this:
    ```rust
    #[pyclass]
    #[numpy] // i.e., #[numpy(object)]
    struct Foo {}
    ```
  - How does one register dtypes for foreign (remote) types? I.e., `OrderedFloat<f32>` or `Wrapping<u64>` or some `PyClassFromOtherCrate`? We can try doing something like what serde does for remote types.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: dtype system and integrating record types #254

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discussion: dtype system and integrating record types #254

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions