NumPy `dtype=object`

arrays are essentially a Python Sequence type with special performance characteristics: they’re mutable like lists but appending is expensive like tuples. They don’t provide the offloading of work from Python to C the way that non-object dtypes in NumPy do.

For your problem, the best you can do with `dtype=object`

is:

```
>>> values = np.array([0, 1, 2, 3, 4, 5, 6])
>>> array = np.array([
... values[0:2], values[2:3], values[3:4], values[5:7]],
... dtype=object,
... ).reshape(2, 2)
>>> array
array([[array([0, 1]), array([2])],
[array([3]), array([5, 6])]], dtype=object)
```

This at least stores the underlying values (`0, 1, 2, 3, 4, 5, 6, 7`

) contiguously. The nested “`arrays`

” are Python objects, and you can see that like this:

```
>>> np.frombuffer(array, dtype=np.intp)
array([140148166124112, 140148166124496, 140148166122096, 140148166122288])
```

These are pointers to `PyObject*`

s, not the contiguous underlying values. Another way to illustrate this is to use the CPython hack that `id`

returns the integer value of the `PyObject*`

pointer.

```
>>> [id(col) for row in array for col in row]
[140148166124112, 140148166124496, 140148166122096, 140148166122288]
```

Note that the pointers are not even increasing; they’re wherever `malloc`

found the space to put 112-byte `np.ndarray`

objects:

```
>>> [sys.getsizeof(col) for row in array for col in row]
[112, 112, 112, 112]
```

That may be fine for you, depending on how your use-case scales. These `dtype=object`

arrays are only a little better than the builtin `list`

type, but there are plenty of applications in which Python objects are performant enough.

If you are going to consider Awkward Array, here’s how you can do it: your data structure is regular 2×2 on the outside, with a variable-length array as the third dimension. You can build that like this:

```
>>> original = ak.to_regular([[[0, 1], [2]], [[3], [4, 5, 6]]], axis=1)
>>> original
<Array [[[0, 1], [2]], [[3], [4, ..., 6]]] type='2 * 2 * var * int64'>
```

The type is `2 * 2 * var * int64`

to reflect the regularity on the first two dimensions and the irregularity on the third. There’s no attribute called `shape`

on this array. In this particular case, one could imagine calling the shape `(2, 2, None)`

or `(2, 2, float("inf"))`

or `(2, 2, float("nan"))`

or something, but you can’t get all three items of the tuple to be integers. Most cases are more complex than this and don’t even have an obvious extension for “shape”, so that’s why it’s not even there.

If you have a data source that gives you the underlying `values`

(as in my NumPy example above) with list lengths, `2, 1, 1, 3`

, then you could use the ak.unflatten function to build it without Python iteration.

This array is different from the NumPy `dtype=object`

one in how it’s laid out in memory:

```
>>> original.layout
<RegularArray size='2' len='2'>
<content><ListOffsetArray len='4'>
<offsets><Index dtype='int64' len='5'>
[0 2 3 4 7]
</Index></offsets>
<content>
<NumpyArray dtype='int64' len='7'>[0 1 2 3 4 5 6]</NumpyArray>
</content>
</ListOffsetArray></content>
</RegularArray>
```

The `0, 1, 2, 3, 4, 5, 6, 7`

values are contiguous, as I made them in the NumPy case, but instead of allocating a `PyObject`

for each list, the lists are implicitly indicated with the `0, 2, 3, 4, 7`

offsets in a second buffer. No third buffer is needed for the outer dimension because it’s regular.

The computation you described is rather hard. There’s an ak.cartesian function for making Cartesian products of data from *different* arrays, but you want them all to come from different parts of the same array. That Cartesian product step can be performed as

```
>>> ak.cartesian(tuple(ak.flatten(original)), axis=0)
<Array [(0, 2, 3, 4), (...), ..., (1, 2, 3, 6)] type='6 * (int64, int64, in...'>
```

because `tuple`

iterates over the lists of the flattened `original`

, giving ak.cartesian the distinct arrays that it needs.

```
>>> tuple(ak.flatten(original))
(
<Array [0, 1] type='2 * int64'>,
<Array [2] type='1 * int64'>,
<Array [3] type='1 * int64'>,
<Array [4, 5, 6] type='3 * int64'>,
)
```

This step gives up on preventing the number of `PyObjects`

from scaling with the size of the data, which is why the NumPy `dtype=object`

is bad. (Each of the above `ak.Arrays`

is a `PyObject`

.) Nevertheless, we can keep going with some munging to get the final result that you want:

```
>>> ak.to_numpy(
... ak.concatenate(
... ak.unzip(
... ak.cartesian(
... tuple(ak.flatten(original)), axis=0
... )[:, np.newaxis]
... ),
... axis=1)
... ).reshape(-1, 2, 2)
array([[[0, 2],
[3, 4]],
[[0, 2],
[3, 5]],
[[0, 2],
[3, 6]],
[[1, 2],
[3, 4]],
[[1, 2],
[3, 5]],
[[1, 2],
[3, 6]]])
```

The scalability of this depends on whether “2×2” is a realistic size in your use-case. If it’s “2000000×2000000”, then no. But in that case, I don’t see how any Cartesian product would be feasible then, either.

Alternatively, you could fill the Cartesian product with Numba, since you can iterate over Awkward Arrays in Numba. That can reduce the memory overhead of intermediate arrays, but nothing can help if the output has to be huge.

```
import numba as nb
@nb.njit
def compute_cartesian(array):
num_rows = len(array)
num_cols = len(array[0]) # careful about len(array) == 0
num_combinations = 1
cumprod = np.empty(num_rows * num_cols + 1, dtype=np.int32)
cumprod[0] = 1
k = 1
for row in array:
for col in row:
cumprod[k] = cumprod[k - 1] * len(col)
k += 1
num_combinations = cumprod[-1]
output = np.empty(
(num_combinations, num_rows, num_cols),
dtype=np.int32,
)
for i, row in enumerate(array):
for j, col in enumerate(row):
index = i*len(row) + j
repeat = num_combinations // cumprod[index + 1]
tile = cumprod[index]
k = 0
for _ in range(tile):
for x in col:
for _ in range(repeat):
output[k, i, j] = x
k += 1
return output
```

and

```
>>> compute_cartesian(original)
array([[[0, 2],
[3, 4]],
[[0, 2],
[3, 5]],
[[0, 2],
[3, 6]],
[[1, 2],
[3, 4]],
[[1, 2],
[3, 5]],
[[1, 2],
[3, 6]]], dtype=int32)
```