Seq#

class cloudly.util.seq.Seq[source]#

Bases: Protocol[Element]

The protocol Seq is simpler and broader than the standard Sequence. The former requires/provides only __len__, __getitem__, and __iter__, whereas the latter adds __contains__, __reversed__, index and count to these three. Although the extra methods can be implemented using the three basic methods, they could be massively inefficient in particular cases, and that is the case in the applications targeted by cloudly.biglist. For this reason, the classes defined in this package implement the protocol Seq rather than Sequence, to prevent the illusion that methods __contains__, etc., are usable.

A class that implements this protocol is sized, iterable, and subscriptable by an int index. This is a subset of the methods provided by Sequence. In particular, Sequence implements this protocol, hence is considered a subclass of Seq for type checking purposes:

>>> from cloudly.util.seq import Seq
>>> from collections.abc import Sequence
>>> issubclass(Sequence, Seq)
True

The built-in dict and tuple also implement the Seq protocol.

The type parameter Element indicates the type of each data element.

__len__() → int[source]#

__getitem__(index: int) → Element[source]#

__iter__() → Iterator[Element][source]#

__init__(*args, **kwargs)#

class cloudly.util.seq.Slicer[source]#

Bases: Seq[Element]

The class Slicer takes any Seq and provides __getitem__() that accepts a single index, or a slice, or a list of indices. A single-index access will return the requested element; the other two scenarios return a new Slicer via a zero-copy operation. To get all the elements out of a Slicer, either iterate it or call its method collect().

A Slicer object makes “zero-copy”—it holds a reference to the underlying Seq and keeps track of indices of the selected elements. A Slicer object may be sliced again in a repeated “zoom in” fashion. Actual data elements are retrieved from the underlying Seq only when a single-element is accessed or iteration is performed. In other words, until an actual data element needs to be returned, it’s all operations on the indices.

__init__(list_: Seq[Element], range_: None | range | Seq[int] = None)[source]#

This provides a “slice” of, or “window” into, list_.

The selection of elements is represented by the optional range_, which is eithe a range such as range(3, 8), or a list of indices such as [1, 3, 5, 6]. If range_ is None, the “window” covers the entire list_. A common practice is to create a Slicer object without range_, and then access a slice of it, for example, Slicer(obj)[3:8] rather than Slicer(obj, range(3,8)).

During the use of this object, the underlying list_ must remain unchanged. Otherwise purplexing and surprising things may happen.

__len__() → int[source]#: Number of elements in the current window or “slice”.

__getitem__(idx: int | slice | Seq[int])[source]#

Element access by a single index, slice, or an index array. Negative index and standard slice syntax work as expected.

Single-index access returns the requested data element. Slice and index-array accesses return a new Slicer object, which, naturally, can be sliced again, like

>>> x = list(range(30))
>>> Slicer(x)[[1, 3, 5, 6, 7, 8, 9, 13, 14]][::2][-2]
9

__iter__() → Iterator[Element][source]#: Iterate over the elements in the current window or “slice”.

property raw: Seq[Element]#: Return the underlying data Seq, that is, the list_ that was passed into __init__().

property range: None | range | Seq[int]#: Return the parameter range_ that was provided to __init__(), representing the selection of items in the underlying Seq.

collect() → list[Element][source]#

Return a list containing the elements in the current window. This is equivalent to list(self).

This is often used to substantiate a small slice as a list, because a slice is still a Slicer object, which does not directly reveal the data items. For example,

>>> x = list(range(30))
>>> Slicer(x)[3:11]
<Slicer into 8/30 of [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]>
>>> Slicer(x)[3:11].collect()
[3, 4, 5, 6, 7, 8, 9, 10]

(A list is used for illustration. In reality, list supports slicing directly, hence would not need Slicer.)

Warning

Do not call this on “big” data!

class cloudly.util.seq.Chain[source]#

Bases: Seq[Element]

This class tracks a series of Seq objects to provide random element access and iteration on the series as a whole, with zero-copy.

Example

>>> from cloudly.util.seq import Chain
>>> numbers = list(range(10))
>>> car_data  
<ExternalBiglist at '/tmp/edd9cefb-179b-46d2-8946-7dc8ae1bdc50' with 112 records in 2 data file(s) stored at ['/tmp/a/b/c/e']>
>>> combined = Chain(numbers, car_data)
>>> combined[3]
3
>>> combined[9]
9
>>> combined[10]
{'make': 'ford', 'year': 1960, 'sales': 234}
>>>
>>> car_data[0]
{'make': 'ford', 'year': 1960, 'sales': 234}

This class is in contrast with the standard itertools.chain, which takes iterables.

__init__(list_: Seq[Element], *lists: Seq[Element])[source]#

__len__() → int[source]#

__getitem__(idx: int) → Element[source]#

__iter__() → Iterator[Element][source]#

property raw: tuple[Seq[Element], ...]#

Return the underlying list of Seqs.

A member Seq could be a Slicer. The current method does not follow a Slicer to its “raw” component, b/c that could represent a different set of elements than the Slicer object.