Seq#
- class cloudly.util.seq.Seq[source]#
Bases:
Protocol[Element]The protocol
Seqis simpler and broader than the standardSequence. The former requires/provides only__len__,__getitem__, and__iter__, whereas the latter adds__contains__,__reversed__,indexandcountto these three. Although the extra methods can be implemented using the three basic methods, they could be massively inefficient in particular cases, and that is the case in the applications targeted bycloudly.biglist. For this reason, the classes defined in this package implement the protocolSeqrather thanSequence, to prevent the illusion that methods__contains__, etc., are usable.A class that implements this protocol is sized, iterable, and subscriptable by an int index. This is a subset of the methods provided by
Sequence. In particular,Sequenceimplements this protocol, hence is considered a subclass ofSeqfor type checking purposes:>>> from cloudly.util.seq import Seq >>> from collections.abc import Sequence >>> issubclass(Sequence, Seq) True
The built-in dict and tuple also implement the
Seqprotocol.The type parameter
Elementindicates the type of each data element.- __init__(*args, **kwargs)#
- class cloudly.util.seq.Slicer[source]#
Bases:
Seq[Element]The class
Slicertakes anySeqand provides__getitem__()that accepts a single index, or a slice, or a list of indices. A single-index access will return the requested element; the other two scenarios return a new Slicer via a zero-copy operation. To get all the elements out of a Slicer, either iterate it or call its methodcollect().A
Slicerobject makes “zero-copy”—it holds a reference to the underlyingSeqand keeps track of indices of the selected elements. ASlicerobject may be sliced again in a repeated “zoom in” fashion. Actual data elements are retrieved from the underlyingSeqonly when a single-element is accessed or iteration is performed. In other words, until an actual data element needs to be returned, it’s all operations on the indices.- __init__(list_: Seq[Element], range_: None | range | Seq[int] = None)[source]#
This provides a “slice” of, or “window” into,
list_.The selection of elements is represented by the optional
range_, which is eithe a range such asrange(3, 8), or a list of indices such as[1, 3, 5, 6]. Ifrange_isNone, the “window” covers the entirelist_. A common practice is to create aSlicerobject withoutrange_, and then access a slice of it, for example,Slicer(obj)[3:8]rather thanSlicer(obj, range(3,8)).During the use of this object, the underlying
list_must remain unchanged. Otherwise purplexing and surprising things may happen.
- __getitem__(idx: int | slice | Seq[int])[source]#
Element access by a single index, slice, or an index array. Negative index and standard slice syntax work as expected.
Single-index access returns the requested data element. Slice and index-array accesses return a new
Slicerobject, which, naturally, can be sliced again, like>>> x = list(range(30)) >>> Slicer(x)[[1, 3, 5, 6, 7, 8, 9, 13, 14]][::2][-2] 9
- property raw: Seq[Element]#
Return the underlying data
Seq, that is, thelist_that was passed into__init__().
- property range: None | range | Seq[int]#
Return the parameter
range_that was provided to__init__(), representing the selection of items in the underlyingSeq.
- collect() list[Element][source]#
Return a list containing the elements in the current window. This is equivalent to
list(self).This is often used to substantiate a small slice as a list, because a slice is still a
Slicerobject, which does not directly reveal the data items. For example,>>> x = list(range(30)) >>> Slicer(x)[3:11] <Slicer into 8/30 of [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]> >>> Slicer(x)[3:11].collect() [3, 4, 5, 6, 7, 8, 9, 10]
(A list is used for illustration. In reality, list supports slicing directly, hence would not need
Slicer.)Warning
Do not call this on “big” data!
- class cloudly.util.seq.Chain[source]#
Bases:
Seq[Element]This class tracks a series of
Seqobjects to provide random element access and iteration on the series as a whole, with zero-copy.Example
>>> from cloudly.util.seq import Chain >>> numbers = list(range(10)) >>> car_data <ExternalBiglist at '/tmp/edd9cefb-179b-46d2-8946-7dc8ae1bdc50' with 112 records in 2 data file(s) stored at ['/tmp/a/b/c/e']> >>> combined = Chain(numbers, car_data) >>> combined[3] 3 >>> combined[9] 9 >>> combined[10] {'make': 'ford', 'year': 1960, 'sales': 234} >>> >>> car_data[0] {'make': 'ford', 'year': 1960, 'sales': 234}
This class is in contrast with the standard itertools.chain, which takes iterables.