batch#

This module wraps a subset of the GCP Batch functionalities that are useful and adequate for typical applications.

User code runs within a Docker container. Running a standalone script is not supported by this wrapper module.

class cloudly.gcp.batch.Job[source]#

Bases: object

classmethod create(name: str, config: JobConfig | dict) → Job[source]#: There are some restrictions on the form of name; see GCP doc for details or cloudly.gcp.compute. In addition, the batch name must be unique in the project and the region.

classmethod list(*, region: str) → list[Job][source]#

__init__(name_or_obj: str | Job, /)[source]#

name is like ‘projects/<project-id>/locations/<location>/jobs/<name>’. This can also be considered the job “ID” or “URI”. This is available from the object returned by create().

job is not necessary because it can be created if needed. It is accepted in case you already have it. See create(), list().

property create_time#

property update_time#

property definition: dict#

status() → JobStatus[source]#: The returned JobStatus object has some useful attributes you can access; see state() for an example.

state() → Literal['STATE_UNSPECIFIED', 'QUEUED', 'SCHEDULED', 'RUNNING', 'SUCCEEDED', 'FAILED', 'DELETION_IN_PROGRESS'][source]#

delete() → None[source]#

class cloudly.gcp.batch.JobConfig[source]#

Bases: object

class BootDisk[source]#

Bases: object

__init__(*, size_gb: int, disk_type: Literal['pd-balanced', 'pd-extreme', 'pd-ssd', 'pd-standard'] = 'pd-balanced', image: str | None = None)[source]#

image: ‘batch-debian’ seems to be a good value for GPUs; otherwise batch-cos may: also work well. Leave it at None until needed. See https://cloud.google.com/batch/docs/vm-os-environment-overview

property disk: Disk#

class LocalSSD[source]#

Bases: object

Local SSDs are attached to each worker node for use during the lifetime of the tasks. They are not “persistent” storage that lives beyond the batch job.

__init__(*, size_gb: int, device_name: str = 'local-ssd', mount_path: str = '/mnt', mode: Literal['ro', 'rw'] = 'rw')[source]#: size_gb should be a multiple of 375. If not, the next greater multiple of 375 will be used.

property disk: AttachedDisk#

property volume: Volume#

class GPU[source]#

Bases: object

__init__(*, gpu_type: str, gpu_count: int)[source]#: Use gcloud compute accelerator-types list to see valid values of gpu_type. Some examples: ‘nvidia-tesla-t4’, ‘nvidia-l4’, ‘nvidia-tesla-a100’, ‘nvidia-tesla-v100’

property accelerator: Accelerator#

classmethod task_group(*, task_spec: dict, task_count: int = 1, task_count_per_node: int = 1, parallelism: int | None = None, permissive_ssh: bool = True, **kwargs) → TaskGroup[source]#

Parameters:

task_count: Number of tasks to be created.
task_count_per_node: Number of tasks that can be running on a work node at any time.
parallelism: Number of tasks that can be running across all nodes at any time.

classmethod allocation_policy(*, region: str, labels: dict, network_uri: str | None = None, subnet_uri: str | None = None, machine_type: str, no_external_ip_address: bool = False, provisioning_model: Literal['standard', 'spot', 'preemptible'] = 'standard', boot_disk: dict | None = None, gpu: GPU | None = None, local_ssd_disk: LocalSSD | None = None, install_gpu_drivers: bool | None = None, **kwargs) → batch_v1.AllocationPolicy[source]#

Parameters:

region

Like ‘us-central1’, ‘us-west1’, etc.

network_uri, subnet_uri

If missing and no_external_ip_address is False, the default for the project in the specified region will be used.

If no_external_ip_address is True, then both must be provided.

See https://cloud.google.com/compute/docs/networking/network-overview and google.cloud.batch_v1.types.job.AllocationPolicy.NetworkInterface.

labels, gpu, local_ssd_disk

These are provided by JobConfig.__init__. User should not provide them directly.

classmethod labels(**kwargs) → dict[str, str][source]#

__init__(*, task_group: dict, allocation_policy: dict, labels: dict[str, str] | None = None, logs_policy: LogsPolicy | None = None, gpu: dict | None = None, local_ssd: dict | None = None, **kwargs)[source]#

property job: Job#

property definition: dict#

property region: str#