batch#
This module wraps a subset of the GCP Batch functionalities that are useful and adequate for typical applications.
User code runs within a Docker container. Running a standalone script is not supported by this wrapper module.
- class cloudly.gcp.batch.Job[source]#
Bases:
object- classmethod create(name: str, config: JobConfig | dict) Job[source]#
There are some restrictions on the form of name; see GCP doc for details or cloudly.gcp.compute. In addition, the batch name must be unique in the project and the region.
- __init__(name_or_obj: str | Job, /)[source]#
name is like ‘projects/<project-id>/locations/<location>/jobs/<name>’. This can also be considered the job “ID” or “URI”. This is available from the object returned by
create().job is not necessary because it can be created if needed. It is accepted in case you already have it. See
create(),list().
- property create_time#
- property update_time#
- property definition: dict#
- status() JobStatus[source]#
The returned JobStatus object has some useful attributes you can access; see
state()for an example.
- class cloudly.gcp.batch.JobConfig[source]#
Bases:
object- class BootDisk[source]#
Bases:
object- __init__(*, size_gb: int, disk_type: Literal['pd-balanced', 'pd-extreme', 'pd-ssd', 'pd-standard'] = 'pd-balanced', image: str | None = None)[source]#
- image: ‘batch-debian’ seems to be a good value for GPUs; otherwise batch-cos may
also work well. Leave it at None until needed. See https://cloud.google.com/batch/docs/vm-os-environment-overview
- property disk: Disk#
- class LocalSSD[source]#
Bases:
objectLocal SSDs are attached to each worker node for use during the lifetime of the tasks. They are not “persistent” storage that lives beyond the batch job.
- __init__(*, size_gb: int, device_name: str = 'local-ssd', mount_path: str = '/mnt', mode: Literal['ro', 'rw'] = 'rw')[source]#
size_gb should be a multiple of 375. If not, the next greater multiple of 375 will be used.
- property disk: AttachedDisk#
- property volume: Volume#
- class GPU[source]#
Bases:
object- __init__(*, gpu_type: str, gpu_count: int)[source]#
Use gcloud compute accelerator-types list to see valid values of gpu_type. Some examples: ‘nvidia-tesla-t4’, ‘nvidia-l4’, ‘nvidia-tesla-a100’, ‘nvidia-tesla-v100’
- property accelerator: Accelerator#
- classmethod task_group(*, task_spec: dict, task_count: int = 1, task_count_per_node: int = 1, parallelism: int | None = None, permissive_ssh: bool = True, **kwargs) TaskGroup[source]#
- Parameters:
- task_count
Number of tasks to be created.
- task_count_per_node
Number of tasks that can be running on a work node at any time.
- parallelism
Number of tasks that can be running across all nodes at any time.
- classmethod allocation_policy(*, region: str, labels: dict, network_uri: str | None = None, subnet_uri: str | None = None, machine_type: str, no_external_ip_address: bool = False, provisioning_model: Literal['standard', 'spot', 'preemptible'] = 'standard', boot_disk: dict | None = None, gpu: GPU | None = None, local_ssd_disk: LocalSSD | None = None, install_gpu_drivers: bool | None = None, **kwargs) batch_v1.AllocationPolicy[source]#
- Parameters:
- region
Like ‘us-central1’, ‘us-west1’, etc.
- network_uri, subnet_uri
If missing and no_external_ip_address is False, the default for the project in the specified region will be used.
If no_external_ip_address is True, then both must be provided.
See https://cloud.google.com/compute/docs/networking/network-overview and google.cloud.batch_v1.types.job.AllocationPolicy.NetworkInterface.
- labels, gpu, local_ssd_disk
These are provided by JobConfig.__init__. User should not provide them directly.
- __init__(*, task_group: dict, allocation_policy: dict, labels: dict[str, str] | None = None, logs_policy: LogsPolicy | None = None, gpu: dict | None = None, local_ssd: dict | None = None, **kwargs)[source]#
- property job: Job#
- property definition: dict#
- property region: str#