Site Adapter

The site adapters provide interfaces to various Cloud APIs and batch systems in order to allow a on-demand provisioning of resources and a dynamic orchestration of pre-built VM images and containers.

Sites are generally configured in the Sites configuration block. One has to specify a site name, the adapter to use and a site quota in units of cores. Negative values for the site quota are interpreted as infinity. Optionally a minimum lifetime in seconds of the Drone can be specified. This is defined as the time the Drone remains in AvailableState before draining it. If no value is given, infinite lifetime is assumed. Multiple sites are supported by using SequenceNodes.

Note

Even if a minimum lifetime is set, it is not guaranteed that the Drone is not drained due to its dropping demand before its minimum lifetime is exceeded.

Generic Site Adapter Configuration

Available configuration options

Option	Short Description	Requirement
name	Name of the site	Required
adapter	Site adapter to use. Adapter will be auto-imported (class name without Adapter)	Required
quota	Core quota to be used for this site. Negative values are interpreted as infinity	Required
drone_heartbeat_interval	Time in seconds between two consecutive executions of `tardis.resources.drone.run()`. Defaults to 60s.	Optional
drone_minimum_lifetime	Time in seconds the drone will remain in `AvailableState` before draining it.	Optional

For each site in the Sites configuration block. A site specific configuration block carrying the site name has to be added to the configuration as well.

The site specific MappingNode contains site adapter specific configuration options that you can find below in the particular site adapter documentation.

In addition, it is required to add the following MappingNodes:

MachineTypes containing a SequenceNode of available machine types to be supported at the given site.
MachineTypeConfiguration a MappingNode for each machine type containing machine type specific configurations, details can be found below in the particular site adapter documentation.
MachineTypeMetaData containing a MappingNode for each machine type specifying the amount of Cores, Memory and Disk available

Note

The amount of memory and disk space is always specified in units of Gigabytes (GB) in TARDIS. The amount of cores is equivalent to the number of single core job slots provided by a machine.

Example configuration

Sites:
  - name: MySiteName_1
    adapter: MyAdapter2Use
    quota: 123
    drone_heartbeat_interval: 10
    drone_minimum_lifetime: 3600
  - name: MySiteName_2
    adapter: OtherAdapter2Use
    quota: 987

MySiteName_1:
  general_adapter_option: something
  MachineTypes:
    - Micro
    - Fat
  MachineTypeConfiguration:
    Micro:
      machine_type_specific_option_1: 124234-1245-1345-15
      machine_type_specific_option_2: 4583453-3245-345-2345
    Fat:
      machine_type_specific_option_1: 0034532-345-234-2341
      machine_type_specific_option_2: 1345-134-1345-134-1
  MachineMetaData:
    Micro:
      Cores: 1
      Memory: 2
      Disk: 30
    Fat:
      Cores: 32
      Memory: 128
      Disk: 256

MySiteName_2:
  general_adapter_option: something_else
  MachineTypes:
    - XL
  MachineTypeConfiguration:
    XL:
      machine_type_specific_option_1: 9847867-467846-468748BC
  MachineMetaData:
    XL:
      Cores: 128
      Memory: 256
      Disk: 1024

Cloud Stack Site Adapter

The CloudStackAdapter implements an interface to the CloudStack API. The following general adapter configuration options are available.

Available adapter configuration options

Option	Short Description	Requirement
api_key	Your CloudStack API Key to authenticate yourself.	Required
api_secret	Your CloudStack API secret to authenticate yourself.	Required
end_point	The end point of the CloudStack API to contact.	Required

All configuration entries in the MachineTypeConfiguration section of the machine types are directly added as keyword arguments to the CloudStack API deployVirtualMachine call. All available options are described in the CloudStack documentation

Example configuration

Sites:
  - name: Hexascale
    adapter: CloudStack
    quota: 300

Hexascale:
  api_key: BlaBlubbFooBar123456
  api_secret: AKshflajsdfjnASJFkajsfd
  end_point: https://api.hexascale.com/compute
  MachineTypes:
    - Micro
    - Tiny
  MachineTypeConfiguration:
    Micro:
      templateid: 909ce5b7-2132-4ff0-9bf8-aadbb423f7d9
      serviceofferingid: 71004023-bb72-4a97-b1e9-bc66dfce9470
      zoneid: 35eb7739-d19e-45f7-a581-4687c54d6d02
      securitygroupnames: "secgrp-WN,NFS-access,Squid-access"
      userdata: ini/hexascale.ini
      keypair: MG
      rootdisksize: 70
    Tiny:
      templateid: 909ce5b7-2132-4ff0-9bf8-aadbb423f7d9
      serviceofferingid: b6cd1ff5-3a2f-4e9d-a4d1-8988c1191fe8
      zoneid: 35eb7739-d19e-45f7-a581-4687c54d6d02
      securitygroupnames: "secgrp-WN,NFS-access,Squid-access"
      userdata: ini/hexascale.ini
      keypair: MG
      rootdisksize: 70

HTCondor Site Adapter

The HTCondorAdapter implements an interface to the HTCondor batch system. Regular batch jobs are submitted that start the actual Drone, which than is integrated itself in overlay batch system using the chosen BatchSystemAdapter.

Available adapter configuration options

Option	Short Description	Requirement
max_age	The result of the condor_status call is cached for max_age in minutes.	Required
bulk_size	Maximum number of jobs to handle per bulk invocation of a condor tool. Default: 100	Optional
bulk_size		Optional
bulk_delay	Maximum duration in seconds to wait per bulk invocation of a condor tool. Default: 1.0	Optional
bulk_delay		Optional
executor	The executor used to run submission and further calls to the Moab batch system. Default: ShellExecutor is used!	Optional
executor		Optional

Available machine type configuration options

Option	Short Description	Requirement
jdl	Path to the templated jdl used to submit drones to the HTCondor batch system	Required
SubmitOptions	Options to add to the condor_submit command. (see example)	Optional

Note

The template jdl is using the Python template string syntax (see example HTCondor JDL for details).

Warning

The $(…) used for HTCondor variables needs to be replaced by $$(…) in the templated JDL.

Note

In order to properly identify started drones in the overlay batch system and to limit the amount of resources (CPU cores, memory, disk) announced to be available, a set of environment variables needs to be set inside the drone. Preference is to use the environment parameter in the HTCondor JDL. However, in case of using the HTCondor grid universe the environment is usually dropped by the Grid Compute Element. In that case, we suggest to pass the environment variables using the arguments parameter and set the corresponding environment variables inside the drone itself based on the command line arguments in long option syntax.

Example configuration

Sites:
  - name: TOPAS
    adapter: HTCondor
    quota: 462

TOPAS:
  max_age: 1
  MachineTypes:
    - wholenode
    - remotenode
  MachineTypeConfiguration:
    wholenode:
      jdl: pilot_wholenode.jdl
    remotenode:
      jdl: pilot_remotenode.jdl
      SubmitOptions:
        spool: null
        pool: remote-pool.somewhere.de
  MachineMetaData:
    wholenode:
      Cores: 42
      Memory: 256
      Disk: 840
    remotenode:
      Cores: 8
      Memory: 20
      Disk: 100

Example HTCondor JDL (Vanilla Universe)

executable = start_pilot.sh
transfer_input_files = setup_pilot.sh,grid-mapfile
output = logs/$$(cluster).$$(process).out
error = logs/$$(cluster).$$(process).err
log = logs/cluster.log

accounting_group=tardis
x509userproxy = /home/tardis/proxy

environment=${Environment}

request_cpus=${Cores}
request_memory=${Memory}
request_disk=${Disk}

The Environment contains the following variables, TardisDroneCores. TardisDroneMemory. TardisDroneDisk and TardisDroneUuid.

Example HTCondor JDL (Grid Universe for HTCondor-CE)

universe = grid
executable = start_pilot.sh
arguments = ${Arguments}
transfer_input_files = setup_pilot.sh,grid-mapfile
output = logs/$$(cluster).$$(process).out
error = logs/$$(cluster).$$(process).err
log = logs/cluster.log

accounting_group=tardis
x509userproxy = /home/tardis/proxy

+xcount==${Cores}
+maxMemory=${Memory}
+maxWalltime=1440

The Arguments contains the following command line arguments, --cores. --memory. --disk and --uuid.

Specific information on available options for using the HTCondor grid universe to submit jobs to HTCondor-CE can be found in the HTCondor-CE documentation.

Moab Site Adapter

The MoabAdapter implements an interface to the Moab batch system. Regular batch jobs are submitted that start the actual Drone, which than is integrated itself in overlay batch system using the chosen BatchSystemAdapter..

Available adapter configuration options

Option	Short Description	Requirement
bulk_size	Maximum number of jobs to handle per bulk invocation of the `showq` command. Default: 100	Optional
bulk_size		Optional
bulk_delay	Maximum duration in seconds to wait per bulk invocation of the `showq` command. Default: 1.0	Optional
bulk_delay		Optional
StartupCommand	The command executed in the batch job. (Deprecated: Moved to MachineTypeConfiguration!)	Deprecated
executor	The executor used to run submission and further calls to the Moab batch system. Default: ShellExecutor is used!	Optional
executor		Optional
SubmitOptions	Options to add to the msub command. long and short arguments are supported (see example)	Optional

The available options in the MachineTypeConfiguration section are the expected WallTime of the placeholder jobs and the requested NodeType. For details see the Moab documentation.

Example configuration

Sites:
  - name: moab-site
    adapter: Moab
    quota: 2000

moab-site:
  executor: !TardisSSHExecutor
    host: login.dorie.somewherein.de
    username: clown
    client_keys:
      - /opt/tardis/ssh/tardis
  MachineTypes:
    - singularity_d2.large
    - singularity_d1.large
  MachineTypeConfiguration:
    singularity_d2.large:
      Walltime: '02:00:00:00'
      NodeType: '1:ppn=20'
      StartupCommand: startVM.py
      SubmitOptions:
        short:
          M: "someone@somewhere.com"
        long:
          timeout: 60
    singularity_d1.large:
      Walltime: '01:00:00:00'
      NodeType: '1:ppn=20'
      StartupCommand: startVM.py
  MachineMetaData:
    singularity_d2.large:
      Cores: 20
      Memory: 120
      Disk: 196
    singularity_d1.large:
      Cores: 20
      Memory: 120
      Disk: 196

OpenStack Site Adapter

The OpenStackAdapter implements an interface to the OpenStack Cloud API. The following general adapter configuration options are available.

Available adapter configuration options

Option	Short Description	Requirement
auth_url	The end point of the OpenStack API to contact.	Required
username	Your OpenStack API username to authenticate yourself.	Optional
password	Your OpenStack API password to authenticate yourself.	Optional
user_domain_name	The name of the OpenStack user domain.	Optional
project_domain_name	The name of the OpenStack project domain.	Optional
application_credential_id	Your application credential ID to authenticate yourself.	Optional
application_credential_secret	Your application credential secret to authenticate yourself.	Optional

Note

Either username, password , user_domain_name and project_domain_name or application_credential_id and application_credential_secret are mandatory to authenticate against the OpenStack endpoint.

All configuration entries in the MachineTypeConfiguration section of the machine types are directly added as keyword arguments to the OpenStack API create-server call. All available options are described in the OpenStack documentation

Example configuration

Sites:
  - name: Woohoo
    adapter: OpenStack
    quota: 10 # CPU core quota

Woohoo:
  auth_url: https://whoowhoo:13000/v3
  username: woohoo
  password: Woohoo123
  project_name: WooHoo
  user_domain_name: Default
  project_domain_name: Default
  MachineTypes:
    - m1.xlarge
  MachineTypeConfiguration:
    m1.xlarge:
      flavorRef: 5 # ID of m1.xlarge
      networks:
        - uuid: fe0317c6-0bed-488b-9108-13726656a0ea
      imageRef: bc613271-6a54-48ca-9222-47e009dc0c29
      key_name: MG
      user_data: tardis/cloudinit/woohoo.ini
  MachineMetaData:
    m1.xlarge:
      Cores: 8
      Memory: 16
      Disk: 160

Slurm Site Adapter

The SlurmAdapter implements an interface to the SLURM batch system. Regular batch jobs are submitted that start the actual Drone, which than is integrated itself in overlay batch system using the chosen BatchSystemAdapter..

Available adapter configuration options

Option	Short Description	Requirement
bulk_size	Maximum number of jobs to handle per bulk invocation of the `squeue` command. Default: 100	Optional
bulk_size		Optional
bulk_delay	Maximum duration in seconds to wait per bulk invocation of the `squeue` command. Default: 1.0	Optional
bulk_delay		Optional
StartUpCommand	The command executed in the batch job. (Deprecated: Moved to MachineTypeConfiguration!)	Deprecated
executor	The executor used to run submission and further calls to the Moab batch system. Default: ShellExecutor is used!	Optional
executor		Optional

Available machine type configuration options

Option	Short Description	Requirement
Walltime	Expected walltime of drone	Required
Partition	Name of the Slurm partition to run in	Required
StartupCommand	The command to execute at job start	Required
SubmitOptions	Options to add to the sbatch command. long and short arguments are supported (see example)	Optional
StatusOptions	Options to add to the squeue command. long and short arguments are supported (see example)	Optional
TerminateOptions	Options to add to the scancel command. long and short arguments are supported (see example)	Optional

Example configuration

Sites:
  - name: hpc2000
    adapter: Slurm
    quota: 100

hpc2000:
  executor: !TardisSSHExecutor
    host: hpc2000.hpc.org
    username: billy
    client_keys:
     - /opt/tardis/ssh/tardis
  MachineTypes:
    - one_day
    - twelve_hours
  MachineTypeConfiguration:
    one_day:
      Walltime: '1440'
      Partition: normal
      StartupCommand: 'pilot_clean.sh'
      SubmitOptions:
        short:
          C: "intel"
        long:
          gres: "gpu:2,mic:1"
    six_hours:
      Walltime: '360'
      Partition: normal
      StartupCommand: 'pilot_clean.sh'
      SubmitOptions:
        long:
          gres: "gpu:2,mic:1"
    twelve_hours:
      Walltime: '720'
      Partition: normal
      StartupCommand: 'pilot_clean.sh'
      StatusOptions:
        long:
          cluster: "cm4"
        short:
          p: "cm4_tiny"
      TerminateOptions:
        long:
          cluster: "cm4"
        short:
          p: "cm4_tiny"
  MachineMetaData:
    one_day:
      Cores: 20
      Memory: 62
      Disk: 480
    twelve_hours:
      Cores: 20
      Memory: 62
      Disk: 480
    six_hours:
      Cores: 20
      Memory: 62
      Disk: 480

Kubernetes Site Adapter

The KubernetesAdapter implements an interface to the Kubernetes API. The following general adapter configuration options are available.

Available adapter configuration options

Option	Short Description	Requirement
host	The end point of the Kubernetes Cluster.	Required
token	Bearer token used to authenticate yourself.	Required

To create a token refer to: Kubernetes documentation

Available machine type configuration options

Option	Short Description	Requirement
namespace	Namespace for pods to run in.	Required
image	Image for the pods.	Required
args	Arguments for the containers that run in your pods.	Required
hpa	Set TrueFalse to enabledisable kubernetes horizontal pod autoscaler feature.	Required
min_replicas	Minimum number of pods to scale to. (Only required when hpa is set to True)	Required
max_replicas	Maximum number of pods to scale to. (Only required when hpa is set to True)	Required
cpu_utilization	Average Cpu utilization to maintain across pods of a deployment. (Only required when hpa is set to True)	Required
cpu_utilization		Required

Example configuration

Sites:
  - name: Kube-site
    adapter: Kubernetes
    quota: 10


Kube-site:
  host: https://127.0.0.1:443
  token: 31ada4fd-adec-460c-809a-9e56ceb75269
  MachineTypes:
    - example
  MachineTypeConfiguration:
    example:
      namespace: default
      image: busybox:1.26.1
      label: busybox
      args: ["sleep", "3600"]
  MachineMetaData:
    example:
      Cores: 2
      Memory: 4

Your favorite site is currently not supported? Please, have a look at how to contribute.