CHANGELOG

[Unreleased] - 2024-02-05

Fixed

  • Fix handling of squeue call with invalid jobid

[0.8.1] - 2023-11-29

Changed

  • Add drone_uuid tag to the telegraf record

  • Bump pyauditor version to 0.3.1

  • Enable support for SSH command restrictions in Moab adapter

Fixed

  • Fix type of sshexecutor stdin parameter

[0.8.0] - 2023-10-05

Changed

  • Bump pyauditor version to 0.2.0

Fixed

  • Fix auditor error handling when updating non-existing record

  • Pin TARDIS to use pydantic version 1

  • Fix missing resource_status attribute crashing Prometheus plugin on newly started drones

  • Fix utilization of updated timestamp and potential ignoring of drone minimum lifetime

Deprecated

  • Minimal Python version is 3.8

  • Lancium compute support

[0.7.1] - 2023-05-16

Changed

  • Change pyauditor version to 0.1.0

Fixed

  • Disable change of drone_uuid after resource deployment in Moab adapter

[0.7.0] - 2023-02-24

Added

  • Introduce a TARDIS REST API to query the state of resources from SqlRegistry

  • Ensure python3.10 compatibility

  • Added support for manual draining of drones using the REST API

  • Add support for passing environment variables as executable arguments to support HTCondor grid universe

  • Added support for application credentials of the OpenStack site adapter

  • Added a new site adapter to use Lancium compute as resource provider

Changed

  • Adjust ElasticSearch plugin to support client versions >=7.17,<8.0.0

  • Remove granularity in Standardiser to enable earlier creation of new drones

  • Introduced Bulk Executor and HTCondor Bulk Operations

  • SSHExecutor respects the remote MaxSessions via queueing

  • Remove minimum core limit (Standardiser) from pool factory

  • Change drone state initialisation and notification of plugins

  • REST API cookie authentication and refactoring

  • Adjust Prometheus plugin to the latest aioprometheus version 21.9.0

Fixed

  • Unique constraints in database schema have been fixed to allow same machine_type and remote_resource_uuid on multiple sites

  • Update the remote_resource_uuid in sqlite registry on a each update

  • REST API does not suppress KeyboardInterrupt

  • Fixing recurrent cancellation of jobs TIMEOUTED in Slurm

  • Fixed state transition for stopped workers

[0.6.0] - 2021-08-09

Added

  • Added support for Kubernetes horizontal pod autoscaler

  • Enable support for msub command line options in the Moab site adapter

  • An optional and per site configurable drone heartbeat interval has been added

  • Added support for executors in batch system adapters

  • Added a new site adapter to use Kubernetes clusters as resource provider

  • Added TARDIS docker images to matterminers@dockerhub

Fixed

  • Fixed pypy support of TARDIS

  • Fixes a bug that get_resource_ratios raised a ValueError

  • Fixed installation issues on Centos 7

  • Fixes a bug that the drone_minimum_lifetime parameter is not working as described in the documentation

  • Fixes a bug in the HTCondor Site Adapter which leads to wrong requirements when using non HTCondor OBS

[0.5.0] - 2020-12-09

Added

  • Export tardis environment variable via slurm site adapter

  • Added support for Slurm overlay batch system

[0.4.0] - 2020-06-03

Added

  • Added an example HTCondor jdl for the HTCondor site adapter

  • Added a prometheus monitoring plugin

  • Enable support for sbatch command line options in the Slurm site adapter

  • Add ssh connection sharing to SSHExecutor in order to re-use existing connection

Changed

  • Added log channels and adjusted log levels according to the conventions in COBalD documentation

  • The Moab adapter can now be configured to use different startup commands for each machine type.

  • The SLURM adapter can now be configured to use different startup commands for each machine type.

Fixed

  • Fixed the exception handling of ConnectionResetError in SSHExecutor

  • Fixed the resource status translation of the moab site adapter

[0.3.0] - 2020-02-26

Added

  • Add support for Python 3.8

  • Register pool factory as COBalD yaml plugin

  • Add support for COBalD legacy object initialisation

  • The machine name has been added as a default tag in the telegraf monitoring plugin, can be overwritten.

  • An optional and per site configurable drone minimum lifetime has been added

  • Add the possibility to use an unified COBalD and TARDIS configuration

Fixed

  • Fix draining of slots having a startd name

  • Fix the translation of cloud init scripts into base64 encoded strings

  • Use utilisation as weight in composite pools

  • Allow removal of booting drones if demand drops to zero

  • The CleanupState is now taking into account the status of the resource for state transitions

  • Improved logging of the HTCondor batch system adapter and the status changes of the drones

  • Fix the handling of the termination of vanished resources

  • Fix state transitions for jobs retried by HTCondor

  • Fix state transitions and refactoring of the SLURM site adapter