src package

This folder contains the implementation of the ExpoCloud framework.

Subpackages

Submodules

src.abstract_engine module

The parent class for all cloud-based compute engines.

class AbstractEngine

Bases: object

The parent class for all cloud-based compute engines. We assume that the engine supports machine images. The children should implement two methods: create_instance_native for creating the compute instance and kill_instance for terminating a compute instance.

__init__(config: dict)

The constructor.

Parameters:

config – The configuration dictionary with the following keys:

  • prefix - the prefix used for the names of instances.

  • project - the name of the project on the cloud.

  • server_image - the name of the machine image for a server.

  • client_image - the name of the machine image for a server.

  • root_folder - the path to the root folder, e.g. ~/ExpoCloud/.

  • project_folder - the path to the experiment in dot notation, e.g. 'examples.agent_assignment'.

is_local() bool

Returns False to signify that this engine is not local.

Returns:

False

Return type:

bool

creation_attempt_allowed() bool

Check whether self.creation_delay seconds have passed since the last attempt of instance creation.

Returns:

True if self.creation_delay seconds have passed since the last attempt of instance creation and False otherwise.

Return type:

bool

next_instance_name(role: InstanceRole) str

Generate the name for the next instance with the given role.

Parameters:

role (InstanceRole) – The role of the instance.

Returns:

The name for the next instance with the given role.

Return type:

str

image_name(role: InstanceRole) str

Return the name of the machine image for an instance with the given role.

Parameters:

role (InstanceRole) – The role of the instance.

Returns:

The name of the machine image for an instance with the given role.

Return type:

str

create_instance(name: str, role: InstanceRole) str

Create the instance of the given type and return its IP address.

Parameters:
  • name (str) – The name of the instance to be created.

  • role (InstanceRole) – The role of the instance to be created.

Returns:

The IP address of the newly created instance.

Return type:

str

run_instance(name: str, ip: str, role: InstanceRole, server_port: int, max_cpus: Optional[int] = None)

Run the instance.

Parameters:
  • name (str) – The name of the instance to be run.

  • ip (str) – The IP address of the instance to be run.

  • role (InstanceRole) – The role of the instance to be run.

  • server_port (int) – The port for handshaking with the primary server.

  • max_cpus (int, optional) – The maximum number of workers to be used by the client instance, defaults to None, i.e. unlimited.

list_instances() List[tuple]

Returns list of tuples (<name>, <ip>, <status>) for each instance. The particular engine subclasses should implemented this method.

Returns:

The list of tuples (<name>, <ip>, <status>) for each instance.

Return type:

List[tuple]

kill_instance(name: str) str

Terminates the specified instance. The particular engine subclasses should implemented this method.

Parameters:

name (str) – The name of the instance to terminate.

Returns:

The name of the terminated instance or None in the case of an exception.

Return type:

str

kill_dangling_clients(existing_clients: List[str])

Kill clients whose name has the given prefix, but is not in the provided list of names. The method relies on the particular engine implementing the list_instances method.

Parameters:
  • prefix (str) – The given prefix.

  • existing_clients (List[str]) – The names of instances that are not to be killed.

src.abstract_task module

Defines parent classes for all ProblemInstance, Algorithm and Task classes.

class AbstractProblemInstance

Bases: object

This is a parent class for ProblemInstance classes, such as examples.agent_assignment.instance.ProblemInstance.

__init__(id: int)

The constructor.

Parameters:

id (int) – Instance id.

class AbstractAlgorithm

Bases: object

This is a parent class for Algorithm classes, such as examples.agent_assignment.bnb.Algorithm.

__init__(options: set, instance: AbstractProblemInstance)

The constructor.

Parameters:
  • options (set) – The options controlling the behavior of the algorithm.

  • instance (AbstractProblemInstance) – The problem instance

class AbstractTask

Bases: object

This is a parent class for Task classes for experiments. See examples.agent_assignment.task.Task for an example.

class Hardness

Bases: object

The default class to represent hardness of a task. It provides default comparators.

__init__(params: tuple)

The constructor.

Parameters:

params (tuple) – The tuple of parameters determining the task’s hardness.

__str__() str

Return the string representation of hardness.

Returns:

The string representation of hardness.

Return type:

str

__repr__() str

Return the string representation of hardness.

Returns:

The string representation of hardness.

Return type:

str

__lt__(other: Hardness) bool

Checks whether this hardness is strictly less than other.

Parameters:

other (Hardness) – The hardness to be compared to.

Returns:

True if hardness is strictly less than other and False otherwise.

Return type:

bool

__le__(other: Hardness) bool

Checks whether this hardness is less than or equal to other.

Parameters:

other (Hardness) – The hardness to be compared to.

Returns:

True if hardness is less than or equal to other and False otherwise.

Return type:

bool

__init__(algorithm: AbstractAlgorithm, timeout: float)

The constructor.

Parameters:
  • algorithm (AbstractAlgorithm) – The algorithm used to solve the problem instance.

  • timeout (float) – The deadline for the task in seconds.

group_parameter_titles() Tuple[str]

Return the tuple of names of parameters that determine groups for counting the number of non-hard instances.

Returns:

The tuple of names of parameters that determine groups for counting the number of non-hard instances.

Return type:

Tuple[str]

group_parameters() tuple

Return the tuple of parameters that determine groups for counting the number of non-hard instances.

Returns:

The tuple of parameters that determine groups for counting the number of non-hard instances.

Return type:

tuple

parameter_titles() Tuple[str]

Return the tuple of names of parameters that characterize the task.

Returns:

The tuple of names of parameters that characterize the task.

Return type:

Tuple[str]

parameters() tuple

Return the tuple of parameters that characterize the task.

Returns:

The tuple of parameters that characterize the task.

Return type:

tuple

hardness_parameters() tuple

Return the tuple of parameters determining the hardness of the task. This is to be used to initialize the Hardness object.

Returns:

The tuple of parameters determining the hardness of the task.

Return type:

tuple

result_titles() Tuple[str]

Return the tuple of names of output values for the solved task.

Returns:

The tuple of names of output values for the solved task.

Return type:

Tuple[str]

run() tuple

Return the tuple of output values for the solved task.

Returns:

The tuple of output values for the solved task.

Return type:

tuple

__str__() str

Return the string representation of the task.

Returns:

The string representation of the task.

Return type:

str

__repr__()

Return the string representation of the task.

Returns:

The string representation of the task.

Return type:

str

src.client module

The two classes implementing the client - the Worker and the Client.

class Worker

Bases: Process

The class representing a worker charged with a task.

__init__(id: int, task: AbstractTask, queue: Queue)

The constructor.

Parameters:
  • id (int) – The worker’s id.

  • task (AbstractTask) – The task to execute.

  • queue (Queue) – The queue for communicating the result.

my_kill()

Terminate the worker.

run()

Run the worker process executing the task.

class Client

Bases: object

The main client class.

__init__()

The constructor. In particular, it shakes hands with the primary server and create the queues for communicating with both the primary and the backup servers.

message_to_servers(type: MessageType, body: Any)

Send the message to both primary and backup servers.

Parameters:
  • type (MessageType) – The message type.

  • body (Any) – The body of the message.

event_to_servers(descr: str, worker: Optional[Worker] = None, task: Optional[AbstractTask] = None)

Send an event to servers.

Parameters:
  • descr (str) – The event description.

  • worker (Worker, optional) – The worker at which the event took place or None if the event is not related to any worker; defaults to None.

  • task (AbstractTask, optional) – The task to which the event is relevant or None if the event is not relevant to any task; defaults to None.

process_grant_tasks(tasks: List[AbstractTask])

Process tasks granted to this client by the primary server.

Parameters:

tasks (List[AbstractTask]) – The tasks granted.

apply_domino_effect(hard: tuple)

Apply the domino effect based on the given hardness. Namely,

  • Terminate the workers that execute tasks with hardness greater than or equal to hard.

  • Remove the corresponding tasks from self.tasks.

Parameters:

hard (tuple) – The hardness of the task that timed out as reported by the primary server.

process_no_further_tasks(_body)

Set the flag about having receive the NO_FURTHER_TASKS message from the primary server.

process_stop(_body)

Set the flag about having receive the STOP message from the primary server.

process_resume(_body)

Unset the flag about having receive the STOP message from the primary server and perform the relevant bookkeeping.

process_swap_queues(_body)

Swap the queues for communication with the servers, so that the queues previously used for communication with the backup server are now treated as the queues for communication with the primary server, and vice versa.

process_messages()

Process messages from the servers.

process_worker_started(worker: Worker, _body)

Process the event of the worker having started by sending the corresponding event to the servers.

Parameters:

worker (Worker) – The worker that started.

process_worker_done(worker: Worker, result: tuple)

Process the event of the worker having completed its task by sending both the result and the corresponding event to the servers.

Parameters:
  • worker (tuple) – The worker that finished.

  • worker – The result for the completed task.

process_worker_messages()

Process messages from workers.

collect_garbage()

Remove objects corresponding to workers that have terminated.

kill_overdue()

Terminate workers whose task has timed out. For each timed out task, send the REPORT_HARD_TASK message to the servers.

process_workers()

Manage the workers by performing three actions:

  • Process messages from the workers.

  • Remove objects corresponding to workers that have terminated.

  • Process workers whose task has timed out.

health_update()

Send health update to the servers.

request_tasks(n: int)

Send a message to the servers requesting n tasks.

Parameters:

n (int) – The number of tasks to request.

occupy_workers()

Create workers for the tasks assigned to this client, charge the new workers with tasks and start them.

run()

The main loop of the client. The loop is over when there are no more tasks to be assigned to the client and all tasks previously assigned to it are completed. Once the loop is over, send the BYE message to the servers, wait for them to process this message, and complete.

src.constants module

class Verbosity

Bases: object

Constants determining which output is to be printed.

all = True

Output not about a specific topic specified by the other constants.

instance_creation_etc = True

Output detailing the creation of cloud instances.

workers = False

Output detailing the operation of the workers at a client.

message_sync = True

Output detailing the synchronization of messages from the primary and the backup servers to the clients.

messages = True

Output detailing the processing of messages.

all_non_health_messages = False

Output all non-health-report messages received by one of the servers.

failure_traceback = False

Output of the traceback record for failure events.

command_lines = True

Output of the command lines to be executed.

class Constants

Bases: object

The constants used throughout the framework. All times are in seconds.

MIN_CREATION_DELAY = 30

The initial delay for instance creation.

INSTANCE_MAX_NON_ACTIVE_TIME = 300

If no handshake for this amount of time, kill the instance.

HEALTH_UPDATE_FREQUENCY = 1

Frequency of health updates

HEALTH_UPDATE_LIMIT = 60

An otherwise active instance that has not reported on its health for this amount of time is considered unhealthy.

SSH_RETRY_DELAY = 5

If ssh fails, attempt again after this delay.

CLIENTS_TIME_TO_STOP = 5

Time for clients so surely stop sending messages following the sending of the STOP event.

OUTPUT_FOLDER = 'output'

The folder for storing the output files.

PICKLED_SERVER_FILE = 'output/pickled'

The file storing the serialized server object.

SERVER_CYCLE_WAIT = 0.1

The delay between two successive iterations of the main server loop.

CLIENT_CYCLE_WAIT = 0.1

The delay between two successive iterations of the main client loop.

CLIENT_WAIT_AFTER_SENDING_BYE = 5

The delay between a client sending the BYE message to the server and completing.

WORKER_WAIT_AFTER_DONE = 1

The delay between a worker sending reporting the result to the client and completing.

src.instance module

Classes for representing a cloud instance at either the primary or the backup server.

class Instance

Bases: object

The parent class for classes representing a cloud instance at either the primary or the backup server. This class is only for representing a backup server or a client instance. Note the class for representing a primary server in the backup server does not derive from this class.

__init__(role: InstanceRole, engine: Union[AbstractEngine, LocalEngine])

The constructor.

Parameters:
create()

Create the cloud instance.

run(server_port, max_cpus=None)

Run the instance.

is_healthy(tasks_remain: bool)

Returns true if the instance is healthy, i.e. either:

  • The instance has no IP address (so that no real instance had been created) and there are task remaining.

  • The instance has an IP address, but is not active (i.e. has not shaken hands with the primary server). For client instance, we also require that there still be tasks remaining as indicated. For all types of instances, INSTANCE_MAX_NON_ACTIVE_TIME has not passed since its creation.

  • It is active and HEALTH_UPDATE_LIMIT has not passed since last health update.

Parameters:

tasks_remain (bool) – True if there are tasks remaining and False otherwise.

Returns:

_description_

Return type:

_type_

shake_hands()

Shake hands with the instance. The common action implemented here is storing the timestamp of when the instance became active, where being active means that the handshake with it has taken place.

class ClientInstance

Bases: Instance

The class for representing a client instance in either the primary or the backup server.

__init__(engine: Union[AbstractEngine, LocalEngine], tasks_from_failed: List[AbstractTask])

The constructor.

Parameters:
connect(server_role: InstanceRole)

Connect to the appropriate queues of the server.

Parameters:

server_role (InstanceRole) – The role of the server at which the client instance is being represented.

init_files(parent_dir: str)

Open files for storing events and exceptions from this client.

Parameters:

parent_dir (str) – The folder to contain the files.

shake_hands(server_role: InstanceRole, parent_dir: str)

Shake hands with the client.

Parameters:
  • server_role (InstanceRole) – The role of the server at which the client is being represented.

  • parent_dir (str) – The folder within which the folder with the files related to this client is to be contained.

register_tasks(tasks: List[AbstractTask])

Register tasks assigned to this client.

Parameters:

tasks (List[AbstractTask]) – The newly assigned tasks.

unregister_task(t_id: int)

Unregister a given task from this client.

Parameters:

t_id (int) – The task id.

unregister_domino(tasks: List[AbstractTask], hardness: tuple)

Unregister all tasks that have been previously registered with this client and can now be proven to be hard due to a task with the given hardness having timed out.

Parameters:
  • tasks (List[AbstractTask]) – All the tasks.

  • hardness (tuple) – The hardness of the task that has timed out.

class BackupServerInstance

Bases: Instance

The class for representing the backup server instance in the primary server.

__init__(engine: Union[AbstractEngine, LocalEngine])

The constructor.

Parameters:

engine (Union[AbstractEngine, LocalEngine]) – The engine being used.

shake_hands()

Shake hands with the backup server.

class PrimaryServerInstance

Bases: object

The class for representing the primary server instance in the backup server.

__init__(my_port: int)

The constructor.

Parameters:

my_port (int) – The port for shaking hands with the primary server.

is_healthy()

Returns true if the primary server is healthy, i.e. HEALTH_UPDATE_LIMIT has not passed since last health update.

is_primary(instance: Instance) bool

Determine whether the instance represented by instance is the primary server.

Parameters:

instance (Instance) – The object representing the instance.

Returns:

True if the instance represented by instance is the primary server and False otherwise.

Return type:

bool

is_backup(instance: Instance) bool

Determine whether the instance represented by instance is the backup server.

Parameters:

instance (Instance) – The object representing the instance.

Returns:

True if the instance represented by instance is the backup server and False otherwise.

Return type:

bool

is_client(instance: Instance) bool

Determine whether the instance represented by instance is a client.

Parameters:

instance (Instance) – The object representing the instance.

Returns:

True if the instance represented by instance is a client and False otherwise.

Return type:

bool

src.run_backup module

The script for running the backup server on the backup server instance, which consists of:

  • Deserializing the primary server object.

  • Transforming this object to represent the backup server by invoking the assume_backup_role method.

  • Running the server by invoking the run method.

Note that this script is provided by the framework and is not to be supplied by the user.

src.server module

The class implementing the primary and the backup servers.

class Server

Bases: object

Either the primary or the backup server.

__init__(tasks: List[AbstractTask], engine: Union[LocalEngine, AbstractEngine], backup: bool, max_clients: Optional[int] = None, max_cpus_per_client: Optional[int] = None, min_group_size: int = 0)

The constructor; only ever invoked for building the first primary server.

Parameters:
  • tasks (List[AbstractTask]) – The list of tasks to be executed.

  • engine (Union[LocalEngine, AbstractEngine]) – The compute engine settings.

  • backup (bool) – Whether or not to use a backup server.

  • max_clients (int, optional) – Maximal number of client instances or None for no restriction; defaults to None.

  • max_cpus_per_client (int, optional) – The maximal number of workers to be used at a client instance or None for no restriction; defaults to None

  • min_group_size (int, optional) – minimal size of group as defined by the task’s group_parameter_titles method, defaults to 0

run()

The main loop.

tasks_remain() bool

Check whether there are tasks remaining to be executed.

Returns:

True if there are tasks remaining to be executed and False otherwise.

Return type:

bool

is_hard(hardness: tuple) bool

Check whether hardness is greater or equal to hardness of one of the timed out tasks.

Parameters:

hardness (tuple) – The hardness to be checked.

Returns:

True if hardness is greater or equal to hardness of one of the timed out tasks and False otherwise.

Return type:

bool

print_results()

Restore the original order of tasks and print results.

is_primary() bool

Check whether the server is the primary server.

Returns:

True if the server is the primary server and False otherwise.

Return type:

bool

is_backup() bool

Check whether the server is the backup server.

Returns:

True if the server is the backup server and False otherwise.

Return type:

bool

assume_backup_role()

Assume the backup server role. This method is called after unpickling the primary server object, so as to convert it to a backup server one.

assume_primary_role()

Assume the primary server role. This method is called at the backup server when primary server failure is detected.

handle_primary_server_failure()

Handle the primary server failure. This method is called at the backup server when primary server failure is detected. The handling consists of assuming the primary server role and sending the MessageType.SWAP_QUEUES message to the clients.

init_handshake_q()

Construct the handshake manager and the queue for handshake requests from the backup server and the clients.

handshake_from_client(name: str, port_primary: int, port_backup: int)

Handle handshake request from a client. This method is invoked only at the primary server.

Parameters:
  • name (str) – The name of the client instance.

  • port_primary (int) – The client’s port for communication with the primary server.

  • port_backup (int) – The client’s port for communication with the backup server.

handshake_from_backup(name: str, port: int)

Handle handshake request from the backup server. This method is invoked only at the primary server.

Parameters:
  • name (str) – The name of the backup server instance.

  • port (int) – The backup server’s port for communication with the primary server.

accept_handshakes()

Handle handshake requests from new backup server and client instances. Uses the methods handshake_from_client and handshake_from_backup to handle the respective kinds of requests. This method is invoked only at the primary server.

n_active_clients() int

Compute the number of currently active client instances. An instance is active if it has shaken hands with the server and has not been terminated since then.

Returns:

The number of currently active client instances.

Return type:

int

get_client(name: str) ClientInstance

Get the object representing the client instance with the given name.

Parameters:

name (str) – The name of the client instance.

Returns:

The object representing the client instance or None if there is no client instance with the given name.

Return type:

ClientInstance

kill_client(name: str)

Terminate the client instance with the given name. If the name does not correspond to any client, do nothing.

Parameters:

name (str) – The name of the client instance.

kill_instance(instance: Instance)

Terminate the instance represented by instance.

Parameters:

instance – The object representing the instance to be terminated.

create_backup_server_instance()

Create and run the backup server.

create_client_instance()

Create and run a new client.

create_instance()

Create and run the new instance. The backup server gets a precedence. If it already exists or is not used, then a new client is created.

kill_unhealthy_instances()

Terminate unhealthy instances. See Instance.is_healthy and PrimaryServerInstance.is_healthy for the definition of a healthy instance.

stop_clients()

Send the MessageType.STOP message to all clients and record the timestamp of stopping the clients.

resume_clients()

Send the MessageType.RESUME message to all clients and reset the timestamp previously set by stop_clients.

send_health_update()

Send health update to the other server.

message_to_instance(instance: Instance, type: MessageType, body: Any)

Send a message to an instance.

Parameters:
  • instance (Instance) – The object representing the receiver instance.

  • type (MessageType) – The message type.

  • body (Any) – The body of the message.

messages_waiting(instance: Instance) bool

Check whether a message from the given instance can be read. If the instance is invalid or is not active, returns False. If there is no message in the inbound queue from the given instance, returns False. Otherwise return True, unless the method is invoked at the backup server, the instance is a client and the corresponding message has not been forwarded by the primary server.

Parameters:

instance (Instance) – The object representing the instance sending the message.

Returns:

True if a message from the given instance can be read and False otherwise.

Return type:

bool

forward_message(instance: Instance, message_id: int, type: MessageType, body: Any)

Forward a message from a client to the backup server. This method is invoked only at the primary server.

Parameters:
  • instance (Instance) – The object representing the sending instance.

  • message_id (int) – The id of the message.

  • type (MessageType) – The message type.

  • body (Any) – The body of the message.

process_health_update(instance: Instance, _body)

Record the timestamp of the MessageType.HEALTH_UPDATE message sent by the given instance.

Parameters:

instance (Instance) – The object representing the instance sending the message.

process_request_tasks(client: ClientInstance, n: int)

Process the request for tasks sent by the given client. The method sends either the MessageType.GRANT_TASKS message containing the tasks being granted to the client or the MessageType.NO_FURTHER_TASKS message to indicate that there are no tasks to be assigned.

Parameters:
  • client (ClientInstance) – The object representing the client sending the request.

  • n (int) – The number of tasks requested by the client.

process_log(client: ClientInstance, descr: str)

Log the event related to a task execution by the given client.

Parameters:
  • client (ClientInstance) – The object representing the client sending the event.

  • descr (str) – The description of the event to be logged.

process_exception(client: ClientInstance, descr: str)

Log the exception event sent by the given client.

Parameters:
  • client (ClientInstance) – The object representing the client sending the event.

  • descr (str) – The description of the event to be logged.

process_result(client: ClientInstance, body: tuple)

Process the result of executing a task sent by the given client.

Parameters:
  • client (ClientInstance) – The object representing the client sending the result.

  • body (tuple) – The tuple consisting of the task id and the result.

process_report_hard_task(_client, task_id: int)

If the task specified by task_id is minimally hard, then:

  1. Add the task to self.min_hard.

  2. Send the MessageType.APPLY_DOMINO_EFFECT message to all clients, so they can terminate any task that is as hard or harder than this task.

Parameters:

task_id – The id of the task that timed out.

process_bye(client: ClientInstance, _body)

Process the MessageType.BYE message sent by the given client by terminating the corresponding client instance.

Parameters:

client (ClientInstance) – The object representing the client sending the message.

process_new_client(_instance, body: tuple)

Construct a client object corresponding to the client that shook hands with the primary server and about which the primary server sent the MessageType.NEW_CLIENT message. After constructing the object, invoke its shake_hands method. The process_new_client method is only invoked at the backup server.

Parameters:

body (tuple) – The information about the new client instance.

process_client_terminated(_instance, client_name: str)

Process client failure reported by the primary server. This method is only invoked at the backup server.

Parameters:

client_name (str) – The name of the terminated client.

process_message(instance: Instance, type: MessageType, body: Any)

Process a message from clients and the other server.

Parameters:
  • instance (Instance) – The object representing the sending instance.

  • type (MessageType) – The type of the message.

  • body (Any) – The body of the message.

handle_messages()

Handle messages from clients and the other server.

src.util module

Utility enums and functions.

short_timestamp(timestamp: float) str

Generate short string representation of the given timestamp.

Parameters:

timestamp (float) – The timestamp to render.

Returns:

The short string representation of the given timestamp.

Return type:

str

short_now_str() str

Generate short string representation of the current timestamp.

Returns:

The short string representation of the current timestamp.

Return type:

str

my_print(cond: bool, str: str, err_flag: bool = False)

Print the given string if the given condition is True.

Parameters:
  • cond (bool) – The condition to be checked.

  • str (str) – The string to be printed.

  • err_flag (bool, optional) – True to print to stderr and False to print to stdout; defaults to False.

my_eprint(cond: bool, str: str)

Print the given string to stderr if the given condition is True.

Parameters:
  • cond (bool) – The condition to be checked.

  • str (str) – The string to be printed.

command_arg_ip() str

Get the command line argument representing the ip of the primary server.

Returns:

The command line argument representing the ip of the primary server.

Return type:

str

command_arg_port() int

Get the command line argument representing the port for communicating with the primary server.

Returns:

The command line argument representing the port for communicating with the primary server.

Return type:

int

command_arg_name() str

Get the command line argument representing the name of the current instance.

Returns:

The command line argument representing the name of the current instance.

Return type:

str

command_arg_max_cpus() int

Get the command line argument representing the maximal number of workers that the current client instance is allowed to use. If it is None, return a large positive integer value standing for infinity.

Returns:

The maximal number of workers that the current client instance is allowed to use.

Return type:

int

class InstanceRole

Bases: Enum

The role of instance.

PRIMARY_SERVER = 'PRIMARY_SERVER'

The primary server instance.

BACKUP_SERVER = 'BACKUP_SERVER'

The backup server instance.

CLIENT = 'CLIENT'

A client instance.

class MessageType

Bases: Enum

Type of a message sent to another instance.

HEALTH_UPDATE = 'HEALTH_UPDATE'

Informs the receiver that the sending instance is healthy.

REQUEST_TASKS = 'REQUEST_TASKS'

A client sends this message to the servers to request tasks for execution.

RESULT = 'RESULT'

A client sends this message to the servers to report on a completed task.

REPORT_HARD_TASK = 'REPORT_HARD_TASK'

A client sends this message to the servers to report on a timed out task.

LOG = 'LOG'

A client sends this message to the servers to report an even related to task execution.

EXCEPTION = 'EXCEPTION'

A client sends this message to the servers to report an exception event.

BYE = 'BYE'

A client sends this message to the servers to report that the client script is about to terminate.

NEW_CLIENT = 'NEW_CLIENT'

The primary server sends this message to the backup server to report that a new client has shaken hands with the primary server.

CLIENT_TERMINATED = 'CLIENT_TERMINATED'

The primary server sends this message to the backup server to report that a a client has been terminated.

MESSAGE_FROM_CLIENT = 'MESSAGE_FROM_CLIENT'

The primary server sends this message to forward to the backup server a message sent by a client.

GRANT_TASKS = 'GRANT_TASKS'

A server sends this message to a client to assign to the latter tasks for execution.

APPLY_DOMINO_EFFECT = 'APPLY_DOMINO_EFFECT'

A server sends this message to a client to inform the latter of a task that timed out at another client.

NO_FURTHER_TASKS = 'NO_FURTHER_TASKS'

A server sends this message to a client, in response to requesting tasks for execution, to inform the client that there are no more tasks to be assigned.

STOP = 'STOP'

A server sends this message to a client to inform the latter that it should not send to the servers any messages aside from health updates.

RESUME = 'RESUME'

A server sends this message to a client to inform the latter that it may resume sending messages to the servers, thereby cancelling the effect of the previous STOP message.

SWAP_QUEUES = 'SWAP_QUEUES'

The primary server sends this message to a client to inform the latter that it should swap the queues for communicating with the primary and the backup server. This happens when the backup server assumes the primary server role following the failure of the former primary server.

WORKER_STARTED = 'STARTED'

A worker sends this message to its owning client to inform the latter that the worker has started executing the task.

WORKER_DONE = 'DONE'

A worker sends this message to its owning client to inform the latter that the worker has completed executing the task.

get_unused_port() int

Generate a yet unused port number.

Returns:

An unused port number.

Return type:

int

extended_prefix(role: InstanceRole, prefix: str) str

Compute the prefix for the name for the next cloud instance based on the role.

Parameters:
  • role (InstanceRole) – The role of the new instance.

  • prefix (str) – The prefix to be used in the name.

Returns:

The prefix for the name for the next cloud instance based on the role.

Return type:

str

next_instance_name(role: InstanceRole, prefix: str, instance_id: dict) str

Generate the name for the next cloud instance.

Parameters:
  • role (InstanceRole) – The role of the new instance.

  • prefix (str) – The prefix to be used in the name.

  • instance_id (dict) – The dictionary that, maps the role to the id for the next instance.

Returns:

The name for the next cloud instance.

Return type:

str

get_guest_qs(ip: str, port: int, q_names: List[str]) tuple[queue.Queue]

Get queues owned by another instance, which we call here guest. The caller should handle the exceptions.

Parameters:
  • ip (str) – The IP address of the guest instance.

  • port (int) – The port for connecting to the guest instance.

  • q_names (List[str]) – The names of the queues to be obtained.

Returns:

The requested queues owned by the guest instance.

Return type:

tuple[queue.Queue]

make_manager(q_names: List[str], port: int) SyncManager

Create a manager with the queues with the specified names.

Parameters:
  • port (int) – The port at which to accept connections to the new queues.

  • q_names (List[str]) – The names of the queues to be constructed.

Returns:

The newly created manager.

Return type:

SyncManager

handshake(my_role: InstanceRole, my_port1: int, my_port2: Optional[int] = None)

Perform handshake with the primary server.

Parameters:
  • my_role (InstanceRole) – The role of the current instance.

  • my_port1 (int) – The port at which messages from the primary server will be received.

  • my_port2 (int, optional) – The port at which messages from the backup server will be received. This argument will be None when invoked at the backup server; defaults to None.

handle_exception(e: Exception, msg: str, exit_flag: bool = True, to_primary_q: Optional[Queue] = None)

Print the custom error message and the exception, optionally send an exception event to the primary server, and, again optionally, exit.

Parameters:
  • e (Exception) – The exception.

  • msg (str) – The custom error message.

  • exit_flag (bool, optional) – Whether the script should terminate; defaults to True.

  • to_primary_q (queue.Queue, optional) – The queue for communication with the primary server or None; defaults to None.

get_project_root() Path

Return the root folder of ExpoCloud.

Returns:

The root folder of ExpoCloud.

Return type:

Path

my_name() str

Return the name of the current instance.

Returns:

The name of the current instance.

Return type:

str

my_ip() str

Return the IP address of the current instance.

Returns:

The IP address of the current instance.

Return type:

str

output_folder(instance_name: Optional[str] = None) str

Return the output folder for the given instance.

Parameters:

instance_name (str, optional) – The name of the instance or None for the current instance; defaults to None.

Returns:

The output folder for the given instance.

Return type:

str

pickled_file_name(path: str) str

Return the path including the file name for the pickled server object to be located at the given path.

Parameters:

path (str) – The path at which the pickled server object is to be located.

Returns:

The path including the file name for the pickled server object.

Return type:

str

ssh_command(ip: str, command: str) str

Construct the shell command for executing the given command remotely using ssh.

Parameters:
  • ip (str) – The IP address of the instance at which the command is to be executed.

  • command (str) – The command to be executed remotely.

Returns:

The shell command for executing the given command remotely using ssh.

Return type:

str

scp_command(ip: str, source_folder: str, dest_folder: str) str

Construct the shell command for copying the given local folder to the given remote folder.

Parameters:
  • ip (str) – The IP address of the destination instance.

  • source_folder (str) – The local folder.

  • dest_folder (str) – The destination folder.

Returns:

The shell command for copying the given local folder to the given remote folder.

Return type:

str

attempt_command(command: str, n_attempts: int = 3) Optional[int]

Perform the given number of attempts at executing the given shell command. As soon as an attempt succeeds (i.e. no exception is thrown), return 0. If no attempt succeeds, return None.

Parameters:
  • command (str) – The shell command to be executed.

  • n_attempts (int, optional) – The number of attempts, defaults to 3.

Returns:

The status, either 0 or None.

Return type:

Union[int, None]

remote_execute(ip: str, command: str) Optional[int]

Remotely execute the given command. Uses attempt_command to make several attempts.

Parameters:
  • ip (str) – The IP address of the instance at which the command is to be executed.

  • command (str) – The command to be executed remotely.

Returns:

The value returned by attempt_command.

Return type:

Union[int, None]

remote_replace(ip, source_folder, dest_folder) Optional[int]

Copy the given local folder to the given remote folder. Uses attempt_command to make several attempts.

Parameters:
  • ip (str) – The IP address of the destination instance.

  • source_folder (str) – The local folder.

  • dest_folder (str) – The destination folder.

Returns:

The value returned by attempt_command.

Return type:

Union[int, None]

filter_indices(arr: list, cond: Callable) list

Filter elements of the given list, so only indices satisfying the given predicate remain.

Parameters:
  • arr (list) – The list to be filtered.

  • cond (Callable) – The callable taking a list element and returning a Boolean indicating whether the element should be present in the filtered list.

Returns:

The filtered list.

Return type:

list

list2str(assignment: list, sep=';') str

Convert list to string using the given separator.

Parameters:
  • assignment (list) – The list to be converted.

  • sep (str, optional) – The separator, defaults to ';'.

Returns:

The string representation of the list.

Return type:

str

set2str(s: set, sep=';') str

Convert list to string using the given separator. The elements are sorted.

Parameters:
  • s (set) – The set to be converted.

  • sep (str, optional) – The separator, defaults to ';'.

Returns:

The string representation of the set.

Return type:

str

all_lt(t1: tuple, t2: tuple) bool

Returns True if t1 and t2 are of same length and all elements of t1 are strictly smaller than the corresponding elements of t2.

Parameters:
  • t1 (tuple) – The first tuple being compared.

  • t2 (tuple) – The second tuple being compared.

Returns:

True if t1 and t2 are of same length and all elements of t1 are strictly smaller than the corresponding elements of t2; otherwise False.

Return type:

bool

all_le(t1, t2)

Returns True if t1 and t2 are of same length and all elements of t1 are strictly smaller than or equal to the corresponding elements of t2.

Parameters:
  • t1 (tuple) – The first tuple being compared.

  • t2 (tuple) – The second tuple being compared.

Returns:

True if t1 and t2 are of same length and all elements of t1 are strictly smaller than or equal to the corresponding elements of t2; otherwise False.

Return type:

bool

filter_out(t1: Tuple, t2: Tuple) Tuple

Return the tuple of elements in t1 that are not in t2.

Parameters:
  • t1 (Tuple) – The tuple being filtered.

  • t2 (Tuple) – The tuple being subtracted.

Returns:

The tuple of elements in t1 that are not in t2.

Return type:

Tuple

tuple_to_csv(t: Tuple) str

Return the string of comma-separated values based on the given tuple. :param _sphinx_paramlinks_src.util.tuple_to_csv.t: The given tuple. :type _sphinx_paramlinks_src.util.tuple_to_csv.t1: Tuple :return: The string of comma-separated values based on the given tuple. :rtype: str