src package¶
This folder contains the implementation of the ExpoCloud framework.
Subpackages¶
Submodules¶
src.abstract_engine module¶
The parent class for all cloud-based compute engines.
- class AbstractEngine¶
Bases:
object
The parent class for all cloud-based compute engines. We assume that the engine supports machine images. The children should implement two methods:
create_instance_native
for creating the compute instance andkill_instance
for terminating a compute instance.- __init__(config: dict)¶
The constructor.
- Parameters:
config¶ – The configuration dictionary with the following keys:
prefix
- the prefix used for the names of instances.project
- the name of the project on the cloud.server_image
- the name of the machine image for a server.client_image
- the name of the machine image for a server.root_folder
- the path to the root folder, e.g. ~/ExpoCloud/.project_folder
- the path to the experiment in dot notation, e.g.'examples.agent_assignment'
.
- is_local() bool ¶
Returns
False
to signify that this engine is not local.- Returns:
False
- Return type:
bool
- creation_attempt_allowed() bool ¶
Check whether
self.creation_delay
seconds have passed since the last attempt of instance creation.- Returns:
True
ifself.creation_delay
seconds have passed since the last attempt of instance creation andFalse
otherwise.- Return type:
bool
- next_instance_name(role: InstanceRole) str ¶
Generate the name for the next instance with the given role.
- Parameters:
role¶ (InstanceRole) – The role of the instance.
- Returns:
The name for the next instance with the given role.
- Return type:
str
- image_name(role: InstanceRole) str ¶
Return the name of the machine image for an instance with the given role.
- Parameters:
role¶ (InstanceRole) – The role of the instance.
- Returns:
The name of the machine image for an instance with the given role.
- Return type:
str
- create_instance(name: str, role: InstanceRole) str ¶
Create the instance of the given type and return its IP address.
- Parameters:
name¶ (str) – The name of the instance to be created.
role¶ (InstanceRole) – The role of the instance to be created.
- Returns:
The IP address of the newly created instance.
- Return type:
str
- run_instance(name: str, ip: str, role: InstanceRole, server_port: int, max_cpus: Optional[int] = None)¶
Run the instance.
- Parameters:
name¶ (str) – The name of the instance to be run.
ip¶ (str) – The IP address of the instance to be run.
role¶ (InstanceRole) – The role of the instance to be run.
server_port¶ (int) – The port for handshaking with the primary server.
max_cpus¶ (int, optional) – The maximum number of workers to be used by the client instance, defaults to
None
, i.e. unlimited.
- list_instances() List[tuple] ¶
Returns list of tuples
(<name>, <ip>, <status>)
for each instance. The particular engine subclasses should implemented this method.- Returns:
The list of tuples
(<name>, <ip>, <status>)
for each instance.- Return type:
List[tuple]
- kill_instance(name: str) str ¶
Terminates the specified instance. The particular engine subclasses should implemented this method.
- Parameters:
name¶ (str) – The name of the instance to terminate.
- Returns:
The name of the terminated instance or
None
in the case of an exception.- Return type:
str
- kill_dangling_clients(existing_clients: List[str])¶
Kill clients whose name has the given prefix, but is not in the provided list of names. The method relies on the particular engine implementing the list_instances method.
src.abstract_task module¶
Defines parent classes for all ProblemInstance, Algorithm and Task classes.
- class AbstractProblemInstance¶
Bases:
object
This is a parent class for ProblemInstance classes, such as
examples.agent_assignment.instance.ProblemInstance
.
- class AbstractAlgorithm¶
Bases:
object
This is a parent class for Algorithm classes, such as
examples.agent_assignment.bnb.Algorithm
.- __init__(options: set, instance: AbstractProblemInstance)¶
The constructor.
- Parameters:
options¶ (set) – The options controlling the behavior of the algorithm.
instance¶ (AbstractProblemInstance) – The problem instance
- class AbstractTask¶
Bases:
object
This is a parent class for Task classes for experiments. See
examples.agent_assignment.task.Task
for an example.- class Hardness¶
Bases:
object
The default class to represent hardness of a task. It provides default comparators.
- __init__(params: tuple)¶
The constructor.
- Parameters:
params¶ (tuple) – The tuple of parameters determining the task’s hardness.
- __str__() str ¶
Return the string representation of hardness.
- Returns:
The string representation of hardness.
- Return type:
str
- __repr__() str ¶
Return the string representation of hardness.
- Returns:
The string representation of hardness.
- Return type:
str
- __init__(algorithm: AbstractAlgorithm, timeout: float)¶
The constructor.
- Parameters:
algorithm¶ (AbstractAlgorithm) – The algorithm used to solve the problem instance.
timeout¶ (float) – The deadline for the task in seconds.
- group_parameter_titles() Tuple[str] ¶
Return the tuple of names of parameters that determine groups for counting the number of non-hard instances.
- Returns:
The tuple of names of parameters that determine groups for counting the number of non-hard instances.
- Return type:
Tuple[str]
- group_parameters() tuple ¶
Return the tuple of parameters that determine groups for counting the number of non-hard instances.
- Returns:
The tuple of parameters that determine groups for counting the number of non-hard instances.
- Return type:
tuple
- parameter_titles() Tuple[str] ¶
Return the tuple of names of parameters that characterize the task.
- Returns:
The tuple of names of parameters that characterize the task.
- Return type:
Tuple[str]
- parameters() tuple ¶
Return the tuple of parameters that characterize the task.
- Returns:
The tuple of parameters that characterize the task.
- Return type:
tuple
- hardness_parameters() tuple ¶
Return the tuple of parameters determining the hardness of the task. This is to be used to initialize the Hardness object.
- Returns:
The tuple of parameters determining the hardness of the task.
- Return type:
tuple
- result_titles() Tuple[str] ¶
Return the tuple of names of output values for the solved task.
- Returns:
The tuple of names of output values for the solved task.
- Return type:
Tuple[str]
- run() tuple ¶
Return the tuple of output values for the solved task.
- Returns:
The tuple of output values for the solved task.
- Return type:
tuple
- __str__() str ¶
Return the string representation of the task.
- Returns:
The string representation of the task.
- Return type:
str
- __repr__()¶
Return the string representation of the task.
- Returns:
The string representation of the task.
- Return type:
str
src.client module¶
The two classes implementing the client - the Worker
and the Client
.
- class Worker¶
Bases:
Process
The class representing a worker charged with a task.
- __init__(id: int, task: AbstractTask, queue: Queue)¶
The constructor.
- Parameters:
id¶ (int) – The worker’s id.
task¶ (AbstractTask) – The task to execute.
queue¶ (Queue) – The queue for communicating the result.
- my_kill()¶
Terminate the worker.
- run()¶
Run the worker process executing the task.
- class Client¶
Bases:
object
The main client class.
- __init__()¶
The constructor. In particular, it shakes hands with the primary server and create the queues for communicating with both the primary and the backup servers.
- message_to_servers(type: MessageType, body: Any)¶
Send the message to both primary and backup servers.
- Parameters:
type¶ (MessageType) – The message type.
body¶ (Any) – The body of the message.
- event_to_servers(descr: str, worker: Optional[Worker] = None, task: Optional[AbstractTask] = None)¶
Send an event to servers.
- Parameters:
descr¶ (str) – The event description.
worker¶ (Worker, optional) – The worker at which the event took place or
None
if the event is not related to any worker; defaults to None.task¶ (AbstractTask, optional) – The task to which the event is relevant or
None
if the event is not relevant to any task; defaults toNone
.
- process_grant_tasks(tasks: List[AbstractTask])¶
Process tasks granted to this client by the primary server.
- Parameters:
tasks¶ (List[AbstractTask]) – The tasks granted.
- apply_domino_effect(hard: tuple)¶
Apply the domino effect based on the given hardness. Namely,
Terminate the workers that execute tasks with hardness greater than or equal to
hard
.Remove the corresponding tasks from
self.tasks
.
- Parameters:
hard¶ (tuple) – The hardness of the task that timed out as reported by the primary server.
- process_no_further_tasks(_body)¶
Set the flag about having receive the
NO_FURTHER_TASKS
message from the primary server.
- process_stop(_body)¶
Set the flag about having receive the
STOP
message from the primary server.
- process_resume(_body)¶
Unset the flag about having receive the
STOP
message from the primary server and perform the relevant bookkeeping.
- process_swap_queues(_body)¶
Swap the queues for communication with the servers, so that the queues previously used for communication with the backup server are now treated as the queues for communication with the primary server, and vice versa.
- process_messages()¶
Process messages from the servers.
- process_worker_started(worker: Worker, _body)¶
Process the event of the worker having started by sending the corresponding event to the servers.
- process_worker_done(worker: Worker, result: tuple)¶
Process the event of the worker having completed its task by sending both the result and the corresponding event to the servers.
- process_worker_messages()¶
Process messages from workers.
- collect_garbage()¶
Remove objects corresponding to workers that have terminated.
- kill_overdue()¶
Terminate workers whose task has timed out. For each timed out task, send the
REPORT_HARD_TASK
message to the servers.
- process_workers()¶
Manage the workers by performing three actions:
Process messages from the workers.
Remove objects corresponding to workers that have terminated.
Process workers whose task has timed out.
- health_update()¶
Send health update to the servers.
- request_tasks(n: int)¶
Send a message to the servers requesting
n
tasks.- Parameters:
n¶ (int) – The number of tasks to request.
- occupy_workers()¶
Create workers for the tasks assigned to this client, charge the new workers with tasks and start them.
- run()¶
The main loop of the client. The loop is over when there are no more tasks to be assigned to the client and all tasks previously assigned to it are completed. Once the loop is over, send the
BYE
message to the servers, wait for them to process this message, and complete.
src.constants module¶
- class Verbosity¶
Bases:
object
Constants determining which output is to be printed.
- all = True¶
Output not about a specific topic specified by the other constants.
- instance_creation_etc = True¶
Output detailing the creation of cloud instances.
- workers = False¶
Output detailing the operation of the workers at a client.
- message_sync = True¶
Output detailing the synchronization of messages from the primary and the backup servers to the clients.
- messages = True¶
Output detailing the processing of messages.
- all_non_health_messages = False¶
Output all non-health-report messages received by one of the servers.
- failure_traceback = False¶
Output of the traceback record for failure events.
- command_lines = True¶
Output of the command lines to be executed.
- class Constants¶
Bases:
object
The constants used throughout the framework. All times are in seconds.
- MIN_CREATION_DELAY = 30¶
The initial delay for instance creation.
- INSTANCE_MAX_NON_ACTIVE_TIME = 300¶
If no handshake for this amount of time, kill the instance.
- HEALTH_UPDATE_FREQUENCY = 1¶
Frequency of health updates
- HEALTH_UPDATE_LIMIT = 60¶
An otherwise active instance that has not reported on its health for this amount of time is considered unhealthy.
- SSH_RETRY_DELAY = 5¶
If ssh fails, attempt again after this delay.
- CLIENTS_TIME_TO_STOP = 5¶
Time for clients so surely stop sending messages following the sending of the
STOP
event.
- OUTPUT_FOLDER = 'output'¶
The folder for storing the output files.
- PICKLED_SERVER_FILE = 'output/pickled'¶
The file storing the serialized server object.
- SERVER_CYCLE_WAIT = 0.1¶
The delay between two successive iterations of the main server loop.
- CLIENT_CYCLE_WAIT = 0.1¶
The delay between two successive iterations of the main client loop.
- CLIENT_WAIT_AFTER_SENDING_BYE = 5¶
The delay between a client sending the
BYE
message to the server and completing.
- WORKER_WAIT_AFTER_DONE = 1¶
The delay between a worker sending reporting the result to the client and completing.
src.instance module¶
Classes for representing a cloud instance at either the primary or the backup server.
- class Instance¶
Bases:
object
The parent class for classes representing a cloud instance at either the primary or the backup server. This class is only for representing a backup server or a client instance. Note the class for representing a primary server in the backup server does not derive from this class.
- __init__(role: InstanceRole, engine: Union[AbstractEngine, LocalEngine])¶
The constructor.
- Parameters:
role¶ (InstanceRole) – The role of the instance.
engine¶ (Union[AbstractEngine, LocalEngine])) – The engine being used.
- create()¶
Create the cloud instance.
- run(server_port, max_cpus=None)¶
Run the instance.
- is_healthy(tasks_remain: bool)¶
Returns true if the instance is healthy, i.e. either:
The instance has no IP address (so that no real instance had been created) and there are task remaining.
The instance has an IP address, but is not active (i.e. has not shaken hands with the primary server). For client instance, we also require that there still be tasks remaining as indicated. For all types of instances, INSTANCE_MAX_NON_ACTIVE_TIME has not passed since its creation.
It is active and HEALTH_UPDATE_LIMIT has not passed since last health update.
- Parameters:
tasks_remain¶ (bool) –
True
if there are tasks remaining andFalse
otherwise.- Returns:
_description_
- Return type:
_type_
- shake_hands()¶
Shake hands with the instance. The common action implemented here is storing the timestamp of when the instance became active, where being active means that the handshake with it has taken place.
- class ClientInstance¶
Bases:
Instance
The class for representing a client instance in either the primary or the backup server.
- __init__(engine: Union[AbstractEngine, LocalEngine], tasks_from_failed: List[AbstractTask])¶
The constructor.
- Parameters:
engine¶ (Union[AbstractEngine, LocalEngine]) – The engine being used.
tasks_from_failed¶ (List[AbstractTask]) – The list in which to store the tasks assigned to the client in the case of failure.
- connect(server_role: InstanceRole)¶
Connect to the appropriate queues of the server.
- Parameters:
server_role¶ (InstanceRole) – The role of the server at which the client instance is being represented.
- init_files(parent_dir: str)¶
Open files for storing events and exceptions from this client.
- Parameters:
parent_dir¶ (str) – The folder to contain the files.
- shake_hands(server_role: InstanceRole, parent_dir: str)¶
Shake hands with the client.
- Parameters:
server_role¶ (InstanceRole) – The role of the server at which the client is being represented.
parent_dir¶ (str) – The folder within which the folder with the files related to this client is to be contained.
- register_tasks(tasks: List[AbstractTask])¶
Register tasks assigned to this client.
- Parameters:
tasks¶ (List[AbstractTask]) – The newly assigned tasks.
- unregister_task(t_id: int)¶
Unregister a given task from this client.
- Parameters:
t_id¶ (int) – The task id.
- unregister_domino(tasks: List[AbstractTask], hardness: tuple)¶
Unregister all tasks that have been previously registered with this client and can now be proven to be hard due to a task with the given
hardness
having timed out.- Parameters:
tasks¶ (List[AbstractTask]) – All the tasks.
hardness¶ (tuple) – The hardness of the task that has timed out.
- class BackupServerInstance¶
Bases:
Instance
The class for representing the backup server instance in the primary server.
- __init__(engine: Union[AbstractEngine, LocalEngine])¶
The constructor.
- Parameters:
engine¶ (Union[AbstractEngine, LocalEngine]) – The engine being used.
- shake_hands()¶
Shake hands with the backup server.
- class PrimaryServerInstance¶
Bases:
object
The class for representing the primary server instance in the backup server.
- __init__(my_port: int)¶
The constructor.
- Parameters:
my_port¶ (int) – The port for shaking hands with the primary server.
- is_healthy()¶
Returns true if the primary server is healthy, i.e. HEALTH_UPDATE_LIMIT has not passed since last health update.
- is_primary(instance: Instance) bool ¶
Determine whether the instance represented by
instance
is the primary server.
src.run_backup module¶
The script for running the backup server on the backup server instance, which consists of:
Deserializing the primary server object.
Transforming this object to represent the backup server by invoking the
assume_backup_role
method.Running the server by invoking the
run
method.
Note that this script is provided by the framework and is not to be supplied by the user.
src.server module¶
The class implementing the primary and the backup servers.
- class Server¶
Bases:
object
Either the primary or the backup server.
- __init__(tasks: List[AbstractTask], engine: Union[LocalEngine, AbstractEngine], backup: bool, max_clients: Optional[int] = None, max_cpus_per_client: Optional[int] = None, min_group_size: int = 0)¶
The constructor; only ever invoked for building the first primary server.
- Parameters:
tasks¶ (List[AbstractTask]) – The list of tasks to be executed.
engine¶ (Union[LocalEngine, AbstractEngine]) – The compute engine settings.
backup¶ (bool) – Whether or not to use a backup server.
max_clients¶ (int, optional) – Maximal number of client instances or
None
for no restriction; defaults toNone
.max_cpus_per_client¶ (int, optional) – The maximal number of workers to be used at a client instance or
None
for no restriction; defaults toNone
min_group_size¶ (int, optional) – minimal size of group as defined by the task’s
group_parameter_titles
method, defaults to 0
- run()¶
The main loop.
- tasks_remain() bool ¶
Check whether there are tasks remaining to be executed.
- Returns:
True
if there are tasks remaining to be executed andFalse
otherwise.- Return type:
bool
- is_hard(hardness: tuple) bool ¶
Check whether
hardness
is greater or equal to hardness of one of the timed out tasks.- Parameters:
hardness¶ (tuple) – The hardness to be checked.
- Returns:
True
if hardness is greater or equal to hardness of one of the timed out tasks andFalse
otherwise.- Return type:
bool
- print_results()¶
Restore the original order of tasks and print results.
- is_primary() bool ¶
Check whether the server is the primary server.
- Returns:
True
if the server is the primary server andFalse
otherwise.- Return type:
bool
- is_backup() bool ¶
Check whether the server is the backup server.
- Returns:
True
if the server is the backup server andFalse
otherwise.- Return type:
bool
- assume_backup_role()¶
Assume the backup server role. This method is called after unpickling the primary server object, so as to convert it to a backup server one.
- assume_primary_role()¶
Assume the primary server role. This method is called at the backup server when primary server failure is detected.
- handle_primary_server_failure()¶
Handle the primary server failure. This method is called at the backup server when primary server failure is detected. The handling consists of assuming the primary server role and sending the
MessageType.SWAP_QUEUES
message to the clients.
- init_handshake_q()¶
Construct the handshake manager and the queue for handshake requests from the backup server and the clients.
- handshake_from_client(name: str, port_primary: int, port_backup: int)¶
Handle handshake request from a client. This method is invoked only at the primary server.
- handshake_from_backup(name: str, port: int)¶
Handle handshake request from the backup server. This method is invoked only at the primary server.
- accept_handshakes()¶
Handle handshake requests from new backup server and client instances. Uses the methods
handshake_from_client
andhandshake_from_backup
to handle the respective kinds of requests. This method is invoked only at the primary server.
- n_active_clients() int ¶
Compute the number of currently active client instances. An instance is active if it has shaken hands with the server and has not been terminated since then.
- Returns:
The number of currently active client instances.
- Return type:
int
- get_client(name: str) ClientInstance ¶
Get the object representing the client instance with the given name.
- Parameters:
name¶ (str) – The name of the client instance.
- Returns:
The object representing the client instance or
None
if there is no client instance with the given name.- Return type:
- kill_client(name: str)¶
Terminate the client instance with the given name. If the name does not correspond to any client, do nothing.
- Parameters:
name¶ (str) – The name of the client instance.
- kill_instance(instance: Instance)¶
Terminate the instance represented by
instance
.- Parameters:
instance¶ – The object representing the instance to be terminated.
- create_backup_server_instance()¶
Create and run the backup server.
- create_client_instance()¶
Create and run a new client.
- create_instance()¶
Create and run the new instance. The backup server gets a precedence. If it already exists or is not used, then a new client is created.
- kill_unhealthy_instances()¶
Terminate unhealthy instances. See
Instance.is_healthy
andPrimaryServerInstance.is_healthy
for the definition of a healthy instance.
- stop_clients()¶
Send the
MessageType.STOP
message to all clients and record the timestamp of stopping the clients.
- resume_clients()¶
Send the
MessageType.RESUME
message to all clients and reset the timestamp previously set bystop_clients
.
- send_health_update()¶
Send health update to the other server.
- message_to_instance(instance: Instance, type: MessageType, body: Any)¶
Send a message to an instance.
- Parameters:
instance¶ (Instance) – The object representing the receiver instance.
type¶ (MessageType) – The message type.
body¶ (Any) – The body of the message.
- messages_waiting(instance: Instance) bool ¶
Check whether a message from the given instance can be read. If the instance is invalid or is not active, returns
False
. If there is no message in the inbound queue from the given instance, returnsFalse
. Otherwise returnTrue
, unless the method is invoked at the backup server, the instance is a client and the corresponding message has not been forwarded by the primary server.
- forward_message(instance: Instance, message_id: int, type: MessageType, body: Any)¶
Forward a message from a client to the backup server. This method is invoked only at the primary server.
- process_health_update(instance: Instance, _body)¶
Record the timestamp of the
MessageType.HEALTH_UPDATE
message sent by the given instance.
- process_request_tasks(client: ClientInstance, n: int)¶
Process the request for tasks sent by the given client. The method sends either the
MessageType.GRANT_TASKS
message containing the tasks being granted to the client or theMessageType.NO_FURTHER_TASKS
message to indicate that there are no tasks to be assigned.- Parameters:
client¶ (ClientInstance) – The object representing the client sending the request.
n¶ (int) – The number of tasks requested by the client.
- process_log(client: ClientInstance, descr: str)¶
Log the event related to a task execution by the given client.
- Parameters:
client¶ (ClientInstance) – The object representing the client sending the event.
descr¶ (str) – The description of the event to be logged.
- process_exception(client: ClientInstance, descr: str)¶
Log the exception event sent by the given client.
- Parameters:
client¶ (ClientInstance) – The object representing the client sending the event.
descr¶ (str) – The description of the event to be logged.
- process_result(client: ClientInstance, body: tuple)¶
Process the result of executing a task sent by the given client.
- Parameters:
client¶ (ClientInstance) – The object representing the client sending the result.
body¶ (tuple) – The tuple consisting of the task id and the result.
- process_report_hard_task(_client, task_id: int)¶
If the task specified by
task_id
is minimally hard, then:Add the task to
self.min_hard
.Send the
MessageType.APPLY_DOMINO_EFFECT
message to all clients, so they can terminate any task that is as hard or harder than this task.
- Parameters:
task_id¶ – The id of the task that timed out.
- process_bye(client: ClientInstance, _body)¶
Process the
MessageType.BYE
message sent by the given client by terminating the corresponding client instance.- Parameters:
client¶ (ClientInstance) – The object representing the client sending the message.
- process_new_client(_instance, body: tuple)¶
Construct a client object corresponding to the client that shook hands with the primary server and about which the primary server sent the
MessageType.NEW_CLIENT
message. After constructing the object, invoke itsshake_hands
method. Theprocess_new_client
method is only invoked at the backup server.- Parameters:
body¶ (tuple) – The information about the new client instance.
- process_client_terminated(_instance, client_name: str)¶
Process client failure reported by the primary server. This method is only invoked at the backup server.
- Parameters:
client_name¶ (str) – The name of the terminated client.
- process_message(instance: Instance, type: MessageType, body: Any)¶
Process a message from clients and the other server.
- Parameters:
instance¶ (Instance) – The object representing the sending instance.
type¶ (MessageType) – The type of the message.
body¶ (Any) – The body of the message.
- handle_messages()¶
Handle messages from clients and the other server.
src.util module¶
Utility enums and functions.
- short_timestamp(timestamp: float) str ¶
Generate short string representation of the given timestamp.
- Parameters:
timestamp¶ (float) – The timestamp to render.
- Returns:
The short string representation of the given timestamp.
- Return type:
str
- short_now_str() str ¶
Generate short string representation of the current timestamp.
- Returns:
The short string representation of the current timestamp.
- Return type:
str
- my_print(cond: bool, str: str, err_flag: bool = False)¶
Print the given string if the given condition is
True
.
- my_eprint(cond: bool, str: str)¶
Print the given string to
stderr
if the given condition isTrue
.
- command_arg_ip() str ¶
Get the command line argument representing the ip of the primary server.
- Returns:
The command line argument representing the ip of the primary server.
- Return type:
str
- command_arg_port() int ¶
Get the command line argument representing the port for communicating with the primary server.
- Returns:
The command line argument representing the port for communicating with the primary server.
- Return type:
int
- command_arg_name() str ¶
Get the command line argument representing the name of the current instance.
- Returns:
The command line argument representing the name of the current instance.
- Return type:
str
- command_arg_max_cpus() int ¶
Get the command line argument representing the maximal number of workers that the current client instance is allowed to use. If it is
None
, return a large positive integer value standing for infinity.- Returns:
The maximal number of workers that the current client instance is allowed to use.
- Return type:
int
- class InstanceRole¶
Bases:
Enum
The role of instance.
- PRIMARY_SERVER = 'PRIMARY_SERVER'¶
The primary server instance.
- BACKUP_SERVER = 'BACKUP_SERVER'¶
The backup server instance.
- CLIENT = 'CLIENT'¶
A client instance.
- class MessageType¶
Bases:
Enum
Type of a message sent to another instance.
- HEALTH_UPDATE = 'HEALTH_UPDATE'¶
Informs the receiver that the sending instance is healthy.
- REQUEST_TASKS = 'REQUEST_TASKS'¶
A client sends this message to the servers to request tasks for execution.
- RESULT = 'RESULT'¶
A client sends this message to the servers to report on a completed task.
- REPORT_HARD_TASK = 'REPORT_HARD_TASK'¶
A client sends this message to the servers to report on a timed out task.
- LOG = 'LOG'¶
A client sends this message to the servers to report an even related to task execution.
- EXCEPTION = 'EXCEPTION'¶
A client sends this message to the servers to report an exception event.
- BYE = 'BYE'¶
A client sends this message to the servers to report that the client script is about to terminate.
- NEW_CLIENT = 'NEW_CLIENT'¶
The primary server sends this message to the backup server to report that a new client has shaken hands with the primary server.
- CLIENT_TERMINATED = 'CLIENT_TERMINATED'¶
The primary server sends this message to the backup server to report that a a client has been terminated.
- MESSAGE_FROM_CLIENT = 'MESSAGE_FROM_CLIENT'¶
The primary server sends this message to forward to the backup server a message sent by a client.
- GRANT_TASKS = 'GRANT_TASKS'¶
A server sends this message to a client to assign to the latter tasks for execution.
- APPLY_DOMINO_EFFECT = 'APPLY_DOMINO_EFFECT'¶
A server sends this message to a client to inform the latter of a task that timed out at another client.
- NO_FURTHER_TASKS = 'NO_FURTHER_TASKS'¶
A server sends this message to a client, in response to requesting tasks for execution, to inform the client that there are no more tasks to be assigned.
- STOP = 'STOP'¶
A server sends this message to a client to inform the latter that it should not send to the servers any messages aside from health updates.
- RESUME = 'RESUME'¶
A server sends this message to a client to inform the latter that it may resume sending messages to the servers, thereby cancelling the effect of the previous
STOP
message.
- SWAP_QUEUES = 'SWAP_QUEUES'¶
The primary server sends this message to a client to inform the latter that it should swap the queues for communicating with the primary and the backup server. This happens when the backup server assumes the primary server role following the failure of the former primary server.
- WORKER_STARTED = 'STARTED'¶
A worker sends this message to its owning client to inform the latter that the worker has started executing the task.
- WORKER_DONE = 'DONE'¶
A worker sends this message to its owning client to inform the latter that the worker has completed executing the task.
- get_unused_port() int ¶
Generate a yet unused port number.
- Returns:
An unused port number.
- Return type:
int
- extended_prefix(role: InstanceRole, prefix: str) str ¶
Compute the prefix for the name for the next cloud instance based on the role.
- Parameters:
role¶ (InstanceRole) – The role of the new instance.
prefix¶ (str) – The prefix to be used in the name.
- Returns:
The prefix for the name for the next cloud instance based on the role.
- Return type:
str
- next_instance_name(role: InstanceRole, prefix: str, instance_id: dict) str ¶
Generate the name for the next cloud instance.
- Parameters:
role¶ (InstanceRole) – The role of the new instance.
prefix¶ (str) – The prefix to be used in the name.
instance_id¶ (dict) – The dictionary that, maps the role to the id for the next instance.
- Returns:
The name for the next cloud instance.
- Return type:
str
- get_guest_qs(ip: str, port: int, q_names: List[str]) tuple[queue.Queue] ¶
Get queues owned by another instance, which we call here guest. The caller should handle the exceptions.
- make_manager(q_names: List[str], port: int) SyncManager ¶
Create a manager with the queues with the specified names.
- handshake(my_role: InstanceRole, my_port1: int, my_port2: Optional[int] = None)¶
Perform handshake with the primary server.
- Parameters:
my_role¶ (InstanceRole) – The role of the current instance.
my_port1¶ (int) – The port at which messages from the primary server will be received.
my_port2¶ (int, optional) – The port at which messages from the backup server will be received. This argument will be
None
when invoked at the backup server; defaults toNone
.
- handle_exception(e: Exception, msg: str, exit_flag: bool = True, to_primary_q: Optional[Queue] = None)¶
Print the custom error message and the exception, optionally send an exception event to the primary server, and, again optionally, exit.
- get_project_root() Path ¶
Return the root folder of ExpoCloud.
- Returns:
The root folder of ExpoCloud.
- Return type:
Path
- my_name() str ¶
Return the name of the current instance.
- Returns:
The name of the current instance.
- Return type:
str
- my_ip() str ¶
Return the IP address of the current instance.
- Returns:
The IP address of the current instance.
- Return type:
str
- output_folder(instance_name: Optional[str] = None) str ¶
Return the output folder for the given instance.
- Parameters:
instance_name¶ (str, optional) – The name of the instance or
None
for the current instance; defaults toNone
.- Returns:
The output folder for the given instance.
- Return type:
str
- pickled_file_name(path: str) str ¶
Return the path including the file name for the pickled server object to be located at the given path.
- Parameters:
path¶ (str) – The path at which the pickled server object is to be located.
- Returns:
The path including the file name for the pickled server object.
- Return type:
str
- ssh_command(ip: str, command: str) str ¶
Construct the shell command for executing the given command remotely using ssh.
- scp_command(ip: str, source_folder: str, dest_folder: str) str ¶
Construct the shell command for copying the given local folder to the given remote folder.
- attempt_command(command: str, n_attempts: int = 3) Optional[int] ¶
Perform the given number of attempts at executing the given shell command. As soon as an attempt succeeds (i.e. no exception is thrown), return 0. If no attempt succeeds, return
None
.
- remote_execute(ip: str, command: str) Optional[int] ¶
Remotely execute the given command. Uses
attempt_command
to make several attempts.- Parameters:
- Returns:
The value returned by
attempt_command
.- Return type:
Union[int, None]
- remote_replace(ip, source_folder, dest_folder) Optional[int] ¶
Copy the given local folder to the given remote folder. Uses
attempt_command
to make several attempts.- Parameters:
- Returns:
The value returned by
attempt_command
.- Return type:
Union[int, None]
- filter_indices(arr: list, cond: Callable) list ¶
Filter elements of the given list, so only indices satisfying the given predicate remain.
- list2str(assignment: list, sep=';') str ¶
Convert list to string using the given separator.
- set2str(s: set, sep=';') str ¶
Convert list to string using the given separator. The elements are sorted.
- all_lt(t1: tuple, t2: tuple) bool ¶
Returns
True
ift1
andt2
are of same length and all elements oft1
are strictly smaller than the corresponding elements oft2
.
- all_le(t1, t2)¶
Returns
True
ift1
andt2
are of same length and all elements oft1
are strictly smaller than or equal to the corresponding elements oft2
.
- filter_out(t1: Tuple, t2: Tuple) Tuple ¶
Return the tuple of elements in
t1
that are not int2
.
- tuple_to_csv(t: Tuple) str ¶
Return the string of comma-separated values based on the given tuple. :param _sphinx_paramlinks_src.util.tuple_to_csv.t: The given tuple. :type _sphinx_paramlinks_src.util.tuple_to_csv.t1: Tuple :return: The string of comma-separated values based on the given tuple. :rtype: str