Termination Protocol

In a distributed environment, various situations can occur at any time, such as hardware failures, a drop in connectivity between nodes, the untimely withdrawal of one of the nodes for corrective maintenance, and many others. Most of the time, the HSM's replication mechanism can recover automatically and keep the pool running. In some situations, however, it is necessary for the operator to intervene, but even in these cases, the routines implemented by the HSM keep the procedure for the operator fairly simple, without incurring major administrative costs. Such a procedure can be as simple as remotely informing the HSM management console of the IP address of a node that has been taken down for maintenance; the node to which the management console is connected receives the information and replicates it to the other nodes, thus reducing any downtime in the service. This remote failure communication uses a protocol called Termination Protocol(TP).

A case for using the Termination Protocol is a failure during the Two Phase Commit replication protocol, leaving the pool waiting for the recovery of a node that is known not to return.

The Termination Protocol service will not accept a Node Down notification for an IP that is operational.

For scheduled downtime or preventive maintenance, a node can be removed from the pool via the Local Console, also quite simply and without downtime.

Termination Protocol functions, such as Node Down notification, are performed in the HSM Remote Console.