Termination Protocol

In a distributed environment, various situations can occur at any time, such as hardware failures, a drop in connectivity between nodes, the untimely withdrawal of one of the nodes for corrective maintenance, and many others. Most of the time, the HSM's replication mechanism can recover automatically and keep the pool running. In some situations, however, operator intervention is necessary, but even in these cases, the routines implemented by the HSM keep the operator's procedure fairly simple, without incurring any major administrative costs. Such a procedure can be as simple as remotely informing the HSM management console of the IP address of a node that has been taken down for maintenance; the node to which the management console is connected receives the information and replicates it to the other nodes, thus reducing any downtime in the service. This remote failure communication uses a protocol called Termination Protocol(TP).

---
title: Pool bloqueado por falha em um nó
---

%%{ init: { 'flowchart': { 'curve': 'basis' } } }%%

flowchart TD

    classDef red_s stroke:#f00
    hsm1[HSM 1]
    hsm2[HSM 2]
    hsm3[HSM 3]
    hsmn[HSM n]
    db[(repl)]:::red_s
    u1((1))

    hsm1 -.read only..- db
    hsm2 -.read only..- db
    hsm3 -.X..- db
    hsmn -.read only..- db

    u1 --Notify node 3 down--> hsm1

    linkStyle 0,1,2,3,4 stroke-width:1px;
    style db stroke:#f66,stroke-width:1px,stroke-dasharray: 2 2
    style hsm3 stroke:#f66,stroke-width:2px
---
title: Notificação de terminação
---

%%{ init: { 'flowchart': { 'curve': 'basis' } } }%%

flowchart TD

    classDef red_s stroke:#f00
    hsm1[HSM 1]
    hsm2[HSM 2]
    hsm3[HSM 3]
    hsmn[HSM n]

    hsm1 --HSM 3 is down--> hsm2
    hsm1 ~~~ hsm3
    hsm1 --HSM 3 is down--> hsmn

    linkStyle 0,2 stroke-width:1px;
    style hsm3 stroke:#f66,stroke-width:2px
---
title: Pool liberado após um Node Down
---

%%{ init: { 'flowchart': { 'curve': 'basis' } } }%%

flowchart TD

    classDef red_s stroke:#f00
    hsm1[HSM 1]
    hsm2[HSM 2]
    hsmn[HSM n]
    db[(repl)]:::red_s

    hsm1 -..- db
    hsm2 -..- db
    hsmn -..- db

    linkStyle 0,1,2 stroke-width:1px;
    style db stroke:#f66,stroke-width:1px,stroke-dasharray: 2 2

A case for using the Termination Protocol is a failure during the Two Phase Commit replication protocol, leaving the pool waiting for the recovery of a node that is known not to return.

The Termination Protocol service will not accept a Node Down notification for an IP that is operational.

For scheduled downtime or preventive maintenance, a node can be removed from the pool via the Local Console, also quite simply and without downtime.

Termination Protocol functions, such as Node Down notification, are performed in the HSM Remote Console.