Replication

Info

Replication is available on devices with firmware from version 3 onwards.

The HSM has a system for automatically replicating operations between HSMs, working as a distributed database, where all HSMs participating in the replication maintain a unified view of the database, including keys and user partitions.

The mechanism works in an entirely distributed and decentralized way in multi-master mode, i.e. there is no fixed differentiation of roles or attributes between the HSMs. All replication takes place transparently for the application; once the HSMs have been configured, no action on the part of the developer or HSM operator is necessary. There are also no configuration files or controls outside the HSM, since all the information to operate replication is in the HSM itself.

The HSMs participating in the replication scheme are called Nodes and the set of HSMs communicating to maintain the replicated database is called the Replication Domain (or pool).

To participate in a Replication Domain, HSMs must use the same Server Master Key (SVMK) and be in the same operating mode, as well as having point-to-point connectivity, i.e. all HSMs must be able to communicate with all other HSMs. The replication service port is the same as the general HSM service (TCP 4433). It is the same SVMK requirement that allows secure communication between HSMs during replication. All communication between HSMs in the replication protocol is encrypted.

The operations for creating, destroying and updating keys and objects in the HSM's database are replicated, as are the operations for partitioning users. Read operations and operations with temporary keys and objects are not replicated, as these only exist during the communication session between a client and the HSM; once the session is over, the temporary objects are automatically removed.

The replication mechanism is always triggered from the HSM where the operation is requested, either by a client session or by the operator. The HSM that starts the process is called the Coordinator and the others are called Participants. As the system is decentralized, these roles of Coordinator and Participant are assigned to each new operation and can be assigned to any HSM in the pool for a given operation.

The communication protocol used is Two Phase Commit (2PC). This distributed transaction protocol works in two stages, in the first the transaction coordinator asks the participants to vote on a proposed transaction, each participant receives and analyzes the transaction, if it is possible to carry out the transaction the participant replies to the coordinator that they are ready to commit. When the coordinator receives a vote from all the participants, he decides whether the transaction should be implemented or canceled, based on the votes received; if all the participants voted to implement the transaction, the coordinator sends a commit message to all the nodes, and they all implement the change in their local databases. If any of the participants voted negatively, the coordinator sends a rollback message to the participants and the transaction is canceled, which means that all the nodes remain with the database in the original status, before the coordinator's request for a vote.

---
title: Protocolo Two Phase Commit
---

%%{ init: { 'flowchart': { 'curve': 'basis' } } }%%

flowchart LR

    classDef red_s stroke:#f00

    Coord[Coordenador]
    Part1[Participante]
    Part2[Participante]
    Part3[Participante]

    Coord -.prepare.-> Part1
    Part1 -.vote.-> Coord
    Coord -.-> Part2
    Part2 -.-> Coord
    Coord -.-> Part3
    Part3 -.-> Coord

    Coord2[Coordenador]
    Part1.2[Participante]
    Part2.2[Participante]
    Part3.2[Participante]

    Coord2 -.commit/rollbak.-> Part1.2
    Part1.2 -.ack.-> Coord2
    Coord2 -.-> Part2.2
    Part2.2 -.-> Coord2
    Coord2 -.-> Part3.2
    Part3.2 -.-> Coord2

    linkStyle 0,6,2,4,6,8,10 stroke:#f33,stroke-width:1px;
    linkStyle 1,7,3,5,7,9,11 stroke:#77f,stroke-width:1px;

Info

The pool of HSMs in a replication domain is by definition always synchronized. Modelled on the CAP (Consistency, Availability, Partition Tolerance) theory, which states that a distributed system can have up to two, but never all three of these characteristics at the same time, HSM replication is characterized by being CP (Consistency and Partition Tolerance). To the outside world, the only sign that the pool may be in an inconsistent state is the return of busy to the application, but even in this case consistency is preserved, as all nodes will be in the same busy state. There are no intermediate states between transactions, either the pool carries out the transaction as a whole or not at all.

If a node suffers a failure and is unable to communicate with the coordinator while it is in the middle of a transaction, as soon as the node is recovered it needs to know the coordinator's decision on the pending transaction. The replication mechanism uses Presumed Abort optimization; when the coordinator decides to abort a transaction, it sends a message to the participating nodes and doesn't wait for a response, it just deletes the record of that transaction from its database. In the event of a failure on the participating node, during the recovery process, if the participating node does not find a record of the pending transaction by consulting the coordinator, it is assumed that the transaction has been aborted and the node's recovery process is completed by aborting the transaction.

From the point of view of the calling application, the HSMs participating in the Replication pool remain distinct entities, each accessed by its own IP address. The load balancing and session caching functionalities, for example, are not affected by the replication mechanism. What matters to the user or calling application is that in HSMs operating under replication, once an operation has been carried out in one of the HSMs it will be replicated immediately to all the other HSMs in the Domain, and when opening a new session in a different HSM the result of the transaction will be there, as if the user were opening this new session in the original HSM. For example, if a user creates a key in HSM A, closes the session and then opens a new session in HSM B, the key created in the first session will be available in this new session, i.e. it will exist in both HSM A and HSM B.

All replication transactions in the HSM are given a unique identifier (a GUID, Globally Unique Identifier), and the logs in all HSMs record the same transaction with the same GUID. Each time a replicated transaction is successfully completed, a value called a Sync Point is generated, which takes into account both the GUID of the current transaction and that of the last transaction, taking advantage of a positive avalanche effect. The HSMs in the pool will be synchronized at any given time if all the nodes have the same Sync Point value.

If any of the nodes has a different Sync Point, not only will this node not be able to replicate with the others, but the nodes that have this node in the node list will also not be able to replicate, i.e. there is a situation of conflict or base inconsistency and the pool is blocked until the nodes are synchronized again. Most of the time, the replication recovery system will resolve the inconsistency automatically. In cases where a node leaves the pool permanently, for example a hardware failure, the operator can resolve the conflict remotely, using the HSM's remote console.

---
title: Esquema geral de replicação do HSM
---

%%{ init: { 'flowchart': { 'curve': 'basis' } } }%%

flowchart TD

    classDef red_s stroke:#f00
    hsm1[HSM 1]
    hsm2[HSM 2]
    hsm3[HSM 3]
    hsmn[HSM n]
    db[(repl)]:::red_s
    u1((1))
    u2((2))
    u3((3))

    hsm1 -.- db
    hsm2 -.- db
    hsm3 -.- db
    hsmn -.- db

    u1 --> hsm1
    u1 --> hsm2
    u1 --> hsm3

    u2 --> hsm2
    u2 --> hsmn

    u3 --> hsmn

    linkStyle 0,1,2,3,4,5,6,7,8 stroke-width:1px;
    style db stroke:#f66,stroke-width:1px,stroke-dasharray: 2 2

Replication has a reflexive property, but not a commutative one. A node always replicates locally with itself, even if it is not part of the Replication Domain. A given node A can replicate its transactions with a node B, which does not necessarily imply that node B replicates its transactions with A; both nodes must have the other configured in the node list.

Prerequisites

To start creating a Replication Domain and keep replication running, the HSMs must be activated and configured with the same Server Master Key (SVMK) and configured in the same operating mode, as well as having IP connectivity.

Automatic configuration

The HSM uses the Service Location Protocol (SLP, RFCs 2608 and 3224), based on IP multicast, to automatically find the replication nodes in the network neighborhood. All nodes, once configured for a Replication Domain, will advertise this service, and any node scanning the network will be able to immediately detect which other nodes are in which Domains. This functionality allows for a very simple and quick configuration of the Replication pool, since all you have to do is start a Domain on one of the nodes and then on the other nodes all you have to do is scan and request inclusion in the Domain, without the need for manual cross-registration, listing all the IPs on all the nodes. In environments where the use of IP multicast is restricted, configuration can be done manually (see below). The replication operation only needs IP connectivity between the HSMs to work, it does not depend on IP multicast. The SLP protocol is offered as a facilitator for the configuration stage.

In the first HSM, the Domain must be created. The system will always scan the neighborhood looking for nodes announcing the Replication service; in this first HSM no Domain should be found, so it is necessary to create a new one, simply by entering an identification name for this Domain.

In the following HSMs, after the system scans the neighborhood and finds the Replication Domain that has already been created, it simply joins the HSM to this Domain by choosing the Join option.

Info

In order for an HSM to join a Domain, its database will be overwritten with the database used by the pool at the time of the Join. It is therefore important to choose which HSM in the pool will be used to create the Domain, as this HSM will have its database preserved and then used to overwrite the databases of the other HSMs.

To remove a node from the pool, the reverse process can be carried out: first the operator deletes the node from a Domain, and then the operator must remove the node from the list of remaining HSMs. As replication is non-commutative, it is necessary to break the relationship of nodes in the pool at both ends, removing node A from the Domain and also removing node A from the list of remaining nodes. These activities are carried out locally in the HSM Local Console. In cases where the node to be removed has a problem (such as a communication or hardware failure) it is also possible to use the Remote Console to notify the pool, via any of the nodes, that a given node should be removed from the pool, so that the node that receives the notification takes care of communicating to all the other nodes on its list so that they update the list of nodes and thus no longer attempt to replicate with the informed node. See Termination Protocol below.

Attention

A node can be added to or removed from the Replication Domain without having to downtime the pool.

Manual configuration

In scenarios where it is not possible to scan the neighborhood with IP multicast or the pool nodes cannot be located, the IP addresses of each HSM in the pool must be entered manually, and the online database synchronization operation (Database Live Sync) must also be done manually by the operator.

When a node is added to a Domain and successfully performs Database Live Sync, a signal is triggered to all the other nodes in the pool called Sensibilization. The role of this signal is to include the address of the new node being added in the nodes that are already part of the Domain, so that the operator doesn't have to go back to each HSM after the last node has been configured to update the list of nodes in each HSM.

To set up a Domain manually, define the 1st HSM, which will have the database preserved and will be copied to the other HSMs.

The second HSM needs to take two steps: add the IP of the 1st HSM and then perform an online database synchronization operation (Database Live Sync). After this, the 1st HSM will be sensitized and will have the IP of the 2nd HSM in its replication list, so the HSMs will have their bases synchronized and ready to replicate.

The third HSM requires the same steps: adding the IP of the 1st and 2nd HSMs and then running the online database synchronization (Database Live Sync).

For the fourth and subsequent HSMs, the procedure is the same: add the IPs of all the HSMs already in the Domain and run the base synchronization.

In manual configuration, all the IPs of existing HSMs must be added to the new HSM before base synchronization, otherwise the list of nodes will be incomplete and the pool will become unbalanced, with some nodes not being aware of all the other nodes.

It is important to note that with each new HSM entering the Domain, only this new HSM needs to be configured, none of the existing HSMs require any operator intervention.

In the manual configuration, the name of the Replication Domain is optional; if one is not configured, the HSM will indicate in the status bar (bottom of the screen), showing that the HSM has a list of nodes to replicate, but there is no name defined for this pool.

Removing nodes in the manual configuration is done in the same way as in the automatic configuration (see above).

Client applications

For applications that use the HSM APIs, there is no need to change this because of replication. In certain scenarios it may be interesting for the application to implement some specific treatment for replication return codes; this could be, for example, sending the request back to the HSM.

Due to the distributed nature of replication, during protocol communication between the HSMs, the pool can be placed in a blocked (or busy) state for new operations and client requests, which should always occur at very short intervals, releasing the pool again as soon as the replication protocol ends. This mechanism guarantees that all HSMs at any given time will always have the same view of the database, maintaining integrity and coherence for the calling application, no matter which node it accesses the pool from.

Read operations, such as data encryption and decryption, digital signing and verification are never replicated, so they are not affected by a locked pool condition. As the most common HSM usage scenario is to perform many more Read operations than Write operations (key generation, for example), it is expected that the average HSM user will experience few locked pool situations, and very rarely any where manual resolution is required.

Resolution Protocol

An intrinsic feature of a distributed system with a consensus-based protocol (the case of Two Phase Commit) is that the system assumes that a node never leaves the pool definitively, i.e. any action that can be taken autonomously and internally always assumes that the nodes return to an operational state. This is an assumption that allows the system to be modeled within certain limits, but it cannot be applied to the real case. Therefore, in certain failure situations, the HSM replication system must offer mechanisms to resolve the failure and not only maintain the consistency of the pool database, but also provide functions that allow the operator to re-establish the operational level of the pool. One of these failure situations is the abrupt departure of a node, for whatever reason. As all the nodes in the pool maintain a list of nodes with which they will replicate their transactions, the departure of a node without properly updating the lists of the remaining nodes creates a blocking situation in the pool, as any new transaction will return an error due to not receiving a response from the outgoing node.

In order to resolve the situation and release the remaining nodes from the outgoing node, which is no longer operational and accessible, the operator must intervene and notify the pool that the node is down and must be deleted from the list of all nodes. The Termination Protocol (TP) allows this notification to be broadcast to the pool using the replication channels themselves, without the need for the operator to go to each HSM Console and update the list. From a Remote Console session, on any of the remaining nodes, the operator notifies this node about the node with the problem (informing it of the IP address) and from then on, the node that received the notification confirms that the IP informed cannot be accessed, and is responsible for transmitting the same notification to the remaining nodes. When each node receives the notification, it updates its list by removing the reported IP and informs the warning node of the result. From then on, the pool is rebuilt, now without the problem node, and without the need for downtime to reconfigure the Replication Domain.