Page History
...
- This situation can be considered a disaster as all redundant nodes are gone at the same point in time. This situation requires user intervention.
- Reasons are as follows:
- When the previously active Controller instance is started then it remembers having had this role and will ask the JOC Cockpit Cluster Watch for confirmation.
- The newly started JOC Cockpit instance with the Cluster Watch role cannot confirm the Controller's request as it has no memory before the point in time of unavailability and does not know which Controller instance had the active role before the unavailability occurred. The Cluster Watch cannot confirm the claim of any Controller instance to become active as this claim can be wrong. For example, if a Controller instance crashed some days earlier and in between fail-over occurred to the then standby Controller instance. If in this situation both Controller instances are (re)started then the JOC Cockpit Cluster Watch can determine the Controller instance with the active role as it was a witness to the respective Controller instance's last crash or shutdown.
- In this scenario if any machines die at the same time there is no fail-over between JOC Cockpit instances. This means that the Cluster Watch cannot act as an arbitrator and instead has to ask the user for confirmation. JOC Cockpit will show a red alarm bell to indicate that user intervention is required.
- User confirmation includes to consent that one of the Controller instances that is suggested by the JOC Cockpit Cluster Watch should be considered being lost. The remaining Controller instance will take the active role.
- Before confirming users should check that the Controller instance to be declared lost in fact is shutdown.
- If this check is missed and if the lost Controller instance in fact is up & running and considers itself to have the active role then this can cause both Controller instances to become active and can result in double job execution. As a consequence the Controller Cluster has to be recreated and Agents have to be initialized.
- Users confirm loss of the indicated Controller instance from the Dashboard view like this:
...
Resources
...
Overview
Content Tools