What is a high availability system?
Availability: "The proportion of time a system can be used for productive work".High Availability: "A system designed to avoid loss of service by reducing or managing operational failures in addition to minimizing scheduled downtime".High Availability Computing System: "A computing/IT system which ensures pre-defined levels of operational performance for a defined period of time".Matching HA to users needs ...
A failure affects a system's availability when it causes an unplanned interruption in system service which lasts long enough to cause problems for the system's users.The users tolerance for this failure will be dependent upon the nature of the system application; e.g. a loss of service for one second in an on-line gaming site may be inconsequential to its users, however such an outage for a real-time military or scientific application may be completely unacceptable and possibly disasterous.Where is HA needed ?
HA is a requirement for all mission-critical systems. A mission-critical system being defined as a system where an interruption in services may result in:
AM2D HA Services:
- Loss of life and/or injury
- Financial loss
- Missed opportunities
- Customer dissatisfaction
- Defining your system HA requirements: Analysing your system to determine exactly what is needed to achieve the level of HA service required.
HA does not inherently imply Fault Tolerance (FT); FT is one specific method of achieving HA, but it is not the only way: HA may include FT plus other techniques e.g.
- planning of scheduled down time
- elimination of human interaction with system
- comprehensive acceptance tests
- defining operational practices
- Modelling and proof-of-concept (POC) studies to evaluate and verify HA analysis.
- HA architecture design and integration.
- HA software modelling design and development.
The use of fault tolerant techniques is one method for enhancing software dependability and enabling HA system attributes: They enable a system to tolerate software faults that remain in a system after development is complete.
When a failure occures these techniques provide mechanisms to prevent system mission-critical service failure. A list of some of the FT techniques used by AM2D is shown below:
Some of the FT techniques employed at AM2D:
- Data redundancy
- Temporal redundancy
- Backward recovery
- Forward recovery
- recovery blocks
- Design Diversity (variants):
- N-version programming
- Distributed recovery blocks
- Concensus recovery blocks
- Acceptance voting
- Data Diversity:
- Data re-expression
- Result Adjudication:
- Exact majority voting
- Median, mean, and concensus voting