Главная Обратная связь


Special solutions with full redundancy

To approach continuous availability in the range of 100% requires expensive solutions that incorporate full mirroring or redundancy. Redundancy is the technique of improving availability by using duplicate components. For stringent availability requirements to be met, these need to be working autonomously in parallel. These solutions are not just restricted to the IT components, but also to the IT environments, i.e. data centres, power supplies, air conditioning and telecommunications.

Where new IT services are being developed, it is essential that Availability Management takes an early and participating design role in determining the availability requirements. This enables Availability Management to influence positively the IT infrastructure design to ensure that it can deliver the level of availability required. The importance of this participation early in the design of the IT infrastructure cannot be underestimated. There needs to be a dialogue between IT and the business to determine the balance between the business perception of the cost of unavailability and the exponential cost of delivering higher levels of availability.

As illustrated in Figure 4.17, there is a significant increase in costs when the business requirement is higher than the optimum level of availability that the IT infrastructure can deliver. These increased costs are driven by major redesign of the technology and the changing of requirements for the IT support organization.

It is important that the level of availability designed into the service is appropriate to the business needs, the criticality of the business processes being supported and the available budget. The business should be consulted early in the Service Design lifecycle so that the business availability needs of a new or enhanced IT service can be costed and agreed. This is particularly important where stringent availability requirements may require additional investment in Service Management processes, IT service and System Management tools, high-availability design and special solutions with full redundancy.

It is likely that the business need for IT availability cannot be expressed in technical terms. Availability Management therefore provides an important role in being able to translate the business and user requirements into quantifiable availability targets and conditions. This is an important input into the IT Service Design and provides the basis for assessing the capability of the IT design and IT support organization in meeting the availability requirements of the business.

The business requirements for IT availability should contain at least:

  • A definition of the VBFs supported by the IT service
  • A definition of IT service downtime, i.e. the conditions under which the business considers the IT service to be unavailable
  • The business impact caused by loss of service, together with the associated risk
  • Quantitative availability requirements, i.e. the extent to which the business tolerates IT service downtime or degraded service
  • The required service hours, i.e. when the service is to be provided
  • An assessment of the relative importance of different working periods
  • Specific security requirements
  • The service backup and recovery capability.

Once the IT technology design and IT support organization are determined, the service provider organization is then in a position to confirm if the availability requirements can be met. Where shortfalls are identified, dialogue with the business is required to present the cost options that exist to enhance the proposed design to meet the availability requirements. This enables the business to reassess if lower or higher levels of availability are required, and to understand the appropriate impact and costs associated with their decision.

Determining the availability requirements is likely to be an iterative process, particularly where there is a need to balance the business availability requirement against the associated costs. The necessary steps are:

  • Determine the business impact caused by loss of service
  • From the business requirements, specify the availability, reliability and maintainability requirements for the IT service and components supported by the IT support organization
  • For IT services and components provided externally, identify the serviceability requirements
  • Estimate the costs involved in meeting the availability, reliability, maintainability and serviceability requirements
  • Determine, with the business, if the costs identified in meeting the availability requirements are justified
  • Determine, from the business, the costs likely to be incurred from loss or degradation of service
  • Where these are seen as cost-justified, define the availability, reliability, maintainability and serviceability requirements in agreements and negotiate into contracts.

If costs are seen as prohibitive, either:

  • Reassess the IT infrastructure design and provide options for reducing costs and assess the consequences on availability; or
  • Reassess the business use and reliance on the IT service and renegotiate the availability targets within the SLA.

The SLM process is normally responsible for communicating with the business on how its availability requirements for IT services are to be met and negotiating the SLR/SLA for the IT Service Design process. Availability Management therefore provides important support and input to the both SLM and design processes during this period. While higher levels of availability can often be provided by investment in tools and technology, there is no justification for providing a higher level of availability than that needed and afforded by the business. The reality is that satisfying availability requirements is always a balance between cost and quality. This is where Availability Management can play a key role in optimizing availability of the IT Service Design to meet increasing availability demands while deferring an increase in costs.

Designing service for availability is a key activity driven by Availability Management. This ensures that the required level of availability for an IT service can be met. Availability Management needs to ensure that the design activity for availability looks at the task from two related, but distinct, perspectives:

  • Designing for availability: this activity relates to the technical design of the IT service and the alignment of the internal and external suppliers required to meet the availability requirements of the business. It needs to cover all aspects of technology, including infrastructure, environment, data and applications.
  • Designing forrecovery: this activity relates to the design points required to ensure that in the event of an IT service failure, the service and its supporting components can be reinstated to enable normal business operations to resume as quickly as is possible. This again needs to cover all aspects of technology.

Additionally, the ability to recover quickly may be a crucial factor. In simple terms, it may not be possible or cost-justified to build a design that is highly resilient to failure(s). The ability to meet the availability requirements within the cost parameters may rely on the ability consistently to recover in a timely and effective manner. All aspects of availability should be considered in the Service Design process and should consider all stages within the Service Lifecycle.

The contribution of Availability Management within the design activities is to provide:

  • The specification of the availability requirements for all components of the service
  • The requirements for availability measurement points (instrumentation)
  • The requirements for new/enhanced systems and Service Management
  • Assistance with the IT infrastructure design
  • The specification of the reliability, maintainability and serviceability requirements for components supplied by internal and external suppliers
  • Validation of the final design to meet the minimum levels of availability required by the business for the IT service.

If the availability requirements cannot be met, the next task is to re-evaluate the Service Design and identify cost-justified design changes. Improvements in design to meet the availability requirements can be achieved by reviewing the capability of the technology to be deployed in the proposed IT design. For example:

  • The exploitation of fault-tolerant technology to mask the impact of planned or unplanned component downtime
  • Duplexing, or the provision of alternative IT infrastructure components to allow one component to take over the work of another component
  • Improving component reliability by enhancing testing regimes
  • Improved software design and development
  • Improved processes and procedures
  • Systems management enhancements/exploitation
  • Improved externally supplied services, contracts or agreements
  • Developing the capability of the people with more training.

Consider documenting the availability design requirements and considerations for new IT services and making them available to the design and implementation functions. Longer term seek to mandate these requirements and integrate within the appropriate governance mechanisms that cover the introduction of new IT services.

Part of the activity of designing for availability must ensure that all business, data and information security requirements are incorporated within the Service Design. The overall aim of IT security is ‘balanced security in depth’, with justifiable controls implemented to ensure that the Information Security Policy is enforced and that continued IT services within secure parameters (i.e. confidentiality, integrity and availability) continue to operate. During the gathering of availability requirements for new IT services, it is important that requirements that cover IT security are defined. These requirements need to be applied within the design phase for the supporting technology. For many organizations, the approach taken to IT security is covered by an Information Security Policy owned and maintained by Information Security Management. In the execution of the security policy, Availability Management plays an important role in its operation for new IT services.

Where the business operation has a high dependency on IT service availability, and the cost of failure or loss of business reputation is considered not acceptable, the business may define stringent availability requirements. These factors may be sufficient for the business to justify the additional costs required to meet these more demanding levels of availability. Achieving agreed levels of availability begins with the design, procurement and/or development of good-quality products and components. However, these in isolation are unlikely to deliver the sustained levels of availability required. To achieve a consistent and sustained level of availability requires investment in and deployment of effective Service Management processes, systems management tools, high-availability design and ultimately special solutions with full mirroring or redundancy.

Designing for availability is a key activity, driven by Availability Management, which ensures that the stated availability requirements for an IT service can be met. However, Availability Management should also ensure that within this design activity there is focus on the design elements required to ensure that when IT services fail, the service can be reinstated to enable normal business operations to resume as quickly as is possible. ‘Designing for recovery’ may at first sound negative. Clearly good availability design is about avoiding failures and delivering, where possible, a fault-tolerant IT infrastructure. However, with this focus is too much reliance placed on technology, and has as much emphasis been placed on the fault-tolerance aspects of the IT infrastructure? The reality is that failures will occur. The way the IT organization manages failure situations can have a positive effect on the perception of the business, customers and users of the IT services.

Every failure is an important ‘moment of truth’ – an opportunity to make or break your reputation with the business.

By providing focus on the ‘designing for recovery’ aspects of the overall availability, design can ensure that every failure is an opportunity to maintain and even enhance business and user satisfaction. To provide an effective ‘design for recovery’, it is important to recognize that both the business and the IT organization have needs that must be satisfied to enable an effective recovery from IT failure. These are informational needs that the business requires to help them manage the impact of failure on their business and set expectation within the business, user community and their business customers. These are the skills, knowledge, processes, procedures and tools required to enable the technical recovery to be completed in an optimal time.

Consider documenting the recovery design requirements and considerations for new IT services and make them available to the areas responsible for design and implementation. In the longer term, seek to mandate these requirements and integrate them within the appropriate governance mechanisms that cover the introduction of new IT services.

A key aim is to prevent minor incidents from becoming major incidents by ensuring the right people are involved early enough to avoid mistakes being made and to ensure the appropriate business and technical recovery procedures are invoked at the earliest opportunity. The instigation of these activities is the responsibility of the Incident Management process and a role of the Service Desk. To ensure business needs are met during major IT service failures, and to ensure the most optimal recovery, the Incident Management process and Service Desk need to have defined and to execute effective procedures for assessing and managing all incidents.

The above are not the responsibilities of Availability Management. However, the effectiveness of the Incident Management process and Service Desk can strongly influence the overall recovery period. The use of Availability Management methods and techniques to further optimize IT recovery may be the stimulus for subsequent continual improvement activities to the Incident Management process and the Service Desk.

In order to remain effective, the maintainability of IT services and components should be monitored, and their impact on the ‘expanded incident lifecycle’ understood, managed and improved.

sdamzavas.net - 2020 год. Все права принадлежат их авторам! В случае нарушение авторского права, обращайтесь по форме обратной связи...