ITIL (IT Infrastructure Library) is the most widely accepted approach to IT Service Management in the world. ITIL provides a cohesive set of best practice, drawn from the public and private sectors internationally. It is supported by a comprehensive qualification scheme, accredited training organizations, and implementation and assessment tools.
ITIL is the only consistent and comprehensive documentation of best practice for IT Service Management. Used by many hundreds of organizations around the world, a whole ITIL philosophy has grown up around the guidance contained within the ITIL books.
ITIL consists of a series of books giving guidance on the provision of quality IT services, and on the accommodation and environmental facilities needed to support IT. ITIL has been developed in recognition of organizations' growing dependency on IT and embodies best practices for IT Service Management.
The ethos behind the development of ITIL is the recognition that organizations are becoming increasingly dependent on IT in order to satisfy their corporate aims and meet their business needs, this leads to an increased requirement for high quality IT services.
ITIL provides the foundation for quality IT Service Management. The widespread adoption of the ITIL guidance has encouraged organizations worldwide, both commercial and non-proprietary, to develop supporting products as part of a shared 'ITIL Philosophy'.
Service Support is the practice of those disciplines that enable IT Services to be provided effectively.
Service Delivery is the management of the IT services themselves, and involves a number of management practices to ensure that IT services are provided.
Service Support
Configuration Management
Configuration Management is the implementation of a database (Configuration Management Database CMDB) that contains details of the organization’s elements that are used in the provision and management of its IT services. This is more than just an ‘asset register’, as it will contain information that relates to the maintenance, movement, and problems experienced with the Configuration Items.
The CMDB also holds a much wider range of information about items that the organization's IT Services are dependant upon. This range of information includes:
- Hardware
- Software
- Documentation
- Personnel
Configuration Management essentially consists of 4 tasks:
- Identification this is the specification, identification of all IT components and their inclusion in the CMDB.
- Control this is the management of each Configuration Item, specifying who is authorized to ‘change’ it.
- Status this task is the recording of the status of all Configuration Items in the CMDB, and the maintenance of this information.
- Verification this task involves reviews and audits to ensure the information contained in the CMDB is accurate.
Incident & Problem Management
Incident/Problem Management is the resolution and prevention of incidents that affect the normal running of an organization’s IT services. This includes ensuring that faults are corrected, preventing any recurrence of these faults, and the application of preventative maintenance to reduce the likelihood of these faults occurring in the first instance.
Incident/Problem Management and IT Security
The effective practice of both Incident and Problem Management will ensure that the availability of IT services is maximized, and could also protect the integrity and confidentiality of information by identifying the root cause of a problem.
Change Management
Change Management is the practice of ensuring all changes to Configuration Items are carried out in a planned and authorized manner. This includes ensuring that there is a business reason behind each change, identifying the specific Configuration Items and IT Services affected by the change, planning the change, testing the change, and having a back out plan should the change result in an unexpected state of the Configuration Item.
Change Management and IT Security
IT Security must be embedded into the change management process to ensure that all changes have been assessed for risks. This will include assessing the potential business impacts should the change produce undesired results.
If Change Management procedures are not effective, this may result in unauthorized changes to IT Services, which could have major impacts on the business, including financial loss, customer loss, market loss, litigation, and in the worse case scenario, even collapse of the business that the IT Services are there to support.
Release Management
This discipline of IT Service Management is the management of all software configuration items within the organization. It is responsible for the management of software development, installation and support of an organization's software products.
Software is often not regarded as a tangible asset because of its intangible nature, which results in it not being effectively controlled. There can be several versions of the same software within the organization, and there can also be unlicensed and illegal copies of externally provided software.
The practice of effective Software Control & Distribution involves the creation of a Definitive Software Library (DSL), into which the master copies of all software is stored and from here its control and release is managed. The DSL consists of a physical store and a logical store. The physical store is where the master copies of all software media are stored. This tends to be software that has been provided from an external source. The logical store is the index of all software and releases, versions, etc. highlighting where the physical media can be located. The logical store may also be used for the storage of software developed within the organization.
SC&D procedures include the management of the software Configuration Items and their distribution and implementation into a production environment. This will involve the definition of a release program suitable for the organization, the definition of how version control will be implemented, and the procedures surrounding how software will be built, released and audited.
Software Control & Distribution and IT Security
All three of the key areas of IT Security (Availability, Confidentiality, and Integrity) can be exposed as a direct result of inadequate software control and distribution. If software changes are badly managed and not fully tested, this can lead to problems if these changes reach the production environment by causing services to be unavailable. In addition, unauthorized software modifications can lead to fraud, viruses, and malicious damage to data files.
For these and other reasons, it is important that SC&D procedures are fully reviewed by a security assessment; to ensure that appropriate counter measures are in place to reduce the threats described above.
Service Desk
The Service/Help Desk plays an important part in the provision of IT Services. It is very often the first contact the business users have in their use of IT Services when something does not work as expected. The Service/Help Desk is a single point of contact for end users who need help. Without this, an organization could certainly face losses due to inefficiencies.
The two main focuses of the Service Desk are Incident Control and Communication.
Service/Help Desk Activities
Other than for the Call Center, Service or Help Desks tend to embrace the following: receive all calls and e-mails on incidents; incident recording; incident prioritization, classification and escalation; search for a 'work around'; update the end user on progress; handle communication; report to management on service desk performance.
Service Desk and IT Security
As the Service/Help Desk is generally the first contact a business user has when reporting something out of the ordinary, the skill and assiduity of the Help Desk staff can often prevent recurrence of incidents, and instigate measures that will limit the impact of any breaches in IT Security.
Service Delivery
Service Level Management
Service Level Management is the primary management of IT services, ensuring that agreed services are delivered when and where they are supposed to be delivered. The Service Level Manager is dependent upon all the other areas of Service Delivery providing the necessary support that ensures the agreed services are provided in a secure, efficient and cost effective manner.
There are a number of business processes that form part of Service Level Management. These are:
- Reviewing existing services
- Negotiating with the Customers
- Reviewing the underpinning contacts of 3 rd party service providers
- Producing and monitoring the Service Level Agreement (SLA)
- Implementation of Service Improvement policy and processes
- Establishing priorities
- Planning for service growth
- Involvement in the Accounting process to cost services and recover these costs
Service Level Agreements
A critical part of service level management pertains to service level agreements.
Service Level Management and IT Security
IT Security is an integral part of Service Delivery, and as Service Level Management is the key discipline in providing Service Delivery, this process is also ultimately responsible for ensuring that IT Services are provided in a secure manner, and the availability of the services is maximized within cost and efficiency constraints. Contingency Planning also forms part of Service Delivery to ensure that services can be recovered/maintained in the event of a serious incident.
Capacity Management
Capacity Management is the discipline that ensures IT infrastructure is provided at the right time in the right volume at the right price, and ensuring that IT is used in the most efficient manner.
This involves input from many areas of the business to identify what services are (or will be) required, what IT infrastructure is required to support these services, what level of Contingency will be needed, and what the cost of this infrastructure will be.
These are inputs into the following Capacity Management processes:
- Performance monitoring
- Workload monitoring
- Application sizing
- Resource forecasting
- Demand forecasting
- Modeling
From these processes come the results of capacity management, these being the capacity plan itself, forecasts, tuning data and Service Level Management guidelines.
Capacity Management and IT Security
A risk assessment of the capacity planning function will help ensure that the process is carried out effectively, and that its findings are acted upon.
Continuity Management
Continuity Management / Disaster Recovery / Business Continuity
Continuity management is the process by which plans are put in place and managed to ensure that IT Services can recover and continue should a serious incident occur. It is not just about reactive measures, but also about proactive measures - reducing the risk of a disaster in the first instance.
Continuity management is so important that many organizations will not do business with IT service providers if contingency planning is not practiced within the service provider’s organization. It is also a fact that many organizations that have been involved in a disaster where their contingency plan failed, ceased trading within 18 months following the disaster.
Continuity management is regarded as the recovery of the IT infrastructure used to deliver IT Services, but many businesses these days practice the much further reaching process of Business Continuity Planning (BCP), to ensure that the whole end-to-end business process can continue should a serious incident occur.
Continuity management involves the following basic steps:
- Prioritizing the businesses to be recovered by conducting a Business Impact Analysis (BIA)
- Performing a Risk Assessment (aka Risk Analysis) for each of the IT Services to identify the assets, threats, vulnerabilities and countermeasures for each service.
- Evaluating the options for recovery
- Producing the Contingency Plan
- Testing, reviewing, and revising the plan on a regular basis
Continuity Management and IT Security
Continuity Management (and contingency planning, business continuity and disaster recovery) is an integral part of IT security and risk analysis. Inadequate contingency planning is regarded as a risk to the business, and is often overlooked until it is too late, when a security or other breach results in the loss of supporting IT systems.
This is a complex area, but fortunately a methodology and tool has evolved to greatly simplify it. The COBRA system emerged to counter the problems encountered through the use of older, less dynamic systems and approaches. It greatly reduces reliance upon external expertise, being equipped with significant knowledge within its 'knowledge bases.'
Availability Management
Availability Management is the practice of identifying levels of IT Service availability for use in Service Level Reviews with Customers.
All areas of a service must be measurable and defined within the Service Level Agreement (SLA).
To measure service availability the following areas are usually included in the SLA:
- Agreement statistics such as what is included within the agreed service.
- Availability agreed service times, response times, etc.
- Help Desk Calls number of incidents raised, response times, resolution times.
- Contingency agreed contingency details, location of documentation, contingency site, 3 rd party involvement, etc.
- Capacity performance timings for online transactions, report production, numbers of users, etc.
- Costing Details charges for the service, and any penalties should service levels not be met.
Availability is usually calculated based on a model involving the Availability Ratio and techniques such as Fault Tree Analysis, and includes the following elements:
- Serviceability where a service is provided by a 3 rd party organization, this is the expected availability of a component.
- Reliability the time for which a component can be expected to perform under specific conditions without failure.
- Recoverability the time it should take to restore a component back to its operational state after a failure.
- Maintainability the ease with which a component can be maintained, which can be both remedial and preventative.
- Resilience the ability to withstand failure.
- Security the ability of components to withstand breaches of security.
Service Level Agreements
Service level agreements are clearly of fundamental importance with respect to availability management
Availability Management and IT Security
IT Security is an integral part of Availability Management, this being the primary focus of ensuring IT infrastructure continues to be available for the provision of IT Services.
Some of the above elements are really the outcome of performing a risk analysis to identify any resilience measures to be put in place, identifying just how reliable elements are and how many problems have been caused as a result of system failure.
The risk analysis also recommends controls to improve availability of IT infrastructure such as development standards, testing, physical security, the right skills in the right place at the right time, etc.
IT Financial Management
IT Financial Management is the discipline of ensuring IT infrastructure is obtained at the most effective price (which does not necessarily mean cheapest), and calculating the cost of providing IT services so that an organization can understand the costs of its IT services. These costs may then be recovered from the Customer of the service.
Costs are divided into costing units:
- Equipment
- Software
- Organization (staff, overtime)
- Accommodation
- Transfer (costs of 3 rd party service providers)
The costs are divided into Direct and Indirect costs, and can be Capital or Ongoing.
IT Financial Management and IT Security
The practice of IT financial management enables the Service Manager to identify the amount being spent on security counter measures in the provision of the IT services. The amount being spent on these counter measures needs to be balanced with the risks and the potential losses that the service could incur as identified during a Business Impact Assessment and Risk Assessment. Management of these costs will ultimately reflect on the cost of providing the IT services, and potentially what is charged in the recovery of those costs.
The Service Level Agreement
A Service Level Agreement, or SLA, is fundamental to service provision, from the perspective of both the supplier and the recipient. It essentially documents and defines the parameters of the relationship itself.
The quality of the service level agreement is therefore a critical matter. It is not an area that can be left to chance, and must command careful attention.
The SLA itself must be of sufficient detail and scope for the service covered. Typical SLA sections include: Introduction, Scope of Work, Performance, Tracking and Reporting, Problem Management, Compensation, Customer Duties and Responsibilities, Warranties and Remedies, Security, Intellectual Property Rights and Confidential Information, Legal Compliance and Resolution of Disputes, Termination and Signatures. Other sections of course may be applicable.
Each of these sections must be carefully crafted to ensure that the agreement properly defines the service to be delivered. This is certainly not a trivial task.