Search This Blog

Wednesday, March 16, 2011

Designing in Supportability

This post is a bit of a rant.


I'm not continually amazed by what I'm about to describe - because I have seen it too often.  What I find hard to understand is why corporations continue to fall into the same trap.


And what is the trap I'm talking about?


It's failing to cost for the non-functional or operational features of a solution.  Most IT projects are either building a solution, or customising a product, to meet some business requirements. Up front, someone has prepared some form of Business Case that has demonstrated that there will be a positive return on investment - maybe even fully costed it over say 5 years and calculated that the NPV (Nett Present Value) is greater than zero.


The trouble is, most of these business cases tend to not adequately cost in the non-functional aspects of operating the solution once it goes live.


Here are some of the things that are quite often underdone or totally overlooked:


General Housekeeping
This includes the cost of operators, backups (& restores), database reorganisations, patch management, upgrades (both software & hardware), data management (like archiving), etc.


Monitoring
The level of monitoring depends on how critical the solution is to the business.  A minimum level would be that the solution is running and producing the desired results.  This may be a simple checklist performed by someone in the company at a set interval.  More critical solutions may have multiple real-time points to ensure that users are receiving the service experience desired by the business.


Help Desk
Larger organisations may have a Help Desk or a Customer Service Desk whose goal is to receive calls about service anomalies and, where possible, either fix them during the call or ensure that they gather information and pass it on to the relevant area that will resolve the issue.  In order to perform this function well, the personnel on the Help Desk need adequate training and even a list of known problems and how they can be overcome.


Capacity, Performance, Availability
Like a stool needs three legs to provide a stable platform - so does an IT solution!


These three aspects of an IT solution are interlinked and need to be planned.  ITIL has the term "Patterns of Business Activity" or PBA.  This is where you capture the way in which the business intends to use the solution.  Different businesses will have different patterns such as:

  • daily usage profiles (peak and off-peak)
  • weekly usage profiles (weekday and weekend)
  • monthly usage profiles (start/end of month, special day of the month)
  • quarterly, half-yearly & yearly (end of calendar/tax year)

Apart from the above, you need to understand how the solution will be accessed by the intended users - local LAN, over the Internet, Mobile, etc.  What are the volumes by access method?  What response times do I require?  What are the expect transaction volume peaks?


Sometimes, the above information is not exact.  Whether it is, or is not known, prior to launching the solution for general use it is advisable to put it through a "break test".  This means you push transactions through the solution until it breaks.  Monitoring should be in place during a break test to determine where the solution will fail and what the lead indicators are.  This information is extremely useful in determining what needs to monitored to ensure the solution is operational.


WIth Capacity there are 3 main aspects to consider:

  1. BAU (Business As Usual) growth - meaning how the solution will grow in its use of IT resources without major change.  This should incorporate PBA information.
  2. IT Project growth - this may mean a once-off step change, or it may alter the BAU growth profile.
  3. Business Project growth - this is where the business runs a project without any of the IT components being changed.  A good example of such a project is where the business decides to run a marketing campaign to increase the number of applications for new accounts.

The solution's usage of the underlying IT infrastructure must be monitored and tracked, with predictive analysis ensuring that adequate resources will be in place.  The diagram below shows a simplistic view of how to calculate the predictive analysis horizon for a given resource.




There, I feel a bit better now - perhaps I'll add to this or write a supporting blog entry a bit later...