Saturday, March 21, 2009

Maturity Model for SaaS Service Operations

Given the state in which most SaaS companies are, and the fact that within a very short time span, ISVs will be equated with SaaS, I believe it is time to offer a Maturity Model for SaaS Service Operations.

A methodological approach would be to create a table of practices and mark a numeric value, in order to quantify the maturity state of a company. But that is too simplistic and doesn’t take into account that various practices will exist in some form at each maturity level.

The following is a proposal (a first draft), and I invite all to comment and help zero in on the right model.
Note: Release management is not covered in this model. It is arguably a role shared by the Product, Engineering and Ops groups.

Level 1:
This is probably where most of the SaaS startups are right now. The Ops team is either non existent or consists of a sys admin, help from engineering and a cat. None of the ITSM processes are defined, and there is no orderly asset management in place. Customer support may be handled by a small dedicated team, or even by Engineering.
In the latter case, 24X7 support consists of the cell phones of the CEO and VP Engineering.
Event management is at a very basic level, reporting whether a server is up or down.
Perhaps, a daily backup of the database is in place.

Level 2:
A small operations team is in place. Probably run by a manager/Director level person. A network engineer and a sys-admin make up the team with some help of a part time DBA.
Asset Management consists of a number of excel sheets, not necessarily up-to-date or all inclusive.
A half baked Change Management process is defined, but not really adhered to. Engineers still have access to the production system. A customer support team is in place. Not yet a 24x7 operations. Incident Management consists of people running around like chickens with their heads cut off, but there is a recording of the incident in the CRM, or perhaps a ticketing system. Event Management is implemented through a tool like Nagios or Cacti (freebies, of course) and email alerts are sent on threshold breach. There may be thousands of email alerts sent a day, so that real alerts drown in the flood.
A full daily backup of the database is in place and an hourly differential backup is taken.

Level 3:
A VP Service Operations runs the team of Customer Support and Operations. Change Management and Incident Management are defined and implemented. There is an Asset Management DB which is linked to Change and Incident. Change Window is defined and a Change Calendar is used. A semi automated notification process is in place (internal and external notifications). A staging environment is in place, although it does not fully reflect the production environment. Event Management is better controlled, noises are filtered out from the alerts, and some application level instrumentation is incorporated.
Getting individual customers’ SLA is still a manual process, though the information should be available.
A seed of a disaster recovery site is in place. It may take many hours to get it up and running (including transferring the data), but an alternative site with the basic functionality is available.
SAS 70 Type I should be in place at this stage, or at least have a good story about how you vendors are all SAS 70 Type II. (Mind you, I am not an advocate of SAS 70, but it seems like the industry is pushing for this, or at least a bunch of compliance consultants are)

Level 4:
Event Management is fully implemented: Application level monitoring is in place. Synthetic transactions are generated from multiple global locations. Alerts have context sensitive pointers to knowledgebase. A 24x7 NOC is implemented with a dashboard of all event feeds.
An Incident DB is implemented, which is used to generate SLA reports, incident analysis and availability analysis. A Change DB is in place used for Change analysis and for Incident Management. A CAB (Change Advisory Committee) is defined and regular meetings are scheduled. A Service Status page is in place with up to the moment status reports on the services. Customer and Internal notifications are automated and a full Incident Management closure process is implemented. Management reports are available for service status, trailing N months, SLAs across customers, availability across customers and production systems.
Customer/Component mapping is in place.
A Staging environment exists, fully mimicking the production functionality (not necessarily the network/server setup).
A secondary site is up and running with full functionality and a synched database. Switching between sites should take less that one hour.
SAS 70 Type II compliance is in place.

Level 5:
Bliss. ITIL practices are implemented across the board. (I am not advocating ITIL proper, but I am using the vocabulary to describe the practices). A functioning, up-to-date CMDB is the heart of the system (yeah, dream on). Application management automation is in place. A full Staging environment is in place, fully representing the production environment 1:1.
Quality of service takes a leading role and continuous improvements are sought.
Transparency and customers communication is at the highest level. Executive management has full visibility into every aspect of the service operations. All practices are linked and managed through a comprehensive ITSM management suite.
A complete disaster recovery site is up and running with the ability to switch between sites on the fly.

There. Step one in defining the Maturity Model for SaaS Service Operations is complete.
I hope to get feedback to validate the model.


d said...

It all sounds good ..but... I've seen and heard of companies trying to do ITIL - after spending millions in training and nice little green note books and it never seems to catch. So much of this is a part of the company's fabric. I known some awful companies that manage to the letter - and some very successful companies that are complete anarchy.

Maybe we should approach this as a 12 step model ;-)

Dani Shomron said...

I agree with you about ITIL not being a foolproof solution, and that it doesn't promise success, but SaaS companies are usually so undisciplined that adherence to any set of practices would be a huge improvement. Perhaps there are successful companies that are in a state of anarchy, but they will never be able to scale

Raul Palacios said...

I'm not pretty sure about the linkage you describe there for Maturity Model and ITIL. Eventhough ITIL v3 is much more focused on Services, to me it's kind of strange from the very beginning the Service Operation for SaaS. Should there be much difference from common IT Opps? probably not but on Billing and Provisioning. Maybe on the maturity process I'm missing the business value linkage, so it becomes more than just no-so-diffent IT Service (@lower cost), but a business enabler.

Dani Shomron said...

As I have mentioned, I am using ITIL as a framework for practices, not as a target in itself.
SaaS Ops and IT Ops have much in common, though there are different emphases.
I believe that discipline and ITIL practices in SaaS are even more important than in the IT shop, as IT will be around even if everyone hates it and complains about it. the SaaS company will cease to exist if they do not deliver the availability, performance and customer sat.

Aryeh said...

As a newbie, I have looked up 'ITIL', 'Criticisms of ITIL' and 'ITIL alternatives'.

Do you think the following citation refers to what could be a serious challenger to ITIL?

"The safest and most cost effective alternative to ITIL to date has been identified as good old fashion common sense"

How do you certify an enterpise's common sense?

Dani Shomron said...

Since the Common Sense Scale has not yet been perfected, we will have to do with what we have.
ITIL is all about common sense, but putting a framework around it - at least this is my view.
There may be companies out there that are spending loads of money on implementation and not getting valuable results because they are focusing on the structure instead of the intent. Perhaps. In my SaaS world, most companies are small enough to be able to implement within weeks and reap the benefits immediately.

Anonymous said...

I am not that techie...but I think everyone is missing the point than ITIL is not the main discussion point here... it is an example of a 'top level' structured model. What do we think of the 1st draft maturity model. Really good first draft. Very real and starting to define fairly accurately a typical progression for SaaS companies...