Saturday, March 21, 2009

Maturity Model for SaaS Service Operations

Given the state in which most SaaS companies are, and the fact that within a very short time span, ISVs will be equated with SaaS, I believe it is time to offer a Maturity Model for SaaS Service Operations.

A methodological approach would be to create a table of practices and mark a numeric value, in order to quantify the maturity state of a company. But that is too simplistic and doesn’t take into account that various practices will exist in some form at each maturity level.

The following is a proposal (a first draft), and I invite all to comment and help zero in on the right model.
Note: Release management is not covered in this model. It is arguably a role shared by the Product, Engineering and Ops groups.

Level 1:
This is probably where most of the SaaS startups are right now. The Ops team is either non existent or consists of a sys admin, help from engineering and a cat. None of the ITSM processes are defined, and there is no orderly asset management in place. Customer support may be handled by a small dedicated team, or even by Engineering.
In the latter case, 24X7 support consists of the cell phones of the CEO and VP Engineering.
Event management is at a very basic level, reporting whether a server is up or down.
Perhaps, a daily backup of the database is in place.

Level 2:
A small operations team is in place. Probably run by a manager/Director level person. A network engineer and a sys-admin make up the team with some help of a part time DBA.
Asset Management consists of a number of excel sheets, not necessarily up-to-date or all inclusive.
A half baked Change Management process is defined, but not really adhered to. Engineers still have access to the production system. A customer support team is in place. Not yet a 24x7 operations. Incident Management consists of people running around like chickens with their heads cut off, but there is a recording of the incident in the CRM, or perhaps a ticketing system. Event Management is implemented through a tool like Nagios or Cacti (freebies, of course) and email alerts are sent on threshold breach. There may be thousands of email alerts sent a day, so that real alerts drown in the flood.
A full daily backup of the database is in place and an hourly differential backup is taken.

Level 3:
A VP Service Operations runs the team of Customer Support and Operations. Change Management and Incident Management are defined and implemented. There is an Asset Management DB which is linked to Change and Incident. Change Window is defined and a Change Calendar is used. A semi automated notification process is in place (internal and external notifications). A staging environment is in place, although it does not fully reflect the production environment. Event Management is better controlled, noises are filtered out from the alerts, and some application level instrumentation is incorporated.
Getting individual customers’ SLA is still a manual process, though the information should be available.
A seed of a disaster recovery site is in place. It may take many hours to get it up and running (including transferring the data), but an alternative site with the basic functionality is available.
SAS 70 Type I should be in place at this stage, or at least have a good story about how you vendors are all SAS 70 Type II. (Mind you, I am not an advocate of SAS 70, but it seems like the industry is pushing for this, or at least a bunch of compliance consultants are)

Level 4:
Event Management is fully implemented: Application level monitoring is in place. Synthetic transactions are generated from multiple global locations. Alerts have context sensitive pointers to knowledgebase. A 24x7 NOC is implemented with a dashboard of all event feeds.
An Incident DB is implemented, which is used to generate SLA reports, incident analysis and availability analysis. A Change DB is in place used for Change analysis and for Incident Management. A CAB (Change Advisory Committee) is defined and regular meetings are scheduled. A Service Status page is in place with up to the moment status reports on the services. Customer and Internal notifications are automated and a full Incident Management closure process is implemented. Management reports are available for service status, trailing N months, SLAs across customers, availability across customers and production systems.
Customer/Component mapping is in place.
A Staging environment exists, fully mimicking the production functionality (not necessarily the network/server setup).
A secondary site is up and running with full functionality and a synched database. Switching between sites should take less that one hour.
SAS 70 Type II compliance is in place.

Level 5:
Bliss. ITIL practices are implemented across the board. (I am not advocating ITIL proper, but I am using the vocabulary to describe the practices). A functioning, up-to-date CMDB is the heart of the system (yeah, dream on). Application management automation is in place. A full Staging environment is in place, fully representing the production environment 1:1.
Quality of service takes a leading role and continuous improvements are sought.
Transparency and customers communication is at the highest level. Executive management has full visibility into every aspect of the service operations. All practices are linked and managed through a comprehensive ITSM management suite.
A complete disaster recovery site is up and running with the ability to switch between sites on the fly.

There. Step one in defining the Maturity Model for SaaS Service Operations is complete.
I hope to get feedback to validate the model.

Tuesday, March 17, 2009

SaaS and Automated Application Management

A quick blog this time.

I have been asked by a great new company Nolio, to write a few blog posts for their new blog site.

Nolio automates all key processes needed to service and manage applications across your data center, improving application uptime and quality, while streamlining operations for immediate productivity gains.

I have seen their product and was impressed to the point that I am hooking them up with a number of SaaS companies.

Please read the two blog posts. A third is on the way.