Showing posts with label SaaS and SLA. Show all posts
Showing posts with label SaaS and SLA. Show all posts

Sunday, August 21, 2011

SaaS and SLA - State of the Art

"You can get assent to almost any proposition so long as you are not going to do anything about it." (Chapman, John Jay)

Lately, I have been approached by a number of frustrated CIOs, asking me about what can be expected from a typical SLA in the industry and which provider offers an SLA with some beef.

A Typical SLA
Let’s see what a basic SaaS SLA should look like:
Service Availability 
System Response Time 
Customer Service Response Time 
Customer Service Availability 
Service Outage Resolution Time 
Failover Window For Disaster Recovery 
Reclaiming Customer Data 
Maintenance Notification 
Proactive Service Outage Notification 
RFO (Reason for Outage) 

Nice. Now let’s see what a typical SLA in the SaaS industry looks like:
Service Availability 

Is that it? Yeah, that’s about it.  (Sometimes you may find Customer Support response time as well, the Lord be praised). The standard SLA in the industry only discusses ‘uptime’ and even that is usually very iffy, with mostly zero or negligible penalties.

Recently I have been meeting with CXOs of successful SaaS companies and asking them what their SLAs offer. Not surprisingly, their answers were reflective of the typical SLA above. Some did not even offer an SLA and one said, half jokingly, that they (the customers) should  say ‘thank you’ for even having the service available.  When asked about the future of SLA in the industry, the collective answer was that nothing will probably change, customers will not demand better SLOs (Service Level Objectives) and that the whole issue was quite irrelevant. One CEO suggested that the only concern of the CIO is ease of integration.

Is that so? Or are these guys burying their heads in the sand? When I asked about how many dealt with CIOs (as compared to business units), only one said that he did, and that it was an unpleasant experience.

How would you explain this discrepancy between what CIOs want and what SaaS CXOs would offer? And why is the state of SLAs in the industry is so pitiful?

A Quick Historical Review
I think the answers lie in the history of SaaS and how it penetrated the market. Around 12 years ago we started seeing the first SaaS applications (although no one came up with the name until a few years later).  SaaS mostly targeted the SMBs who had no access to the enterprise software that was available to the larger companies. Either from a cost, or complexity or support point of view, the on-premise applications were out of reach for the smaller companies. When they started becoming available over the Web, the SMB were so delighted to even have a solution they were not going to bitch about the service levels being offered in the contracts. They were just happy that the apps were available. So, SaaS companies offered a 99% uptime which seemed pretty good (except that it translated into four days of downtime!).  Nobody could talk about performance, as the dependency on the customers’ own network and on their ISPs allowed the providers an easy escape from accountability.

The Corporate Business Unit
Even though SaaS initially targeted the SMB, the big breakthrough came from the business units that found freedom in circumventing IT and getting their needs answered quickly (and in the process, flipping a bird to IT). The heads of the business units were mostly concerned with features and did not care much about SLAs. Even if they did, they did not have the experience and knowledge, that IT has accumulated over the years, on what to demand, how to verify that their service levels are met, etc.

The New IT Manager
More than ten years have passed with SaaS slowly establishing itself as mainstream, and conquering more and more territories. Old habits die hard and the sad state of SLAs remained where it had been a decade ago. Now, SaaS is finally entering the enterprise through the front door. There is a new generation of CIOs that are not threatened by SaaS and understand the freedom it offers them. They want to get back into the driver’s seat, clean up the mess that a decentralized SaaS policy created and control what is entering their domain.

As for the CIOs with the old-timer’s attitude, the Cloud hype has forced them to pay attention. When the CEOs caught on (hey, we can save a lot of money here) the pressure was on the CIOs to start acquiring Cloud Applications – SaaS. And, like it or not, there are numerous integration issues that demand that IT be in the picture.

Slowly, we are seeing a shift in the market. More and more CIOs and IT managers are in the picture. And when they see the lack of real certification or the famished SLAs offered by the vendors, they are probably baffled, at best, if not furious.

I believe that gradually, as more CIOs enter the picture, the SaaS providers will have to prove themselves as more mature, attentive and accountable vendors.  I think that the IT customers will step-up the pressure and changes will occur. SaaS providers will succumb to provide a serious document with real numbers and repercussions.

In short, the differentiator is no longer the fact that a vendor offers SaaS, nor the feature set, nor the pricing. To distinguish oneself, a SaaS vendor will have to excel in every aspect of the service and provide the assurances for the service levels that CIOs are expecting.

Sunday, January 31, 2010

SLA Consequences to Service Operations

"What, me worry?" (Alfred E. Neuman)

In my previous post I discussed some basic concepts about SLAs, SLOs and penalties. As promised, I am addressing the ‘who cares?’ question.

From a Service Operations point of view, you may shrug your shoulders and claim that these are issues with the Legal and Finance departments. Most likely you were brought on board later in the game and never viewed an SLA until your were forced to do so.

As the person responsible for keeping all the services up and running, it may be best to keep the SLA to a minimum. After all, a document containing vague language, with little commitment and liability would be hard to wave in front of your face when the service levels drop.

I will argue that vagueness will play against you. A tough SLA will require the company to adhere to the high service levels they are committed to, and yes, pay the penalties for breaching these agreements. Keep in mind that if your service level drops one time too many, the legalese you will be hiding behind will not save your butt when customers drop from the service or simply do not renew.

I would take it even one step further. I advocate that the Service Ops managers bonuses are tied to achieving those SLOs that will keep a smile on the customers’ faces. (typically up-time and response time, but in some cases there are other objectives that are crucial to the customers). The carrot and the stick should work nicely to assure that you are doing the utmost to live up to the agreements.

Commitments
Another issue that concerns you (Service Operations) is that Sales are making commitments that you are suppose to keep, usually without you ever knowing about it. Operations needs to initiate a fact finding effort to learn what is there. You need to know what you are capable of providing. Everybody likes to state that they are five nines (99.999% uptime) but how many companies out there really are? You need to monitor and test your service over a substantial period of time before you commit to those numbers.

Another point in favor of having a good grasp of your SLA is that you, as a consumer of services would be conscience of your requirements vis-à-vis your service providers.

That will include the hosting services, your ISP, your communications provider, and whatever cloud services you are using. In my past positions as VP Service Operations I have been appalled by the contracts that my predecessors have signed with service providers. Some of them had no consequences to service level degradation. Others had ridiculous clauses such as 'for every hour of downtime, the credit would be for one hour prorated service cost' which meant that there was no real penalty. Another contract stated that we could get out of the agreement if for three months in a row(!) the service provided was available for less that 75% of the time.
We would have been out of business by then.

Where are those damn SLAs?
As we have seen, SLAs that are broad and meaningful will be complex. Add to that various service levels such as Standard, Gold and Platinum and the fact that some customers have negotiated special terms for themselves, and you are dealing with a mean, slimy problem.

To compound that problem, nine times out of ten, these documents are sitting on someone’s laptop in a PDF format with perhaps a hard copy in a dusty folder, in the cabinet below the espresso machine.

Imagine the exercise of figuring out if an SLA was breached for a particular customer, and if that breach carries a penalty.

I have painfully gone through that exercise too many times, and believe you me - I had much better things to attend to following a service outage. The process was extremely slow, finding the various documents, looking up the terms and comparing the events with them.

Then a calculation was needed as to how much credit was due. And all this was done for a single customer. Multiply that by the number of customers that may have been affected and you have just wasted many good hours of Solitaire.

SLA Management Tools
There are multiple tools out there (some are offered as SaaS) to manage your SLAs. Many of them provide a full cycle of defining SLOs, creating SLAs, generating the documents, monitoring performances against obligations, computing compensation and generating reports. I have not used any of them (although I used to work at an SLM ISV), so am not about to promote any single one, but there are very slick solutions available.

If you are at an early stage, it would be hard sell for you to justify to management that you need to start paying for a service that possibly no one in the company comprehends.

Typically, when a SaaS company starts out there are very simple, non-abiding, fixed SLAs, so there is very little attention paid to this aspect of the business.

As with any aspect of SaaS Service Operations, scalability issues hit you when you least expect them.

As most (all?) SaaS companies do not start with Service Level Management software, by the time it becomes a burden they will have many dozens, or hundreds of such SLAs. The effort of converting them to an automated system is daunting.

Therefore, you can start structuring your existing and future SLAs into a simple excel, or DB so that they are easily accessible, and comparable.

An example of a typical SLA would be stored in a table such as below.
The values for the various SLOs in the table were automatically populated from the definitions in the pre-defined Platinum and Gold tables (which state the default values for these SLAs). They may be overridden by specific values, following negotiations for a particular customer.


Cust.
Calia
Google
Cust ID
123
213
SLSLA
Gold
Platinum
Uptime
99.9
99.99
Response time
under 6 sec
under 4 sec
Support Response time
2 hours
30 min
Support Avail.
12X6
24x7
Major Outage Resolution
1 hour
30 min
Partial outage resolution
4 hours
2 hours
Minor Outage Resolution
12 hours
6 hours
Maint. Notification
10 days
2 weeks
FTP
12 hrs
6 hours
Outage Notif.
Email 1 hours
email + call 30 min


In the book I will elaborate on the structures and the tools and how to automate the compensation computations.

Thursday, November 12, 2009

SLA Management for SaaS

“God does not ask about our ability, but our availability.” (Source unknown)

(Yet another chapter in the book - keep the feedback coming!)

As the second ‘S’ of SaaS indicates, the on-demand company is all about providing a service and therefore one would expect Service Level Agreements to be well defined and understood in this industry, but the facts tell another story. Few SaaS companies pay much attention to the SLAs, few companies really invest in it and most customers are quite clueless about it as well.

SLAs are tricky. Every SaaS provider is supposed to adhere to its service level commitments but on the whole, it is a document that most providers tend to keep out of the limelight and out of the conversation with customers. Judging from my experience, many SaaS companies use a single, non-abiding, standard SLA for all customers, keeping to a minimum their commitments and consequences.

An SLA, as its name suggests, is an agreement between the service provider and the consumers, consisting of sections regarding the various commitments to service levels that will be matched or exceeded.
Each section is defined as a Service Level Objective (SLO).

A typical SaaS SLA should have the following SLOs:
  • Service Availability – define the availability of the service represented in percentage (e.g. 99.95% uptime)
  • System Response Time – define response time of various transactions represented in seconds. (e.g. login should not take more than 9 seconds)
  • Customer Service Response Time – a response on customer enquiries should take no more than an allotted time for various services (e.g. enabling a service for a new group should take less than two business days)
  • Customer Service Availability – hours of availability of customer service represented in a ‘hours per day’ notation. (e.g. 11X5 for regular customers, 24X7 for platinum customers)
  • Service Outage Resolution Time – the times it takes to restore a service after an outage has been reported. Represented in minutes and hours (e.g. 30 minutes for a full system outage)
  • Failover Window For Disaster Recovery - how long will it take to restore the service in a disaster recovery site, if disaster disables the main datacenter.
  • Reclaiming Customer Data – a commitment to transfer all (agreed) data in an agreed format in case the customer leaves the service.
  • Maintenance Notification – the advance notice that the provider will notify customers of planned service outages, represented in days. (e.g. a planned downtime that will take more than one hour requires 10 business days notification)
  • Proactive Service Outage Notification - the time it takes for the provider to inform the customer that there are service issues, represented in minutes.
  • RFO (Reason for Outage) – a report to customers following a service outage explaining the circumstance, the incident and steps taken to remedy the problem. (For more information see the chapter on Incident Management). Some customers require an RFO automatically; in some SLAs it is written that an RFO will be generated only following a specific customer request. Usually the company commits to three business days following the service disruption.
Note the emphasis on should when referring to the SLOs of the document. The SLA provided by most on-demand companies consists of two or three paragraphs at most, regarding uptime, customer service availability and perhaps another one of the items above.
Many providers have additional services such as daily reports, daily data aggregations, or FTP services. Each one of these services merits an SLO that should be part of the document.

Some SLOs override others. In the example of an service outage, the Availability SLO takes precedence over the Response Time SLO, as you would not expect the performance of the system to be up to par when the system is down. On the other hand, this will kick start other SLOs such as Outage Notification, Resolution Time and Support Response Time.

Customer Expectations
Not all SaaS companies are created equal. They will vary by maturity, by the vertical they are serving, by the company size they cater for and, of course, by the type of application.
Some applications are core and some are peripheral. Some applications are used around the clock, like metering or call centers and the customers have zero tolerance for downtime. Other applications are rarely used outside of office hours, (e.g. payroll, talent management) and if the system is down, the price is a handful of irritated end-users that will need to take a coffee break earlier than they planned.
Larger customers tend to have more rigorous demands while lower paying customers will usually be more tolerant of the system’s performance and support availability.
Therefore, your SLA should reflect the relative position of your service along the following three vectors:
  1. Customer size (reflecting subscription [potential] size)
  2. Core vs. periphery
  3. Downtime tolerance
So if you are providing a mission critical application to a large customer, whose downtime will cost the customer real dollars, your SLA should be taken very seriously.

Service Level Breaches and Penalties
We have seen the promises that come with the SLAs, but many of these agreements fail to state the consequences to the provider of not meeting the terms.
Each SLO should also define the penalties for breaching the service level commitment.
Penalties are typically specified as a prorated credit for the following month’s subscription fees.
From the customers’ point of view, the penalties should not be flat rated but increase as the service deteriorates, so that the second outage will carry a heavier penalty than the first outage. It is rare that customers insist on this point but those that do will need to negotiate these terms separately.

There is typically a maximum. It is unusual that accumulated penalties will top the monthly subscription costs. There is a catch here. As an extreme example, if your service was down for the duration of the whole month, the customer will be exempt from paying a full month’s service fee – but this is ridiculous of course. The damage to you customers is typically orders of magnitude higher than the subscription costs.
Many SaaS customers commit up front to a year or more of service, for a reduced subscription price. A good SLA will include a section that allows the customer to breach the extended commitment if the provider failed to adhere to the service levels for, say, three consecutive months.

The next chapter will outline what all of this means to the Service Operations group and why should you care about issues that initially seem to be in the domain of Sales, Legal and Finance.