Wednesday, May 20, 2009

Questions that SaaS executives must be able to answer - KPIs that matter.

“There is much pleasure to be gained from useless knowledge” (Bertrand Russell)

It has been my experience that SaaS executives have trouble answering the most basic questions about their service operations, and mind you, this is what the business is all about.


Again and again, I keep coming back to the conclusion that the fact that state of SaaS Service Operations is so dire is due to the fact that on-demand companies are built on the first ‘S’ (software) and not the second 'S' (Service).

SaaS entrepreneurs are, in general, bright, creative, out-of-the-box thinkers. They are software developers and have no clue about IT practices and disciplines.


The age old premise “if you can't measure it you can't manage it” somehow escapes SaaS companies across the globe, until it becomes a huge problem.

Have you gone through the numbing process of presenting a specific customer with their real SLA adherence? I have. On average, it would take me a few hours of going through multiple sources of data to come up with (sometimes) accurate data.


Following are a number of questions that every SaaS executive should be able to answer in her sleep, or at least with a click of a button.

1. Availability management
  • What are your real uptime numbers?
  • How do the trialing twelve months (TTM) look like
  • Are we better than we were six months ago?
  • How many outages have you had in the last M months?
  • What is the breakdown, based on severity?
  • What is the breakdown, based on downtime causes?
  • How many service disruption incidents were repeated?
  • How quickly do you recover from outages?
  • How many days have gone by without a critical, major outage?
2. SLA Management
  • How does your availability match up to your customer commitments?
  • Which customers were affected most (even if they do not complain)?
3. Change Management
  • How often are changes made to the production environment?
  • What is the breakdown of changes by category?
  • What percent of changes did you have to roll back?
4. Asset management
  • What is the status of your inventory? What box is located where?
  • What function or customer would be impacted by a loss of a certain box?
  • When do your support/software contracts expire and what might it affect?
5. Churn Management
  • How many customers have you lost in the past 6, 12, 24 months?
  • Is your customer retention improving over time?
  • What percent is your customer churn out of your customer base?
  • What is the average retention time of your customers?
  • What is the breakdown, based on reasons for churn?

I am well aware of the fact that there are no integrated solutions for the SMB supporting a database for these crucial KPIs, but every company should have some form of repository capturing at least some of the data and a easy way of extracting it.

The important issue here is that SaaS companies should be aware of these KPIs and start asking these questions, even if they do not yet have all the answers.

Saturday, March 21, 2009

Maturity Model for SaaS Service Operations

Given the state in which most SaaS companies are, and the fact that within a very short time span, ISVs will be equated with SaaS, I believe it is time to offer a Maturity Model for SaaS Service Operations.

A methodological approach would be to create a table of practices and mark a numeric value, in order to quantify the maturity state of a company. But that is too simplistic and doesn’t take into account that various practices will exist in some form at each maturity level.

The following is a proposal (a first draft), and I invite all to comment and help zero in on the right model.
Note: Release management is not covered in this model. It is arguably a role shared by the Product, Engineering and Ops groups.

Level 1:
This is probably where most of the SaaS startups are right now. The Ops team is either non existent or consists of a sys admin, help from engineering and a cat. None of the ITSM processes are defined, and there is no orderly asset management in place. Customer support may be handled by a small dedicated team, or even by Engineering.
In the latter case, 24X7 support consists of the cell phones of the CEO and VP Engineering.
Event management is at a very basic level, reporting whether a server is up or down.
Perhaps, a daily backup of the database is in place.

Level 2:
A small operations team is in place. Probably run by a manager/Director level person. A network engineer and a sys-admin make up the team with some help of a part time DBA.
Asset Management consists of a number of excel sheets, not necessarily up-to-date or all inclusive.
A half baked Change Management process is defined, but not really adhered to. Engineers still have access to the production system. A customer support team is in place. Not yet a 24x7 operations. Incident Management consists of people running around like chickens with their heads cut off, but there is a recording of the incident in the CRM, or perhaps a ticketing system. Event Management is implemented through a tool like Nagios or Cacti (freebies, of course) and email alerts are sent on threshold breach. There may be thousands of email alerts sent a day, so that real alerts drown in the flood.
A full daily backup of the database is in place and an hourly differential backup is taken.

Level 3:
A VP Service Operations runs the team of Customer Support and Operations. Change Management and Incident Management are defined and implemented. There is an Asset Management DB which is linked to Change and Incident. Change Window is defined and a Change Calendar is used. A semi automated notification process is in place (internal and external notifications). A staging environment is in place, although it does not fully reflect the production environment. Event Management is better controlled, noises are filtered out from the alerts, and some application level instrumentation is incorporated.
Getting individual customers’ SLA is still a manual process, though the information should be available.
A seed of a disaster recovery site is in place. It may take many hours to get it up and running (including transferring the data), but an alternative site with the basic functionality is available.
SAS 70 Type I should be in place at this stage, or at least have a good story about how you vendors are all SAS 70 Type II. (Mind you, I am not an advocate of SAS 70, but it seems like the industry is pushing for this, or at least a bunch of compliance consultants are)



Level 4:
Event Management is fully implemented: Application level monitoring is in place. Synthetic transactions are generated from multiple global locations. Alerts have context sensitive pointers to knowledgebase. A 24x7 NOC is implemented with a dashboard of all event feeds.
An Incident DB is implemented, which is used to generate SLA reports, incident analysis and availability analysis. A Change DB is in place used for Change analysis and for Incident Management. A CAB (Change Advisory Committee) is defined and regular meetings are scheduled. A Service Status page is in place with up to the moment status reports on the services. Customer and Internal notifications are automated and a full Incident Management closure process is implemented. Management reports are available for service status, trailing N months, SLAs across customers, availability across customers and production systems.
Customer/Component mapping is in place.
A Staging environment exists, fully mimicking the production functionality (not necessarily the network/server setup).
A secondary site is up and running with full functionality and a synched database. Switching between sites should take less that one hour.
SAS 70 Type II compliance is in place.


Level 5:
Bliss. ITIL practices are implemented across the board. (I am not advocating ITIL proper, but I am using the vocabulary to describe the practices). A functioning, up-to-date CMDB is the heart of the system (yeah, dream on). Application management automation is in place. A full Staging environment is in place, fully representing the production environment 1:1.
Quality of service takes a leading role and continuous improvements are sought.
Transparency and customers communication is at the highest level. Executive management has full visibility into every aspect of the service operations. All practices are linked and managed through a comprehensive ITSM management suite.
A complete disaster recovery site is up and running with the ability to switch between sites on the fly.

There. Step one in defining the Maturity Model for SaaS Service Operations is complete.
I hope to get feedback to validate the model.

Tuesday, March 17, 2009

SaaS and Automated Application Management

A quick blog this time.

I have been asked by a great new company Nolio, to write a few blog posts for their new blog site.

Nolio automates all key processes needed to service and manage applications across your data center, improving application uptime and quality, while streamlining operations for immediate productivity gains.

I have seen their product and was impressed to the point that I am hooking them up with a number of SaaS companies.

Please read the two blog posts. A third is on the way.

Saturday, January 24, 2009

Of Dinosaurs and Men – Why Traditional ISVs Will Fail On SaaS

Dinosaurs were an extremely successful model. They roamed the earth and multiplied and were the indisputable rulers of this planet for hundreds of millions of years.

Then an unexpected event occurred (some scientists believe a meteor hit the earth creating a nuclear winter while others claim it was fast, cheap internet and tightening IT budgets) and soon all dinosaurs went extinct. Well, not all. Two groups were not annihilated. There were those alligators that stayed in their swamp (niche) and are doing pretty well, thank you, still today. The other group consisted of small reptiles that were driven to grow wings since the larger crawlers ate up all the easily available resources. These birds were lucky to be able to adapt quickly to the changing environment and survive the downfall of their relatives.

The fast moving, warm blooded mammals were better equipped to deal with the new brave world and many have grown to become true behemoth.

In a previous post I revealed the fact that I have mostly stopped advising the traditional on-premise, enterprise, perpetual, software vendor on the transition to on-demand, subscription model, i.e. SaaS.

This is not because I do not believe that it is a smart move, or that the ISVs would not benefit from the transition. Far from it! I envision a world, not too far in the future, where on-premise software would be the exception, not the rule, and even that exception would point to a dwindling model that would survive in niche markets only (swamps) .

My experience, which is supported by many famous (SAP and Avaia for starters) and less famous companies, has been that most traditional, on-premise, enterprise ISVs will fail miserably in the transition to SaaS. I have advised to companies that started out with great enthusiasm that dwindled to a silent death. They simply do not have the DNA for it.

I am talking about the right STUFF that is inherently lacking in established enterprise ISVs that will allow them to make the successful transition. This is not a comment about these companies’ value or success. It is usually inversely proportionate. The more successful the company is, the more entrenched it is likely to be in doing things the ‘right way’ – right, as far as the traditional model dictates.

These ISVs have a product view, not a service view. Their emphasis is on features not serviceability. There is a lot of push back from every silo in the organization, for change, in general, and the SaaS change in particular. It requires a paradigm shift in the organization, and the bigger, more established that organization is, the more difficult it is to bring about that change. (See Impact on the ISV Organization July 02, for a detailed account)


Until a couple of years ago, one could say that most ISVs just don’t get it. But that is no longer the case.

Many traditional ISVs saw their market share being cannibalized by these fast moving SaaS companies. Many heard their customers ask about an on-demand offering and many understand that it is vital that they have a “me too” offering. One cannot ignore the changes in the market and shrug it off as a fad. SaaS used to be a way to work around IT; now CIOs are building on-demand strategies for their business and even starting to use on-demand tools in IT.

So, there is a much deeper understanding of the need to offer an on demand service, but very few ISVs understand that it means a total commitment from the executive level and down.

Not that it is impossible. I have worked with a company whose board made the decision to go Services. They replaced the CEO, who in turn replaced all the senior staff, save the VP engineering. The new VP Sales brought in a fresh new sales force. Then they went through the process of rewriting most of the application from scratch. This process took about a year. They are now a successful SaaS vendor, but they got as close to re-encoding their DNA as possible.

And, of course, there known successful enterprises such as Oracle on demand, HP SaaS (former Mercury Managed Services) and others that had successfully launched their on-demand services, but they are the exception to the rule.


Dinosaurs were magnificent creatures and it sad that we don't have them around any longer (except on isolated islands in the Pacific), but their only fault was that they were too successful for the 'old world' model. I wonder how many software alligators will still be around a decade form now.

Thursday, August 14, 2008

Quick response to a silly blog

Normally, I have my own plans for what I want to post on my blog and I leave the fencing for the US women's Olympic team.
BUT, this morning I received an email from a friend who is also a CEO of a SaaS company, pointing out a new post with an outrageous title Why You Should Steer Your Customer Away from SaaS, for Now asking me to comment on it in my blog.

Frank's argument was that since Goggle's Gmail had an outage " just mention Google and Amazon's problems and a shadow of doubt can be place over the whole hosted applications market" He goes on to say that cloud computing (SaaS) "is just a pipe dream" and that we should resort to good old reliable client server technology.

I told my friendly CEO that it is such a non-issue that I don't think I should waste a good posting on it, but I did add a comment to the blog.
As the day went by and my comment was neither posted nor acknowledged, I decided to use this forum to respond.

So, my comment went something like this:
  1. I can't believe we are even having this discussion. This is not 2003 when people were discussing the merits of this novel delivery system. As Dave Rosenberg pointed yesterday in his Negative Approach blog "Software-as-a-Service is so common it's actually boring at this point"
  2. Frank's argument is as valid as steering customers away from motor cars back to horse drawn carts, because accidents happen.
  3. SaaS companies make a living out of providing these services. Although I do not have the statistics, I know from numerous interactions with many companies that your average SaaS uptime figures are far better than your average IT department's.
  4. Part of the success of SaaS has to do with the fact that IT was simply not delivering the goods, not in performance nor availability, so the business units went out looking for someone who could deliver a better service, NOW.
There. I don't think I need to splash in the mud much more. I must say though that I feel like I'm playing out a scene from Back to the Future IV.

Thursday, August 07, 2008

Your Typical SaaS Operations

Personal note: For those avid fan(s) that have wondered why I have disappeared for such a long time. I took a long, forced vacation, having been sucked into what is not commonly known as the SaaS Operations Black Hole. I went under the radar as VP Operations and Services in a SaaS company, and although it did not leave me time to write blogs or visit the bathroom, I have collected an arsenal of good stuff from hands-on experience, both good and bad, which I intend to share, time allowing.
(One of the reasons I changed my vocation as a consultant to on-premise companies who were contemplating going on-demand, was my realization that it was mostly a futile attempt. It is the equivalent of turning slow moving, cold-blooded dinosaurs into fast, warm-blooded mammals without the benefit of a few hundred million years of evolution. But more on that in a future blog.)

Very well, back to the subject at hand. I have been visiting, talking to, sharing with, advising many SaaS startups in the SF Bay Area in the past year and a clear (actually murky) state of affairs seems to emerge

Company Profile:

Name: YTSC (Your Typical SaaS Company)

Age: 3-4 years in the making.

Staff: between 20-40 people, possibly a small dev team offshore in Southeast Asia or one of the former Soviet Union republics.

Technology: Being relatively fresh, YTCS technology is multi-tenant, customer-centric, with (hopefully) an automatic customer on-boarding mechanism, and they surely have an integration with Salseforce.com (or perhaps NetSuite, SugarCRM, MSDynamics, etc.). Some fancy configuration capabilities should be built into the product and Web Services integration options are available.

Platform: most probably a LAMP shop. Let’s start with all the free stuff and hope to reach profitability before loading the heavy guns. And, hey, we’re big advocates of open-source.

Sales force Compensation: Hmm, we read all the papers, attended the webinars. We think we’re getting it right, but why does the Sales department feel like Grand Central Station?

Customers: A lot of mom & pop shops, a bunch of WEB 2.0 companies with flamboyant logos, a number of departmental customers with big names that we flash on our web site.

Profitability: Surely by next year.

YTSC is now poised for accelerated growth. The customers seem to like the service and the price, and it looks like the numbers will grow rapidly; at least this is what YTSC’s newly acquired VP of Sales has projected.

So how is YTSC prepared for this rapid growth? Do they have the People, Practices and Programs (P-cube) in place? Are they ready to scale from dozens of customers to hundreds and, hopefully, thousands?

My guess is NO. Let me think about that for a moment… Naw.

So why does it look unpromising? Being typical, YTSC Operations has the following traits:

· Operations is under the auspices of Engineering. There is no VP of Operations; there is no Operations group. A Sys Admin is managing the production servers and probably doing office IT on the side.

· The CTO is responsible for uptime, availability and performance. Does the CTO have an Operations background? I'll bet my lunch money that he doesn't. Is there a Staging platform? Probably not. Can the engineers log into production servers and modify configurations? Yeah. Actually we just fixed that nasty bug during lunch break.

· There is no application-level monitoring in place, or trend analysis.

· Is Customer Churn being tracked and analyzed? (What? What was that?)

· There is no 24X7 support, although YTSC claims it is a 24X7 shop.

· Are the following crucial practices defined and followed?

Change management – the cause of over 60% of downtime is caused by good intentioned modifications to the platform. Is there a proper process in place? Is there an RFC (Request For Change) form and procedure? A change committee?

Incident Management – are Support, Operations and Engineering aligned in a well rehearsed routine; roles and responsibilities defined? Is there an Incident management system in place? How about a knowledgebase?

Configuration Management – are hundreds of moving parts accounted for? Are they linked into the Change Management process – actually, we don’t have a Change Management practice.

Availability Management – how do you analyze unavailability? How do you “budget” downtime? Do you know where to invest your next Dollar to ensure optimal availability? It should be all tied into an Incident management system. But, wait we don’t have one.

Release Management – how, when, how often, naming conventions. How does it tie into Change Management and Configuration Management?

SLA Management – Are we providing what we promised? Are we tracking effect of incidents on customers? Are we compensating them according to our contractual commitments? Is it tied into our (hosted) CRM solution? Hard to do without an Incident Management system.

Are we any better than we were last month, last year? Can anybody tell?

No doubt, parts of these practices have been in place with less fancy names. Otherwise YTSC would not have survived this far. But Excel and Notepad will not suffice for a large scale operation.

Most companies understand that (or maybe that is wishful thinking), but when having to chose between investing the next Dollar in great features that customers have been begging for, or that ugly, boring, misunderstood, 800 pound gorilla, they will opt for the former. Pay now or pay later.

I will cover some of these practices in future posts, Google willing.

Thursday, December 07, 2006

The Central Role of Operations

"Computers are useless. They can only give you answers." (Pablo Picasso)

The Operations group is an odd duck for the traditional, on-premise, enterprise ISV. Those ISVs that are transitioning to the SaaS model are typically not familiar with this group, its role and perhaps its reason for being, and in some cases you might find Operations reporting to the CFO as a ‘cost center’.

But in a SaaS shop, the Ops group is the hub of all activity. Its crucial and main job, of course, is to ‘keep the lights on’ and do that in a highly available, quality performance fashion. Maintaining a scalable, fail proof service is a task that the Ops group should, in time, perfect to the notion of ‘auto-pilot’, implementing the Automate and Delegate principles (see Reducing SaaS Operational Costs).
But that is not where the job ends; indeed it is only the beginning.

In some early stage SaaS operations (either a pure SaaS player or an ISV in transition), R&D and IT provide that function. IT is usually incapable of running, scaling and maintaining the application; its tool set, capacity and pace are so removed from an application level, 24X7 operation. R&D is in shock and awe: “you mean we have to use the damn product!!??” – they are usually the least capable of understanding how the application should work or the value to their customers.
Whereas R&D used to have dozens, hundreds or thousands of customers, Operations is now the only customer (or in a hybrid solution, the largest). All and every feedback to product marketing would come from the Ops group. It must develop a keen understanding of the application, not just the infrastructure supporting it, and it has to be in constant contact with its customers – the SaaS consumers – to gather feedback, compile it in an orderly and prioritized manner and be able to communicate it to R&D.
Since an Operations Support System (OSS) will be lacking in most early SaaS implementations, the Ops group will be the one presenting the technical solutions either through building its own tools, buying those apps or though cooperation with Engineering to provide the solutions. In any case, Operations will be the authority on the architectural needs, security, storage, the OSS, service-ready features and the application in general. Therefore Operations should be highly involved in the product roadmap.

In some organizations, ownership of the customer success may be in a separate silo that does not include Operations. But one must keep in mind that the Ops group works on a daily basis with the Network Operations Center (NOC) which doubles as the customer support center. Even if Ops and Customer support are not part of the same organization (which I believe they should be) the daily interaction between the groups means that Ops owns the customer success in many ways and deals with the customer directly as a Tier 2 support.

Operations needs an in-depth understanding of its infra and application performance issues and of principles of performance testing/monitoring. They need to work closely with the QA group to test and resolve load issues. In each rollout of a new version, Ops has the ownership of the project, the dates, process definition and should work in conjunction with R&D, QA, Marketing and customer support.

If the Service offered through this model allows for project based business, Operations will tend to be involved in defining offerings, help with the pricing and participate in scoping projects.
And based on my own experience, the Ops team, as the owner of the application and service, worked closely with a team of Expert Service engineers who provided the end consumers with domain-level expertise to drive more value from the application. (See IPaaS) Ops engineers also participated in user forums with the customers to provide best practices and tips n' tricks.

"OK! Fine!" I hear you shouting from the back rows "We are convinced of the central role of Operations. So what?"
So just keep in mind when you are about to launch this endeavor that you will need to assemble a good team of professionals to play in this game. Not just seasoned systems engineers, a network manager and a good DBA, but operations engineers that ideally have an engineering background, that are innovative, customer oriented and business savvy. Nothing short of the Fantastic Four