Monday, November 02, 2009

Introduction to the book on SaaS Service Operations

"You can't handle the truth!" (Col. Jessep in 'A Few Good Men')

As I have mentioned in a previous post, I am working my way through writing a book on SaaS Service Operations. Using the web as a collaborative tool, I have decided to share my work, bit by bit (three chapters, so far) to test it within the community and get live feedback from those who matter, potentially those that would read and recommend it.
Following is the (draft) introduction chapter. I would dearly appreciate your feedback on content, style, typos, grammar and whether you might find such a book an interesting read.
I am not fishing for compliments - it will beat the purpose, and yes, I can handle the truth.
Many thanx,
Dani

Introduction – or Why am I Writing This Book.

Well, someone has to write it. Numerous words have been exhausted over the years on matters SaaS, but I have seen very little being written about SaaS Service Operations, and there are no books on this subject that I am aware of.

As SaaS is becoming mainstream, it has also become the most visible and mature service in the Cloud stack. Consumer expectations have elevated such that they are demanding fast response times and a service that delivers on the availability slogan of ‘anytime-anywhere’. These expectations do not refer only to the application; but also it is expected of the customer and professional services as well. SaaS companies often excel when it relates to the first ‘S’ of SaaS, i.e. Software, but fair quite poorly with regards to the second ‘S’ – Service.

What started as an experiment of the few and the brave, will soon become the major force in the software market, and what will differentiate one company from the rest is no longer the on-demand allure or the feature set, but the level of service it provides.

I am a war veteran in this respect and have many scars to parade. There are probably very few mistakes that I have not made. Being a descendant of Homo Sapiens Sapiens, I like to think of myself as one who has learned from his mistakes and taken steps to remedy them.

‘Operational Fatigue’ is a term I coined after the umpteenth time I was awoken in the wee hours of the morning to handle an outage that occurred yet once again, after having seemingly fixed the problem two weeks prior. I could have just as well created this phrase after the two hour scheduled downtime to upgrade the service. The upgrade turned into a nine hour nightmare that was finally resolved (a couple of minutes before our major customers started their workday) by some engineering heroics. As always, these were followed by heart wrenching phone calls to the CEOs of our customers to explain what went wrong (again) and why it would not repeat.
No wonder I grind my teeth at night.

Throughout my years of practice in this space I have discovered a number of traits across the industry:
  • Most SaaS companies are structured and behave in a similar fashion
  • Most SaaS companies lack the discipline, the tools and the practices to provide an efficient and effective service operation
  • Most SaaS companies, therefore, end up paying the price of not meeting their SLAs, which leads to customer dissatisfaction, customer churn and ‘Operational Fatigue’
The intended audience for this book is whomever is responsible for the quality of customer service. That includes the CEO, the CTO, VP Engineering, VP-Director-Manager of Operations and VP-Director-Manager of customer service. All of these functions must work in unison to ensure a smooth operation both outwardly and internally.

This book is divided into four sections:
  1. The first section introduces concepts about SaaS, the evolution of the market and why the model is here to stay. Enough has been written about the subject so I will stick to some of my observations without going into a long dissertation.
  2. The second section contains insights on service operations in an SaaS company. It includes various posts published on my blog (‘Dani’s Perspective on SaaS’), over the past year. It discusses typical SaaS operations, discipline, transparency, outsourcing in the Cloud, metrics, etc.
  3. The third section covers Operational Support Systems that might or might not be supported by the product. They include: Billing, On-Boarding, De-provisioning, Integration, Retention Policy and Communication.
  4. The final section is instructional and lays out the principles of my adaptation of ITIL for SaaS Service Operations™ . It explains what ITIL is and why I chose ITIL as a basis for defining the practices of running an efficient and effective service operation. It covers six practices that I have developed and refined throughout the years at various companies with whom I worked either as an employee or as a consultant.
By following the practices, following the workflows and deploying the tools outlined in this book, SaaS companies can instill the discipline needed to reap the benefits in a surprisingly short time.

It is not complicated, it is not expensive, nor is there sorcery involved - it only requires awareness and leadership.


Sunday, October 25, 2009

Cloud IaaS: Sorry, not very Interesting

“There is an incessant influx of novelty into the world, and yet we tolerate incredible dullness” – Henry David Thoreau

Don’t get me wrong. Infrastructure-as-a-Service is a wonderful, useful and logical development. I do not need to sing the praise of it here. I believe in it and I am sure that it will provide a growing, significant percentage of computing needs around the globe.

But, it is just not very interesting, although it is the rage in all IT circles and hype generators. The technologies that enable it are basically: high speed bandwidth, virtualization and sophisticated management software. Now, I do not belittle these technologies. They are the product of years of development of ingenious engineers and some fast acting companies that had the ability to put one and one together and come up with the offering. And kudos to Amazon Web Services on leadership, ideas and execution.

Still, I believe that it is the domain of the few, and although every datacenter and ISP out there is starting to offer a ‘cloud’ solution, the end result will be a few very large companies that are big enough to invest in a model that makes economic sense and are sophisticated enough to pull it through.
So what does that say for technological companies that are thinking of providing IaaS-enabling software or hardware? There will survive only a handful of those companies, since they will be competing in such a small market.

So why is it such a hype, and why is it burning like a bushfire in the Kalahari savannah, while it took almost a decade for SaaS to become mainstream? Because the idea of IaaS is very simple and straightforward. IT gets it. Any old CIO can understand the concept, because hardware is a commodity and has been for a long time. Because many enterprises have been hosting in co-los for decades, acting as if their hardware is in their datacenter.
Once you get over the fear of losing control and get through the blah-blah of security, the idea of IaaS is very simple, and therefore, not interesting.

SaaS on the other hand is all about Applications. And applications are not perceived as a commodity (although many of the non-core applications are beginning to assume that role – and that’s a good thing). Therefore, once the hype will run its course and the dust Clouds will settle, IaaS will become mainstream. Every enterprise will choose how much of its infrastructure will lay outside of its firewalls and to what extent it will use the flexibility of the solution. SaaS will still be the interesting item, since every ISV will offer an on-demand solution, and the competition will continue to generate innovation and breakthroughs.

Tuesday, September 08, 2009

SaaS 70 – Nextgen Certification for On-demand companies

“A certified lunatic is certified nonetheless” (Dani, 2009).

I was asked by one of my readers (note the plural) to include a chapter on SAS 70 in my upcoming book on SaaS Service Operations. I must admit that I was not sure if he was advocating SAS 70 or he wanted me to discuss certifications for SaaS, since I am not a fan of the former but a promoter of the latter.

Confusion is defining the SaaS market when it comes to certification.
Enterprise IT personnel certainly do not know what questions to ask, so they generate these long RFPs that are very similar to the on-premise RFPs, and they slap on top of it security questions that make their CSO officer feel important with a multitude of acronyms that are either relevant or not. Most on-demand ISVs wouldn’t know how to define a ‘certified’ SaaS either.

The good news is that the customer base is demanding assurances. While a few years back, the concerns were mostly security and mostly compared to on-premise solutions, the market is maturing and now there are a myriad on-demand solutions for every vertical or horizontal aspect of applications.
So how does an IT professional distinguish between the good and better solutions? How can she judge whether the SaaS provider will stand up to its SLAs, whether the data is secured and operational procedures exist and are followed?

SAS 70
The truth is, there are no authoritative answers to these questions nowadays. With a glowing lack of SaaS certification the only default out there is SAS 70.

Statement on Auditing Standards No.70 (SAS 70) is an internationally recognized auditing standard developed by the American Institute of Certified Public Accountants (AICPA) in 1992. It is used to report on the "processing of transactions by service organizations", which can be done by completing either a Type I or a Type II audit. A SAS 70 Type I is known as "reporting on controls placed in operation", while a SAS 70 Type II is known as "reporting on controls placed in operation" and "tests of operating effectiveness" (http://www.sas70.us.com/what-is/definition-of-sas70.php)

(Disclosure: I have not undergone a SAS 70 audit in the companies I worked for. My knowledge is based on reading and sharing other companies’ experiences)

What’s good about SAS 70
The fact that SaaS companies want to take the extra (expensive) step to distinguish themselves from the rest of the pack, shows a level of maturity and seriousness about their business. SAS 70 requires that you have a set of practices and that you are following them.
This in itself is a big step forward for most SaaS companies – they actually have a set of defined practices.
Sorry, only two short paragraphs on the benefits.

The shortcomings of SAS 70
This audit was not defined for SaaS. It was developed in 1992, years before even ASPs were in vogue. It is a general audit for service organizations and covers a wide range of businesses, from credit processing, to medical insurance and data processing.
There are no specifics for an on-demand software company. Heck, there are no specifics for a software company either.

Please note the language “A SAS 70 audit helps companies meet regulatory compliance…”, and “a SAS 70 audit provides an additional layer of accountability…”
Nowhere does is state that it certifies the company at any level, other than the fact that the audit was done.
It reminds me of cosmetic advertizing “makes your skin feel younger” – how very scientific.
There are no recommendations, no standards to meet, no right or wrong. It merely states that you have practices (good or bad) in place, and that you are following them.

As mentioned, the mere fact that there are defined practices exhibits a level of maturity, so I do not belittle the exercise, but there are no provisions in SAS 70 to avoid documenting your bad practices and following them through.

SaaS 70
There is a dire need for a certification program for SaaS companies as the domain matures and SaaS becomes a major component of IT.
IT wants to know that you are a competent service operator, that you are running a tight shop and that the service will be around next Thanksgiving.
I am suggesting a certification program, currently named SaaS 70 (to demonstrate my famous wit), which includes three elements:
  • Service Operational Maturity – Has the company defined and implemented practices and procedures for running a robust operation, to ensure that SLAs are met? This would include Change Mgmt, Release Mgmt, Incident mgmt, Event Mgmt, Availability Mgmt, On-boarding, de-provisioning, integration, data retention, etc.
  • Security – covering all aspects of password policies, data separation, vulnerability testing, virus protection, privacy, etc.
  • Service Continuity – examining the financial viability of the company and what plans are in place to continue providing the service even if the ISV goes belly-up.
Within each component the company will score a level of maturity beyond a pass/fail that comprises coverage, depth, documentation, and tools. And, of course, the report will include recommendations for improvement and scaling up the maturity ladder.

Only with such a specific, SaaS-centric, verifiable and accountable program, will the consumer of these on-demand services know that a company can or cannot meet their expectations.

Thursday, August 20, 2009

Discipline (or lack thereof) and Operational Fatigue

“Half of life is luck; the other half is discipline - and that’s the important half, for without discipline you wouldn’t know what to do with luck”- Carl Zuckmeyer

Creative and nonconformists
SaaS companies are mostly composed of a group of highly capable software engineers. These techies are, by nature, creative, imaginative, out-of-the-box engineers, inventing new ideas or new ways of achieving better results. They tend to adopt the latest and greatest technologies and are always looking forward to the next best thing.
Naturally, these engineers are nonconformists and not inclined to follow rules or to stick to routine.
Almost always, they do not come from an enterprise IT environment, where rules and regulations are stricter and operational practices are followed almost religiously.
With the nascent state of SaaS, if the engineers have prior experience, it would mostly come from on-premise, product companies that emphasize features, versatility and usability.
They rarely had to deal with customers, and bugs that were found were handled according to their priority to be fixed in the next release (which could be months away).
Therefore, typical SaaS engineers lack the necessary discipline to run a 24X7 service, and are usually hostile to restrictions imposed on them.

What, me worry?
The lack of discipline manifests itself mainly in Change Management and consequently in Asset Management and, then, consequently in Incident Management.
This refers to what changes are allowed to be done when (‘hey, just to let you guys know, I installed the new patch during lunch break’), how are they approved and communicated (‘yeah, no prob, I tested the code on my laptop – it is foolproof, just a small change in the parsing engine’) how they are recorded and rolled back if necessary (‘don’t worry, I keep all changes in a dedicated notepad on my machine’).
There usually are no rules about touching production. Typically, every engineer has full SUDO access to all servers in the data center, using a single super-user login, so that activities cannot be traced to any specific person.
One-offs can be installed on a particular server and not be documented. Months later when a new version is installed or a server replaced, things fail to work and it may take hours for someone to remember that a special component is not functioning any more.
Lack of a fully functional staging environment may cause an engineer to ‘temporarily test’ some feature on a production machine that either causes service disruption or is forgotten until the fan turns brown.


Operational Fatigue
Operational Fatigue is a term I coined after years in the trenches, of waking up at 3:00 AM to deal with the same problem that hit us three weeks ago; of the stress of dealing with an incident at peak time when Management is hysterical, when Sales are complaining, when Support is overwhelmed with frustrated customers; of making the calls to the high profile customers, explaining, apologizing, promising; of having to explain to the Board why we lost so many customers this quarter.
It gets to you. You discover new gray hair and develop a fear of answering the phone.

The point is – it is avoidable. Instilling the practices and discipline can make life so much easier and allow the ops team to plan and improve instead of fighting fires all the time.

Educating the young
Like toddlers, engineers crave for guidance and discipline, but as most parents would testify, they will make every attempt to break the rules and stretch the envelope to test the boundaries of their environment. Experienced parents will tell you that the young children feel much more secure when they know the rules and when the rules are being enforced. It has been my experience that when I introduced a new set of regulations such as in Change Management, there is always an initial push-back, mumbling about bureaucracy and attempts to circumvent the rules in the beginning. But I have always seen a quick adoption of the new regulations, followed by a realization that life would be so much better if we only stick to the rules – these guys are smart, you know. Many a disaster was avoided by playing the game by the new rules and I found out how quickly the engineers embraced the discipline and started devising ways to improve on and automate the processes.

Just do it!
I recently participated in a round table hosted by HP on the subject of Change Management. Most of the participants were from large IT shops and were talking about adapting to new Change Management processes in terms of six to twelve months. I was astonished. I concede that my background has been with much smaller groups, and I had the full backing of the executive management, but twelve months? Jeez!

The process in my experience was:
· Prepare the documents, templates and work-flows.
· Make a compelling Power Point presentation.
· Present to the Engineering, Ops and Support groups.
· Emphasize the consequences of not following the practice (genitalia hanging at high altitude)
And Voila - It works! A few weeks later you have a spiritual following of admirers, because the fruits of the labor are so obvious in a very short time.

Thursday, August 13, 2009

Transparency in SaaS Service Operations

“Life is filigree work. What is written clearly is not worth much, it's the transparency that counts.” - Louis-Ferdinand Celine

Companies like to boast about their transparency, but in practice, information dissemination is highly controlled. At an on-demand company, hiding the backstage operations seems like a smart thing to do. As long as you are servicing the customer, and as long at the customers do not complain, why should you wash your dirty laundry in the public?
So what about SLAs? The guiding principle seems to be ‘Don’t worry about them if your customers do not demand them’. And even when they do, there are SLAs and then there are SLAs. There are so many ways to interpret these elusive numbers (assuming you even know the real ones) that most companies will portray better results than those that reflect reality.

Varying degrees
There are different modes of Transparency communications; from the non existent to the reactive, the proactive and full disclosure.

The reactive type is the common case where there are service disruptions and customers call in to complain. In this case you will determine how much information you would like to divulge. This could be done with a customer call, an RFO (Reason for Outage) that is sent to particular customers or a message on the corporate site.

A proactive approach would have a Service Status Page depicting the current service availability of the various production systems.

A full disclosure mode will provide customers with a historical view of production systems availability and response time such at Salesforce’s Trust or SAManage’s Status Page .

Advantages of Transparency
My experience has been that the more transparent you are with your customers, the better relationship you will foster with them and the more forgiving they will be when things turn sour. And things do turn sour; it is unavoidable.
Your customers are not dumb (in general, that is – I can relate many amusing stories of individuals that should have not been awarded fourth grade graduation, but that is another story). The people on the other end generally understand that you are dealing with a complex environment with many factors that are not always under your control. They will be willing to accept that scheisse happens, but they also must know that you are ready to accept responsibility and learn from these events. There should be a closure process for each event including Incident Recording, Post Mortem, RFO communication (more on that in Incident Management).
Of course, nothing beats a good, reliable, available and responsive service. If you are not able to provide that, you will end up loosing your customers regardless of how much camouflage and finger pointing are used to cover the smell.


How transparent should you be?
I am not advocating that you have to run out and tell the guys every time you messed up or that you should bombard the customers with a technical exposition as part of the RFO document.
Striking the balance is an art that comes with practice and common sense. If an incident occurred that did not disrupt services, you must undergo the full Incident life-cycle practice to ensure that lessons are learned and the incident will not repeat. But you do not necessarily have to go and boast about it.
As for the RFO, in my days I have been asked to put my signature on many customer facing documents that had a bland, general, canned message that meant nothing to the reader. (“service was lost do to a system failure”). I realized that customers will not trust the messaging and choose to either ignore it while snorting in disgust or have a techie call in and start drilling the poor customer service rep for technical details which would be hard to provide.
I have also seen RFOs that contained multiple pages and read like a PHd dissertation in electronic engineering. I do not know who approved these RFOs and if the purpose was to wear down the suffering reader so that further RFOs will never be requested.

Company Culture
And finally, keep in mind that if the company’s culture tolerates half-truths and spins when facing the customer, you run the risk of it percolating through the company’s internal activities and reports. Don’t you expect your employees to be truthful, accountable and not shy away from reporting mistakes, even if it makes them look not too great? Your customers have to expect your company to do the same. And, if the results of truthful reporting will cost you a customer then something was probably wrong with the relationship to begin with, and the customer may have been looking for an excuse to break away.

Monday, July 20, 2009

Can SaaS Companies Go Back to Basics?

"Change is a bouncing ball on the circumference of a circle"


I was recently at an AWS conference and met with a substantial number of SaaS companies that are running their full production on the Amazon EC2 and S3, albeit all were relatively early stage, smaller companies. I spoke with half a dozen VP Ops or their equivalents, and all stated that they were satisfied with the service and the uptime, and did not experience major outages.
I have also met recently with a number of successful SaaS companies that we under 20 people total – and that includes R&D and Sales & Marketing and running the 24X7 operations.

So it got me thinking that if SaaS companies can do well without the need to deal with hardly any aspect of the infrastructure we may be approaching a completion of full circle.

History 101

Since the dawn of time (January 1, 1970 – Unix time, that is) there were software companies. If they were successful, they excelled at writing software and testing it, and with time developed good professional services capabilities. And of course they needed to know how to market themselves and sell, and partner – but that was true of any company out there, whether they were manufacturing rubber gaskets or CAD software.

Fast forward to the post-boom, post-ASP era, and a new breed of software vendors appeared on the scene. As they were pioneering the new on-demand model; they all owned their infrastructure, and probably some of them were even hosting the hardware in their own back office.
These new SaaS vendors had to have expertise in their domain and their software, of course, but also in operations, 24X7 customer support, servers, power, storage, DBs, networking, security, performance and load testing, on top of the mastering the model of selling services rather than software.

As the market rapidly expanded, SaaS enablement companies grew around these new vendors, and started offering hosting at first (real estate and power), then networking capabilities, and then basic network monitoring services.

Two trends developed. On one hand, Managed Services companies (e.g. IP-Soft) offered to take over all the routine operations of managing the infrastructure up to the application level. That included monitoring and maintaining the network gear, servers, storage, DBs, Web servers and their respective operating systems.
On the other hand one saw the rise of Managed Hosting companies (e.g. Rackspace).that rented out the hardware itself on top of the real estate and offered ever growing services around the hardware.

And, there are companies (such as OpSource) that offer everything from hosting, to servers, storage, application management, 1st and 2nd tier helpdesk, as well as reading you bedtime stories.

Now we are seeing companies that offer QA services (especially performance, but not limited to), security services, integration services (AKA Professional Services), 24X7 answering services and tucking you into bed.

It is too early to tell how successful this ecosystem will turn out to be, and what percent of SaaS companies will subscribe to this model, but the emerging trend is clear – SaaS companies are offered the opportunity to go back and do what they do best – write software.

Nobody Does it Better

The question is where do you draw the line? Keep in mind that the success of a SaaS company relies mostly on the second S (Service) and less on the first S (Software), or put another way – depending on the execution more than the quality of the software.

I see a number of areas that must be directly managed by the SaaS staff:
  • Product development
  • Customer relationships
  • Application management
As for functional testing, performance testing, security testing – they may all be outsourced, but never relinquish control of these processes.
Ditto for professional/integration services – you may hire an outsourcer/partner to perform these functions, but ultimately, the customer success lies at your door.

My friend and SaaS networking expert, Gil, says that you should never outsource the infra or the management of it because nobody will take care of your baby as good as yourself.

I recently had a conversation with a managed services account manager that confided in me that managing the infra of SaaS companies is far more difficult than that of your average enterprise IT, since the SaaS companies are far more sophisticated, have deeper technological understandings and higher availability and response requirements.

Another question is at what point do you want to take back ownership? Does a certain size and complexity of the service and business justify bringing in your own teams of experts to handle those tasks listed above? The cost of doing it yourself will probably start going down as you grow, but the company’s values might dictate sticking to the core competencies – Hey, isn’t that what the SaaS offering is all about?


Wednesday, May 20, 2009

Questions that SaaS executives must be able to answer - KPIs that matter.

“There is much pleasure to be gained from useless knowledge” (Bertrand Russell)

It has been my experience that SaaS executives have trouble answering the most basic questions about their service operations, and mind you, this is what the business is all about.


Again and again, I keep coming back to the conclusion that the fact that state of SaaS Service Operations is so dire is due to the fact that on-demand companies are built on the first ‘S’ (software) and not the second 'S' (Service).

SaaS entrepreneurs are, in general, bright, creative, out-of-the-box thinkers. They are software developers and have no clue about IT practices and disciplines.


The age old premise “if you can't measure it you can't manage it” somehow escapes SaaS companies across the globe, until it becomes a huge problem.

Have you gone through the numbing process of presenting a specific customer with their real SLA adherence? I have. On average, it would take me a few hours of going through multiple sources of data to come up with (sometimes) accurate data.


Following are a number of questions that every SaaS executive should be able to answer in her sleep, or at least with a click of a button.

1. Availability management
  • What are your real uptime numbers?
  • How do the trialing twelve months (TTM) look like
  • Are we better than we were six months ago?
  • How many outages have you had in the last M months?
  • What is the breakdown, based on severity?
  • What is the breakdown, based on downtime causes?
  • How many service disruption incidents were repeated?
  • How quickly do you recover from outages?
  • How many days have gone by without a critical, major outage?
2. SLA Management
  • How does your availability match up to your customer commitments?
  • Which customers were affected most (even if they do not complain)?
3. Change Management
  • How often are changes made to the production environment?
  • What is the breakdown of changes by category?
  • What percent of changes did you have to roll back?
4. Asset management
  • What is the status of your inventory? What box is located where?
  • What function or customer would be impacted by a loss of a certain box?
  • When do your support/software contracts expire and what might it affect?
5. Churn Management
  • How many customers have you lost in the past 6, 12, 24 months?
  • Is your customer retention improving over time?
  • What percent is your customer churn out of your customer base?
  • What is the average retention time of your customers?
  • What is the breakdown, based on reasons for churn?

I am well aware of the fact that there are no integrated solutions for the SMB supporting a database for these crucial KPIs, but every company should have some form of repository capturing at least some of the data and a easy way of extracting it.

The important issue here is that SaaS companies should be aware of these KPIs and start asking these questions, even if they do not yet have all the answers.