"The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency. The second is that automation applied to an inefficient operation will magnify the inefficiency" - Bill Gates
This is an article which I posted at the Nolio blog site as a guest blogger back in 2009. Not only is is relevant today, but perhaps even more so...
The Next Killer App
If I had a great idea for the next killer app (I have, actually) and if I had unlimited funds (I don’t, actually) I would have built the software as an on-demand offering.
I would have spent half my funds on building the operational support systems – provisioning, billing, retention policy, self-service, report generator, etc. The other half would be invested in building instrumentation, redundancy, automation, integration, application level monitoring, silent upgrades, customer notifications, and so on.
The rest of the money (you may wonder about my math, but hey, I’ve got unlimited funds) would go towards building the actual application.
Most SaaS vendors out there (and they are growing fast) have chosen the predictable path of building the application first, and worrying about serviceability later. This is the fastest way of getting to market with low costs. The next step is choosing some viable hosting solution and off we go, offering the world our ever better CRM.
Growth
Many months and dozens of customers later, reality hits with all the issues of servicing the software, rapid growth and dealing with labor intensive tasks that are the humdrum of daily life in a SaaS operation. Provisioning/de-provisioning, configuration changes, customized reports, and the most dreaded – upgrades, task the team as a whole, especially when the product is successful and the number of customers is growing daily.
It is not that SaaS executives, architects and engineers are lacking in any way. On the contrary, they are mostly smart, inventive, and creative and have a deep understanding of their customers’ needs in the specific domain. The problem is that they are product people, not service people. Practically none of them come from IT and cannot envision the life of a service operations engineer.
At this point, automation becomes crucial to the survival of the business.
Whether it is built into the next version (many architectures make this quite difficult) or done externally, automation is needed to reduce costs, physical labor, frustration and mainly, error-prone manual procedures. Repeatability, which is a derivative of automation, is also crucial.
Automation is needed across the board. Be it in setting up a new server, or building a new application instance. It could be a manual procedure regarding provisioning of application resources, or building a seamless upgrade procedure.
Outages happen. How quickly can you recover from a service disruption and ensure that the recovery does not create it own problems? Automation not only provides the routines for quick recovery, but instills a discipline of thinking out the necessary steps, discovering dependencies and planning ahead. An added benefit of automation is that it documents the process so you can go back and review the best and worst of your procedures.
In my next post I will take a closer look at the SaaS Upgrade Nightmare.
Software-as-Service as a disruptive trend and how it affects the traditional, ISVs and IT moving to the Cloud. Considerations in the transition to the new model and expertise on SaaS Service Operations - STORM™ and DevOps
Friday, December 24, 2010
Thursday, December 16, 2010
2 x E-cube = S-cube - Simple math for SaaS Scalability Success
"The circumstances of human society are too complicated to be submitted to the rigor of mathematical calculation" (Marquis De Custine)
Recently I posted a discussion on the link between success with SaaS and scalability and how, therefore, the Service Operations needs to be geared to handle scale and deal with fast growth.
On a recent consultation engagement, I gave a presentation to the board members of a SaaS company that decided to penetrate the SMB, following their hardships in securing big deals in the enterprise market.
As part of the thought process and brainstorming session I came up with a marketing catchphrase to make a point. At the time I did not think much about it, but as I was working on the board presentation, I realized that there it was much deeper than I had originally thought.
I would like to share this with you.
The First Cube
What I drew on the white-board was roughly this artistic creation:
Announcing: “E-cube is the winning formula!”
- Easy to Buy
- Easy to Implement
- Easy to Use
Easy to Buy - Their product is very easy to implement, and easy to on-board a new organization, (although it is a complex product). There is a technical issue that involves an architectural change, so that became part of the plan.
Easy to Use - the product has a great user interface and intuitive flow. They need to add tutorial videos.
Easy to Buy – That was lacking and that was what we decided to focus our efforts on. It included a lead generation program, a no-touch free trial and an improved landing site.
The Second Cube
As I was working on what it would take from the company’s end to deliver scalability, I realized there was another E-cube involved:
- Easy to Sell
- Easy to Scale
- Easy to Maintain
It is all nice and well if you can bring thousands of leads to your site with the E-cube formula, but if you cannot convert these leads into paying customers and then give them the best service on a controlled budget, you will have not achieved your scalability goals.
So, Easy to Sell means - a simple pricing model and a top-notch insides-sales team (not easy to come by though) backed up by funnel management software.
Easy to Scale means - a solid, configurable, multi-tenant architecture, self service features and automated procedures.
Easy to Maintain means – a full featured Operations Support System, Service Operations practices and the discipline to enforce them.
So with two E-cube guidelines one can achieve SaaS Scalability Success.
Ergo: 2 x E3 = S3
Q.E.D.
Thursday, December 09, 2010
The SaaS Consumer’s Point of View – Negotiating an Agreement
“The food was superb, the atmosphere was great, the service was outstanding; it was those goddamned customers that had to ruin it all”. (Morris Green, restaurateur, NYC, 1987)
This short article is an introduction to an interesting blog post that was published recently by Derek Singleton from ERP Software Advice, but before I let you go, I want to take this opportunity to talk about the customer’s perspective.
My writings, presentations and webinars have been mostly dedicated to the point of view of the software provider, whether SaaS or SaaS-to-be. I would like to present a different point of view.
Throughout the years of being on the software provider’s side I have learned a thing two on what makes a happy customer and that is the guiding light I have been trying to follow for years.
I believe that if we understand the customer’s perspective we have a better chance of providing a good service and creating a happy, loyal customer base.
Who is the customer?
SaaS adoption has been mostly done in a haphazard fashion throughout the years. Many of the early adopters were business managers at the department level in the enterprise.
Even within smaller companies, the decision to consume SaaS was usually a point solution, for a particular issue to handle and not as part of a well thought process and methodology.
In many cases the IT department and CIOs were kept out of the loop in defining needs, selecting the service, negotiating the deals and the process of provisioning and de-provisioning. In the extreme, IT found out about their company consuming a web application only when a user would call the help desk and ask for support.
As more CIOs are ‘getting it’, as more IT departments are becoming cloud-oriented they are becoming that target customers, rather than the end users. They are usually better equipped (once their fears are neutralized) to judge the provider, the application, the integration and to negotiate a better deal for the organization.
These IT professionals should be planning a Roadmap for SaaS so that the consumed applications become part of a coherent plan rather than something the cat dragged in.
We should start focusing on this new generation of SaaS customers.
An interesting article covering the negotiations with a provider is therefore presented.
Saturday, November 13, 2010
SaaS, Scalability and the Three Little Pigs
“We have the wolf by the ears, and we can neither hold him, nor safely let him go” - Thomas Jefferson
A few days ago I gave a talk at a SaaS Business Challenges conference and I would like to share the main theme with my readers.
It’s the Service, Dummy!
When SaaS burst upon the scene about ten years ago, the first consumers were of two types: Those that used on-demand as an ideology, having foreseen the cloud revolution early on, and those that had no choice, because the other option was an expensive, time consuming and complex solution.
Back then, there were few SaaS application or service choices, so customers had to be very forgiving about the service levels and were willing to put up with reduced functionality, outages and low response times. SLAs, if they even existed, were non- binding and lacked both depth and breadth.
These days, however, when there are dozens of SaaS applications for every need, the differentiator is no longer the functionality. Most SaaS applications offer a similar set of capabilities, and as applications change on a bi-weekly or monthly basis, features are added on an on-going basis.
Even if one comes up with a revolutionary solution and provides the only SaaS offering of its kind, it is safe to say that within a year, three new SaaS companies will offer the same, or an improved set of capabilities.
So what differentiates one SaaS offering from another? It’s the Service, dummy!
Features don’t make a loyal customer – outstanding service does.
Scalability
There aren’t many SaaS companies out there that service a few dozen Fortune 200 companies while keeping the profitability high. The general rule is that margins are small and that profitability is achieved through hundreds or thousands of customers.
Giving great service to 20 or 30 customers is a no-brainer. Just throw more bodies at the problem and you will achieve a highly satisfied customer base.
But what happens when these numbers multiply rapidly? You’ll soon find out that what worked for a few dozens might collapse at the next order of magnitude.
When I approach SaaS companies at the stage when they already have a growing customer base and warn them about the perils of scaling up, the usual response I get is:
“I wish I will have to deal with that problem”, meaning that they would love to have 200 hundred customers that cause strain on the infrastructure and operations and deal with those ‘good’ problems then. It is only human to postpone these issues when they are not burning your behind.
To those CEOs I would say: “Pack your stuff, return the money to your investors and go look for another job elsewhere”.
Your investors did not give you their hard earned dollars for a proof of concept.
They invested in you because they believed that you will bring in thousands of customers.
My best clients are SaaS companies that come to me when they are in pain. They start loosing customers because they did not build an operation capable of handling the scale which they had wished for.
As I have stated time and again – SaaS companies usually consist of outstanding, creative developers that build great technology, but they don’t have the IT and/or operational experience. They lack the know-how and especially the methodology for building a successful, scalable operation.
Building Operations for Scalability
I have written much about these issues and they can all be found in previous blog postings.
Suffice to say that the setup you need to build for a scalable service operations includes:
· Methodology – the framework of practices, templates, workflows and tools
· Operations Support Systems – everything else your engineering team left out of the product.
· Executive buy-in and awareness -define the metrics, and provide tools to capture and analyze those metrics.
Can a Huff and Puff Blow Your House Down?
So what has all this to do with the Three Little Pigs?
The first little pig built a house of straw, since he didn't have time to invest in operational infrastructure and wanted to make a quick exit. He ended his career as wolf poop.
The second little pig invested in a better infrastructure but did not pay attention to the practices and processes. He lasted longer in his wooden structure but was huffed and puffed and blown away by the competition.
The third little pig took his time and invested executive attention in doing it right from the start. He lived happily ever after.
So even if you cannot afford a brick house from day one, you should have the blueprints ready and the determination to add the bricks when they become available.
By the time the Scalability Wolf arrives, you should have a sturdy enough structure and react quickly to threats, to survive and prosper.
Thursday, June 10, 2010
Private Clouds – What’s in a Name?
“Happiness is like a cloud, if you stare at it long enough, it evaporates” - Sarah McLachlan
Remember when all of our secretaries and stewardesses turned into office managers and flight attendants overnight? Remember when all the co-los and server-hosting companies became cloud providers overnight?
(To alleviate all suspicions – as my readers know, I am an ardent advocate for Cloud computing and SaaS in particular, so this post is not about arguing the merits of this constructive and disruptive trend).
Cloud means different things to different people – but mainly it means “a cool way to market my same old, tired stuff”. I worked with a co-lo provider a couple of years back. A few months ago I went to their site to check prices and lo and behold: They became a Cloud Provider! They had clouds splashed all over their site and every solution they sold was a Cloud solution. I called up a sales person and asked about their elasticity and time units. Turns out they were elastic in one direction – you could always order more servers – and you only had to commit to one year in advance. On a geological time scale that is quite flexible. When I laughed and asked what was ‘cloudy’ about their offering, the guy got confused and said that his manager will get back to me.
So what is all this newspeak about Private Clouds?
Guess what is happening to the good ol’ data center? As David Linthicum aptly puts it in his latest blog post: “the reality is that ‘private cloud’ is just another term for on-premise systems”.
Point number one is that I find the term an Oxymoron:
Cloud means that it is ‘somewhere out there’; location is transparent. Cloud means sharing; resources are transparent. The physical server I used this morning might be used by someone else this afternoon.
Private means it is in my back yard and only I get to play in the sandbox.
Point number two is that even if we apply the Cloud concepts to the enterprise, it will be relevant to a very small number of very big players. Those enterprises that are truly global and distributed. They could take advantage of the peaks and troughs, of the "follow the sun" model, of the large numbers and justify the investment of a Cloud player.
And what about the rest of us mortals? Let’s assume that a certain percentage of our IT services will not transition to the cloud – be it regulation, compliance, perceived loss of control, or the illusion of maintaining job security. We should be using Cloud-enabling technologies, to make a smarter use of our resources and data centers. That means virtualization, automation, orchestration and auto-provisioning technologies. We only get the silver lining – not the cloud.
So the Private Cloud, in essence, is a wonderful opportunity for the big vendors out there to sell to the enterprises - new equipment, new systems and new services.
Remember when all of our secretaries and stewardesses turned into office managers and flight attendants overnight? Remember when all the co-los and server-hosting companies became cloud providers overnight?
(To alleviate all suspicions – as my readers know, I am an ardent advocate for Cloud computing and SaaS in particular, so this post is not about arguing the merits of this constructive and disruptive trend).
Cloud means different things to different people – but mainly it means “a cool way to market my same old, tired stuff”. I worked with a co-lo provider a couple of years back. A few months ago I went to their site to check prices and lo and behold: They became a Cloud Provider! They had clouds splashed all over their site and every solution they sold was a Cloud solution. I called up a sales person and asked about their elasticity and time units. Turns out they were elastic in one direction – you could always order more servers – and you only had to commit to one year in advance. On a geological time scale that is quite flexible. When I laughed and asked what was ‘cloudy’ about their offering, the guy got confused and said that his manager will get back to me.
So what is all this newspeak about Private Clouds?
Guess what is happening to the good ol’ data center? As David Linthicum aptly puts it in his latest blog post: “the reality is that ‘private cloud’ is just another term for on-premise systems”.
Point number one is that I find the term an Oxymoron:
Cloud means that it is ‘somewhere out there’; location is transparent. Cloud means sharing; resources are transparent. The physical server I used this morning might be used by someone else this afternoon.
Private means it is in my back yard and only I get to play in the sandbox.
Point number two is that even if we apply the Cloud concepts to the enterprise, it will be relevant to a very small number of very big players. Those enterprises that are truly global and distributed. They could take advantage of the peaks and troughs, of the "follow the sun" model, of the large numbers and justify the investment of a Cloud player.
And what about the rest of us mortals? Let’s assume that a certain percentage of our IT services will not transition to the cloud – be it regulation, compliance, perceived loss of control, or the illusion of maintaining job security. We should be using Cloud-enabling technologies, to make a smarter use of our resources and data centers. That means virtualization, automation, orchestration and auto-provisioning technologies. We only get the silver lining – not the cloud.
So the Private Cloud, in essence, is a wonderful opportunity for the big vendors out there to sell to the enterprises - new equipment, new systems and new services.
Friday, May 21, 2010
The SaaS VP Operations as Product Manager
SaaS Operations Support Systems
“'Tis not enough to help the feeble up, but to support them after” - William Shakespeare
Every SaaS company needs to deal with numerous functions that are not necessarily part of the technological stack that originally came with the application, namely the Operations Support Systems.
If a SaaS start-up had unlimited time and funds to plan and build the perfect solution, they would probably all be in the Caribbean islands doing what people with unlimited time and funds do.
Just Do It!
Because of the nature of monetizing SaaS, companies try to get to market as soon as possible, getting subscriptions fees streaming in. Sometimes, they even launch with a half baked solution that will provide added value to the customers at the expense of future operational headaches.
The logic behind this is that dealing with a scalability problem, is a good thing. In other words - who doesn’t want to reach the stage when too many customers are taxing the team? We’ll deal with that when it becomes a problem.
These days, numerous PaaS offerings or SaaS frameworks offer built-in operational support systems functionality, but many of the necessary features are not supported.
Most of the SaaS application in the market were not built with these frameworks for various reasons. The most prevalent are that they were not available a few years ago, and that engineers have a tendency to build everything from scratch, or using frameworks (e.g. LAMP, Java, .Net, RoR) that they are familiar with.
When the typical SaaS service is launched, it lacks functionality that would support the scalability of the service operation. It is left to the Service Operations team to deal with all the 'maturity' functionality that the product lacks.
Let us examine some of the Operations Support Systems functions that are typically not handled by the application.
On-boarding new customers
One would expect most SaaS systems to have automatic provisioning. While that is true in many cases, a lot of the systems allow the customer to define users, but creating a new customer entity is left to the Support or Operations team. That may require generating a new database or schema, or setting up storage, etc. If the company is adding one customer a week, it may be manageable, but at a higher rate this is a taxing job and error prone.
De-provisioning
While it is expected that some level of automatic provisioning is provided with the product, very few SaaS applications provide a simple (never mind automatic) mechanism for removing existing customers. Very few SaaS developers design with of the prospect of loosing a customer in mind. Beyond the task of removing dependencies from the database, there are issues of releasing resources, and removing customizations.
Billing
Such a basic function, (one would think, for a company that lives or dies by subscriptions) is typically lacking from most SaaS infrastructures. While the situation is improving dramatically with the advent of PaaS and SaaS development frameworks, many systems still start out with excel sheets and a lot of manual work. Customer data must be extracted by some ad-hoc solution form the application database or is duplicated in the CRM, which causes endless synchronization errors. Ad-hoc solutions are implemented (either in-house or SaaS billing solutions) as the billing becomes ever more complex but until that occurs the brunt of the work falls upon the Operations team.
Metering is a whole new layer of complexity, if the billing is more complex than a fixed amount per seat per month.
Retention Policy
Some SaaS companies plan ahead for resource consumption by their customers, bur many start dealing with the issue only when they begin running out of storage, or when their storage costs are starting to hurt. Most customers want to retain their data forever on the provider’s disks, but that is impossible. So a retention policy must be defined and followed (such as delete files that are older than X months or larger than Y gigabytes).
The problem is that the application does not support that, so manual work is requried or ad-hoc solutions have to be built around the product.
Failover and Backup
Automatic recovery and failover are rarely built into the initial solution and are usually managed by the Operations team via building complex solutions with networking boxes.
This is true for backup and recovery mechanisms as well. The Operations team frequently has to build mechanisms around the production to support backup and recovery and most often they require much manual labor.
Application Monitoring
Well designed software is packed with instrumentation that is easy to turn on and off and easy to monitor and interpret. Not all SaaS systems have that built-in capability and the Operations team has to create an application-specific, monitoring infrastructure that can detect and react quickly to service degradation.
Seamless Upgrade
Upgrades are the recurring nightmare of any Operations team, especially on applications that have a demanding uptime SLA. Few SaaS systems are designed with that goal in mind. It usually takes a level of maturity and a sizeable customer base to get Engineering to consider revising the code to allow some sub-systems a no-downtime upgrade.
The problem is that is usually requires major revisions of the code if the system was not designed a-priori to handle a silent upgrade.
SLA Management
Whether you are using automated SLM or not (chances are you are not), you need to take into consideration the various aspects of your service that need to be metered, tracked and compared against a set of Service Level Objectives.
End User Broadcasting
Sometimes it is necessary to communicate with all of you users, or a sub group of them in real time. What a better option is there that to pop up a message on the end-user’s browser that you can compose on the spot or pull from a list of canned messages? The operations team need an integrated solution with the product to be able to do that.
Operations Console
And to tie it all up, a separate application that controls all the operational aspects of the service is needed. Included should be: provisioning/de-provisioning customers, password management, real-time login view of current users, real-time view of application usage, customer communication console, production environment control, etc.
There are many other operational features that are typically not found in a SaaS application as it leaves the factory floor. Just to mention a few: Integration, Reporting engine and reporting Database, Security, Scale-up and Scale-out capabilities, Sandbox, Status Page.
Development and Product management Experience Required
The VP Operations (or whatever title the job carries) is required to be a product manager of sorts, and a background in software development is almost a must. The Operations manager should be highly involved in the product roadmap and insist on having a say in defining future releases. There will be a contention between investing in Serviceability versus Functionality. Since the paying customers require more functionality, it is usually an uphill battle to gain service upgrades to the product.
While building an organic set of solutions into the product may practically take years, Operations cannot continue to throw bodies at solving scalability and downtime issues. So beyond influencing the product group on the direction in which the application should be developed, the Operations manager needs to build a set of tools addressing the Operations Support Systems needs as stated above. One option is to nurture a relationship with VP Engineering and get resources from her group to build well defined solutions that could each be completed in a couple of weeks of work. This is especially true in early stage companies where the Operations team is small and the engineers take the brunt of many of the operations’ tasks. The VP Engineering would appreciate the need as members of her team are feeling the pain as well.
In a more established company, the VP Operations must make sure that there are coding/scripting capabilities in the team, so simple projects and tools could be developed within the team, with minimal aid from the Engineering group.
“'Tis not enough to help the feeble up, but to support them after” - William Shakespeare
Every SaaS company needs to deal with numerous functions that are not necessarily part of the technological stack that originally came with the application, namely the Operations Support Systems.
If a SaaS start-up had unlimited time and funds to plan and build the perfect solution, they would probably all be in the Caribbean islands doing what people with unlimited time and funds do.
Just Do It!
Because of the nature of monetizing SaaS, companies try to get to market as soon as possible, getting subscriptions fees streaming in. Sometimes, they even launch with a half baked solution that will provide added value to the customers at the expense of future operational headaches.
The logic behind this is that dealing with a scalability problem, is a good thing. In other words - who doesn’t want to reach the stage when too many customers are taxing the team? We’ll deal with that when it becomes a problem.
These days, numerous PaaS offerings or SaaS frameworks offer built-in operational support systems functionality, but many of the necessary features are not supported.
Most of the SaaS application in the market were not built with these frameworks for various reasons. The most prevalent are that they were not available a few years ago, and that engineers have a tendency to build everything from scratch, or using frameworks (e.g. LAMP, Java, .Net, RoR) that they are familiar with.
When the typical SaaS service is launched, it lacks functionality that would support the scalability of the service operation. It is left to the Service Operations team to deal with all the 'maturity' functionality that the product lacks.
Let us examine some of the Operations Support Systems functions that are typically not handled by the application.
On-boarding new customers
One would expect most SaaS systems to have automatic provisioning. While that is true in many cases, a lot of the systems allow the customer to define users, but creating a new customer entity is left to the Support or Operations team. That may require generating a new database or schema, or setting up storage, etc. If the company is adding one customer a week, it may be manageable, but at a higher rate this is a taxing job and error prone.
De-provisioning
While it is expected that some level of automatic provisioning is provided with the product, very few SaaS applications provide a simple (never mind automatic) mechanism for removing existing customers. Very few SaaS developers design with of the prospect of loosing a customer in mind. Beyond the task of removing dependencies from the database, there are issues of releasing resources, and removing customizations.
Billing
Such a basic function, (one would think, for a company that lives or dies by subscriptions) is typically lacking from most SaaS infrastructures. While the situation is improving dramatically with the advent of PaaS and SaaS development frameworks, many systems still start out with excel sheets and a lot of manual work. Customer data must be extracted by some ad-hoc solution form the application database or is duplicated in the CRM, which causes endless synchronization errors. Ad-hoc solutions are implemented (either in-house or SaaS billing solutions) as the billing becomes ever more complex but until that occurs the brunt of the work falls upon the Operations team.
Metering is a whole new layer of complexity, if the billing is more complex than a fixed amount per seat per month.
Retention Policy
Some SaaS companies plan ahead for resource consumption by their customers, bur many start dealing with the issue only when they begin running out of storage, or when their storage costs are starting to hurt. Most customers want to retain their data forever on the provider’s disks, but that is impossible. So a retention policy must be defined and followed (such as delete files that are older than X months or larger than Y gigabytes).
The problem is that the application does not support that, so manual work is requried or ad-hoc solutions have to be built around the product.
Failover and Backup
Automatic recovery and failover are rarely built into the initial solution and are usually managed by the Operations team via building complex solutions with networking boxes.
This is true for backup and recovery mechanisms as well. The Operations team frequently has to build mechanisms around the production to support backup and recovery and most often they require much manual labor.
Application Monitoring
Well designed software is packed with instrumentation that is easy to turn on and off and easy to monitor and interpret. Not all SaaS systems have that built-in capability and the Operations team has to create an application-specific, monitoring infrastructure that can detect and react quickly to service degradation.
Seamless Upgrade
Upgrades are the recurring nightmare of any Operations team, especially on applications that have a demanding uptime SLA. Few SaaS systems are designed with that goal in mind. It usually takes a level of maturity and a sizeable customer base to get Engineering to consider revising the code to allow some sub-systems a no-downtime upgrade.
The problem is that is usually requires major revisions of the code if the system was not designed a-priori to handle a silent upgrade.
SLA Management
Whether you are using automated SLM or not (chances are you are not), you need to take into consideration the various aspects of your service that need to be metered, tracked and compared against a set of Service Level Objectives.
End User Broadcasting
Sometimes it is necessary to communicate with all of you users, or a sub group of them in real time. What a better option is there that to pop up a message on the end-user’s browser that you can compose on the spot or pull from a list of canned messages? The operations team need an integrated solution with the product to be able to do that.
Operations Console
And to tie it all up, a separate application that controls all the operational aspects of the service is needed. Included should be: provisioning/de-provisioning customers, password management, real-time login view of current users, real-time view of application usage, customer communication console, production environment control, etc.
There are many other operational features that are typically not found in a SaaS application as it leaves the factory floor. Just to mention a few: Integration, Reporting engine and reporting Database, Security, Scale-up and Scale-out capabilities, Sandbox, Status Page.
Development and Product management Experience Required
The VP Operations (or whatever title the job carries) is required to be a product manager of sorts, and a background in software development is almost a must. The Operations manager should be highly involved in the product roadmap and insist on having a say in defining future releases. There will be a contention between investing in Serviceability versus Functionality. Since the paying customers require more functionality, it is usually an uphill battle to gain service upgrades to the product.
While building an organic set of solutions into the product may practically take years, Operations cannot continue to throw bodies at solving scalability and downtime issues. So beyond influencing the product group on the direction in which the application should be developed, the Operations manager needs to build a set of tools addressing the Operations Support Systems needs as stated above. One option is to nurture a relationship with VP Engineering and get resources from her group to build well defined solutions that could each be completed in a couple of weeks of work. This is especially true in early stage companies where the Operations team is small and the engineers take the brunt of many of the operations’ tasks. The VP Engineering would appreciate the need as members of her team are feeling the pain as well.
In a more established company, the VP Operations must make sure that there are coding/scripting capabilities in the team, so simple projects and tools could be developed within the team, with minimal aid from the Engineering group.
Thursday, March 18, 2010
Change Management and the Sanctity of Production
“Most people are afraid of change. We love it!” (Sign of a beggar on a street in San Francisco, 2004)
(Note: This article is part of the STORM™ methodology)
Change is the greatest cause for service interruption in any IT operations. Period. Stop. Exclamation mark.
In the extreme, one might argue that change is the cause for every service interruption if one counts hardware or software malfunctions as a change as well.
SaaS operations tend to suffer more from a lack of proper change management for two reasons: First, the consequences of a service outage for a company whose entire existence depends on its service, is dire. Second, SaaS engineers, as I have written in numerous posts, lack the discipline that is more inherent in IT departments.
And yet, my experience has been that in most SaaS companies, changes are unsupervised, undocumented, unauthorized, unplanned, (sometimes unnecessary), underestimated and un_____ (fill in the blanks).
The importance of Change Management cannot be overstated and it is the first practice that I have implemented at companies that I worked at (or for). I wince when I recall the casualness which I have witnessed at various SaaS companies about making changes in the production system. I can quote my former boss, Mansur Salame, CEO of Contactual, saying that “production should be treated as sacred, with utmost respect appropriate to holy places” (or something of the sort). And, boy, were we sacrilegious back in those days!
In the Chapter on Change Management in the upcoming book, I will present my STORM™ adaptation with much detail.
In this post I will outline some guidelines for sane change management.
Sixty Seconds on Change Management
Below are listed the objects that comprise a comprehensive Change Management practice.
RFC – Request for Change document. Must initiate the process, regardless how small or major the changes are. It should include the what, the why, the when, the risk, the potential impact (on customers or components), and a checklist of notifications and tests that should/should not be done.
A Change Window must be defined, clearly notating what type of changes to what subsystems are allowed at which days, during what hours.
Change Calendar – which might be implemented in a number of static or automatic formats, must represent the ‘Change Window’, and depict all planned changes by the company, service providers and customers, and must be part of the RFC process.
Change Advisory Board or CAB is the pre-determined group or people who scrutinize and approve the RFC. The CAB may be large or small but it should include at least one person who is not involved in the request and planning process. The CAB may meet on a recurring schedule or as needed.
A Change Record is a record describing a change that occurred in production or the eco-system. It could be implemented as a database or excel or within a ticketing system. It should include the what, when and impact (on customers or components). Much important information could be derived from this data store that pertains to the Incident and Availability Management practices.
The Maintenance Plan is a detailed document defining the pre-requisite tasks, the maintenance tasks, rollback tasks and post-maintenance tasks. Each task should have a description, an owner, a time and duration. In most cases the plan must be scrutinized to the lowest detail level, and practiced in a Pre-Production environment that should mimic the production environment as much as possible.
The sixty seconds are over. This was just a teaser. Obviously there are templates, workflows and a sleuth of details that tie all of these objects into a well-oiled practice. The book will expand on the details and include the workflows, the templates and methods for automation.
To summarize; as the market matures and competition thrives, the big differentiator will be the second ‘S’ in SaaS and customers will become less and less forgiving. If a SaaS company does not practice a robust Change Management practice it will end up paying in a frustrated staff and customer churn.
(Note: This article is part of the STORM™ methodology)
Change is the greatest cause for service interruption in any IT operations. Period. Stop. Exclamation mark.
In the extreme, one might argue that change is the cause for every service interruption if one counts hardware or software malfunctions as a change as well.
SaaS operations tend to suffer more from a lack of proper change management for two reasons: First, the consequences of a service outage for a company whose entire existence depends on its service, is dire. Second, SaaS engineers, as I have written in numerous posts, lack the discipline that is more inherent in IT departments.
And yet, my experience has been that in most SaaS companies, changes are unsupervised, undocumented, unauthorized, unplanned, (sometimes unnecessary), underestimated and un_____ (fill in the blanks).
The importance of Change Management cannot be overstated and it is the first practice that I have implemented at companies that I worked at (or for). I wince when I recall the casualness which I have witnessed at various SaaS companies about making changes in the production system. I can quote my former boss, Mansur Salame, CEO of Contactual, saying that “production should be treated as sacred, with utmost respect appropriate to holy places” (or something of the sort). And, boy, were we sacrilegious back in those days!
In the Chapter on Change Management in the upcoming book, I will present my STORM™ adaptation with much detail.
In this post I will outline some guidelines for sane change management.
Sixty Seconds on Change Management
Below are listed the objects that comprise a comprehensive Change Management practice.
RFC – Request for Change document. Must initiate the process, regardless how small or major the changes are. It should include the what, the why, the when, the risk, the potential impact (on customers or components), and a checklist of notifications and tests that should/should not be done.
A Change Window must be defined, clearly notating what type of changes to what subsystems are allowed at which days, during what hours.
Change Calendar – which might be implemented in a number of static or automatic formats, must represent the ‘Change Window’, and depict all planned changes by the company, service providers and customers, and must be part of the RFC process.
Change Advisory Board or CAB is the pre-determined group or people who scrutinize and approve the RFC. The CAB may be large or small but it should include at least one person who is not involved in the request and planning process. The CAB may meet on a recurring schedule or as needed.
A Change Record is a record describing a change that occurred in production or the eco-system. It could be implemented as a database or excel or within a ticketing system. It should include the what, when and impact (on customers or components). Much important information could be derived from this data store that pertains to the Incident and Availability Management practices.
The Maintenance Plan is a detailed document defining the pre-requisite tasks, the maintenance tasks, rollback tasks and post-maintenance tasks. Each task should have a description, an owner, a time and duration. In most cases the plan must be scrutinized to the lowest detail level, and practiced in a Pre-Production environment that should mimic the production environment as much as possible.
The sixty seconds are over. This was just a teaser. Obviously there are templates, workflows and a sleuth of details that tie all of these objects into a well-oiled practice. The book will expand on the details and include the workflows, the templates and methods for automation.
To summarize; as the market matures and competition thrives, the big differentiator will be the second ‘S’ in SaaS and customers will become less and less forgiving. If a SaaS company does not practice a robust Change Management practice it will end up paying in a frustrated staff and customer churn.
Labels:
SaaS Management Culture,
SaaS Operations,
STORM™
Sunday, January 31, 2010
SLA Consequences to Service Operations
"What, me worry?" (Alfred E. Neuman)
In my previous post I discussed some basic concepts about SLAs, SLOs and penalties. As promised, I am addressing the ‘who cares?’ question.
From a Service Operations point of view, you may shrug your shoulders and claim that these are issues with the Legal and Finance departments. Most likely you were brought on board later in the game and never viewed an SLA until your were forced to do so.
As the person responsible for keeping all the services up and running, it may be best to keep the SLA to a minimum. After all, a document containing vague language, with little commitment and liability would be hard to wave in front of your face when the service levels drop.
I will argue that vagueness will play against you. A tough SLA will require the company to adhere to the high service levels they are committed to, and yes, pay the penalties for breaching these agreements. Keep in mind that if your service level drops one time too many, the legalese you will be hiding behind will not save your butt when customers drop from the service or simply do not renew.
I would take it even one step further. I advocate that the Service Ops managers bonuses are tied to achieving those SLOs that will keep a smile on the customers’ faces. (typically up-time and response time, but in some cases there are other objectives that are crucial to the customers). The carrot and the stick should work nicely to assure that you are doing the utmost to live up to the agreements.
Commitments
Another issue that concerns you (Service Operations) is that Sales are making commitments that you are suppose to keep, usually without you ever knowing about it. Operations needs to initiate a fact finding effort to learn what is there. You need to know what you are capable of providing. Everybody likes to state that they are five nines (99.999% uptime) but how many companies out there really are? You need to monitor and test your service over a substantial period of time before you commit to those numbers.
Another point in favor of having a good grasp of your SLA is that you, as a consumer of services would be conscience of your requirements vis-à-vis your service providers.
That will include the hosting services, your ISP, your communications provider, and whatever cloud services you are using. In my past positions as VP Service Operations I have been appalled by the contracts that my predecessors have signed with service providers. Some of them had no consequences to service level degradation. Others had ridiculous clauses such as 'for every hour of downtime, the credit would be for one hour prorated service cost' which meant that there was no real penalty. Another contract stated that we could get out of the agreement if for three months in a row(!) the service provided was available for less that 75% of the time.
We would have been out of business by then.
Where are those damn SLAs?
As we have seen, SLAs that are broad and meaningful will be complex. Add to that various service levels such as Standard, Gold and Platinum and the fact that some customers have negotiated special terms for themselves, and you are dealing with a mean, slimy problem.
To compound that problem, nine times out of ten, these documents are sitting on someone’s laptop in a PDF format with perhaps a hard copy in a dusty folder, in the cabinet below the espresso machine.
Imagine the exercise of figuring out if an SLA was breached for a particular customer, and if that breach carries a penalty.
I have painfully gone through that exercise too many times, and believe you me - I had much better things to attend to following a service outage. The process was extremely slow, finding the various documents, looking up the terms and comparing the events with them.
Then a calculation was needed as to how much credit was due. And all this was done for a single customer. Multiply that by the number of customers that may have been affected and you have just wasted many good hours of Solitaire.
SLA Management Tools
There are multiple tools out there (some are offered as SaaS) to manage your SLAs. Many of them provide a full cycle of defining SLOs, creating SLAs, generating the documents, monitoring performances against obligations, computing compensation and generating reports. I have not used any of them (although I used to work at an SLM ISV), so am not about to promote any single one, but there are very slick solutions available.
If you are at an early stage, it would be hard sell for you to justify to management that you need to start paying for a service that possibly no one in the company comprehends.
Typically, when a SaaS company starts out there are very simple, non-abiding, fixed SLAs, so there is very little attention paid to this aspect of the business.
As with any aspect of SaaS Service Operations, scalability issues hit you when you least expect them.
As most (all?) SaaS companies do not start with Service Level Management software, by the time it becomes a burden they will have many dozens, or hundreds of such SLAs. The effort of converting them to an automated system is daunting.
Therefore, you can start structuring your existing and future SLAs into a simple excel, or DB so that they are easily accessible, and comparable.
An example of a typical SLA would be stored in a table such as below.
The values for the various SLOs in the table were automatically populated from the definitions in the pre-defined Platinum and Gold tables (which state the default values for these SLAs). They may be overridden by specific values, following negotiations for a particular customer.
In the book I will elaborate on the structures and the tools and how to automate the compensation computations.
In my previous post I discussed some basic concepts about SLAs, SLOs and penalties. As promised, I am addressing the ‘who cares?’ question.
From a Service Operations point of view, you may shrug your shoulders and claim that these are issues with the Legal and Finance departments. Most likely you were brought on board later in the game and never viewed an SLA until your were forced to do so.
As the person responsible for keeping all the services up and running, it may be best to keep the SLA to a minimum. After all, a document containing vague language, with little commitment and liability would be hard to wave in front of your face when the service levels drop.
I will argue that vagueness will play against you. A tough SLA will require the company to adhere to the high service levels they are committed to, and yes, pay the penalties for breaching these agreements. Keep in mind that if your service level drops one time too many, the legalese you will be hiding behind will not save your butt when customers drop from the service or simply do not renew.
I would take it even one step further. I advocate that the Service Ops managers bonuses are tied to achieving those SLOs that will keep a smile on the customers’ faces. (typically up-time and response time, but in some cases there are other objectives that are crucial to the customers). The carrot and the stick should work nicely to assure that you are doing the utmost to live up to the agreements.
Commitments
Another issue that concerns you (Service Operations) is that Sales are making commitments that you are suppose to keep, usually without you ever knowing about it. Operations needs to initiate a fact finding effort to learn what is there. You need to know what you are capable of providing. Everybody likes to state that they are five nines (99.999% uptime) but how many companies out there really are? You need to monitor and test your service over a substantial period of time before you commit to those numbers.
Another point in favor of having a good grasp of your SLA is that you, as a consumer of services would be conscience of your requirements vis-à-vis your service providers.
That will include the hosting services, your ISP, your communications provider, and whatever cloud services you are using. In my past positions as VP Service Operations I have been appalled by the contracts that my predecessors have signed with service providers. Some of them had no consequences to service level degradation. Others had ridiculous clauses such as 'for every hour of downtime, the credit would be for one hour prorated service cost' which meant that there was no real penalty. Another contract stated that we could get out of the agreement if for three months in a row(!) the service provided was available for less that 75% of the time.
We would have been out of business by then.
Where are those damn SLAs?
As we have seen, SLAs that are broad and meaningful will be complex. Add to that various service levels such as Standard, Gold and Platinum and the fact that some customers have negotiated special terms for themselves, and you are dealing with a mean, slimy problem.
To compound that problem, nine times out of ten, these documents are sitting on someone’s laptop in a PDF format with perhaps a hard copy in a dusty folder, in the cabinet below the espresso machine.
Imagine the exercise of figuring out if an SLA was breached for a particular customer, and if that breach carries a penalty.
I have painfully gone through that exercise too many times, and believe you me - I had much better things to attend to following a service outage. The process was extremely slow, finding the various documents, looking up the terms and comparing the events with them.
Then a calculation was needed as to how much credit was due. And all this was done for a single customer. Multiply that by the number of customers that may have been affected and you have just wasted many good hours of Solitaire.
SLA Management Tools
There are multiple tools out there (some are offered as SaaS) to manage your SLAs. Many of them provide a full cycle of defining SLOs, creating SLAs, generating the documents, monitoring performances against obligations, computing compensation and generating reports. I have not used any of them (although I used to work at an SLM ISV), so am not about to promote any single one, but there are very slick solutions available.
If you are at an early stage, it would be hard sell for you to justify to management that you need to start paying for a service that possibly no one in the company comprehends.
Typically, when a SaaS company starts out there are very simple, non-abiding, fixed SLAs, so there is very little attention paid to this aspect of the business.
As with any aspect of SaaS Service Operations, scalability issues hit you when you least expect them.
As most (all?) SaaS companies do not start with Service Level Management software, by the time it becomes a burden they will have many dozens, or hundreds of such SLAs. The effort of converting them to an automated system is daunting.
Therefore, you can start structuring your existing and future SLAs into a simple excel, or DB so that they are easily accessible, and comparable.
An example of a typical SLA would be stored in a table such as below.
The values for the various SLOs in the table were automatically populated from the definitions in the pre-defined Platinum and Gold tables (which state the default values for these SLAs). They may be overridden by specific values, following negotiations for a particular customer.
Cust.
|
Calia
|
…
|
Google
|
Cust ID
|
123
|
…
|
213
|
SLSLA
|
Gold
|
…
|
Platinum
|
Uptime
|
99.9
|
…
|
99.99
|
Response time
| under 6 sec |
…
|
under 4 sec
|
Support Response time
|
2 hours
|
…
|
30 min
|
Support Avail.
|
12X6
|
…
|
24x7
|
Major Outage Resolution
|
1 hour
|
…
|
30 min
|
Partial outage resolution
|
4 hours
|
…
|
2 hours
|
Minor Outage Resolution
|
12 hours
|
…
|
6 hours
|
Maint. Notification
|
10 days
|
…
|
2 weeks
|
FTP
|
12 hrs
|
…
|
6 hours
|
Outage Notif.
|
Email 1 hours
|
…
|
email + call 30 min
|
In the book I will elaborate on the structures and the tools and how to automate the compensation computations.
Subscribe to:
Posts (Atom)