Private Clouds Need a Share of an SRE

There are a lot of ways that big cloud companies run their IT estate differently to how enterprise IT companies run their estate. One of the significant differences is that cloud vendors have developers operate their infrastructure, this is the Site Reliability Engineer (SRE) role that Google talk about. The SRE role is approximately like the IT operations function in Enterprise IT, but at a scale that enterprises never experience. In Enterprise IT when something is broken an Operations engineer will fix the problem, then move on to the next task. In an SRE role, the problem is not simply fixed; the SRE will make software changes to the underlying platform to avoid the problem ever occurring again. This difference in approach is one of the essential elements to moving from IT as pets to IT as cattle (or probably poultry since a cow, sheep, or goat is a pretty expensive & non-disposable asset.)

My original Platform 9 Meme, thanks for the reminder Sirish

The SRE skill set is very different to the IT operations skill set, much more like the software developer skill set but with a focus on infrastructure and platform software rather than business process or data management. There is a lot of overlap with IT Ops; long-term and systems thinking, rapid response to a crisis. But an SRE will be familiar with the programming language of the platform so that they can make changes to that platform; they will use source management tools to a far higher degree than most IT Ops teams. There is high demand for SREs among the cloud vendors and big infrastructure development businesses, so it can be hard to recruit an SRE to an enterprise organization. It can also be hard to justify hiring an SRE. Facebook has around 20,000 physical servers per SRE, an enterprise private cloud with 5,000 physical servers is probably only just large enough to need one SRE. Managing even 500 physical servers with traditional enterprise IT Operations techniques will require a team, not a single engineer. It seems like sharing an SRE team over several enterprise private clouds might be a good idea.

One way to look at what Platform 9 offers is that they have a team of SREs who operate a Cloud Management Platform (CMP) on behalf of Platform 9’s customers. The original product from Platform 9 is a CMP for your on-premises physical infrastructure, that is delivered as a cloud service. The SREs at Platform 9 develop and maintain the CMP that is used by clients to manage over 300 private clouds. All of the customer clouds run on-premises on hardware owned by the client and in many cases shared with traditional enterprise IT workloads.

Platform 9 has added Kubernetes for container application management and Fission which is an open-source functions as a service (Serverless) platform to the existing IaaS services. The Kubernetes and Fission workloads do not need to run on-premises. The same Platform 9 interface can manage container and serverless applications on-premises or on a variety of public cloud providers such as AWS and Google GCP. Platform 9 provides a cloud management platform to manage Kubernetes and Serverless in a hybrid cloud model. This way they are addressing the need for application portability across the private and public cloud. With Platform 9’s products, an enterprise can have the same CMP managing both on-premises and public cloud-based applications. All without needing to recruit and retain any SREs of their own, gain the benefit of Platform 9’s SREs.

© 2018, Alastair. All rights reserved.

About Alastair

I am a professional geek, working in IT Infrastructure. Mostly I help to communicate and educate around the use of current technology and the direction of future technologies.
This entry was posted in General. Bookmark the permalink.