7 min read
- How do I run multiple interconnected Kubernetes clusters for high availability? The number one question you will hear from people starting to build a solution on top of Kubernetes is how will they make it resilient for customers. Unfortunately, federation in Kubernetes today is in the same realm as "Kubernetes on Kubernetes" and "Openstack on Kubernetes." It is merely an engineering fantasy without any tether to the reality of customer demands. What are most people asking for in regards to federation? They have trouble verbalizing it but I'd break down to three things:
- a) Kubernetes Cluster Management - This isn't the same as a Kubernetes Dashboard of Dashboards. Users want to manage the clusters and not the hosted objects regardless of cluster location, lifespan or intended use. They want a way to easily spin those clusters up anywhere, interconnect them if they must, and manage their entire lifecycle. Most importantly they want these clusters to be largely identical across environments.b) Granular Monitoring and Aggregated Logging - Users want robust monitoring on both the hosted Kubernetes objects, via their application performance management (APM) solution, and for the underlying Kubernetes infrastructure itself. There are third party tools like DataDog, Sysdig and open source Prometheus from the CNCF that accomplish this goal.
- c) Deployment Workflow to Many Environments - Customers are asking for workflow for lightly coupled, eventually consistent, homogenous clusters instead of tight orchestration among clusters in a distributed system. The difference is you won't have "bursting" or a single cluster spanning multiple data centers. Users want to know that if an availability zone or environment goes down, then their higher level service will still be up and running because it was also hosted on an identical cluster somewhere else. Their customers won't experience an outage. They also want to know that deployments will work across clusters, and these clusters must be homogenous for that to work.
- Currently Kubernetes federation is complex and doesn't really get to the heart of most of the issues listed above. For Kubernetes federation right now you host some of the Kubernetes components - API Server, Scheduler and Controller Manager - on one of the Kubernetes clusters. Even with the new OSS tool (kfed) to help, that's not easy. When you have it working, many of the Kubectl commands are not supported and the entire architecture is fragile. The current federation approach provides some of the deployment workflow, when it works, but addressing part of the above concerns is not enough.
- How do I use Kubernetes as a component for my single application? Probably one of the most surprising things you hear from users is they are using Kubernetes to support a single application and not as a generalized container service. Around 80% of the users I have talked to fall into this bucket. Many of the case studies I see from large organizations have many siloed teams working on different services using Kubernetes. That is very different from the original intent of using Kubernetes to run the entire data center like Google.
- How do I integrate my load balancer and identity management? The single most cited reason people I have talked to gave for using a particular Kubernetes distribution, over simply using the vanilla version, is out-of-the-box load balancer and identity management integrations. All the other features such as single vendor user interfaces, additional storage features and the like never come up. There needs to be better integration with load balancers and identity management solutions if customers are going to use the pure upstream version that are compatible across clouds.
- Create New Clusters Anywhere - DC/OS allows users to spin up identical Kubernetes clusters, and supporting data service components, as needed in any environment. Each team can have its own isolated cluster or a single team can spin up multiple clusters for different life cycles. Clustered services can be updated to newer versions with zero downtime while capacity can be added as needed.
- Connected Service Components - modern applications rely on an ever growing list of open source data services, for example distributed databases like Cassandra and MongoDB, and stream processing like Kafka and Spark. To get the most out of these services often requires machine learning from additional frameworks like Tensorflow and PyTorch. DC/OS supports a variety of frameworks that can be spun up, connected, and accessed immediately. The service catalog is enabled by a powerful SDK that lets OSS communities, customers, and DC/OS users to integrate their solutions on the platform with minimal code and deploy them with a single click. A full list of DC/OS packages can be found in the service catalog https://universe.dcos.io/#/packages.
- Load Balancing for Thousands of Nodes - Securely exposing services in a containerized distributed environment is difficult due to its dynamic nature. Recently, Mesosphere released Edge-LB that allows many siloed teams to have access to a load balancer that was built with scalable microservices and containers in mind. With Edge-LB, teams don't need to worry about collisions due to identically named services or about scaling the load balancer itself because Edge-LB is hosted on DC/OS and is as scalable as every other framework. It conforms to and expands the Container Network Interface (CNI) standard so plugins like Calico can utilize it.