This morning I had the opportunity to chat with software engineers and data scientists at the AI Dev World Conference on a topic I just happen to be v...
Developers can get passionate when it comes to their toolkit because the tools they use can have a significant impact on their productivity and quality of work. Our team at Pinpoint is no different. We love trying out new dev tools that help improve our software, process, security, etc. We're all encouraged to research and pilot new tools to introduce to the rest of the org. With so many options coming onto the scene everyday, it can be overwhelming to decide what may work best for your situation.
Kubernetes has no shortage of Open Source and Open Core projects that help you solve any problem you might have. The list is so long that it gets a little daunting to pick the right one for the right job. Here are some of the projects we wanted to highlight that have helped us solve some of our problems recently.
What is it? Velero is an open source tool to safely backup and restore, perform disaster recovery, and migrate Kubernetes cluster resources and persistent volumes.
Problem we were solving: Our platform operations team spent a weekend trying to move our persistent storage from one cluster to another. Normally re-running deployments for a new cluster is as simple as changing the server endpoint, but migrating hundreds of volumes in a way that the new pods will pick them up with no data loss would be time consuming and possibly contain errors.
How we're using it: Using Valero to migrate our clusters was simple. We are able to define our persistent storage provider (in our case, we use S3) and three commands in a terminal later, we have snapshots of all of our resources, CRDs, and persistent volumes and claims cloned into our new cluster.
By running this automatically as a nightly pipeline, we are able to reduce our Recovery Point Objective (RPO) to 24 hours and our Recovery Time Objective (RTO) to 10-15 minutes. The footprint in S3 is minimal and the Disaster Recovery benefits are massive.
What is it? A Kubernetes-based event-driven autoscaler. KEDA allows us to scale our Kubernetes Deployments based on a wide variety of metrics.
Problem we were solving: Scale easily based on the number of events in our Event Queues.
How we're using it: Keda gives us an interface to define custom autoscaling metrics more easily than the default Horizontal Pod Autoscaler. Most of our system is event driven and most of those events are delivered via RabbitMQ through an abstraction layer called Event API. Keda allows us to easily scale on the size of the event queue without the need for a complicated Prometheus query.
What is it? By leveraging familiar programming languages for infrastructure as code, Pulumi is an open source SDK that enables sharing and reuse of common patterns.
Problem we were solving: Manual processes for managing infrastructure are often long, complicated, and error prone. Pulumi allows us to source control our infrastructure and have one place to store all of our configuration.
How we're using it: Pulumi allows us to create stacks and re-use those stacks across teams and Kubernetes clusters. All of our configuration and open source tools are defined and managed by Pulumi. This cuts down on configuration drift as well as allows us to preview any changes to our environments.
What is it? Monitor and manage cost and capacity in Kubernetes environments.
Problem we were solving: Kubernetes is often a black box for costs. We pay for the EC2 instances that make up our cluster but are we paying too much? Could we scale down any of our resources and use less nodes? Kubecosts provides an intuitive dashboard, suggestions on how to save money, and even lets you set alerts and alarms when a team goes over their allotted budget.
How we're using it: Right now we’re still working on getting our deployments tagged and our scalers working the way we want them. Kubecost right out of the box helps establish resource limits and requests that are otherwise just guesses.
These are just a subset of the dev tools we use internally to ensure our infrastructure can support the delivery of new features on an almost daily basis. Visit our changelog to see what features the team has been releasing to help fellow developers build their own software better.
Automating data science is hard, and we do a lot of it.
As part of our latest release, our Agent underwent a complete transformation in order to simplify the installation of in...