In conjunction of the recent release of, The Rise of Skywalker, the final movie in the Skywalker saga, I figured an image of stormtroopers was in order. I mean, what could better represent ephemeral Jenkins build agents than the massive unending population of stormtroopers who lead ephemeral, short-lived lives (especially when the Jedi are around)?
Here’s the problem: Many organizations have a single Jenkins master and a limited number of build agents with no ability to auto-scale Jenkins to meet delivery team demands. The DevOps team is left guessing as to the number of build agents needed. As more teams onboard onto the pipeline, this scalability issue creates delivery bottlenecks. It’s a serious and all too common challenge for enterprise organizations. In this post, I’ll review the Jenkins problem in more detail and then propose a container-based CI/CD solution for using ephemeral Jenkins masters with Kubernetes.
The Jenkins Problem
In many organizations, Jenkins is implemented in one of two approaches: (1) centralized management (top-down effort) and (2) individual team management (grassroots effort).
This first approach involves deploying a single Jenkins master with multiple Jenkins agents that teams use to run their build workloads (typically on physical hardware or virtual machines) completely managed by a centralized team. Essentially, this centralized team attempts to solve and support all permutations each individual delivery team will need from a pipeline (e.g., build/deploy resources and plugins).
In reality, this centrally managed approach doesn’t scale well as the demands for CI/CD grow in the organization. Building, configuring, and maintaining Jenkins agents for all of the organization’s CI/CD pipelines becomes difficult and risky to maintain. In addition, having a centralized Jenkins master can become a liability for the enterprise because it introduces a single point of failure -- if the single Jenkins master goes down, no team will be able to build, test, and deploy their applications. Limited scalability and downtime for the CI/CD orchestration are very costly. Just as importantly, having a central management team inhibits product teams’ ability to innovate and experiment with new Jenkins plugins or processes without first receiving change or update approval. The central team, in turn, often struggles to keep up with requests and update demands from the organization.
The second approach involves each team managing their Jenkins master and agents on physical/virtual hardware. Individual team management solves a lot of the issues associated with configuring plugins and understanding what each team needs to build and deploy software. However, the problem of scalability is still very present. The success of this approach also requires each team to have the time and skills needed to manage a Jenkins master, Jenkinsfiles, and the underlying infrastructure of the CI pipeline. With variable CI/CD pipelines, consistency also becomes a major challenge. In other words, multiple variations of deployed/released code become extremely difficult to manage in a streamlined manner across an enterprise.
Regardless which approach is used, organizations that use physical hosts or VMs may either overprovision resources or underutilize hardware, incurring unnecessary costs as a result of pre-purchasing compute power to build the physical hosts/VMs. It’s a guessing game: either you have more than enough compute resources, or your Jenkins builds will queue. CloudBees, a licensed version of Jenkins, attempts to solve the Jenkins scaling problem associated with both approaches. However, it doesn’t enable burstable scaling of Jenkins masters/agents. It also has licensing costs and is not open source.
The Jenkins Solution
While organizations have multiple products to choose as their CI/CD platform orchestrator, Liatrio’s point of view is that open source Jenkins is one of the most robust CI/CD orchestration platform for ensuring that an organization can successfully build and deploy software across multiple platforms. It’s also our point of view that Jenkins needs to be able to dynamically scale to support an organization's needs.
In addition, we believe organizations should run their Jenkins infrastructure using a container-based approach on Kubernetes. Containers increase the speed of delivery by offering developers a mechanism for defining their infrastructure as code with high fidelity between development and production environments, which supports efficient CD pipelines. Workload availability increases through the ability to rapidly respond to additional load or failure by quickly scaling through the launching of additional lightweight containers. Finally, operational costs are decreased through higher density of workload deployments with multi-tenant container orchestration platforms such as AKS, EKS, GKE, and OpenShift.
Below, I’ll discuss how enterprises can leverage containers with Kubernetes to create a stable, scalable, and empowering CI/CD environment. In particular, I’ll focus on how to use ephemeral Jenkins masters with Kubernetes to support the needs of large enterprise teams.
Ephemeral Jenkins Implementation Overview
I’ll start with some assumptions:
- Teams are already familiar with creating and editing Jenkinsfiles.
- Recipients of the Jenkins master (a business unit, DevOps enablement team, or delivery team) understand and accept the responsibility of maintaining a Jenkins master.
- Teams will have limited ability to manage the master - plugins and configuration changes will be handled via InnerSource.
- The Jenkins configuration as code plugin will be leveraged to configure the master.
- All Jenkins logs will be persisted in a logging system such as ElkStack.
- Pipeline stage executions aren’t memory intensive. For example, a Jenkins system based on Kubernetes can’t support automation runs that require more than 4GB of memory.
- A managed Kubernetes service will host the client implementation and is already in place.
Here are some key recommendations:
- The implementation strategy depends on the abilities and maturity of each enterprise. For example, early in the DevOps transformation a delivery team may not be ready to manage a Jenkins master. In such cases, we recommend that a higher-level business unit manage Jenkins masters. However, if the enterprise is further along in its transformation and is less dependent on a centralized team, then Jenkins masters can be managed at the business unit or team level.
- All Jenkins masters should start with the same base image - the base masters will grow via InnerSourcing, and changes should be persisted only via pull request to the master branch.
- Ephemeral environments exacerbate the challenge with secrets. As a result, an API-based key and secret management solution should be required to avoid exposing secrets in code. Organizations can leverage Hashicorp Vault to store all of the secrets for the cluster and the namespaces within the cluster.
- If teams are not ready to manage their own Jenkins master, then a default template Jenkinsfile should be provided for the team, thus enabling them to learn, experiment, and migrate their product at their own pace without delivery disruptions. Liatrio’s approach to team enablement is the Dojo-based “teach a team to fish” approach, which fosters organizational transformation and a new way of working on a Delivery Teams existing product and backlog.
- We strongly suggest defining your Kubernetes infrastructure as well as your Jenkins infrastructure as code. Defining all the components in your Jenkins infrastructure as code helps to facilitate configuration changes and upgrades all of the Jenkins master as the number of Jenkins instances in your infrastructure grows. To manage this infrastructure as code, we recommend using Terraform. The key benefit of using Terraform is that the state of your managed infrastructure along with all of the Jenkins instances for each team is saved and can be compared before rolling out new changes. This allows for easier configuration changes and updates to Jenkins when rolling out updates.
When an Enterprise Should Consider Using Ephemeral Jenkins Masters
An enterprise should consider using ephemeral Jenkins masters in the following two scenarios:
- Scenario One - Start with a centralized solution and then organically scale out to multiple masters. DevOps is usually centralized early in a transformation until the community is robust enough to stand on their own. The initial centralized system should be configured via automation to ensure a smooth transition to the ephemeral Jenkins masters.
- Scenario Two - With an implementation vehicle in place such as a Dojo, delivery teams can learn how to administer their own master.
We recommend implementing ephemeral Jenkins masters/agents for each product/business unit using Kubernetes. Engineering teams will have more autonomy and control over how Jenkins instances are configured in terms of plugins and availability while also maintaining a base level of standardization regarding how Jenkins is configured. The current DevOps tools team will also be able to better scale Jenkins across the enterprise.
Masters and agents will run in containers on Kubernetes. Here are some guidelines to follow:
- Provide declarative pipelines as code in an Innersourcing hub for the technologies that are in use today, with the ability to expand to other technologies in use across the enterprise.
- Reduce or eliminate the need to use Jenkins plugins by replacing them with function-based builder images. (Note: Some plugins may still be necessary.)
- Ensure any shared libs are open to the entire enterprise in favor of community-based reusable code, essentially crowd-sourcing the pipeline capabilities across the enterprise. (Shared libraries are excellent candidates for Innersourcing.)
- Discourage manual configuration and the use of customization plugins. Changes should be declared as source code via the configuration as code plugin.
- Integrate security scans into pipelines (e.g., container scanning, SAST, DAST, and IAST) using security scanning tools such as JFrog Xray, Twistlock, and WhiteHat Scans.
- Execute Jenkins stages in technology-based containers (e.g., Maven and NodeJS) to avoid issues with tool installation on slaves and reduce the use of plugins as much as possible.
- Employ a base Jenkins master container to deploy Jenkins masters across the enterprise. This base master should be centrally managed or owned by a given team (we recommend that it be the DevOps tools team).
- Ensure the DevOps tools team manages base technology containers (e.g., Maven and NodeJS).
- Implement a solution to enable delivery teams to build and run their applications in local Docker containers on their machines. This is another systemic enterprise problem that we often encounter. We will address our thoughts and recommendations on this issue in a future blog post.
At the end of the CI phase, the pipeline produces two artifacts, which are versioned and pushed to Artifactory: (1) the product artifact such as a .jar file, and (2) the product artifact installed on a container and stored as a container image.
Here are some Kubernetes implementation guidelines to follow:
- Ensure Jenkins pipelines execute on containers in Kubernetes.
- Source control Kubernetes managed service configuration and deployment scripting for the Jenkins containers.
- Ensure each product team has their own namespace in the Kubernetes cluster.
- Ensure Vault/Consul live in their own namespace in the cluster.
- Create a system namespace in the cluster for any shared resources.
- Build in managed Kubernetes services (e.g., AKS, EKS, GKE, and OpenShift).
Jenkins Pipeline Implementation Results
- Engineering teams will gain more autonomy and control over how their Jenkins instances are configured today in terms of plugins and availability. Teams will be able to make decisions about Jenkins implementation and test new plugins or changes before adding them.
- Pipelines will be declarative by technology types to ensure teams consistently apply the pipelines. Application build/deploy steps will be consistent across all teams, and teams will be able to clearly see what steps are taking place in the pipeline and implement their products more easily.
- Teams will be able to run automated performance/regression test suites at any time, leading to earlier detection of issues.
- Security scanning requirements will be a shared library within the pipeline and run on every build to ensure compliance. As a result, teams will be aware of the security compliance status of their applications at all times. The security organization, in turn, will be able to update and control the policies and checks enforced in the shared library and act as the overall owner of the security testing shared library.
Kubernetes/Jenkins Implementation Results
- Teams will have a scalable solution that provides the build instances they need to run builds, automated tests, and deployments at any time. This scalable solution will also provide on-demand scaling for Jenkins build agents.
- The base configuration of the Jenkins master will be controlled by a centralized tools team, standardizing the Jenkins master across the organization. Product teams will be able to provide updates via Innersource pull requests, giving them some autonomy over their Jenkins masters. As a result, delivery teams will have a more scalable Jenkins solution, a replicable process, and greater availability of Jenkins instances, as well as decrease the blast radius of downtime and outages of Jenkins due to distributed master nodes.
- The centralized DevOps tools team will be able to better scale Jenkins throughout the enterprise.
In my next post in this series, I’ll have much more to say about implementing Ephemeral Jenkins masters with Kubernetes. Until then, reach out if you’d like to learn more!
Liatrio is a collaborative, end-to-end Enterprise Delivery Acceleration consulting firm that helps enterprises transform the way they work. We work as boots-on-the-ground change agents, helping our clients improve their development practices, react more quickly to market shifts, and get better at delivering value from conception to deployment.