Some applications seem as though they were designed explicitly to make automation hard. When working with legacy software that has, over time, become a core component of the business infrastructure, it’s not always possible to “just rewrite” the software or “replace it with something modern.” While the results aren’t always pretty (and are rarely ideal), in most cases automation around these legacy systems can result in improvements. Case in point: Watson Explorer.
Watson Explorer Automation
Watson Explorer is a scraping/indexing search software (different from Watson). The proprietary configuration of this software involves a combination of xml and property files that are processed by a custom jar into a configuration package. A total of 71 servers ran this particular software, supported by two people who were constantly swamped.
Based on our research and discussions with the vendor, we realized that there was no precedent for this type of automation. As is the case with most enterprises, these types of applications require a lot of care and feeding. We knew we would need to solve a number of key challenges in order to implement an automation-first approach.
Below, we’ll describe the automation approach we took.
Jenkins Pipeline Implementation
Here are the manual steps that the two-person team performed for each update of the application’s configuration:
- Check out code from SVN, make changes, and commit back to SVN. (This process often involved tedious manual changes.)
- Log in to jump server that can access application servers.
- Check out code from SVN to jump server.
- Collect files into the correct directories.
- Run preprocess script.
- Run main process jar.
- Perform cleanup.
After moving the application code over to Git, we implemented the following Jenkins pipeline. This pipeline condenses all the manual steps following code commit into one flow, making the process repeatable and reducing the likelihood of errors.
The team can deploy to numerous environments. This parameter lets the team choose the environment with which they want to work. Because there isn’t necessarily progression from one environment to another, we can’t set up a dev -> qa -> prod type of pipeline.
Occasionally an older version will need to be deployed when testing older versions.
No proper linting tool is available to validate that the generated configuration matches what the application expects, so we’re defaulting the pipeline to a “dry run” that allows the team to manually review the created configuration.
This stage will grab a specified tag if “latest” isn’t specified.
This stage copies all relevant files into one directory for processing.
This stage is meant to reduce manual edits. The application is expecting a property file with sequentially numbered properties, so adding a property into the middle of a few hundred other properties means all the following properties need to be renumbered.
The solution was to replace the numbers in the property files with placeholders and use this stage to convert the placeholders to sequential numbers.
“Dry run” vs. “actually apply” is determined from an entry in a property file. This stage flips that property flag if needed.
Some environments need some other tweaks to the configuration before main processing. Those tweaks are stored in one script file that is executed by this stage.
This stage runs the main processing jar. This jar will also deploy the configuration to the specified environment if the appropriate flag is set. An ssh key is needed to connect to the application servers; this key is provided via the Jenkins Credentials Plugin.
Attempt to remove the key again in case an error occurred that prevented it from being deleted.
This is a pause to allow the team to check the files that were created and validate that the configuration looks good.
It’s important to periodically step back and think about our work from a high level to see where we can optimize and eliminate manual steps.
Oftentimes small optimizations, such as the Watson Explorer optimizations above, can have a big impact on the time spent on individual tasks. To be clear, the pipeline shown above isn’t overly complex. Instead, it essentially automates manual steps, making the entire process much faster and more reliable.