Navigating complex Puppet setups - part 1

In this series of blogs I will explain how to not get lost in complex Puppet setups.

December 16, 2014
puppet

The downside to ‘Puppetizing’ everything is that you usually end up with huge amounts of code. In this series of blogs I will explain how to not get lost in complex Puppet setups.

(Looking for part 2? You can find it here )

How to get a complex Puppet setup

Introduction

About a year ago one of my clients, a software development company called Avisi, asked me to help them design and build new infrastructure for their company. They ofcourse had a few requirements about security, scalability, performance, flexibility, etcetera. But there were a few notable requirements:

no more ‘sudo/root’ permissions for developers
eliminating manual changes
up-to-date documentation (or better: self-documenting)
self-service for developers

From the start, it was clear that automation was going to be a key part of our infrastructure. As we had prior experience with Puppet, we decided to ‘Puppet all the things’. That was easier said than done, though.

We had to deal with several different development teams that had very different requirements, wanted support for different Linux distributions, different versions of Java, Oracle, PHP, MySQL, etcetera. Then there were all the infrastructure services (DNS, LDAP, backup, syslog, monitoring), development services (SCM, build, test, deploy, repository, QA tooling), collaboration services (issue tracking, wiki) some websites and the usual ‘one-offs’, that all needed to be ‘Puppetized’.

Mission: successful?

One year down the road we seem to have accomplished everything we set out to do. We have eliminated root permissions for developers and manual changes, deploying new servers takes just minutes and includes fully automated configuration of monitoring and backup. But most important, the developers are actively using Puppet to deploy their applications on the development infrastructure.

This has resulted in a fairly large codebase. Some numbers:

total lines of code: about 380k
Puppet modules developed in-house: 91
Upstream Puppet modules used: 49
Downtime related to config errors: none in the past 6 months

The problem

While the above may seem pretty successful, our fairly complex Puppet setup did introduce a few new challenges:

some tasks that took just a few seconds before, now take up to a few hours because of code reviews.
in the past, everyone with rudimentary sysadmin skills could do perform basic sysadmin tasks. Now, basic Puppet knowledge gets you nowhere.
When you first start working with the Puppet codebase, it can be quite intimidating.
When you want to change something, it is sometimes difficult to find out exactly where to find the code that needs to be changed.
Puppet doesn’t guide you in how to handle includes/inheritance.

Solving the problem

There is no single solution to this problem, but there are a few guidelines that could help you steer clear from the common pitfalls of complex Puppet setups.

Separate data from code

Separating data from code has a few advantages. First, it allows for easy re-use of code. Second, it forces you to think ahead while writing code, make your modules highly configurable, and decide on sane defaults. Third, it allows you to expose the ‘data-part’, or node-classification separately, so the actual configuration of your nodes doesn’t necessarily require any programming skills.

When separating data from code, you obviously need a place to store your data. The most obvious choice currently is Hiera, which is built into Puppet. Hiera is a key/value lookup tool that uses a configurable hierarchy and supports multiple backends.

Other options are ENCs (External Node Classifiers) like the Puppet Enterprise Dashboard or The Foreman.

Depending on the node classifier you choose, you can configure nodes using YAML, JSON or web interfaces.

Separate upstream code from in-house code

There is nothing wrong with using good Puppet modules from the Puppet Forge. However, separating those modules from the modules you developed in-house has a few advantages:

you can use a Puppetfile and R10K to automatically deploy your upstream modules, as well as specify the exact versions you need.
it makes it obvious which modules are ‘static’ and should not be modified. (if modifications are needed, you simply fork the module and move it to your in-house code)
because of the generic nature of the modules, a separate location makes for an easily searchable ‘library’ of generic functionality.

Do not make spaghetti

It’s easy to build a Puppet module that includes or inherits another. And another. Which includes another. Etcetera. As long as no ‘inheritance loop’ is created, Puppet will not complain.

But your developers will, because you are making spaghetti.

This is not to say that inheriting of including other modules is bad, because it isn’t. You just need to decide on certain rules.

Confused? Read on.

Classifying modules

It can be a very good idea to classify modules. For instance:

generic level: usually upstream modules.
company level: defining a company standard for certain functionality.
project level: implementing the company level module but with certain project-specific settings.
role level: creating a role which consists of a set combination of modules.

We could easily use the classification levels defined above to avoid inheritance ‘spaghetti’:

Any module can only implement or inherit modules that are less specific than the module itself.

What’s next?

In the next part of this series I will dive a little deeper into module classification, managing upstream modules, and setting up a local development environment.