Domain Expert vs Generalist

When should you use a blunt generalist tool, and when should you use a sharper domain-specific tool?

I posted a question on Serverfault recently, and received a relevant answer that wasn’t quite what I was looking for:

Systemd – How do I automatically reload a unit, when another oneshot service is fired by timer?

My reply to the answer thanked him for it, but mentioned that I think systemd is the right place to do this “sort of thing”. In reply to my reply, he told me that systemd is “absolutely the wrong place” to do this sort of thing, which is pretty strong language!

I think we’re approaching this from different perspectives here, so let’s break the problem down in general terms.

The Problem

I have two services. It doesn’t really matter what they are, but one of them is a daemon (running all the time), and the other is a background process. The daemon relies on the output of the background process. Let’s call the daemon “web” and the background process “updater”.

Web is the primary service, which needs to run all the time. Updater is a background maintenance task that runs periodically on a schedule, but it needs to signal (reload) the web service whenever it performs an action. This is a pretty common pattern in systems administration.

The Domain Expert Approach

Updater has a configuration file, and the ability to add hooks. So when an update happens, I can run a shell command to reload the web service.

On the face of it, the obvious thing is to use this functionality to reload the web service. This is basically the answer I was given; it’s simple, effective, and I probably should have preempted it. It is the domain-specific solution, in that is uses the configuration of the service itself to achieve what you want.

But in the real world, things aren’t often this simple. I’m looking a more general solution, mostly as an intellectual curiosity, but also because there are often good reasons for keeping service relationships in one place.

A Generalist Solution

The service manager (systemd) is the source of truth when it comes to the state of your services, and one of its key jobs is to manage the relationships between them. It is designed to declare, in one set of configuration, “service X requires Y to be running”, with the implication that Y would be started before X, and X would be shutdown before Y. Signalling a service on completion is a similar kind of relationship.

What should updater do if web isn’t running? Perhaps it shouldn’t be run? Or perhaps we should run updater, but not trigger the reload. The service manager has this information, but it’s impossible to encode this logic from the updater without an error-prone shell script.

When updater runs successfully (exits 0), I want to reload web. Let’s say the systemd option, called PropagatesReloadTo, reloads the target unit when the service is started (to be clear it doesn’t, it only propagates a reload command which you send the source unit).

The solution would be systemctl edit updater and add a line which says PropagatesReloadTo=web.service, and web would be reloaded when updater runs successfully.

NB: The actual solution working is ExecPostStart=systemctl reload web, but let’s pretend we have a native declarative solution here, which is what the question was about.

Tradeoffs of my generalist approach

This hypothetical systemd approach is a blunt, generalist instrument from the perspective of the updater, because the scheduler only knows the outcome of the job in binary terms – whether it succeeded or failed. There are more nuanced outcomes; for example the job may run, but perhaps it didn’t have any work to do, so no reload would be required. It’ll exit zero whether it did work or not, and the reload will always be signalled. So that’s one clear advantage to using the service’s own logic.

On maintainability Michael argues:

Systemd is absolutely the wrong place to do this sort of thing. You may need different deploy hooks for each DNS name, for instance, (e.g. getting certs for both nginx and postfix, each with different names) and that gets completely unmaintainable if you try to cram it into a systemd unit or override.

Perhaps we are talking cross-purposes with the ill-defined “sort of thing”, but I don’t agree.

I don’t need separate hooks for each DNS name. There is one certbot service, and one nginx service. Adding more DNS names doesn’t require more certbot services, and adding more service dependencies doesn’t either. Systemd overrides are declared as overlays in /etc/systemd/system/$name.service.d/override.conf, but you don’t even have to know this, you can just run systemctl edit.

It’s one file, with a single line for each service that’s signalled. One overlay for each source node in the relationship and one line in it for each target? That’s about as maintainable as it could be really.

The Advantages

In my view, the generalist approach is more maintable as the number of services grows, particularly if they have their own unique configuration paths and syntax. It’s also more consistent and easily discoverable.

When you consider more complex systems with lots of dependencies (micro services), doing systemctl list-dependencies web and seeing all the units it depends on at a glance, along with their status, is a powerful tool. Also I can do this without any prior knowledge of the system, other than the Linux distribution. This last point is particularly important when it comes to systems that multiple people are going to interact with, and with systems that have in-house software that would not be encountered anywhere else. A standardised way to declare these relationships is extremely useful, and much easier to discover with common knowledge.

For a domain-expert solution, I first need to know that the updater exists; the web service logs are not going to tell me where the reload signal came from.

Conclusion

Circling back to the start, I think Michael and I were approaching the problem from different perspectives. I wanted a general solution that could be more broadly applied; he was indicating the best way to fix a specific problem within the domain of letsencrypt, when it was really just an example.

As for which approach to take? It depends, but I prefer common general approaches when they meet the requirement.

With lots of small services that interact, there’s a lot of logic in the relationships, and a common approach to integration is essential. If you’re writing configuration management code, stick with generalist tools and common integration points or you’ll end up with lots of tightly-coupled modules and circular dependencies.

Otherwise, if the logic on whether to signal depends on state within the domain, stick to the domain. If the logic depends on external state? Go generalist.

But with just a few simple services on one web server? Take your pick. There are no absolutely right or wrong answers here.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.