Not all watchers are created equal (or how to make yak shaving useful)

14 Aug 2018

Clojure  Tools 

Originally posted at https://tech.labs.oliverwyman.com/blog/2018/08/14/not-all-watchers/

I’ve been hacking around with a Clojurescript project recently, and it resulted in a certain amount of yak shaving when I found the watcher system I was using was eating a lot of CPU. On the one hand, yak shaving is bad, because you’re doing other things that aren’t the core task you’d originally meant to do, but there are yaks and there are yaks. In my particular case, I try to make sure my yak shaving results in some improvements to open source projects I’m using and I’d like to encourage you to do the same (which technically makes this the 3rd in my series of “you should be contributing to open source” posts).

Some background first: ‘watchers’ are programs designed to ‘watch’ your source files and do something when they change e.g. run linters/test suites, reload source files in a server, hot-reload into a web browser, etc. They’re ubiquitous in a great many settings, especially for web work, but the linter/test suite case is useful for pretty much everything. There’s versions of them for most languages/frameworks out there, or they might even be built into your existing build tools, and I’d recommend their use for all development work these days.

There’s two core ways to build one: polling or event-based. Polling works by repeatedly checking modified times on files, and is easier to implement and more portable, but can end up being more CPU/IO-intensive especially with the fun trade-off between poll interval v.s. response time, which you want low enough to spot stuff earlier, but high enough to keep the overhead reasonable. Event-based options on the other hand work by asking the operating system to tell you about new changes. This is lightweight and fast, but needs OS-specific support. Most OSes have at least one option for this (epoll/inotify on Linux; ReadDirectoryChangesW on Windows; FSEvents on macOS, etc), but their models of what you can/can’t watch vary and have a tendency to get deprecated/replaced in later versions of an operating system.

I was using lein-auto to make some updates to my project (along with Figwheel which is another watcher in it’s own right), and this was fine until I Dockerised the project. Suddenly something was eating all my CPU even when nothing much was really happening. I eventually managed to narrow down the problem to lein-auto, which was rather surprising, until I read the source code and found out that it was using polling with a default 50ms interval. Apparently on a Mac, the Docker implementation and VM bits with a mounted volume mean that file system operations take a little bit longer than usual and so this was hammering my system.

Ok, let’s rewrite it with a watching implementation. Early work with this still appeared quite slow, and some searching later found me “Is Java 7 WatchService Slow for Anyone Else?” and a 6+ year old JDK bug basically just noting “polling is good enough for Macs, but we should look at this at some point for JDK 9” (for those keeping track, latest is JDK 10). Has anyone done better versions of this? Well, there’s the BarbaryWatchService which is an implementation of a very similar interface to the WatchService, but with proper FSEvents support. Sadly, this had bitrotted rather badly in the three years since it’s last commit, so I went through and fixed all of that, and hopefully the project will wake up at some point.

Of course, that wasn’t directly relevant since I was going to be running this process under Docker, and not directly running on a Mac at all, so I could just use the WatchService directly for my lein-auto changes. However, I’ve now (in the course of writing this) found a library called hawk that abstracts over both WatchService and BarbaryWatchService, although that has it’s own issues I might now have to have a look at as well.

Previously: How to be a Rockstar developer! Next: Even more Rockstar: using WebAssembly to run Rust code in browser