April 17, 2024 Sam Thursfield

2024

In which I meet QA testers, bang my head against the GNOME OS initial setup process, and travel overland from Scotland to Spain in 48 hours.

Linux QA meetup

Several companies and communities work on QA testing for Linux distros, and we mostly don’t talk to each other. GUADEC 2023 was a rare occasion where several of us were physically collocated for a short time, and a few folk proposed a “GNOME + openQA hackfest” to try and consolidate what we’re all working on.

Over time, we realized ongoing lines of communication are more useful than an expensive one-off meetup, and the idea turned into a recurring monthly call. This month we finally held the first call. In terms of connecting different teams it was a success – we had folk from Canonical/Ubuntu, Codethink, Debian, GNOME, Red Hat/Fedora and SUSE, and there are some additional people already interested in the next one. Everyone who attended this round is using openQA and we will to use the openqa:opensuse.org chat to organise future events – but the call is not specific to openQA, nor to GNOME: anything Linux-related and QA-related is in scope.

If you want to be involved in the next one, make sure you’re in the openQA chat room, or follow this thread on GNOME Discourse. The schedule is documented here and the next call should be 08:00UTC on Thursday 2nd May.

GNOME OS tests

On the topic of QA, the testsuite for GNOME OS is feeling pretty unloved at the moment. Tests still don’t pass reliably and haven’t done for months. Besides the existing issue with initial setup where GNOME Shell doesn’t start, there is a new regression that breaks the systemd user session and causes missing sound devices. Investigating these issues is a slow and boring process which you can read about in great detail on the linked issues.

Fun fact: most of GNOME OS works fine without a systemd user session – there is still a D-Bus session bus after all; systemd user sessions are quite new and we still (mostly) support non-systemd setups.

One thing is clear, we still need a lot of work on tooling and docs around GNOME OS and the tests, if we hope to get more people involved. I’m trying my best in the odd hours I have available, greatly helped by Valentin David and other folk in the #GNOME OS channel, but it still feels like wading through treacle.

We particularly could do with documentation on how the early boot and initial setup process is intended to work – its very hard to visualize just from looking at systemd unit files. Or maybe systemd itself can generate a graph of what should be happening.

Magic in the `ssam_openqa` tool

Debugging OS boot failures isn’t my favourite thing. I just want reliable tests. Writing support tooling in Rust is fun though, and it feels like magic to be able to control and debug VMs from a simple CLI tool, and play with them over VNC while the test suite runs.

Using a VNC connection to run shell commands is annoying at times: it’s a terminal in a desktop in a VNC viewer, with plenty of rendering glitches, and no copy/paste integration with the host. I recently noticed that while openQA tests are running, a virtio terminal is exposed on the host as a pair of in/out FIFOs, and you can control this terminal using cat and echo. This feels like actual magic.

I added a new option to ssam_openqa, available whenever the test runner is paused, to open a terminal connection to this virtio console, and now I can debug directly from the terminal on my host. I learned a few things about line buffering and terminal control codes along the way. (I didn’t get ANSI control codes like cursor movement to work, yet – not sure if my code is sending them wrong, or some TERM config is needed on the VM side. – but with backspace, tab and enter working it’s already fairly usable).

Here’s a quick demo of the feature:

Available in the 1.2.0-rc1 release. Happy debugging!

Cross country travel

Most of this month I was on holiday. If you’re a fan of overland travel, how about this: Aberdeen to Santiago in 48 hours; via night train to Crewe, camper van to Plymouth, ferry to Santander and then more driving across to Galicia. A fun trip, although I got pretty seasick on the boat until I’d had a good nights sleep. (This wasn’t a particularly low-carbon trip though despite going overland, as the train, ferry and van are all powered by big diesel engines.)

And now back to regular work – “Moving files from one machine to another using Python and YAML”, etc.

March 19, 2024 Sam Thursfield

Status update, 19/03/2024 – GNOME OS and openQA

Looking back at this month its been very busy indeed. In fact, so busy that I’m going to split this status update into two parts.

Possibly this is the month where I worked the whole time yet did almost no actual software development. I guess it’s part of getting old that you spend more time organising people and sharing knowledge than actually making new things yourself. But I did still manage one hack I’m very proud of, which I go into below.

GNOME OS & openQA testing

This month we wrapped up the Outreachy internship on the GNOME OS end-to-end tests. I published a writeup here on the Codethink blog. You can also read first hand from Dorothy and Tanju. We had a nice end-of-internship party on meet.gnome.org last week, and we are trying to arrange US visas & travel sponsorship so we can all meet up at GUADEC in Denver this summer.

There are still a few loose ends from the internship. Firstly, if you need people for Linux QA testing work, or anything else, please contact Dorothy and Tanju who are both now looking for jobs, remote or otherwise.

Secondly, the GNOME OS openQA tests currently fail about 50% of the time, so they’re not very useful. This is due to a race condition that causes the initial setup process to fail so the machine doesn’t finish booting.

Its the sort of problem that takes days rather than hours to diagnose so I haven’t made much of a dent in it so far. The good news is, as announced on Friday, there is now a team from Codethink working full time on GNOME OS, funded partly by the STF grant and partly by Codethink, and this issue is high on the list of priorities to fix.

I’m not directly involved in this work, as I am tied up on a multi-year client project that has nothing to do with open source desktops, but of course I am helping where I can, and hopefully the end-to-end tests will be back on top form soon.

ssam_openqa

If you’ve looked at the GNOME openQA tests you’ll see that I wrote a small CLI tool to drive the openQA test runner and egotistically named it after myself. (The name also makes it clear that it’s not something built or supported by the main openQA project).

I used Rust to write ssam_openqa so its already pretty reliable, but until now it lacked proper integration tests. The blocker was this: openQA is for testing whole operating systems, which are large and slow to deploy. We need to use the real openQA test runner in the ssam_openqa tests, otherwise the tests don’t really prove that things are working, but how can we get a suitably minimal openQA scenario?

During some downtime on my last trip to Manchester I got all the pieces in place. First, a Buildroot project to build a minimal Linux ISO image, with just four components:

bootloader (isolinux)
kernel (Linux)
libc (musl)
shell and tools (Busybox)

The output is 6MB – small enough to commit straight to Git. (By the way, using the default GNU libc the image size increases to 15MB!)

The next step was to make a minimal test suite. This is harder than it sounds because there isn’t documentation on writing openQA test suites “from the ground up”. By copying liberally from the openSUSE tests I got it down to the following pieces:

config/scenario_definitions.yaml, with QEMU and machine config.
lib/minimaldistribution.pm, defining two consoles that use QEMU’s virtio terminal interface
lib/serial_terminal.pm, which implements a helper function to login as root on the virtio terminal
tests/minimal.pm, a simple test that runs a command and asserts that it exits with code 0 (success)
main.pm, the entry point for the test runner.

That’s it – a fully working test that can boot Linux and run commands.

The whole point is we don’t need the openQA web UI in this workflow, so its much more comfortable for commandline-driven development. You can of course run the same testsuite with the web UI when you want.

The final piece for ssam_openqa is some integration tests that trigger these tests and assert they run, and we get suitable messages from the frontend, for now implemented in tests/real.rs.

I’m happy that ssam_openqa is less likely to break now when I hack on it, but i’m actually more excited about having figured out how to make a super minimal openQA test.

The openQA test runner, isotovideo, does something similar in its test suite using TinyCore Linux. I didn’t reuse this; firstly because it uses the Linux framebuffer console instead of virtio terminal, which makes it impossible to read output of the commands in tests; and secondly because it’s got a lot of pieces that I’m not interested in. You can anyway see it here.

Have you used ssam_openqa yet ? Let me know if you have! It’s a fun to be able to interact with and debug an entire GNOME OS VM using this tool, and I have a few more feature ideas to make this even more fun in future.

February 16, 2024February 18, 2024 Sam Thursfield

Status update, 16/02/2024

Some months you work very hard and there is little to show for any of it… so far this is one of those very draining months. I’m looking forward to spring, and spending less time online and at work.

Rather than ranting I want to share a couple of things from elsewhere in the software world to show what we should be aiming towards in the world of GNOME and free operating systems.

Firstly, from “Is something bugging you?”:

Before we even started writing the database, we first wrote a fully-deterministic event-based network simulation that our database could plug into. … if one particular simulation run found a bug in our application logic, we could run it over and over again with the same random seed, and the exact same series of events would happen in the exact same order. That meant that even for the weirdest and rarest bugs, we got infinity “tries” at figuring it out, and could add logging, or do whatever else we needed to do to track it down.

The text is hyperbolic, and for some reason they think it’s ok to work with Palantir, but its an inspiring read.

Secondly, a 15 minute video which you should watch in its entirety, and then consider how the game development world got so far ahead of everyone else. Here’s a link to the video.

This is the world that’s possible for operating systems, if we focus on developer experience and integration testing across the whole OS. And GNOME OS + openQA is where I see some of the most promising work happening right now.

Of course we’re a looooong way from this promised land, despite the progress we’ve made in the last 10+ years. Automated testing of the OS is great, but we don’t always have logs (bug), the image randomly fails to start (bug), the image takes hours to build, we can’t test merge requests (bug), testing takes 15+ minutes to run, etc. Some of these issues seem intractable when occasional volunteer effort is all we have.

Imagine a world where you can develop and live-deploy changes to your phone and your laptop OS, exhaustively test them in CI, step backwards with a debugger when problems arise – this is what we should be building in the tech industry. A few teams are chipping away at this vision – in the Linux world, GNOME Builder and the Fedora Atomic project spring to mind and I’m sure there are more .

Anyway, what happened last month?

Outreachy

This is the final month of the Outreachy internship that I’m running around end-to-end testing of GNOME. We already had some wins, there are now 5 separate testsuites running against GNOME OS, unfortunately rather useless at present due to random startup failures.

I spent a *lot* of time working with Tanju on a way to usefully test GNOME on Mobile. I haven’t been able to follow this effort closely beyond seeing a few demos and old blogs. This week was something of a crash course on what there is. Along the way I got pretty confused about scaling in GNOME Shell – it turns out there’s currently a hardcoded minimum screen size, and upstream Mutter will refuse to scale the display below a certain size. In fact upstream GNOME Shell doesn’t have any of the necessary adaptations for use in a mobile form factor. We really need a “GNOME OS Mobile” VM image – here’s an open issue – it’s unlikely to be done within the last 2 weeks of the current internship, though. The best we can do for now is test the apps on a regular desktop screen, but with the window resized to 360×720 pixels.

On the positive side, hopefully this has been a useful journey for Tanju and Dorothy into the inner workings of GNOME. On a separate note, we submitted a workshop on openQA testing to GUADEC in Denver, and if all goes well with travel sponsorship and US VISA applications, we hope to actually meet in person there in July.

FOSDEM

I went to FOSDEM 2024 and had a great time, it was one of my favourite FOSDEM trips. I managed to avoid the ‘flu – i think wearing a mask on the plane is the secret. From Codethink we were 41 people this year – probably a new record.

I went a day early to join in the end of the GTK hackfest, and did a little work on the Tiny SPARQL database, formerly known as Tracker SPARQL. Together with Carlos we fixed breakage in the CLI, improved HTTP support and prototyped a potential internship project to add a web based query editor.

My main goal at FOSDEM was to make contact with other openQA users and developers, and we had some success there. Since then i’ve hashed out a wishlist for openQA for GNOME’s use cases, and we’re aiming to set up an open, monthly call where different QA teams can get together and collaborate on a realistic roadmap.

I saw some great talks too, the “Outreachy 1000 Interns” talk and the Fairphone “Sustainable and Longlasting Phones” were particular highlights. I went to the Bytenight event for the first time and and found an incredible 1970’s Wurlitzer transistor organ in the “smoking area” of the HSBXL hackspace, also beating Javi, Bart and Adam at Super Tuxcart several times.

January 16, 2024 Sam Thursfield

Status update, 16/01/2024

Happy new year everyone! For better or worse we’re now well into the 2020s.

Here are some things I’ve been doing recently.

A post on the GNOME openQA tests: Looking back to 2023 and forwards to 2024. I won’t repeat the post here, go read it and see what I hope to see in 2024 with regards to QA testing.

I’m still mentoring Dorothy and Tanjuate to work on the tests and we’re about at the half way point of the Outreachy internship. Our initial goal was to test GNOME’s accessibility features and this work is nearly done, hopefully we’ll be showing it off this week. Here’s a sneak preview of the new gnome_accessibility testsuite…

Screenshot of openqa.gnome.org showing tests: a11y_seeing, a11y_hearing, a11y_text_to_speech, a11y_zoom, a11y_screen_keyboard

These features are sometimes forgotten (at least by me) and I hope this testsuite can serve as a good demo of what accessibility features GNOME has, in addition to guarding against regressions.

It’s hard work to remotely mentor 2 interns on top of 35hrs/wk customer work and various internal work duties, so it’s great that Codethink are covering my time as a mentor, and it also helps that it’s winter and the outside world is hidden behind clouds most of the time. If only it was a face-to-face internship and I could hang out in Uganda or Cameroon for a while. Tanju and Dorothy are great to work with, if you’re looking to hire QA engineers then I can give a good reference for both.

I expect my open source side project for 2024 will still be the GNOME openQA tests, it ties in with a lot of activity around GNOME OS and it feels like we are pushing through useful workflow improvements for all of GNOME. Not that I’ve lost interest in other important stuff like desktop search, music recommendations and playing OpenTTD, but there is only so much you can do with a couple of hours a week.

What else? As usual I’ve been reviewing various speedups and SPARQL improvements from Carlos Garnacho in the Tracker search engine. I’m making plans to go to FOSDEM, so hopefully will see you there (especially if you want to talk about openQA). And I listed out my top 5 albums of 2023 which you should go listen to before doing anything else.

December 18, 2023 Sam Thursfield

Status update, 18/12/23

The Outreachy intership for end-to-end testing of GNOME OS started this month, and we are already in week 3 of the internship. Our two interns Dorothy and Tanjuate are working hard on the first phase which is testing accessibility-related features. Here’s what we’ve achieved so far:

Add a gnome_accessibility testsuite so all a11y related tests appear in one place
Extend the app_settings test in gnome_apps testsuite so it switches on dark mode
- This caught a crash in libjxl via GNOME Settings
Create tests for all “Seeing” a11y features in GNOME Settings
- These enable the settings one by one and test the Settings app looks as expected with the new settings
Create test for screen reader
- The test works but the speech output is garbled
- We are adding some finer grained tests to ensure audio and Speech Dispatcher work separately to Orca
Create test for “Overamplification” option in “Hearing” settings

Screenshot of openQA including `a11y_screenreader` test that captures audio — The work-in-progress `gnome_accessibility` testsuite

Outreachy recommend blog post topics for the interns. and this week’s is “Everyone Struggles.” There are still a whole load of technical struggles when I work with the openQA tests which test my own patience. Here are some of them:

On some CI runners, tests never start. It seems to be network related – the openQA worker log says something about missing WORKER_HOSTNAME. The next step will be to add some debugging to .gitlab-ci.yml to try and diagnose what’s different on the two workers that don’t work.
GNOME Shell doesn’t always start when gnome-initial-setup should run. There’s an error about read-only GVFS metadata. This doesn’t reproduce very often, so I can’t easily investigate. Valentin came up with a possible solution, but it seems it didn’t solve the issue and we need to keep trying.
The CI pipeline that runs the tests takes about 15 minutes, and almost 5 minutes are just downloading large OS artifacts from S3. The artifacts are already compressed (in fact another minute is spent running unxz). I don’t know how we can improve this,
The end-to-end tests can’t run on gnome-build-meta merge requests, only on the master branch, due to an infrastructure issue that is stuck.
Creating needles is too difficult. I documented a manual approach, which is useful for learning but too fiddly for ongoing use. The openQA web UI has a reasonable editor, but the interns don’t have GNOME LDAP accounts so they can’t access it, and the “developer mode” can’t be used anyway. We’re currently trying running a demo instance of the webUI locally using Podman, there’s more work to do before this is a fully usable workflow for test development though.
GNOME OS is migrating from OSTree to systemd-sysupdate, we need to switch to using the new images in the tests, but we need to update os.gnome.org to make the new images downloadable first.

I am happy that we’re working through these issues, that was partly why I was excited to run an internship around the end-to-end tests. Hopefully this gives you some idea on how hard we are working on QA for GNOME !

May 16, 2023 Sam Thursfield

Status update, 16/05/2023

I am volunteering a lot of time to work on testing at the moment. When you start out as a developer this seems like the most boring kind of open source contribution that you can do. Once you become responsible for maintaining existing codebases though it becomes very interesting as a way to automate some of the work involved in reviewing merge requests and responding to issue reports.

Most of the effort has been on the GNOME OpenQA tests. We started with a single gnomeos test suite, which started from the installer ISO and ran every test available, taking around 8 minutes to complete. We now have two test suites: gnome_install and gnome_apps.

The gnome_install testsuite now only runs the install process, and takes about 3 minutes. The gnome_apps testsuite starts from a disk image, so while it still needs to run through gnome-initial-setup before starting any apps, we save a couple of minutes of execution. And now the door is open to expand the set of OpenQA tests much more, because during development we can choose to run only one of the testsuites and keep the developer cycle time to a minimum.

Big thanks to Jordan Petridis for helping me to land this change (I didn’t exactly get it right the first time), and to the rest of the crew in the #gnome-os chat.

Partial screenshot of the gnome_apps OpenQA testsuite

I don’t plan on adding many more testsuites myself. The next step is to teach the hardworking team of GNOME module maintainers how to extend the openQA test suites with tests that are useful to them, without — and this is very important — without just causing frustration when we make big theming or design changes. (See a report from last month when Adwaita header bars changed colour).

Hopefully I can spread the word effectively at this year’s GUADEC in Riga, Lativia. I will be speaking there on Thursday 27th July. The talk is scheduled at the same time as a very interesting GTK status update so I suppose the talk itself will be for a few dedicated cats. But there should be plenty of time to repeat the material in the bar afterwards to anyone who will listen.

My next steps will be around adding an OpenQA test suite for desktop search – something we’ve never properly integration tested, which works as well as it does only because of hard working maintainers and helpful downstream bug reports. I have started collecting some example desktop content which we can load and index during the search tests. I’m on the lookout for more, in fact I recently read about the LibreOffice “bug documents” collection and am currently fetching that to see if we can reuse some of it.

One more cool thing before you go – we now have a commandline tool for checking openQA test status and generating useful bug reports. And we have a new project-wide label 9. Integration test failure to track GNOME bugs that are detected by the OpenQA integration tests.

〉utils/pipeline_report.py --earlier=1                                   05/16/2023 05:24:30 pm
Latest gnome/gnome-build-meta pipeline on default branch is 528262. Pipeline status: success
Pipeline 1 steps earlier than 528262 is 528172. Pipeline status: success

Project:

  * Repo: gnome/gnome-build-meta
  * Commit: dc71a6791591616edf3da6c757d736df2651e0dc
  * Commit date: 2023-05-16T13:20:48.000+02:00
  * Commit title: openqa: Fix up casedir

Integration tests status (Gitlab):

  * Pipeline: https://gitlab.gnome.org/gnome/gnome-build-meta/-/pipelines/528172
  * test-s3-image job: https://gitlab.gnome.org/gnome/gnome-build-meta/-/jobs/2815294
  * test-s3-image job status: failed
  * test-s3-image job finished at: 2023-05-16T14:46:54.519Z

Integration tests status (OpenQA):

  * gnome_apps testsuite - job URL: https://openqa.gnome.org/tests/1012
    * 3/31 tests passed
    * Failed: gnome_desktop
  * gnome_install testsuite - job URL: https://openqa.gnome.org/tests/1013
    * 4/6 tests passed
    * Failed: gnome_desktop

The format can be improved here, but it already makes it quicker for me to turn a test failure into a bug report against whatever component probably caused the failure. If you introduce a critical bug in your module maybe you’ll get to see one soon 😉

May 13, 2022May 13, 2022 Sam Thursfield

Trying out systemd’s Portable Services

I recently acquired a monome grid, a set of futuristic flashing buttons which can be used for controlling software, making music, and/or playing the popular 90’s game Lights Out.

There’s no sound from the device itself, all it outputs is a USB serial connection. Software instruments connect to the grid to receive button presses and control the lights via the widely-supported protocol Open Sound Control protocol. I am using monome-rs to convert the grid signals into MIDI, send them to Bitwig Studio and make interesting noises, which I am excited to share with you in the future, but first we need to talk about software packaging.

Monome provide a system service named serialosc, which connects to the grid hardware (over USB-serial) and provides the Open Sound Control endpoint. This program is not packaged in by Linux distributions and that is fine, it’s rather niche hardware and distro maintainers shouldn’t have support every last weird device. On the other hand, it’s rather crude to build it from source myself, install it into /usr/local, add a system service, etc. etc. Is there a better way?

First I tried bundling serialosc along with my control app using Flatpak, which is semi-possible – you can build and run the service, but it can’t see the device unless you set the super-insecure “–devices=all” mode, and it still can’t detect the device because udev is not available, so you would have to hardcode the driver name /dev/ttyACM0 and hotplug no longer works … basically this is not a good option.

Then I read about systemd’s Portable Services. This is a way to ship system services in containers, which sounds like the correct way to treat something like serialosc. So I followed the portable walkthrough and within a couple of hours the job was done: here’s a PR that could add Portable Services packaging upstream: https://github.com/monome/serialosc/pull/62

I really like the concept here, it has some pretty clear advantages as a way to distribute system services:

Upstreams can deliver Linux binaries of services in a (relatively) distro-independent way
It works on immutable-/usr systems like Fedora Silverblue and Endless OS
It encourages containerized builds, which can lower the barrier for developing local improvements.

This is quite a new technology and I have mixed feelings about the current implementation. Firstly, when I went to screenshot the service for this blog post, i discovered it had broken:

screenshot of terminal showing "Failed at step EXEC - permission denied"

Fixing the error was a matter of – disabling SELinux. On my Fedora 35 machine the SELinux profile seems totally unready for Portable Services, as evidenced in this bug I reported, and this similar bug someone else reported a year ago which was closed without fixing. I accept that you get a great OS like Fedora for free in return for being a beta tester of SELinux, but this suggests portable services are not yet ready widespread use.

That’s better:

I used the recommended mkosi build to create the serialosc container, which worked as documented and was pretty quick. All mkosi operations have to run as root. This is unfortunate and also interacts badly with the Git safe.directory feature (the source trees are owned by me, not root, so Git’s default config raises an error).

It’s a little surprising the standard OCI container format isn’t supported, only disk images or filesystem trees. Initially I built a 27MB squashfs, but this didn’t work as the container rootfs has to be read/write (another surprise) so I deployed a tree of files in the end. The service container is Fedora-based and comes out at 76MB across 5,200 files – that’s significant bloat around a service implemented in a few 1000 lines of C code. If mkosi supported Alpine Linux as base OS we could likely reduce this overhead significantly.

The build/test workflow could be optimised but is already workable, the following steps take about 30 seconds with a warm mkosi.cache directory:

sudo mkosi -t directory -o /opt/serialosc --force
sudo portablectl reattach --enable --now /opt/serialosc --profile trusted

We are using “Trusted” profile because the service needs access to udev and /dev, by the way.

All in all, the core pieces are already in place for a very promising new technology that should make it easier for 3rd parties to provide Linux system-level software in a safe and convenient way, well done to the systemd team for a well executed concept. All it lacks is some polish around the tooling and integration.

August 13, 2021 Sam Thursfield

Automated point and click

I have been watching GNOME’s testing story get better and better for a long time. The progress we made since we first started discussing the GNOME OS initiative is really impressive, even when you realize that GUADEC in A Coruña took place nine years ago. We got nightly OS images, Gitlab CI and the gnome-build-meta BuildStream project, not to mention Flatpak and Flathub.

Now we have another step forwards with the introduction of OpenQA testing for the GNOME OS images. Take a look at the announcement on GNOME Discourse to find out more about it.

Automated testing is quite tedious and time consuming to set up, and there is significant effort behind this – from chasing regressions that broke the installer build, and debugging VM boot failures to creating a set of simple, reliable tests and integrating OpenQA with Gitlab CI. A big thanks to Codethink for sponsoring the time we are spending on setting this up. It is part of a wider story aiming to facilitate better cooperation upstream between companies and open source projects, which I wrote about in this Codethink article “Higher quality of FOSS”.

It’s going to take time to figure out how to get the most out of OpenQA, but I’m sure it’s going to bring GNOME some big benefits.

January 5, 2019September 29, 2020 Sam Thursfield

How Tracker is tested in 2019

I became interested in the Tracker project in 2011. I was looking at media file scanning and was happy to discover an active project that was focused on the same thing. I wanted to contribute, but I found it very hard to test my changes; and since Tracker runs as a daemon I really didn’t want to introduce any crazy regressions.

In those days Tracker already had a set of tests written in Python that tested the Tracker daemons as a whole, but they were a bit unfinished and unreliable. I focused some spare-time effort on improving those. Surprisingly enough it’s taken eight years to get the point where I’m happy with how they work.

The two biggest improvements parallel changes in many other GNOME projects. Last year Tracker stopped using GNU Autotools in favour of Meson, after a long incubation period. I probably don’t need to go into detail of how much better this is for developers. Also, we set up GitLab CI to automatically run the test suite, where previously developers and maintainers were required to run the test suite manually before merging anything. Together, these changes have made it about 100000% easier to review patches for Tracker, so if you were considering contributing code to the project I can safely say that there has never been a better time!

The Tracker project is now divided into two parts, the ‘core’ (tracker.git) and the ‘miners’ (tracker-miners.git) . The core project contains the database and the application interface libraries, while the miners project contains the daemons that scan your filesystem and extract metadata from your interesting files.

Let’s look at what happens automatically when you submit a merge request on GNOME GitLab for the tracker-miners project:

The .gitlab-ci.yml file specifies a Docker image to be used for running tests. The Docker images are built automatically from this project and are based on Fedora.
The script in .gitlab-ci.yml clones the ‘master’ version of Tracker core.
The tracker and tracker-miners projects are configured and built, using Meson. There is a special build option in tracker-miners that makes it include Tracker core as a Meson subproject, instead of building against the system-provided version. (It still depends on a few files from host at the time of writing).
The script starts a private D-Bus session using dbus-run-session, sets a fixed en_US.UTF8 locale, and runs the test suite for tracker-miners using meson test.
Meson runs the tests that are defined in meson.build files. It tries to run them in parallel with one test per CPU core.
The libtracker-miners-common tests exercises some utility code, which is duplicated from libtracker-common in Tracker core.
The libtracker-extract tests exercises libtracker-extract, which is a private library with helper code for accessing file metadata. It mainly focuses on standard metadata formats like XMP and EXIF.
The functional-300-miner-basic-ops and functional-301-resource-removal tests check the operation of the tracker-miner-fs daemon, mostly by copying files in and out of a specific path and then waiting for the corresponding changes to the Tracker database to take effect.
The functional-310-fts-basic test tries some full-text search operations on a text file. There are a couple of other FTS tests too.
The functional/extract/* tests effectively run tracker extract on a set of real media files, and test that the expected metadata is extracted. The tests are defined by JSON files such as this one.
The functional-500-writeback tests exercise the tracker-writeback daemon (which allows updating things like MP3 tags following changes in the Tracker database). These tests are not particularly thorough. The writeback feature of Tracker is not widely used, to my knowledge.
Finally, the functional-600-* tests simulate the behaviour of some MeeGo phone applications. Yes, that’s how old this code is 🙂

There is plenty of room for more testing of course, but this list is very comprehensive when compared to the total lack of automated testing that the project had just a year ago!

January 29, 2018September 29, 2020 Sam Thursfield

How BuildStream uses OSTree

Note: In version 1.2, BuildStream stopped using OSTree to cache artifacts. It now uses a generic “Content Addressable Storage” system, implemented internally but designed to be compatible with Bazel and any other tool which supports the Remote Execution API. I’ve updated this article accordingly.

I’ve been asked a few times about the relationship between BuildStream and OSTree. The answer is a bit complicated so I decided to answer the question here.

OSTree is a content-addressed content store, inspired in many ways by Git but optimized for storing trees of binary files rather than trees of text files.

BuildStream is an integration tool which deals with trees of binary files.

I’m deliberately using the abstract term “trees of binary files” here because neither BuildStream or OSTree limit themselves to a particular use case. BuildStream itself uses the term “artifact” to describe the output of a build job and in practice this could be the set of development headers and documentation for library, a package file such as a .deb or .rpm, a filesystem for a whole operating system, a bootable VM disk image, or whatever else.

Anyway let’s get to the point! There are actually ~~four~~ two ways that BuildStream uses OSTree.

The `ostree` source plugin

The `ostree` source plugin allows pulling arbitrary data from a remote OSTree repository. It is normally used with an `import` element as a way of importing prebuilt binaries into a build pipeline. For example BuildStream’s integration tests currently run on top of the Freedesktop SDK binaries (which were originally intended for use with Flatpak applications but are equally useful as a generic platform runtime). The gnome-build-meta project uses this mechanism to import a prebuilt Debian base image, which is currently manually pushed to an OSTree repo (this is a temporary measure, in future we want to base gnome-build-meta on top of the upcoming Freedesktop SDK 1.8 instead).

It’s also possible to import binaries using the `tar` and `local` source types of course, and you can even use the `git` or `bzr` plugins for this if you really get off on using the wrong tools for the wrong job.

In future we will likely add other source plugins for importing binaries, for example from the Docker Registry and perhaps using casync.

Storing artifacts locally

Once a build has completed, BuildStream needs to store the results somewhere locally. The results go in the exciting-sounding “local artifact cache”, which is usually located inside your home directory at ~/.cache/buildstream/artifacts.

BuildStream 1.0 used OSTree to store artifacts. BuildStream 1.2 and later use a generic Content Addressed Storage implementation.

Storing artifacts remotely

As a way of saving everyone from building the same things, BuildStream supports downloading prebuilt artifacts from a remote cache.

BuildStream 1.0 used OSTree for remote storage. BuildStream 1.2 and later use the same CAS service that is used for local storage.

Pushing and pulling artifacts

BuildStream 1.2 and later use the CAS protocol from the Remote Execution API to transfer artifacts. This protocol is implemented using GRPC.

Indirect uses of OSTree

It may be that you also end up deploying stuff into an OSTree repository somewhere. BuildStream itself is only interested with building and integrating your project — once that is done you run `bst checkout` and are rewarded with a tree of files on your local machine. What if, let’s say, your project aims to build a Flatpak application?

Flatpak actually uses OSTree as well and so your deployment step may involve committing those files into yet another OSTree repo ready for Flatpak to run them. (This can be a bit long winded at present so there will likely be some better integration appearing here at some point).

So, is anywhere safe from the rise of OSTree or is it going to take over completely? Something you might not know about me is that I grew up outside a town in north Shropshire called Oswestry. Is that a coincidence? I can’t say.

November 10, 2017September 29, 2020 Sam Thursfield

Using BuildStream through Docker

BuildStream isn’t packaged in any distributions yet, and it’s not entirely trivial to install it yourself from source. BuildStream itself is just Python, but it depends on a modern version of OSTree (2017.8 or newer at time of writing), with the GObject introspection bindings, which is a little annoying to have to build yourself¹.

So we have put some work into making it convenient to run BuildStream inside a Docker container. We publish an image to the Docker hub with the necessary dependencies, and we provide a helper script named bst-here that sets up a container with the current worked directory mounted at /src and then runs a BuildStream command or an interactive shell inside it. Just download the script, read it through and run it: all going well you’ll be rewarded with an interactive Bash session where you can run the bst command. This allows users on any distro that supports Docker to run BuildStream builds in a pretty transparent way and without any major performance limitations. It even works on Mac OS X!

In order to run builds inside a sandbox, BuildStream uses Bubblewrap. This requires certain kernel features, in particular CONFIG_USER_NS which right now is not enabled by default in Arch and possibly in other distros. Docker containers run against the kernel of the host OS so it doesn’t help with this issue.

The Docker images we provide are based off Fedora and are built by GitLab CI from this repo. After a commit to that repo’s ‘master’ branch, a new image wends its way across hyperspace from GitLab to the Docker hub. These images are then pulled when the bst-here wrapper script calls docker run. (We also use these images for running the BuildStream CI pipelines).

More importantly, we now have a mascot now! Let me introduce the BuildStream beaver:

Beavers, of course, are known for building things in streams, are also native to Canada. This isn’t going to be the final logo, he’s just been brought in as we got tired of the project being represented by a capital letter B in a grey circle. If anyone can contribute a better one then please get in touch!

So what can you build with BuildStream now that you have it running in Docker? As recently announced, you can build GNOME! Follow this modified version of the newcomer’s guide to get started. Soon you will also be able to build Flatpak runtimes using the rewritten Freedesktop SDK; or build VM images using Baserock; and of course you can create pipelines for your own projects (although if you only have a few dependencies, using Meson subprojects might be quicker).

After one year of development, we are just a few blockers away from releasing BuildStream 1.0. So it is a great time to get involved in the project!

[1]. Installing modern OSTree from source is not impossible — my advice if you want to avoid Docker and your distro doesn’t provide a new enough OSTree would be to build the latest tagged release of OSTree from Git, and configure it to install into /opt/ostree. Then put something like export GI_TYPELIB_PATH=/opt/ostree/lib/girepository-1.0/ in your shell’s startup file. Make sure you have all the necessary build dependencies installed first.

June 19, 2017September 29, 2020 Sam Thursfield

BuildStream and host tools

It’s been a while since I had to build a whole operating system from source. I’ve mostly been working on compilers so far this year at Codethink in fact, but my new project is to bring up some odd target systems that aren’t supported by any mainstream distros.

We did something similar about 4 years ago using Baserock and it worked well; this time we are using the Baserock OS definitions again but with BuildStream as a build tool. I’ve not had any chance to get involved in BuildStream up til now (beyond observing it) so this will be good.

The first thing I’m getting my head around is the “no host tools” policy. The design of BuildStream is that every build is run in a sandbox that’s isolated from the host. Older Baserock tools took a similar approach too and it makes a lot of sense: it’s a lot easier to maintain build instructions if you limit the set of environments in which they can run, and you are much more likely to be able to reproduce them later or on other people’s machines.

However your sandbox is going to need a compiler and a shell environment in there if it’s going to be able to build anything, and BuildStream leaves open the question of where those come from. It’s simple to find a prebuilt toolchain at least for mainstream architectures — pretty much every Linux distro can provide one so the only question is which one to use and how to get it into BuildStream’s sandbox?

GNOME and Freedesktop base runtime and SDK

The Flatpak project has a similar need for a controlled runtime and build environment, and is producing a GNOME SDK, and a lower level Freedesktop SDK. These are at present built on top of Yocto.

Up to date versions of these are made available in an OSTree repo at http://sdk.gnome.org/repo. This makes it easy to import them into BuildStream using an ‘import’ element and the ‘ostree’ source:

kind: import
description: Import the base freedesktop SDK
config:
  source: files
  target: usr
host-arches:
  x86_64:
    sources:
      - kind: ostree
        url: gnomesdk:repo/
        track: runtime/org.freedesktop.BaseSdk/x86_64/1.4
        gpg-key: keys/gnome-sdk.gpg
        ref: 0d9d255d56b08aeaaffb1c820eef85266eb730cb5667e50681185ccf5cd7c882
  i386:
    sources:
      - kind: ostree
        url: gnomesdk:repo/
        track: runtime/org.freedesktop.BaseSdk/i386/1.4
        gpg-key: keys/gnome-sdk.gpg
        ref: 16036b747c1ec8e7fe291f5b1f667cb942f0267d08fcad962e9b7627d6cf1981

The main downside to using these is that they are pretty large — the GNOME 3.18 SDK weighs in at 1.5 GB uncompressed and around 63,000 files. Creating a hardlink tree using `ostree checkout` takes up to a minute on my (admittedly rather old) laptop. The Freedesktop SDK is smaller but still not ideal. They are also only built for a small set of architectures — I think just some x86 and ARM families at the moment.

Debian in OSTree

As part of building GNOME’s jhbuild modulesets inside BuildStream Tristan created a script to produce Debian chroots for various architectures and commit them to an OSTree repo. The GNOME components are then built on top of these base Debian images, with the idea that in future they can be tested on top of a whole variety of distros in addition to Debian to make us catch platform-specific regressions more quickly.

The script, which uses the awesome Multistrap tool to do most of the heavy lifting, lives here and pushes its results to a repo that is temporarily housed at https://gnome7.codethink.co.uk/repo/ and signed with this key.

The resulting sysroot are 2.7 GB in size with 105,320 different files. This again takes up to a minute to check out on my laptop. Like the GNOME SDK, this sysroot contains every external dependency of GNOME which adds up to a lot of stuff.

Alpine Linux Toolchain

I want a lighter weight set of host tools to put in my build sandbox. Baserock’s OS images can be built with just a C++ toolchain and a minimal shell environment, so there’s no need to start copying gigabytes of dependencies around.

Ultimately the Baserock project could build its own set of host tools, but to save faff while prototyping things I decided to try Alpine Linux, which is a minimal distribution.

Alpine Linux provide “mini root filesystem” tarballs. These can’t be used directly as they contain device nodes (so require privileges to extract) and don’t contain a toolchain.

Here’s how I produced a workable host tools sysroot. I’m using Bubblewrap (the same tool used by BuildStream to create build sandboxes) as a simple container driver to run the `apk` package tool as root without needing special host privileges. This won’t work on every OS; you can use something like Docker or plain old `chroot` instead if needed.

wget https://nl.alpinelinux.org/alpine/v3.6/releases/x86_64/alpine-minirootfs-3.6.1-x86_64.tar.gz
mkdir -p sysroot
tar -x -f alpine-minirootfs-3.6.1-x86_64.tar.gz -C sysroot --exclude=./dev

alias alpine_exec='bwrap --unshare-all --share-net --setenv PATH /usr/bin:/bin:/usr/sbin:/sbin  --bind ./sysroot / --ro-bind /etc/resolv.conf /etc/resolv.conf --uid 0 --gid 0'
alpine_exec apk update
alpine_exec apk add bash bc gcc g++ musl-dev make gawk gettext-dev gzip linux-headers perl e2fsprogs mtools

tar -z -c -f alpine-host-tools-3.6.1-x86_64.tar.gz -C sysroot .

This produces a 219MB host tools sysroot containing 11,636 files. This is not as minimal as you can go with a GNU C/C++ toolchain but it’s around the right order of magnitude and it checks out from BuildStream’s artifact store into the build directory in a matter of seconds.

We include gawk as it is needed during the GCC build (BusyBox awk is not enough), and gettext-dev is needed by GLIBC (at least, libintl.h is needed and in Alpine only gettext provides that header). Bash is needed by scripts/config from linux.git, and bc, GNU gzip, linux-headers and Perl are also needed for building Linux. The e2fsprogs and mtools are useful for creating disk images.

I’ve integrated this into my builds in a pretty lazy way for now:

kind: import
description: Import an Alpine Linux C/C++ toolchain
host-arches:
  x86_64:
    sources:
    - kind: tar
      url: file:///home/sam/src/buildstream-bootstrap/alpine-host-tools-3.6.1-x86_64.tar.gz
      base-dir: .
      ref: e01d76ef2c7e3e105778e2aa849a42d38dc3163f8c15f5b2de8f64cd5543cf29

This element is obviously not something I can share with others — I’d need to upload the tarball somewhere or set up a public OSTree repo that others could pull from, and then have the element reference that.

However, this is just the first step towards some much deeper work which will result in me needing to move beyond Alpine in any case. In future I hope that it’ll be pretty straightforward to obtain a minimal toolchain as a sysroot that can be pulled into a sandbox using OSTree. The work required to produce such a thing is simple enough to automate but it requires a server to host the binaries which then requires ongoing maintenance for security updates, so I’m not yet going to commit to doing it …

May 22, 2017September 29, 2020 Sam Thursfield

Tracker 💙 Meson

A long time ago I started looking at rewriting Tracker’s build system using Meson. Today those build instructions landed in the master branch in Git!

Meson is becoming pretty popular now so I probably don’t need to explain why it’s such a big improvement over Autotools. Here are some key benefits:

It takes 2m37s for me to build from a clean Git tree with Autotools, but only 1m08s with Meson.
There are 2573 lines of meson.build files, vs. 5013 lines of Makefile.am, a 2898 line configure.ac file, and various other bits of debris needed for Autotools
Only compile warnings are written to stdout by default, so they’re easy to spot
Out of tree builds actually work

Tracker is quite a challenging project to build, and I hit a number of issues in Meson along the way plus a few traps for the unwary.

We have a huge number of external dependencies — Meson handles this pretty neatly, although autodetection of backends requires a bit of boilerplate.

There’s a complex mix of Vala and C code in Tracker, including some libraries that are written in both. The Meson developers have put a lot of work into supporting Vala, which is much appreciated considering it’s a fairly niche language and in fact the only major problem we have left is something that’s just as broken with Autotools: failing to generate a single introspection repo for a combined C + Vala library

Tracker also has a bunch of interdependent libraries. This caused continual problems because Meson does very little deduplication in the commandlines it generates, and so I’d get combinational explosions hitting fairly ridiculous errors like commandline too long (the limit is 262KB) or too many open files inside the ld process. This is a known issue. For now I work around it by manually specifying some dependencies for individual targets instead of relying on them getting pulled in as transitive dependencies of a declare_dependency target.

A related issue was that if the same .vapi file ends up on the valac commandline more than once it would trigger an error. This required some trickery to avoid. New versions of Meson work around this issue anyway.

One pretty annoying issue is that generated files in the source tree cause Meson builds to fail. Out of tree builds seem to not work with our Autotools build system — something to do with the Vala integration — with the result that you need to make clean before running a Meson build even if the Meson build is in a separate build dir. If you see errors about conflicting types or duplicate definitions, that’s probably the issue. While developing the Meson build instructions I had a related problem of forgetting about certain files that needed to be generated because the Autotools build system had already generated them. Be careful!

Meson users need to be aware that the rpath is not set automatically for you. If you previously used Libtool you probably didn’t need to care what an rpath was, but with Meson you have to manually set install_rpath for every program that depends on a library that you have installed into a non-standard location (such as a subdirectory of /usr/lib). I think rpaths are a bit of a hack anyway — if you want relocatable binary packages you need to avoid them — so I like that Meson is bringing this implementation detail to the surface.

There are a few other small issues: for example we have a Gtk-Doc build that depends on the output of a program, which Meson’s gtk-doc module currently doesn’t handle so we have to rebuild that documentation on every build as a workaround. There are also some workarounds in the current Tracker Meson build instructions that are no longer needed — for example installing generated Vala headers used to require a custom install script, but now it’s supported more cleanly.

Tracker’s Meson build rules aren’t quite ready for prime time: some tests fail when run under Meson that pass when run under Autotools, and we have to work out how best to create release tarballs. But it’s pretty close!

All in all this took a lot longer to achieve than I originally hoped (about 9 months of part-time effort), but in the process I’ve found some bugs in both Tracker and Meson, fixed a few of them, and hopefully made a small improvement to the long process of turning GNU/Linux users into GNU/Linux developers.

Meson has come a long way in that time and I’m optimistic for its future. It’s a difficult job to design and implement a new general purpose build system (plus project configuration tool, test runner, test infrastructure, documentation, etc. etc), and the Meson project have done so in 5 years without any large corporate backing that I know of. Maintaining open source projects is often hard and thankless. Ten thumbs up to the Meson team!

May 17, 2017September 29, 2020 Sam Thursfield

Night Bus: simple SSH-based build automation

night-858546_640

My current project at Codethink has involved testing and packaging GCC on several architectures. As part of this I wanted nightly builds of ‘master’ and the GCC 7 release branch, which called for some kind of automated build system.

What I wanted was a simple CI system that could run some shell commands on different machines, check if they failed, and save a log somewhere that can be shared publically. Some of the build targets are obsolete proprietary OSes where modern software doesn’t work out of the box so simplicity is key. I considered using GitLab CI, for example, but it requires a runner written in Go, which is not something I can just install on AIX. And I really didn’t have time to maintain a Jenkins instance.

So I started by trying to use Ansible as a CI system, and it kind of worked but the issue is that there’s no way to get the command output streamed back to you in real time. GCC builds take hours and its test suite can take a full day to run on an old machine so it’s essential to be able to see how things are progressing without waiting a full day for the command to complete. If you can’t see the output, the build could be hanging somewhere and you’d not realise. I discovered that Ansible isn’t going to support this use case and so I ended up writing a new tool: Night Bus.

Night Bus is written in Python 3 and runs tasks across different machines, similarly to Ansible but with the usecase of doing nightly builds and tests as opposed to configuration management. It provides:

remote task execution via SSH (using the Parallel-SSH library)
live logging of output to a specified directory
an overall report written once all tasks are done, which can contain status messages from the tasks
parametrization of the tasks (to e.g. build 3 branches of the same thing)
a support library of helper functions to make your task scripts more readable

Scheduled execution can be set up using cron or systemd. You can set up a webserver (i’m using lighttpd) on the machine that runs Night Bus to make the log output available over HTTP

You control it by creating two YAML (or JSON) files:

hosts describes the SSH configuration for each machine
tasks lists the sequence of tasks to run

Here’s an example hosts file:

host1:
  user: automation
  private_key: ssh/automation.key

host2:
  proxy_host: 86.75.30.9
  proxy_user: jenny
  proxy_private_key: ssh/jenny.key

Here’s an example tasks file:

tasks:
- name: counting-example
  commands: |
    echo "Counting to 20."
    for i in `seq 1 20`; do
      echo "Hello $i"
      sleep 1
    done

You might wonder why I didn’t just write a shell script to automate my builds as many thousands of hackers have done in the past. Basically I find maintaining shell scripts over about 10 lines to be a hateful experience. Shell is great as a “live” programming environment because it’s very flexible and quick to type. But those strengths turn into weaknesses when you’re trying to write maintainable software. Every CI system ultimately ends up with you writing shell scripts (or if you’re really unlucky, some XML equivalent) so I don’t see any point hiding the commands that are being run under layers of other stuff, but at the same time I want a clear separation between the tasks themselves and the support aspects like remote system access, task ordering, and logging.

Night Bus is released as a random GitHub project that may never get much in the way of updates. My aim is for it to fall into the category of software that doesn’t need much ongoing work or maintenance because it doesn’t try to do anything special. If it saves one person from having to maintain a Jenkins instance then the week I spent writing it will have been worthwhile.

November 21, 2015September 29, 2020 Sam Thursfield

CMake: dependencies between targets and files and custom commands

As I said in my last post about CMake, targets are everything in CMake. Unfortunately, not everything is a target though!

If you’ve tried do anything non-trivial in CMake using the add_custom_command() command, you may have got stuck in this horrible swamp of confusion. If you want to generate some kind of file at build time, in some manner other than compiling C or C++ code, then you need to use a custom command to generate the file. But files aren’t targets and have all sorts of exciting limitations to make you forget everything you ever new about dependency management.

What makes it so hard is that there’s not one limitation, but several. Here is a hopefully complete list of things you might want to do in CMake that involve custom commands and custom targets depending on each other, and some explainations as to why things don’t work the way that you might expect.

1. Dependencies between targets

point1-vertical This is CMake at its simplest (and best).

cmake_minimum_required(VERSION 3.2)

add_library(foo foo.c)

add_executable(bar bar.c)
target_link_libraries(bar foo)

You have a library, and a program that depends on it. When you run CMake, both of them get built. Ideal! This is great!

What is “all”, in the dependency graph to the left? It’s a built in target, and it’s the default target. There are also “install” and “test” targets built in (but no “clean” target).

2. Custom targets

If your project is a good one then maybe you use a documentation tool like GTK-Doc or Doxygen to generate documentation from the code.

This is where add_custom_command() enters your life. You may live to regret ever letting it in.

cmake_minimum_required(VERSION 3.2)

add_custom_command(
    OUTPUT
        docs/doxygen.stamp
    DEPENDS
        docs/Doxyfile
    COMMAND
        doxygen docs/Doxyfile
    COMMAND
        cmake -E touch docs/doxygen.stamp
    COMMENT
        "Generating API documentation with Doxygen"
    VERBATIM
    )

We have to create a ‘stamp’ file because Doxygen generates lots of different files, and we can’t really tell CMake what to expect. But actually, here’s what to expect: nothing! If you build this, you get no output. Nothing depends on the documentation, so it isn’t built.

So we need to add a dependency between docs/doxygen.stamp and the “all” target. How about using add_dependencies()? No, you can’t use that with any of the built in targets. But as a special case, you can use add_custom_target(ALL) to create a new target attached to the “all” target:

add_custom_target(
    docs ALL
    DEPENDS docs/doxygen.stamp
    )

In practice, you might also want to make the custom command depend on all your source code, so it gets regenerated every time you change the code. Or, you might want to remove the ALL from your custom target, so that you have to explicitly run make docs to generate the documentation.

This is also discussed here.

3. Custom commands in different directories

Another use case for add_custom_command() is generating source code files using 3rd party tools.

### Toplevel CMakeLists.txt
cmake_minimum_required(VERSION 3.2)

add_subdirectory(src)
add_subdirectory(tests)


### src/CMakeLists.txt
add_custom_command(
    OUTPUT
        ${CMAKE_CURRENT_BINARY_DIR}/foo.c
    COMMAND
        cmake -E echo "Generate my C code" > foo.c
    VERBATIM
    )


### tests/CMakeLists.txt
add_executable(
    test-foo
        test-foo.c ${CMAKE_CURRENT_BINARY_DIR}/../src/foo.c
    )

add_test(
    NAME test-foo
    COMMAND test-foo
    )

How does this work? Actually it doesn’t! You’ll see the following error when you run CMake:

CMake Error at tests/CMakeLists.txt:1 (add_executable):
  Cannot find source file:

    /home/sam/point3/build/src/foo.c

  Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm .hpp
  .hxx .in .txx
CMake Error: CMake can not determine linker language for target: test-foo
CMake Error: Cannot determine link language for target "test-foo".

Congratulations, you’ve hit bug 14633! The fun thing here is that generated files don’t behave anything like targets. Actually they can only be referenced in the file that contains the corresponding add_custom_command() call. So when we refer to the generated foo.c in tests/CMakeLists.txt, CMake actually has no idea where it could come from, so it raises an error.

As the corresponding FAQ entry describes, there are two things you need to do to work around this limitation.

The first is to wrap your custom command in a custom target. Are you noticing a pattern yet? Most of the workarounds here are going to involve wrapping custom commands in custom targets. In src/CMakeLists.txt, you do this:

add_custom_target(
    generate-foo
    DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/../src/foo.c
    )

Then, in tests/CMakeLists.txt, you can add a dependency between “test-foo” and “generate-foo”:

add_dependencies(test-foo generate-foo)

That’s enough to ensure that foo.c now gets generated before the build of test-foo begins, which is obviously important. If you try to run CMake now, you’ll hit the same error, because CMake still has no idea where that generated foo.c file might come from. The workaround here is to manually set the GENERATED target property:

set_source_files_properties(
    ${CMAKE_CURRENT_BINARY_DIR}/../src/foo.c
    PROPERTIES GENERATED TRUE
    )

Note that this is a bit of a contrived example. In most cases, the correct solution is to do this:

### src/CMakeLists.txt
add_library(foo foo.c)

### tests/CMakeLists.txt
target_link_libraries(test-foo foo)

Then you don’t have to worry about any of the nonsense above, because libraries are proper targets, and you can use them anywhere.

Even if it’s not practical to make a library containing ‘foo.c’, there must be some other target that links against it in the same directory that it is generated in. So instead of creating a “generate-foo” target, you can make “test-foo” depend on whatever other target links to “foo.c”.

4. Custom commands and parallel make

I came into this issue while doing something pretty unusual with CMake: wrapping a series of Buildroot builds. Imagine my delight at discovering that, when parallel make was used, my CMake-generated Makefile was running the same Buildroot build multiple times at the same time! That is not what I wanted!

It turns out this is a pretty common issue. The crux of it is that with the “Unix Makefiles” backend, multiple toplevel targets run as an independent, parallel make processes. Files aren’t targets, and unless something is a target then it doesn’t get propagated around like you would expect.

Here is the test case:

cmake_minimum_required(VERSION 3.2)

add_custom_command(
    OUTPUT gen
    COMMAND sleep 1
    COMMAND cmake -E echo Hello > gen
    )

add_custom_target(
    my-all-1 ALL DEPENDS gen
    )

add_custom_target(
    my-all-2 ALL DEPENDS gen
    )

If you generate a Makefile from this and run make -j 2, you’ll see the following:

Scanning dependencies of target my-all-2
Scanning dependencies of target my-all-1
[ 50%] Generating gen
[100%] Generating gen
[100%] Built target my-all-2
[100%] Built target my-all-1

If creating ‘gen’ takes a long time, then you really don’t want it to happen multiple times! It may even cause disasters, for example running make twice at once in the same Buildroot build tree is not pretty at all.

As explained in bug 10082, the solution is (guess what!) to wrap the custom command in a custom target!

add_custom_target(make-gen DEPENDS gen)

Then you change the custom targets to depend on “make-gen”, instead of the file ‘gen’. Except! Be careful when doing that — because there is another trap waiting for you!

5. File-level dependencies of custom targets are not propagated

If you read the documentation of add_custom_command() closely, and you look at the DEPENDS keyword argument, you’ll see this text:

If DEPENDS specifies any target (created by the add_custom_target(), add_executable(), or add_library() command) a target-level dependency is created to make sure the target is built before any target using this custom command. Additionally, if the target is an executable or library a file-level dependency is created to cause the custom command to re-run whenever the target is recompiled.

This sounds quite nice, like more or less what you would expect. But the important bit of information here is what CMake doesn’t do: when a custom target depends on another custom target, all the file level dependencies are completely ignored.

Here’s your final example for the evening:

cmake_minimum_required(VERSION 3.2)

set(SPECIAL_TEXT foo)

add_custom_command(
    OUTPUT gen1
    COMMAND cmake -E echo ${SPECIAL_TEXT} > gen1
    )

add_custom_target(
    gen1-wrapper
    DEPENDS gen1
    )

add_custom_command(
    OUTPUT gen2
    DEPENDS gen1-wrapper
    COMMAND cmake -E copy gen1 gen2
    )

add_custom_target(
    all-generated ALL
    DEPENDS gen2
    )

This is subtly wrong, even though you did what you were told, and wrapped the custom command in a custom target.

The first time you build it:

Scanning dependencies of target gen1-wrapper
[ 50%] Generating gen1
[ 50%] Built target gen1-wrapper
Scanning dependencies of target all-generated
[100%] Generating gen2
[100%] Built target all-generated

But then touch the file ‘gen1’, or overwrite it with something other text, or change the value of SPECIAL_TEXT in CMakeLists.txt to something else, and you will see this:

[ 50%] Generating gen1
[ 50%] Built target gen1-wrapper
[100%] Built target all-generated

There’s no file-level dependency created between ‘gen2’ and ‘gen1’, so ‘gen2’ never gets updated, and things get all weird.

You can’t just depend on gen1 instead of gen1-wrapper, because it may end up being built multiple times! See the previous point. Instead, you need to depend on the “gen1-wrapper” target and the file itself:

add_custom_command(
    OUTPUT gen2
    DEPENDS gen1-wrapper gen1
    COMMAND cmake -E copy gen1 gen2
    )

As the documentation says, this only applies to targets wrapping add_custom_command() output. If ‘gen1’ was a library created with add_library, things would work how you expect.

Conclusion

Maybe I just have a blunt head, but I found all of this quite difficult to work out. I can understand why CMake works this way, but I think there is plenty of room for improvement in the documentation where this is explained. Hopefully this guide has gone some way to making things clearer.

If you have any other dependency-related traps in CMake that you’ve hit, please comment and I’ll add them to this list…

October 20, 2015November 27, 2020 Sam Thursfield

Some CMake tips

Sketch of John Barber's gas turbine, from his patent

I spent the past few weeks converting a bunch of Make and Autotools-based modules to use CMake instead. This was my first major outing with CMake. Maybe there will be a few blog posts on that subject!

In general I think CMake has a sound design and I quite want to like it. It seems like many of its warts are due to its long history and the need for backwards compatibility, not anything fundamentally broken. To keep a project going for 16 years is impressive and it is pretty widely used now. This is a quick list of things I found in CMake that confused me to start with but ultimately I think are good things.

1. Targets are everything

CMake is pretty similar to normal make in that all the things that you care about are ‘targets’. Libraries are targets, programs are targets, subdirectories are targets and custom commands create files which are considered targets. You can also create custom targets which run commands if executed. You need to use custom targets feature if you want a custom command target to be tied to the default target, which is a little confusing but works OK.

Targets have properties, which are useful.

2. Absolute paths to shared libraries

Traditionally you link to libfoo by passing -lfoo to the linker. Then, if libfoo is in a non-standard location, you pass -L/path/to/foo -lfoo. I don’t think pkg-config actually enforces this pattern but pretty much all the .pc files I have installed use the -L/path -Lname pattern.

CMake makes this quite awkward to do, because it makes every effort to forget about the linker paths. Library ‘targets’ in CMake keep track of associated include paths, dependent libraries, compile flags, and even extra source files, using ‘target properties’. There’s no target property for LINK_DIRECTORIES, though, so outside of the current CMakeLists.txt file they won’t be tracked. There is a global LINK_DIRECTORIES property, confusingly, but it’s specifically marked as “for debugging purposes.”

So the recommended way to link to libraries is with the absolute path. Which makes sense! Why say with two commandline arguments what you can say with one?

At least, this will be fine once CMake’s pkg-config integration returns absolute paths to libraries …

3. Semicolon safety instead of whitespace safety

CMake has a ‘list’ type which is actually a string with ; (semicolon) used to delimit entities. Spaces are used as an argument separator, but converted to semicolons during argument parsing, I think. Crucially, they seem to be converted before variable expansion is done, which means that filenames with spaces don’t need any special treatment. I like this more than shell code where I have to quote literally every variable (or else Richard Maw shouts at me).

For example:

cmake_minimum_required(VERSION 3.2)
set(path "filename with spaces")
set(command ls ${path})
foreach(item ${command})
     message(item: ${item})
endforeach()

Output:

item:ls
item:filename with spaces

On the other hand:

cmake_minimum_required(VERSION 3.2)
set(path "filename;with\;semicolons") 
set(command ls ${path}) 
foreach(item ${command})
     message(item: ${item})
endforeach()

Output:

item:ls
item:filename
item:with
item:semicolons

Semicolons occur less often in file names, I guess. Most of us are trained to avoid spaces, partly because we know how broken (all?) most shell-based build systems are in those cases. CMake hasn’t actually solved this but just punted the special character to a less often used one, as far as I can see. I guess that’s an improvement? Maybe?

The semi-colon separator can bite you in other ways, for example, when specifying CMAKE_PREFIX_PATH (library and header search path) you might expect this to work:

cmake . -DCMAKE_PREFIX_PATH=/opt/path1:/opt/path2

However, that won’t work (unless you did actually mean that to be one item). Instead, you need to pass this:

cmake . -DCMAKE_PREFIX_PATH=/opt/path1\;/opt/path2

Of course, ; is a special character in UNIX shells so must be escaped.

4. Ninja instead of Make

CMake supports multiple backends, and Ninja is often faster than GNU Make, so give the Ninja backend a try: cmake -G Ninja.

5. Policies

The CMake developers seem pretty good at backwards compatibility. To this end they have introduced the rather obtuse policies framework. The great thing about the policies framework is that you can completely ignore it, as long as you have cmake_minimum_required(VERSION 3.3) at the top of your toplevel CMakeLists.txt. You’ll only need it once you have a massive bank of existing CMakeLists.txt files and you are getting started on porting them to a newer version of CMake.

Quite a lot of CMake error messages are worded to make you think like you might need to care about policies, but don’t be fooled. Mostly these errors are for situations where there didn’t use to be an error, I think, and so the policy exists to bring back the ‘old’ behaviour, if you need it.

If a tool is weird but internally consistent, I can get on with it. Hopefully, CMake is getting there. I can see there have been a lot of good improvements since CMake 2.x, at least. And at no point so far has it made me more angry than GNU Autotools. It’s not crashed at all (impressive given it’s entirely C++ code!). And it is significantly faster and more widely applicable than Autotools or artisanal craft Makefiles. So I’ll be considering it in future. But I can’t help wishing for a build system that I actually liked…

Edit: you might also be interested in a list of common CMake antipatterns.

March 5, 2015September 29, 2020 Sam Thursfield

My first Ansible modules

This week I had my first go at writing modules for Ansible. I’ve been a fan of Ansible for a while now, I find it really practical and I think the developers made some very sensible compromises when making it. Writing a module for it has been almost painless, which is a very rare thing in the programming world. The only real surprise was that it was so easy.
What I have written is some Ansible modules to adminstrate a Gerrit instance. It seems that while there are a few Ansible roles online already for deploying a Gerrit instance, there are none that automate the step after deployment, when you need to configure your groups, projects, and special accounts.

I owe several beers to the developers of the Gerrit REST API which is powerful enough to do everything I needed to do, and has complete and accurate documentation, as well as to the developers of Ansible. The repo so far contains less than 800 lines of code, but is hopefully complete enough to let me define the initial configuration of the Baserock Gerrit instance with just a few files in a Git repo.

October 20, 2014September 29, 2020 Sam Thursfield

Developing with Packer and Docker

I spent last Friday looking at setting up an OpenID provider for the Baserock project. This is kind of a prerequisite to having our own Storyboard instance, which we’d quite like to use for issue and task tracking.

I decided to start by using some existing tools that have nothing to do with Baserock. Later we can move the infrastructure over to using Baserock, to see what it adds to the process. We have spent quite a lot of time “eating our own dogfood” and while it’s a good idea in general, a balanced diet contains more than dogfood.

The Baserock project has an OpenStack tenency at DataCentred which can host our public infrastructure. The goal is to deploy my OpenID provider system there. However, for prototyping it’s much easier to use a container on my laptop because I don’t need to be working across an internet connection.

The morph deploy command in Baserock allows deploying systems in multiple ways and I think it’s pretty cool. Another tool which seems to do this is Packer.

Differences between `morph deploy` and Packer

Firstly, I notice Packer takes JSON as input, where Morph uses YAML. So with Packer I have to get the commas and brackets right and can’t use comments. That’s a bit mean!

A bigger difference is that Morph considers ‘building and configuring’ the image differently to ‘writing the image somewhere.’ By constrast, Packer ties ‘builder’ and ‘type of hosting infrastructure’ together. In my case I need to use the Docker builder for my prototype and the OpenStack builder for the production system. There can be asymmetry here: in Docker my Fedora 20 base comes from the Docker registry, but for OpenStack I created my own image from the Fedora 20 cloud image. I could have created my own Docker image too if I didn’t trust the semi-official Fedora image, though.

The downside to the `morph deploy` approach is that it can be less efficient. To deploy to OpenStack, Morph first unpacks a tarball locally, then runs various ‘configuration’ extensions on it, then converts it to a disk image and uploads it to OpenStack. Packer starts by creating the OpenStack instance and then configures it ‘live’ via SSH. This means it doesn’t need to unpack and repack the system image, which can be slow if the system being deployed is large.

Packer has separate types of ‘provisioner’ for different configuration management frameworks. This is handy, but the basic ‘shell’ and ‘file’ provisioners actually give you all you need to implement the others. The `morph deploy` command doesn’t have any special helpers for different configuration management tools, but that doesn’t prevent one from using them.

Building Packer

Packer doesn’t seem to be packaged for Fedora 20, which I use as my desktop system, so I had a go at building it from source:

    sudo yum install golang
    mkdir gopath && cd gopath
    export GOPATH=`pwd`
    go get -u github.com/mitchellh/gox
    git clone https://github.com/mitchellh/packer \
src/github.com/mitchellh/packer
    cd src/github.com/mitchellh/packer    
    make updatedeps
    make dev

There’s no `make install`, but you can run the tool as

$GOPATH/bin/packer

. Note there’s also a program called `packer` provided by the CrackLib package in /usr/sbin on Fedora 20 which can cause a bit of confusion!

Prototyping with Docker

I’m keen on use of Docker for prototyping and for managing contains in general. I’m not really sold on the idea of the Dockerfile as it seems like basically a less portable incarnation of shell scripting. I do think it’s cool that Docker takes a snapshot after each line of the file is executed, but I’ve no idea if this is useful in practice. I’d much prefer to use a configuration management system like Ansible. And rather than installing packages every time I deploy a system, it’d be nice to just build the right one to begin with, like I can do with Morph in Baserock.

Using Packer’s Docker builder doesn’t need to me to write a Dockerfile, and as the Packer documentation points out, all of the configuration that can be described in a Dockerfile can also be specified as arguments to the docker run command.

As a Fedora desktop user it makes sense to use Fedora for now as my Docker base image. So I started with this template.json file:

    {
        "builders": [
            {
                "type": "docker",
                "image": "fedora:20",
                "commit": true
            }
        ],
        "post-processors": [
            {
                "type": "docker-tag",
                "repository": "baserock/openid-provider",
                "tag": "latest",
            }
        ]
    }

I ran packer build template.json and (after some confusion with /usr/sbin/packer) waited for a build. Creating my container took less then a minute including downloading the Fedora base image from the Docker hub. Nice!

I could then enter my image with docker run -i -t and check out my new generic Fedora 20 system.

Initially I thought I’d use Vagrant, which is a companion tool to Packer, to get a development environment set up. That would have required me to use VirtualBox rather than Docker for my development deployment, though, which would be much slower and more memory-hungry than a container. I realised that all I really wanted was the ability to share the Git repo I was developing things in between my desktop and my test deployments anyway, which could be achieved with a Docker volume just as easily.

Edit: I since found out that there are about four different ways to use Vagrant with Docker, but I’m going to stick with my single docker run command for now

So I ended up with the following commandline to enter my development environent:

    docker run -i -t --rm \
        --volume=`pwd`:/src/test-baserock-infrastructure \
        baserock/openid-provider

The only issue is that because I’m running as ‘root’ inside the container, files from the develpoment Git repo that I edit inside the container become owned by root in my home directory. It’s no problem to always edit and run Git from my desktop, though (and since the container system lacks both vim and git, it’s easy to remember!).

Running a web service

I knew of two OpenID providers I wanted to try out. Since Baserock is mostly a Python shop I thought the first one to try should be the Python-based Django OpenID Provider (the alternative being Masq, which is for Ruby on Rails).

Fedora 20 doesn’t ship with Django so the first step was install it in my system. The easiest (though far from the best) way is using the Packer ‘shell’ provisioner and running the following:

    yum install python-pip
    pip install django

Next step was to follow the Django tutorial and get a demo webserver running. Port forwarding is nobody’s friend it took me a bit of time to be able to talk to the webserver (which was in a container) from my desktop. The Django tutorial advises running the server on 127.0.0.1:80 but this doesn’t make so much sense in a container. Instead, I ran the server with the following:

    python ./manage.py runserver 0.0.0.0:80

And I ran the container as follows:

    docker run -i -t --rm \
        --publish=127.0.0.1:80:80 \
        --volume=`pwd`:/src/test-baserock-infrastructure \
        baserock/openid-provider

So inside the container the web server listens on all interfaces, but but it’s forwarded only to localhost on my desktop, so that other computers can’t connect to my rather insecure test webserver.

I then spent a while learning Django and setting up Django OpenID Provider in my Django project. It was actually quite easy and fun! Eventually I got to the point where I wanted to put my demo OpenID server on the internet, so I could test it against a real OpenID consumer.

Deploying to OpenStack

The Packer to deployment to OpenStack proved a bit more tricky than deploying to Docker. It turns out that fields like ‘source_image’, ‘flavor’ and ‘networks’ take IDs rather than names, which is pain (although I understand the reason). I had to give my instance a floating IP, too, as Packer needs to contact it via SSH and I’m deploying from my laptop, which isn’t on the cloud’s internal network. It took a while to get a successful deployment but we got there.

I found that using the “files” provisioner to copy in the code of the Django application didn’t work: the files were there in the resulting system, but corrupted. It may be better to try and deploy this from Git anyway, but I’m a little confused what went wrong there.

Edit: I found that there was quite a bit of corruption in the files that were added during provisioning, and while I didn’t figure out the cause, I did find that adding a call to ‘sleep 10’ as the first step in provisioning made the issue go away. Messy.

I’m fairly happy with what I managed to get done in a single day: we now have a way of developing and deploying infrastructure which requires minimal fluff in the Git repository and a pretty quick turnaround time. As for the Django OpenID provider, I’ve not yet managed to get it to serve me an OpenID — I guess next Friday I shall start debugging it.

The code is temporarily available at http://github.com/ssssam/test-baserock-infrastructure. If we make use of this in Baserock it will no doubt move to git.baserock.org.

January 16, 2014September 29, 2020 Sam Thursfield

The Fundamentals of OSTree

I’ve had some time at work to get my head around the OSTree tool. I summarised the tool in a previous post. This post is an example of its basic functionality (as of January 2014).

The Repository

OSTree repos can be in one of multiple modes. It seems ‘bare’ is for use on deployed systems (as the actual deployment uses hardlinks so disk space usage is kept to a minimum), while ‘archive-z2’ is for server use. The default is ‘bare’.

Every command requires a –repo=… argument, so it makes sense to create an alias to save typing.

$ cd ostree-demo-1
$ mkdir repo
$ alias ost="ostree --repo=`pwd`/repo"

$ ost init
$ ls repo
config  objects  refs  remote-cache  tmp
$ cat repo/config
[core]
repo_version=1
mode=bare

Committing A Filesystem Tree

Create some test data and commit the tree to ‘repo’. Note that none of the commit SHA256s here will match the ones you see when running these commands, unless you are in some kind of time vortex.

$ mkdir tree
$ cd tree
$ echo "x" > 1
$ echo "y" > 2
$ mkdir dir
$ cp /usr/share/dict/words words

$ ost commit --branch=my-branch \
    --subject="Initial commit" \
    --body="This is the first commit."
ce19c41036cc45e49b0cecf6b157523c2105c4de1ce30101def1f759daafcc3e

$ ost ls my-branch
d00755 1002 1002      0 /
-00644 1002 1002      2 /1
-00644 1002 1002      2 /2
-00644 1002 1002 4953680 /words
d00755 1002 1002      0 /dir

Let’s make a second commit.

$ tac words > words.tmp && mv words.tmp words

$ ost commit --branch=my-branch \
    --subject="Reverse 'words'"
    --body="This is the second commit."
67e382b11d213a402a5313e61cbc69dfd5ab93cb07fbb8b71c2e84f79fa5d7dc

$ ost log my-branch
commit 67e382b11d213a402a5313e61cbc69dfd5ab93cb07fbb8b71c2e84f79fa5d7dc
Date:  2014-01-14 12:27:05 +0000

    Reverse 'words'

    This is the second commit.

commit ce19c41036cc45e49b0cecf6b157523c2105c4de1ce30101def1f759daafcc3e
Date:  2014-01-14 12:24:19 +0000

    Initial commit

    This is the first commit.

Now you can see two versions of ‘words’:

$ ost cat my-branch words | head -n 3
ZZZ
zZt
Zz
error: Error writing to file descriptor: Broken pipe

$ ost cat my-branch^ words | head -n 3
1080
10-point
10th
error: Error writing to file descriptor: Broken pipe

OSTree lacks most of the ref convenience tools that you might be used to from Git. For example, where providing a SHA256 you cannot abbreviate it, you must give the full 40 character string. There is an ostree rev-parse command which takes a branch name and returns the commit that branch ref currently points too, and you may use ^ to refer to the parent of a commit as seen above.

Checking Out Different Versions of the Tree

The last command we’ll look at is ostree checkout. This command expects you to be checking out the tree into an empty directory, and if you give it only a branch or commit name it will create a new directory with the same name as that branch or commit:

$ ost checkout my-branch
$ ls my-branch/
1  2  dir  words
$ head -n 3 my-branch/words
ZZZ
zZt
Zz

By default, OSTree will refuse to check out into a directory that already exists:

$ ost checkout my-branch^ my-branch
error: File exists

However, you can also use ‘union mode’, which will overwrite the contents of the directory (keeping any files which already exist that the tree being checked out doesn’t touch, but overwriting any existing files that are also present in the tree):

$ ost checkout --union my-branch^ my-branch
$ ls my-branch/
1  2  dir  words
$ head -n 3 my-branch/words 
1080
10-point
10th

One situation where this is useful is if you are constructing a buildroot that is a combination of several other trees. This is how the Gnome Continuous build system works, as I understand it.

Disk Space

In a bare repository OSTree stores each file that changes in a tree as a new copy of that file, so our two copies of ‘words’ make the repo 9.6MB in total:

$ du -sh my-branch/
4.8M    my-branch/
$ du -sh repo/
9.6M    repo/

It’s clear that repositories will get pretty big when there is a full operating system tree in there and there is no functionality right now that allows you to expire commits that are older than a certain threshold. So if you are looking for somewhere to start hacking …

In Use

OSTree contains a higher layer of commands which operate on an entire sysroot rather than just a repository. These commands are documented in the manual, and a good way to try them out is to download the latest Gnome Continuous virtual machine images.

January 8, 2014September 29, 2020 Sam Thursfield

OS-level version control

At the time of writing (January 2014), the area of OS-level version control is at the “scary jungle of new-looking projects” stage. I’m pretty sure we are at the point where most people would answer the question “Is it useful to be able to snapshot and rollback your operating system?” with “yes”. Faced with the next question, “How do you recommend that we do it?” the answer becomes a bit more difficult.

I have not tried out all of the projects below but hopefully this will be a useful summary.

Snapper

The openSUSE project have created Snapper, which has a specific goal of providing snapshot, rollback and diff of openSUSE installations.

It’s not tied to a specific implementation method, instead there are multiple backends, currently for btrfs, LVM or ext4 using the ‘next4’ branch. Btrfs snapshots seems to be the preferred implementation.

The tool is implemented as a daemon which exposes a D-Bus API. There is a plugin for the Zypper package manager and the YaST config tool which automatically creates a snapshot before each operation. There is a cron job which creates a snapshot every hour, and a PAM module which creates a snapshot on every login. There is also a command-line client which allows you to create and manage snapshots manually (of course, you can also call the D-Bus API directly).

Creating a snapshot every hour might sound expensive but remember that Btrfs is a copy-on-write filesystem, so if nothing has changed on disk then the snapshot comes virtually free. Even so, there is also a cron job set up which cleans up stale snapshots in a configurable way.

The openSuSE user manual contains this rather understated quote:

Since Linux is a multitasking system, processes other than YaST or zypper may modify data in the timeframe between the pre- and the post-snapshot. If this is the case, completely reverting to the pre-snapshot will also undo these changes by other processes. In most cases this would be unwanted — therefore it is strongly recommended to closely review the changes between two snapshots before starting the rollback. If there are changes from other processes you want to keep, select which files to roll back.

OSTree

GNOME/Red Hat developer Colin Walters is behind OSTree, which is notable primarily because it provides snapshotting without depending on any underlying filesystem features. Instead, it implements its own version control system for binary files.

In brief, you can create snapshots and go back to them later on, and compare differences between them. There doesn’t seem to be a way to delete them, yet — unlike Snapper, which is being developed with a strong focus on being usable today, OSTree is being developed bottom-up starting from the “Git for binaries” layer. There is work on the higher level in progress; Colin recently announced a prototype of RPM integration for OSTree.

Beyond allowing local management of filesystem trees, OSTree also has ~~push and~~ a pull command, which allow you to share branches between machines. Users of the GNOME Continuous continuous integration system can use this today to obtain a ready-built GNOME OS image from the build server, and then keep it up to date with the latest version using binary deltas.

OSTree operates atomically, and the root tree is mounted read-only so that other processes cannot write data there. This is a surefire way to avoid losing data, but it does require major rethinking on how Linux-based OSes work. OSTree provides an init ramdisk and a GRUB module, which allows you to choose at boot time which branch of the filesystem to load.

There are various trade-offs between OSTree’s approach versus depending on features in a specific filesystem. OSTree’s manual discusses these tradeoffs in some detail. An additional consideration is that both OSTree and Btrfs are still under active development and therefore you may encounter bugs that you need to triage and fix. OSTree is roughly 28k lines of C, running entirely in userland, while the kernel portion of Btrfs alone is roughly 91k lines of C.

Docker

Docker is a tool for managing Linux container images. As well as providing a helpful wrapper around LXC for running containers, it allows taking snapshots of the state of a container. Until recently it implemented this using the AUFS union file system, which requires patching your kernel to use and is unlikely to make it into mainline Linux any time soon, but as of version 0.7 Docker allows use of multiple storage backends. Alex Larsson (coincidentally another GNOME/Red Hat developer) implemented a practical device-mapper storage backend which works on all Linux-based OSes. There is talk of a Btrfs storage backend too, but I have not seen concrete details since this prototype from May 2013.

It can be kind of confusing to understand Docker’s terminology at first so I’ll run through my understanding of it. The machine you are actually running has a root filesystem and some configuration info, and that’s a container. You create containers by calling docker run <image>, where an image is a root filesystem and some configuration info, but stored. You can have multiple versions of the same image, which are distinguished by tags. For example, Docker provide a base image named ‘ubuntu’ which has a tag for each available version. The rule seems to be that if you’re using it, it’s a container, if it’s a snapshot or something you want to use as a base for something else, it’s an image. You’ll have to train yourself to avoid ever using the term “container image” again.

Docker’s version control functionality is quite primitive. You can call docker commit to create a new image or a new tag of an existing image from a container. You can call docker diff to show an svn status-style list of the differences between a container and the image that it is based on. You can also delete images and containers.

You can also push and pull images from repositories such as the public Docker INDEX. This allows you to get a container going from scratch pretty quickly. However you can create an image from a rootfs tarball using docker import, so you can base your containers off any operating system you like.

On top of all this, Docker provides a simple automation system for creating containers via calls to a package manager or a more advanced provisioning system like Puppet (or just arbitrary shell commands).

Linux Containers (LXC)

Docker is implemented on top of Linux Containers. It’s possible to use these commands directly, too (although unlike Docker, where ignoring the manual and using the default settings will get you to a working container, using LXC commands without having read the manual first is likely to lead to crashing your kernel). LXC provides a snapshotting mechanism too via the lxc-clone and lxc-snap commands. It uses the snapshotting features of either Btrfs or LVM, so it requires that your containers (not just the rootfs, but the configuration files and all) are stored on either a Btrfs or LVM filesystem.

Baserock

The design of Baserock includes atomic snapshot and rollback using Btrfs. Not much time has been spent on this so far, partly because despite being originally being aimed at embedded systems Baserock is seeing quite a lot of use as a server OS.

The component that does exist is tb-diff, which is a tool for comparing two snapshots (two arbitrary filesystems, really) and constructing a binary delta. This is useful for providing upgrades, and could even be used to rebase in-development systems on top of a new base image. While all the above projects provide a ‘diff’ command, Snapper’s simply defers to the UNIX diff command (which can only display differences in text files, not binaries), and Docker’s doesn’t display contents at all, just the names of the files that have changes.

Addendum: Nix

It was remiss of me I think not to have mentioned Nix, which aims to remove the need for snapshots altogether by implementing rollback (and atomic package management transactions) at the package level using hardlinks. Nix is a complex enough beast that you wouldn’t want to use it in a non-developer-focused OS as-is, but you could easily implement a snapshot and rollback mechanism on top.