Manchester GNOME 3.22 Release Party – Friday 23rd Sept. @ MADLab

We are hosting a party for the new GNOME release this Friday (23rd September).

The venue is MADLab in Manchester city centre (here’s a map). We will be there between 18:00 and 21:00. There will be some free refreshments, an overview of the new features in 3.22, advice on how install a free desktop OS on your computer and how contribute to GNOME or a related Free Software project.

Everyone is welcome, including users of rival desktop environments & operating systems🙂


Posted in Uncategorized | 3 Comments

What’s coming in Tracker 1.10

Tracker 1.9.1 was released last month, and it comes with some work we did to improve the various extract modules (the code which looks at files on disk and extracts what we think is useful information). The extract modules are no longer hardcoded to generate SPARQL commands, instead they now use the new TrackerResource API which is a simple way of describing resources programmatically.

TrackerResource is hopefully fairly self-explanatory; you can read the unstable documentation for it here. Basically a TrackerResource object represents a resource (for example a Video) and the API lets you describe properties of that resource (such as its title, who directed it, etc.). When you’re done you can serialize it to some kind of interchange format. The RDF working committee have invented many of these over the years (most of which are absurd); Tracker generally uses Turtle, which is both efficient and human-friendly, so TrackerResource lets you serialize information about one or more resources either to Turtle or to a series of SPARQL commands that you can use to update a database (such as the Tracker-store!) with the new information.

What’s the point? One highlight of this work was removing 2,000 lines of code from the extract modules, and making them (in my opinion) a whole lot more readable and maintainable in the process … but more interesting to me is that the code in the extract modules should be useful to a lot more people. Scanning files and then outputting a series of SPARQL INSERT statements is really only useful to folk who want to use a SPARQL database to track file metadata, and most people (at least in the desktop & embedded world) are not asking for that at all. Data serialized as Turtle can be easily parsed by other programs, inserted into tables in an SQL database, and/or converted into any other serialization format you want; the most interesting one to me is JSON-LD, which gets you the convenience of JSON and (optionally) the rigour of Linked Data too. In fact I’m hoping TrackerResource will be able to serialize to JSON-LD directly in future, the work is partly done already.

I’ve often felt like there’s a lot of solid, well-tested code sat in tracker.git that would be a useful to a lot more people if it wasn’t quite so Tracker-specific. Hopefully this will help to “open up” all that code that parses different file formats. I should note that there’s nothing really Tracker-specific in the TrackerResource code, it could be moved out into a generic “RDF for GObject folk” library if there was any demand for such a thing. Likewise, the Tracker extract modules could be moved into their own “Generic metadata extraction” library if it looked like the extra maintenance burden would be worthwhile (splitting up Tracker has been under discussion for some time).

Unless someone speaks up, the TrackerSparqlBuilder API will be deprecated in due course in favour of TrackerResource. The TrackerSparqlBuilder API was never flexible enough to build the kind of advanced queries that SPARQL is really designed for, it was basically created so we could create correct INSERT statements in the extractors, and it’s no longer needed there.

This is quite a big change internally, although functionality-wise nothing should have changed, so please test out Tracker 1.9.1 if you can and report any issues you see to GNOME Bugzilla.

Example usage of TrackerResource

Object Resource Mapping (the resource-centric equivalent of Object Relation Mapping in the SQL database world) is not a new idea… Adrien Bustany had a go at an ORM for Tracker many years ago now named Hormiga. The Hormiga approach was to generate code to deal with each new type of resource which is fiddly in practice, although it does mean validation code can be generated automatically.

The TrackerResource approach was inspired by the Python RDFLib library which has a similar rdflib.Resource class. It’s simple and effective, and suits the “open world”, “there are no absolute truths except that everything is a resource” philosophy of RDF better.

To describe something with TrackerResource, you might do this:

    #include <libtracker-sparql/tracker-sparql.h>


    TrackerResource *video = tracker_resource_new ("file:///home/sam/cat-on-stairs.avi");
    tracker_resource_set_string (video,
                                 "My cat walking downstairs");

Properties are themselves resources so they have URIs, which you can follow to find a description of said property… but it’s quite painful reading and writing long URIs everywhere, so you should of course take advantage of Compact URIs and do this instead:

    TrackerResource *video = tracker_resource_new ("file:///home/sam/cat-on-stairs.avi");
    tracker_resource_set_string (video, "nie:title", "My cat walking downstairs");

You can now serialize this information to Turtle or to SPARQL, ready to send it or save it somewhere. For it to make sense to the receiver, the compact URI prefixes need to be kept with the data. The TrackerNamespaceManager class tracks the mappings between prefixes and the corresponding full URIs, and you can call tracker_namespace_manager_get_default() to receive a TrackerNamespaceManager that already knows about Tracker’s built-in prefixes. So to generate some Turtle you’d do this:

    char *text;
    text = tracker_resource_print_turtle (video,
                                          tracker_namespace_manager_get_default ());
    g_print (text);
    g_free (text);

The output looks like this:

    @prefix nie: <> .

    <file:///home/sam/cat-on-stairs.avi> nie:title "My cat walking downstairs" .

The equivalent JSON-LD output could be this:

    "@context": {
      "title": ""
    "@id": "file:///home/sam/cat-on-stairs.avi",
    "title": "My cat walking downstairs"

Here’s the equivalent SPARQL update query. Note that it removes any existing
values for nie:title.

    DELETE {
          nie:title ?nie_title
    WHERE {
          nie:title ?nie_title
    INSERT {
    <file:///home/sam/cat-on-stairs.avi> nie:title "My cat walking downstairs" .

If you want to add a 2nd value to a property, rather than removing existing
values, you can do that too. The number of values that a property can have is
called its “cardinality” for some reason. If a property has a cardinality of
more than 1, you can *add* values, e.g. maybe you want to associate some new
artwork you found:

    tracker_resource_add_uri (video,

The generated SPARQL for this would be:

    INSERT {
    <file:///home/sam/cat-on-stairs.avi> nie:title "My cat walking downstairs" .

There’s no DELETE statement first because we are *adding* a value to the property, not *setting* the property

By the way, it’s up to the Tracker Store to make sure that the data you are
inserting is valid. TrackerResource doesn’t have any way right now to check that the data you give it makes sense according to the ontologies (database schemas). So you can give the video 15 different titles, but if you try to execute the INSERT statement that results then the Tracker Store will raise an error, because the nie:title
property has a cardinality of 1 — it can only have 1 value.

Right now TrackerResource doesn’t support *querying* resources. It’s never
going to gain any kind of advanced querying features that try to hide SPARQL
from developers; the best way to write advanced data queries is using a query
language, and that’s really what SPARQL is best at. It is possible we could add a way to read information about known resources in the Tracker Store into
TrackerResource objects, and if that sounds useful to you please get hacking🙂

Example usage of the new Tracker extractors

It’s always been possible to run `tracker-extract` on a single file, by
running `/usr/libexec/tracker-extract –file` explicitly. Try it:

    $ /usr/libexec/tracker-extract --file ~/Downloads/Best\ Coast\ -\ The\ Only\ Place.mp3  --verbosity=0
    Locale 'TRACKER_LOCALE_LANGUAGE' was set to 'en_GB.utf8'
    Locale 'TRACKER_LOCALE_TIME' was set to 'en_GB.utf8'
    Locale 'TRACKER_LOCALE_COLLATE' was set to 'en_GB.utf8'
    Locale 'TRACKER_LOCALE_NUMERIC' was set to 'en_GB.utf8'
    Locale 'TRACKER_LOCALE_MONETARY' was set to 'en_GB.utf8'
    Setting priority nice level to 19
    Loading extractor rules... (/usr/share/tracker/extract-rules)
      Loaded rule '10-abw.rule'
      Loaded rule '10-bmp.rule'
      Loaded rule '10-comics.rule'
      Loaded rule '10-dvi.rule'
      Loaded rule '10-ebooks.rule'
      Loaded rule '10-epub.rule'
      Loaded rule '10-flac.rule'
      Loaded rule '10-gif.rule'
      Loaded rule '10-html.rule'
      Loaded rule '10-ico.rule'
      Loaded rule '10-jpeg.rule'
      Loaded rule '10-msoffice.rule'
      Loaded rule '10-oasis.rule'
      Loaded rule '10-pdf.rule'
      Loaded rule '10-png.rule'
      Loaded rule '10-ps.rule'
      Loaded rule '10-svg.rule'
      Loaded rule '10-tiff.rule'
      Loaded rule '10-vorbis.rule'
      Loaded rule '10-xmp.rule'
      Loaded rule '10-xps.rule'
      Loaded rule '11-iso.rule'
      Loaded rule '11-msoffice-xml.rule'
      Loaded rule '15-gstreamer-guess.rule'
      Loaded rule '15-playlist.rule'
      Loaded rule '15-source-code.rule'
      Loaded rule '90-gstreamer-audio-generic.rule'
      Loaded rule '90-gstreamer-image-generic.rule'
      Loaded rule '90-gstreamer-video-generic.rule'
      Loaded rule '90-text-generic.rule'
    Extractor rules loaded
    Initializing media art processing requirements...
    Found '37 GB Volume' mounted on path '/media/TV'
      Found mount with volume and drive which can be mounted: Assuming it's  removable, if wrong report a bug!
      Adding mount point with UUID: '7EFAE0646A3F8E6E', removable: yes, optical: no, path: '/media/TV'
    Found 'Summer 2012' mounted on path '/media/Music'
      Found mount with volume and drive which can be mounted: Assuming it's  removable, if wrong report a bug!
      Adding mount point with UUID: 'c94b153c-754a-48d4-af67-698bb8972ee2', removable: yes, optical: no, path: '/media/Music'
    MIME type guessed as 'audio/mpeg' (from GIO)
    Using /usr/lib64/tracker-1.0/extract-modules/
    GStreamer backend in use:
    Retrieving geolocation metadata...
    Processing media art: artist:'Best Coast', title:'The Only Place', type:'album', uri:'file:///home/sam/Downloads/Best%20Coast%20-%20The%20Only%20Place.mp3', flags:0x00000000
    Album art already exists for uri:'file:///home/sam/Downloads/Best%20Coast%20-%20The%20Only%20Place.mp3' as '/home/sam/.cache/media-art/album-165d0788d1c055902e028da9ea6db92a-b93cfc4ae665a6ebccfb820445adec56.jpeg'
    Done (14 objects added)

    SPARQL pre-update:
    INSERT {
    <urn:artist:Best%20Coast> a nmm:Artist ;
         nmm:artistName "Best Coast" .
    INSERT {
    <urn:album:The%20Only%20Place:Best%20Coast> a nmm:MusicAlbum ;
         nmm:albumTitle "The Only Place" ;
         nmm:albumArtist <urn:artist:Best%20Coast> .
    DELETE {
    <urn:album-disc:The%20Only%20Place:Best%20Coast:Disc1> nmm:setNumber ?unknown .
    WHERE {
    <urn:album-disc:The%20Only%20Place:Best%20Coast:Disc1> nmm:setNumber ?unknown .
    DELETE {
    <urn:album-disc:The%20Only%20Place:Best%20Coast:Disc1> nmm:albumDiscAlbum ?unknown .
    WHERE {
    <urn:album-disc:The%20Only%20Place:Best%20Coast:Disc1> nmm:albumDiscAlbum ?unknown .  }
    INSERT {
    <urn:album-disc:The%20Only%20Place:Best%20Coast:Disc1> a nmm:MusicAlbumDisc ;
         nmm:setNumber 1 ;
         nmm:albumDiscAlbum <urn:album:The%20Only%20Place:Best%20Coast> .

    SPARQL item:
     a nfo:Audio , nmm:MusicPiece ;
         nie:title "The Only Place" ;
         nie:comment "Free download from and" ;
         nmm:trackNumber 1 ;
         nfo:codec "MPEG-1 Layer 3 (MP3)" ;
         nfo:gain 0 ;
         nfo:peakGain 0 ;
         nmm:performer <urn:artist:Best%20Coast> ;
         nmm:musicAlbum <urn:album:The%20Only%20Place:Best%20Coast> ;
         nmm:musicAlbumDisc <urn:album-disc:The%20Only%20Place:Best%20Coast:Disc1> ;
         nfo:channels 2 ;
         nfo:sampleRate 44100 ;
         nfo:duration 164 .

    SPARQL where clause:

    SPARQL post-update:

All the info is there, plus way more logging output that you didn’t ask for,
but it’s really hard to make use of that and this feature was really only
intended for testing.

In the Tracker 1.9.1 unstable release things have improved. The `tracker`
commandline tool has an `extract` command which you can use to see the metadata of any file that has a corresponding extract module, and the output is readable both for other programs and for humans (at least, for developers):

    $ tracker extract ~/Downloads/Best\ Coast\ -\ The\ Only\ Place.mp3
    @prefix rdf: <> .
    @prefix nmm: <> .
    @prefix nie: <> .
    @prefix nfo: <> .

    <urn:artist:Best%20Coast> nmm:artistName "Best Coast" ;
      a nmm:Artist .

    <urn:album:The%20Only%20Place> nmm:albumTitle "The Only Place" ;
      a nmm:MusicAlbum .

    <urn:album-disc:The%20Only%20Place:Disc1> nmm:setNumber 1 ;
      nmm:albumDiscAlbum <urn:album:The%20Only%20Place> ;
      a nmm:MusicAlbumDisc .

    <file:///media/home-backup/Best%20Coast%20-%20The%20Only%20Place.mp3> nie:comment "Free download from and" ;
      nmm:trackNumber 1 ;
      nmm:performer <urn:artist:Best%20Coast> ;
      nfo:averageBitrate 128000 ;
      nmm:musicAlbum <urn:album:The%20Only%20Place> ;
      nfo:channels 2 ;
      nmm:dlnaProfile "MP3" ;
      nmm:musicAlbumDisc <urn:album-disc:The%20Only%20Place:Disc1> ;
      a nmm:MusicPiece , nfo:Audio ;
      nfo:duration 164 ;
      nfo:codec "MPEG" ;
      nmm:dlnaMime "audio/mpeg" ;
      nfo:sampleRate 44100 ;
      nie:title "The Only Place" .

Turtle is still a bit of an esoteric format; I think this will be more fun when it can output JSON, but you can already pipe this into a tool like `rapper` or `rdfpipe` and convert it to whatever format you like.

Other news

I didn’t make it to GUADEC this year because I was busy raising money for charity by driving a car right across Europe, Russia and Mongolia… but I was excited to find out that our bid to host it in Manchester in 2017 has been accepted. Any UK folk wishing to get involved in the organisation, please get in touch!

Posted in Uncategorized | 2 Comments

Leaving the EU

I've never voted for Christmas before, but we all need to accept savage cuts.

In a few weeks the UK has a referendum over whether we should remain a member of the EU.

It’s completely impossible to make a fully informed decision on whether leaving the EU now would be ultimately beneficial, unless you can actually see ten years into the future.

I’ll be voting to remain, for a few reasons:

1. I’m proud to be from a part of Europe, and I like having the freedom to travel and work anywhere in Europe.

2. Most of the “Leave” campaign’s arguments boil down to xenophobia, fear-mongering and “economics“. Sure we’re in the midst of a global population crisis, but leaving the EU will hardly solve that.

3. The “Leave” campaign is fronted by various awful human beings who make me want to do the opposite of whatever they say (although, comically, the “Remain” campaign is fronted by someone who doesn’t like the EU at all — that’s the return on a Faustian bargain he made a decade ago so that he could become leader of the Conservative party & Prime Minister)

4. Taking power from the EU means giving more power to the Conservative party, the same delusional, incompetent, hypocritical arms-dealing racist election cheats who have been steadily running the country into the ground for the last 6 years; ruining education, welfare, healthcare, the police, the economy, the universities, and anything else they can get their hands on.

5. A restricted border between Northern Ireland and the Republic of Ireland would be a massive step backwards for people there.

6. If we did leave, there would be no going back.

Actually I don’t care much about what we end up voting for. I became fully disillusioned with British politics five years ago, when we were given the once-in-a-lifetime opportunity to change from our hopelessly unfair voting system to one that’s slightly better. We voted 67% in favour of the most unfair voting system. There has been no UK government in my entire lifetime that I could be at all proud of. If there’s one thing you can count on, it’s the British public voting to shaft ourselves!

I would like to put on an international software conference in England next year. The upside of leaving the EU would be that presumably the Pound will be at about the lowest it could possibly be, so GUADEC would be quite cheap for everyone! The downside: maybe a lot more people would have to suffer Britain’s complex, offensive and arbitrary VISA application process.

We’ll have to wait and see what happens on 23rd June. I would be surprised if we vote to leave, because money tends to control politics and there would obviously be financial losses stemming from the uncertainty that would follow a “Leave” vote. On the other hand, the world-class bastards who run our lying, scheming, racist, hate-filled newspapers are mostly anti EU. Never underestimate the power of the British public to completely ruin things for ourselves.

The only good thing that can really come out of this referendum is a climax of infighting in the Conservative government, so grab some popcorn for that.

Image at the top from

Posted in Uncategorized | 9 Comments

Enourmous Git Repositories

If you had a 100GB Subversion repository, where a full checkout came to about 10GB of source files, how would you go about migrating it to Git?

One thing you probably wouldn’t do is import the whole thing into a single Git repo, it’s pretty well known that Git isn’t designed for that. But, you know, Git does have some tools that let you pretend it’s a centralised version control system, and, huge monolithic repos are cool, and it works in Mercurial… evidence is worth more than hearsay, so I decided to create a Git repo with 10GB of text files to see what happened. I did get told in #git on Freenode that Git will not cope with a repo that’s larger than available RAM, but I was a little suspicious given the number of multi-gigabyte Git repos in existance.

I adapted a Bash script from here to create random filenames, and the csmith program to fill those files with nonsense C++ code, until I had 10GB 7GB of such gibberish.(I realised that, having used du -s instead of du --apparent-size -s to check the size of my test data, it was only 7GB of content, that was using 10GB of disk space.)

The test machine was an x86 virtual machine with 2GB of RAM and 1CPU, with no swap. The repo was on a 100GB ext4 volume. Doing a performance benchmark on a virtual machine on shared infrastructure is a bad idea, but I’m testing a bad idea, so whatever. The machine ran Git version 2.5.0.


Generating the initial data: this took all night, perhaps because I included a call to du inside the loop that generated the data, which would take an increasing amount of time on each iteration.

Creating an initial 10GB 7GB commit: 95 minutes

$ time git add .
real    90m0.219s
user    84m57.117s
sys     1m6.932s

$ time git status
real    1m15.992s
user    0m4.071s
sys     0m20.728s

$ time git commit -m "Initial commit"
real    4m22.397s
user    0m27.168s
sys     1m5.815s

The git log command is pretty instant, a git show of this commit takes a minute the first time I run it, about 5 seconds if I run it again.

Doing git add and git rm to create a second commit is really quick, git status is still slow, but git commit is quick:

$ time git status
real    1m19.937s
user    0m5.063s
sys     0m16.678s

$ time git commit -m "Put all z files in same directory"
real    0m11.317s
user    0m1.639s
sys     0m5.306s

Furthermore, git show of this second commit is quick too.

Next I used git daemon to serve the repo over git:// protocol:

$ git daemon --verbose --export-all --base-path=`pwd`

Doing a full clone from a different machine (with Git 2.4.3, over
intranet): 22 minutes

$ time git clone git://
Cloning into 'huge-repo'...
remote: Counting objects: 339412, done.
remote: Compressing objects: 100% (33351/33351), done.
remote: Total 339412 (delta 5436), reused 0 (delta 0)
Receiving objects: 100% (339412/339412), 752.12 MiB | 2.53 MiB/s, done.
Resolving deltas: 100% (5436/5436), done.
Checking connectivity... done.
Checking out files: 100% (46345/46345), done.

real    22m17.734s
user    2m12.606s
sys     0m54.603s

Doing a sparse checkout of a few files: 15 minutes

$ mkdir sparse-checkout
$ cd sparse-checkout
$ git init .
$ git config core.sparsecheckout true
$ echo z-files/ >> .git/info/sparse-checkout

$ time git pull  git:// master
remote: Counting objects: 339412, done.
remote: Compressing objects: 100% (33351/33351), done.
remote: Total 339412 (delta 5436), reused 0 (delta 0)
Receiving objects: 100% (339412/339412), 752.12 MiB | 2.58 MiB/s, done.
Resolving deltas: 100% (5436/5436), done.
From git://
 * branch            master     -> FETCH_HEAD

real    14m26.032s
user    1m9.133s
sys     0m22.683s

This is rather unimpressive. I only pull a 55MB subset of the repo, a single directory, but the clone still takes nearly 15 minutes. Cloning the same subset again from the same git-daemon process took a similar time. The .git directory of the sparse clone is the same size as with a full clone.

I think these numbers are interesting. They show that the sky doesn’t fall if you put a huge amount of code into Git. At the same time, the ‘sparse checkouts’ feature doesn’t really let you pretend that Git is a centralised version control system, so you can’t actually avoid the consequences of having such a huge repo.

Also, I learned that if you are profiling file size, you should use du --apparent-size to measure that, because file size != disk usage!

Disclaimer: there are better ways to spend your time than trying to use a tool for things that it’s not designed for (sometimes).

Posted in Uncategorized | 3 Comments

Codethink is hiring!

We are looking for people who can write code, who match one of these job descriptions at least slightly, and who are willing to relocate to Manchester, UK (so you must either be an EU resident, or able to get a work permit for the UK.) Manchester is number 8 in Lonely Planet’s Best In Travel list for 2016, so really you’d be doing yourself a favour to move here. Remote working is possible if you have lots of contributions to public software projects that demonstrate your amazingness.

There is a nice symmetry to this blog post, I remember reading a similar one quite a few years ago, which led to me applying for a job at Codethink, and i’ve been here ever since, with various trips to exotic countries in between.

If you’re interested, send a CV & cover letter to

Posted in Uncategorized | Leave a comment

CMake: dependencies between targets and files and custom commands

As I said in my last post about CMake, targets are everything in CMake. Unfortunately, not everything is a target though!

If you’ve tried do anything non-trivial in CMake using the add_custom_command() command, you may have got stuck in this horrible swamp of confusion. If you want to generate some kind of file at build time, in some manner other than compiling C or C++ code, then you need to use a custom command to generate the file. But files aren’t targets and have all sorts of exciting limitations to make you forget everything you ever new about dependency management.

What makes it so hard is that there’s not one limitation, but several. Here is a hopefully complete list of things you might want to do in CMake that involve custom commands and custom targets depending on each other, and some explainations as to why things don’t work the way that you might expect.

1. Dependencies between targets

point1-verticalThis is CMake at its simplest (and best).

cmake_minimum_required(VERSION 3.2)

add_library(foo foo.c)

add_executable(bar bar.c)
target_link_libraries(bar foo)

You have a library, and a program that depends on it. When you run CMake, both of them get built. Ideal! This is great!

What is “all”, in the dependency graph to the left? It’s a built in target, and it’s the default target. There are also “install” and “test” targets built in (but no “clean” target).

2. Custom targets

If your project is a good one then maybe you use a documentation tool like GTK-Doc or Doxygen to generate documentation from the code.

This is where add_custom_command() enters your life. You may live to regret ever letting it in.

cmake_minimum_required(VERSION 3.2)

        doxygen docs/Doxyfile
        cmake -E touch docs/doxygen.stamp
        "Generating API documentation with Doxygen"

We have to create a ‘stamp’ file because Doxygen generates lots of different files, and we can’t really tell CMake what to expect. But actually, here’s what to expect: nothing! If you build this, you get no output. Nothing depends on the documentation, so it isn’t built.

So we need to add a dependency between docs/doxygen.stamp and the “all” target. How about using add_dependencies()? No, you can’t use that with any of the built in targets. But as a special case, you can use add_custom_target(ALL) to create a new target attached to the “all” target:

    docs ALL
    DEPENDS docs/doxygen.stamp


In practice, you might also want to make the custom command depend on all your source code, so it gets regenerated every time you change the code. Or, you might want to remove the ALL from your custom target, so that you have to explicitly run make docs to generate the documentation.

This is also discussed here.

3. Custom commands in different directories

Another use case for add_custom_command() is generating source code files using 3rd party tools.

### Toplevel CMakeLists.txt
cmake_minimum_required(VERSION 3.2)


### src/CMakeLists.txt
        cmake -E echo "Generate my C code" > foo.c

### tests/CMakeLists.txt
        test-foo.c ${CMAKE_CURRENT_BINARY_DIR}/../src/foo.c

    NAME test-foo
    COMMAND test-foo

How does this work? Actually it doesn’t! You’ll see the following error when you run CMake:

CMake Error at tests/CMakeLists.txt:1 (add_executable):
  Cannot find source file:


  Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm .hpp
  .hxx .in .txx
CMake Error: CMake can not determine linker language for target: test-foo
CMake Error: Cannot determine link language for target "test-foo".

Congratulations, you’ve hit bug 14633! The fun thing here is that generated files don’t behave anything like targets. Actually they can only be referenced in the file that contains the corresponding add_custom_command() call. So when we refer to the generated foo.c in tests/CMakeLists.txt, CMake actually has no idea where it could come from, so it raises an error.


As the corresponding FAQ entry describes, there are two things you need to do to work around this limitation.

The first is to wrap your custom command in a custom target. Are you noticing a pattern yet? Most of the workarounds here are going to involve wrapping custom commands in custom targets. In src/CMakeLists.txt, you do this:


Then, in tests/CMakeLists.txt, you can add a dependency between “test-foo” and “generate-foo”:

add_dependency(test-foo generate-foo)

That’s enough to ensure that foo.c now gets generated before the build of test-foo begins, which is obviously important. If you try to run CMake now, you’ll hit the same error, because CMake still has no idea where that generated foo.c file might come from. The workaround here is to manually set the GENERATED target property:



Note that this is a bit of a contrived example. In most cases, the correct solution is to do this:

### src/CMakeLists.txt
add_library(foo foo.c)

### tests/CMakeLists.txt
target_link_libraries(test-foo foo)

Then you don’t have to worry about any of the nonsense above, because libraries are proper targets, and you can use them anywhere.

Even if it’s not practical to make a library containing ‘foo.c’, there must be some other target that links against it in the same directory that it is generated in. So instead of creating a “generate-foo” target, you can make “test-foo” depend on whatever other target links to “foo.c”.

4. Custom commands and parallel make

I came into this issue while doing something pretty unusual with CMake: wrapping a series of Buildroot builds. Imagine my delight at discovering that, when parallel make was used, my CMake-generated Makefile was running the same Buildroot build multiple times at the same time! That is not what I wanted!

It turns out this is a pretty common issue. The crux of it is that with the “Unix Makefiles” backend, multiple toplevel targets run as an independent, parallel make processes. Files aren’t targets, and unless something is a target then it doesn’t get propagated around like you would expect.

Here is the test case:

cmake_minimum_required(VERSION 3.2)

    OUTPUT gen
    COMMAND sleep 1
    COMMAND cmake -E echo Hello > gen

    my-all-1 ALL DEPENDS gen

    my-all-2 ALL DEPENDS gen

If you generate a Makefile from this and run make -j 2, you’ll see the following:

Scanning dependencies of target my-all-2
Scanning dependencies of target my-all-1
[ 50%] Generating gen
[100%] Generating gen
[100%] Built target my-all-2
[100%] Built target my-all-1

If creating ‘gen’ takes a long time, then you really don’t want it to happen multiple times! It may even cause disasters, for example running make twice at once in the same Buildroot build tree is not pretty at all.


As explained in bug 10082, the solution is (guess what!) to wrap the custom command in a custom target!

add_custom_target(make-gen DEPENDS gen)


Then you change the custom targets to depend on “make-gen”, instead of the file ‘gen’. Except! Be careful when doing that — because there is another trap waiting for you!

5. File-level dependencies of custom targets are not propagated

If you read the documentation of add_custom_command() closely, and you look at the DEPENDS keyword argument, you’ll see this text:

If DEPENDS specifies any target (created by the add_custom_target(), add_executable(), or add_library() command) a target-level dependency is created to make sure the target is built before any target using this custom command. Additionally, if the target is an executable or library a file-level dependency is created to cause the custom command to re-run whenever the target is recompiled.

This sounds quite nice, like more or less what you would expect. But the important bit of information here is what CMake doesn’t do: when a custom target depends on another custom target, all the file level dependencies are completely ignored.

Here’s your final example for the evening:

cmake_minimum_required(VERSION 3.2)


    OUTPUT gen1
    COMMAND cmake -E echo ${SPECIAL_TEXT} > gen1

    DEPENDS gen1

    OUTPUT gen2
    DEPENDS gen1-wrapper
    COMMAND cmake -E copy gen1 gen2

    all-generated ALL
    DEPENDS gen2

This is subtly wrong, even though you did what you were told, and wrapped the custom command in a custom target.

The first time you build it:

Scanning dependencies of target gen1-wrapper
[ 50%] Generating gen1
[ 50%] Built target gen1-wrapper
Scanning dependencies of target all-generated
[100%] Generating gen2
[100%] Built target all-generated

But then touch the file ‘gen1’, or overwrite it with something other text, or change the value of SPECIAL_TEXT in CMakeLists.txt to something else, and you will see this:

[ 50%] Generating gen1
[ 50%] Built target gen1-wrapper
[100%] Built target all-generated

There’s no file-level dependency created between ‘gen2’ and ‘gen1’, so ‘gen2’ never gets updated, and things get all weird.


You can’t just depend on gen1 instead of gen1-wrapper, because it may end up being built multiple times! See the previous point. Instead, you need to depend on the “gen1-wrapper” target and the file itself:

    OUTPUT gen2
    DEPENDS gen1-wrapper gen1
    COMMAND cmake -E copy gen1 gen2

As the documentation says, this only applies to targets wrapping add_custom_command() output. If ‘gen1’ was a library created with add_library, things would work how you expect.



Maybe I just have a blunt head, but I found all of this quite difficult to work out. I can understand why CMake works this way, but I think there is plenty of room for improvement in the documentation where this is explained. Hopefully this guide has gone some way to making things clearer.

If you have any other dependency-related traps in CMake that you’ve hit, please comment and I’ll add them to this list…

Posted in Uncategorized | 6 Comments

Some CMake tips

Sketch of John Barber's gas turbine, from his patentI spent the past few weeks converting a bunch of Make and Autotools-based modules to use CMake instead. This was my first major outing with CMake. Maybe there will be a few blog posts on that subject!

In general I think CMake has a sound design and I quite want to like it. It seems like many of its warts are due to its long history and the need for backwards compatibility, not anything fundamentally broken. To keep a project going for 16 years is impressive and it is pretty widely used now. This is a quick list of things I found in CMake that confused me to start with but ultimately I think are good things.

  1. Targets are everything

    CMake is pretty similar to normal make in that all the things that you care about are ‘targets’. Libraries are targets, programs are targets, subdirectories are targets and custom commands create files which are considered targets. You can also create custom targets which run commands if executed. You need to use custom targets feature if you want a custom command target to be tied to the default target, which is a little confusing but works OK.

    Targets have properties, which are useful.

  2. Absolute paths to shared libraries

    Traditionally you link to libfoo by passing -lfoo to the linker. Then, if libfoo is in a non-standard location, you pass -L/path/to/foo -lfoo. I don’t think pkg-config actually enforces this pattern but pretty much all the .pc files I have installed use the -L/path -Lname pattern.

    CMake makes this quite awkward to do, because it makes every effort to forget about the linker paths. Library ‘targets’ in CMake keep track of associated include paths, dependent libraries, compile flags, and even extra source files, using ‘target properties’. There’s no target property for LINK_DIRECTORIES, though, so outside of the current CMakeLists.txt file they won’t be tracked. There is a global LINK_DIRECTORIES property, confusingly, but it’s specifically marked as “for debugging purposes.”

    So the recommended way to link to libraries is with the absolute path. Which makes sense! Why say with two commandline arguments what you can say with one?

    At least, this will be fine once CMake’s pkg-config integration returns absolute paths to libraries

  3. Semicolon safety instead of whitespace safety

    CMake has a ‘list’ type which is actually a string with ; (semicolon) used to delimit entities. Spaces are used as an argument separator, but converted to semicolons during argument parsing, I think. Crucially, they seem to be converted before variable expansion is done, which means that filenames with spaces don’t need any special treatment. I like this more than shell code where I have to quote literally every variable (or else Richard Maw shouts at me).

    For example:

    cmake_minimum_required(VERSION 3.2)
    set(path "filename with spaces")
    set(command ls ${path})
    foreach(item ${command})
        message(item: ${item})


    item:filename with spaces

    On the other hand:

    cmake_minimum_required(VERSION 3.2)
    set(path "filename;with\;semicolons")
    set(command ls ${path})
    foreach(item ${command})
        message(item: ${item})



    Semicolons occur less often in file names, I guess. Most of us are trained to avoid spaces, partly because we know how broken (all?) most shell-based build systems are in those cases. CMake hasn’t actually solved this but just punted the special character to a less often used one, as far as I can see. I guess that’s an improvement? Maybe?

    The semi-colon separator can bite you in other ways, for example, when specifying CMAKE_PREFIX_PATH (library and header search path) you might expect this to work:

    cmake . -DCMAKE_PREFIX_PATH=/opt/path1:/opt/path2

    However, that won’t work (unless you did actually mean that to be one item). Instead, you need to pass this:

    cmake . -DCMAKE_PREFIX_PATH=/opt/path1\;/opt/path2

    Of course, ; is a special character in UNIX shells so must be escaped.

  4. Ninja instead of Make

    CMake supports multiple backends, and Ninja is often faster than GNU Make, so give the Ninja backend a try: cmake -G Ninja.

  5. Policies

    The CMake developers seem pretty good at backwards compatibility. To this end they have introduced the rather obtuse policies framework. The great thing about the policies framework is that you can completely ignore it, as long as you have cmake_minimum_required(VERSION 3.3) at the top of your toplevel CMakeLists.txt. You’ll only need it once you have a massive bank of existing CMakeLists.txt files and you are getting started on porting them to a newer version of CMake.

    Quite a lot of CMake error messages are worded to make you think like you might need to care about policies, but don’t be fooled. Mostly these errors are for situations where there didn’t use to be an error, I think, and so the policy exists to bring back the ‘old’ behaviour, if you need it.

If a tool is weird but internally consistent, I can get on with it. Hopefully, CMake is getting there. I can see there have been a lot of good improvements since CMake 2.x, at least. And at no point so far has it made me more angry than GNU Autotools. It’s not crashed at all (impressive given it’s entirely C++ code!). And it is significantly faster and more widely applicable than Autotools or artisanal craft Makefiles. So I’ll be considering it in future. But I can’t help wishing for a build system that I actually liked

Edit: you might also be interested in a list of common CMake antipatterns.

Posted in Uncategorized | 9 Comments