Tracker documentation improvements

Word Cloud of Tracker ontology documentationIt’s cool storing stuff in a database, but what if you shared the database schema so other tools can work with the data? That’s the basic idea of Linked Data which Tracker tries to follow when indexing your content.

In a closed music database, you might see a “Music” table with a “name” column. What does that mean? Is it the name of a song, an artist, an album, … ? You will have to do some digging to find out.

When Tracker indexes your music, it will create a table called nmm:MusicAlbum. What does that mean? You can click the link to find out, because the database schema is self-documenting. The abbreviation nmm:MusicAlbum expands to a URL, which clearly identifies the type of data being stored.

By formalising the database schema, we create a shared vocabulary for talking about the data. This is very powerful – have you seen GMail Highlights, where a button appears in your email inbox to checkin for a flight and such things? These are powered by the https://schema.org/ shared vocabulary. Google don’t manually add support to GMail for each airline in the world. Instead, the airlines embed a https://schema.org/FlightReservation resource in the confirmation email which GMail uses to show the information. The vocabulary is an open standard, so other email providers can use the same data and even propose improvements. Everyone wins!

Recent improvements

Tracker began 5 years before the creation of schema.org, and we use an older vocabulary from a project called Nepomuk. Tracker may now be the only user of the Nepomuk vocabularies, but to avoid a huge porting effort we have opted to keep using them for 3.0.

Inspired by schema.org documentation, I changed the formatting of Tracker’s schema documentation trying to pack the important information more densely. Compare the 2.x documentation to the 3.x documentation to see what has changed – I think it’s a lot more readable now.

We have also stopped using broken or incorrect URLs. The https://tracker.api.gnome.org/ namespace was recently set up by the incredibly efficient GNOME sysadmins and we can trust it not to disappear at random, unlike the http://tracker-project.org/ and https://www.semanticdesktop.org/ontologies/ namespaces we were using before.

One thing you will notice if you followed the nmm:MusicAlbum link above is that the contents of the documentation still requires some improvement. I hope to see incremental improvements here; if you think you can make it better, please send us a merge request !

CLI documentation

We maintain documentation for the tracker CLI tool in the form of man pages. These were a bit neglected. We now publish the man pages online making it easier to read them and harder to forget they exist.

Internally this is done using Asciidoc and xmlto, plus a small Python script to post-process the output.

User documentation

There is a well-written and quite outdated set of documentation at https://wiki.gnome.org/Projects/Tracker. It’s mostly aimed at setting up Tracker on systems where it doesn’t come ready-integrated – which is a use case we don’t really want to support. I’m a bit stuck as I don’t want to delete what is quite good content, but I also don’t want to maintain documention for things that nobody should need to do…

Documentation hosting

This is a periodic reminder that the library-web script that manages developer.gnome.org needs a major reworking, such as the one proposed here. All the Tracker documentation on http://developer.gnome.org/ is years out of date, because we switched to Meson which requires us to do extra effort on each release to post the documentation. Much kudos is awaiting the people who can resolve this.

Stay tuned

Work is proceeding nicely on Tracker 3.0 and we hope to have the first beta release ready within the next couple of weeks. At that point, there will be opportunities to help with testing app ports and making sure performance is good – I will keep you posted here!

Tracker developer experience improvements

There have been lots of blog posts since I suggested we write more blog posts. Great! I’m going to write about what I’ve done this month.

I’m excited that work started on Tracker 3.0, after we talked about it at GUADEC 2019. We merged Carlos’ enourmous branch to modernize the Tracker store database. This has broken some tests in tracker-miners, and the next step will be to track down and fix these regressions.

I’ve continued looking at the developer experience of Tracker. Recently we modernized the README.md file (as several GNOME projects have done recently). I want the README to document a simple “build and test Tracker from git” workflow, and that led into work making it simpler to run Tracker from the build tree, and also a bunch of improvements to the test suite.

The design of Tracker has always meant that it’s a pain in the ass to build and test, because to do anything useful you need to have 3 different daemons running and talking to each other over D-Bus, reading and writing data in the same location, and communicating with the CLI or an app. We had a method for running Tracker from the build tree for use by automated tests, whose code was duplicated in tracker.git and tracker-miners.git, and then we had a separate script for developers to test things manually, but you still had to install Tracker to use that one. It was a bit of a mess.

The first thing I fixed was the code duplication. Now we have a Python module named trackertestutils. We install it, so we don’t need to duplicate code between tracker.git and tracker-miners.git any more. Thanks to Marco Trevisan we also install a pkgconfig file.

Then I added a ./run-uninstalled script to tracker-miners.git. The improvement in developer experience I think is huge. Now you can do this to try out the latest Tracker code:

    git clone tracker-miners.git
    cd tracker-miners && mkdir build && cd build
    meson .. && ninja
    ./run-uninstalled --wait-for-miner=Files --wait-for-miner=Extract -- tracker index --file ~/Documents
    ./run-uninstalled -- tracker search "Hello"

The script is a small wrapper around trackertestutils, which takes care of spawning a private D-Bus daemon, collecting and filtering logs, and setting up the environment so that the Tracker cache is written to `/tmp/tracker-data`. (At the time of writing, there are some bug still and ./run-installed actually still requires you to install Tracker first.)

I also improved logging for Tracker’s functional-test suite. Since a year ago we’ve been running these tests in CI, but there have been some intermittent failures, which were hard to debug because log output from the tests was very messy. When you run a private D-Bus session, all kinds of daemons spawn and dump stuff to stdout. Now we set G_MESSAGE_PREFIXED in the environment, so the test harness can separate the messages that come from Tracker processes. It’s already allowed me to track down some of these annoying intermittent failures, and to increase the default log verbosity in CI.

Another neat thing about installing trackertestutils is that downstream projects can use it too. Rishi mentioned at GUADEC that gnome-photos has a test which starts the photos app and ends up displaying the actual photo collection of the user who is running the test. Automated tests should really be isolated from the real user data. And using trackertestutils, it’s now simple to do that: here’s a proof of concept for gnome-photos.

And I made a new tune!