Tracker 3.0: What’s New?

This is part 2 of a series. Part 1 is here. Come back next week to find out more about the history and future of Tracker.

It was only a single line in the release notes. There weren’t any new graphics to show in the video. We leave cool UIs to others.

So what do we have to show, after a year of focused effort and a series of disruptive change to Tracker and GNOME?

A complete redesign

Where earlier efforts failed, Tracker has made full-text search a first-class feature in GNOME.

However, the shortcomings of the 2000’s era design have been clear for a while.

Back in 2005, around the time your grandparents first met each other on Myspace, it seemed a great idea to aggregate all the metadata we could find into a single database. The old tracker-store database from Tracker 2.x includes the search index created by Tracker Miner FS right next to user data stored by apps like Notes, Photos and Contacts. This was going to allow cool features like tagging people in your photo collection with their phone number and online status (before the surveillence-advertising industry showed how creepy that actually is). Ivan Frade’s decade-old talk “Semantic social desktop& mobile devices” is a great insight into the thinking of the time.

I’m going to dig into Tracker’s origins in a future article, but for now — note that “Security on graphs” is listed in Ivan’s presentation as a “to-do” item.

Security

In a world of untrusted Flatpak apps, “to-do” isn’t good enough. Any app that uses the system search service, even any app that stores RDF data with Tracker, requests a Flatpak D-Bus permission for org.freedesktop.Tracker1. This gives access to the entire tracker-store database, right down to the search terms indexed from ‘Documents’ folder. Imagine your documents as a savoury snack, stored in a big monolithic building. You accidentally install and run a malicious app, represented here as a hungry seagull…

To solve this, we had to make access control more granular. It didn’t make sense to retrofit this to tracker-store. During 2019, Carlos casually eliminated the monolithic tracker-store altogether and in its place implemented a desktop-wide distributed database, taking Tracker from a “public-by-default” model to “private-by-default”

The new libtracker-sparql-3 API lets apps store SPARQL data anywhere they like. You can keep it private, if you just want a lightweight database. Nautilus and Notes are already doing this, to store starred files and note data respectively.

If and when you want to publish data, it’s done by creating a TrackerEndpoint on DBus. Using a SPARQL federated query, one Tracker SPARQL database can pull data from multiple others in a single query. This, for example, allows Photos to merge photo metadata from the search index with album metadata stored in its own database. (I wrote more about this back in March).

The search index created by Tracker Miner FS is published at org.freedesktop.Tracker3.Miner.Files, but we don’t let Flatpak apps access this directly. A new Flatpak portal gates access to search based on content type. You can now install a music player app and let it search ONLY your music collection, where previously your options were “let the app search everything” or “break it”.

A clearer architecture

“I’m finally starting to “get” tracker 3. And it’s like an epiphany. “

Antonio

If someone asked “What actually is Tracker?” I used to find it tricky to answer. We narrowed the focus down to two things: a lightweight database, and a search engine.

For the last 3 years we worked on separating these two concerns, and as of Tracker 3.0 we are done. Were we starting from search, we could find clearer names for the two parts than ‘tracker’ and ‘tracker-miners’, but we kept the repos and package names the same to avoid making the 2.x to 3.x transition harder for distributors.

The name “Tracker” refers to the overall project. You can use “GNOME Tracker” for clarity where needed. The project maintains two code repositories:

  • Tracker SPARQL: a distributed database, provided as a GObject C library and implementing the full SPARQL 1.1 query standard.
  • Tracker Miners: a content indexer for the desktop, providing the
    Tracker Miner FS system service and its companion Tracker Extract.

The tracker3 commandline tool can operate on any Tracker SPARQL database, and it has some extensions for searching and managing the Tracker Miner FS indexer.

Decentralisation

The headline feature is there’s one less reason to claim “Flatpak’s sandbox is a lie!”. Decentralisation brings more benefits too:

  • You can backup app data by running tracker3 export on the app’s SPARQL database. Useful for Notes, Photos and more.
  • Apps can bundle Tracker Miners inside Flatpak, allowing them to run on platforms that don’t ship a suitable version of Tracker Miners in the base OS.
  • Apps are no longer limited to the Nepomuk data model when storing data. Tracker Miner FS still uses the Nepomuk ontologies, but apps can write their own. Distributed queries work even across different data models.
  • Tracker’s test suite now sets up a private database for testing using public API, avoiding some hideous hacks.
  • Apps test suites can also set up private databases and even a private instance of the indexer. GNOME’s search and content apps have rather low test coverage at present. I suspect this is partly because the old design of Tracker made it hard to write good tests.
  • A distributed database is fundamentally a cool thing that you definitely need.

Stability

A system service, like a Victorian child, should be “seen and not heard”. Nobody wants the indexer to drain the battery, burn out the fan or lock up the desktop.

We prioritize any issue which reports the Tracker daemons have been behaving badly. In collaboration with many helpful bug reporters, we removed two codepaths in 3.0 that could trigger high CPU usage. One major change is we no longer index all plain text files, only those with an allowed extension. If you unpack Linux kernel tarballs in your Music folder, this is for you! (Remember Tracker isn’t designed to index source code). We also dropped a buggy and pointless codepath that tried and mostly failed to extract metadata from random image/* type files using GStreamer’s Discoverer API.

Tracker Extract is designed for robustness but it also needs to report errors. If extraction of foo.flac fails it may indicate a bug in Tracker, or in GStreamer, or libflac, or (more likely) the file is corrupt or mis-labelled. In Tracker Miners 3 we have improved how extraction errors are reported — instead of using the journal, we log errors to disk (at ~/.cache/tracker3/files/errors/). This prevents any ‘spamming’ of the journal when many errors are detected. You can check for errors by running tracker3 status. Perhaps Nautilus could make these errors visible in future too.

Since Tracker Miners 3.0.0 was released, distro beta testers found two issues that could cause high CPU usage. These are fixed in the 3.0.1 release. If you see any issues with Tracker Miners 3.0.1, please report them on GitLab!

Here I also want to mention Benjamin’s excellent work to improve resource management for system services. Tracker Miner FS tries to avoid heavy resource use but filesystem IO is infinitely complicated and we cannot defend against every possible situation. Strange filesystems or bugs in dependencies can cause high CPU or IO consumption. If the kernel’s scheduler is not smart, it may focus on these tasks at the expense of the important shell and app processes, leaving the desktop effectively locked. Benjamin’s work lets the kernel know to prioritize a responsive desktop above any system services like Tracker Miner FS. Mac OS X could do this since 2013.

Whatever this decade brings, it should be free from desktop lockups!

Standards

The 2.x to 3.x transition was difficult partly because Tracker missed some big pieces of the SPARQL standard. Implementing a 3.x-to-2.x translation layer was out of the question — we had no motivation to re-implement the quirks of 2.x just so apps could avoid porting to 3.x.

I don’t see another major version break in Tracker’s near future, but we are now prepared. Tracker implements almost all of the SPARQL 1.1 standard.

SPARQL is not without its drawbacks — more on that in a future article — but aside from a few simple C and DBus interfaces, all of Tracker’s functionality is accessible through this W3C standard query language. Better to reuse standards than to make our own.

…and more

We have a new website, improved documentation. The tracker3 commandline tool saw loads of cleanups and improvements. Files are automatically re-processed when the relevant tracker-extract module changes — a ten year old feature request. Debugging is nicer as a keyword-enabled TRACKER_DEBUG variable replaces the old TRACKER_VERBOSITY. Deprecated APIs and dependencies are gone, including the venerable intltool. The core and Nepomuk ontologies are slimmed down and better organised. We measure test coverage, and test coverage is higher than ever. We enabled Coverity static analysis too which has found some obscure bugs. I’m no doubt forgetting some things.

A few of these changes impact everyone, but mostly the improvements benefit power users, app developers, and ourselves as maintainers. It’s crucial that a volunteer-driven project like Tracker is easy and fun to maintain, otherwise it can only fail. I think we have paved the way for a bright future.

Come back next week to find out more about the background and future of Tracker.

Tracker 3.0: It’s Here!

This is part 1 of a series. Come back next week to find out more about the changes in Tracker 3.0.

It’s too early to say “Job done”. But we’ve passed the biggest milestone on the project we announced last year: version 3.0 of Tracker is released and the rollout has begun!

We wanted to port all the core GNOME apps in a single release, and we almost achieved this ambitious goal. Nautilus, Boxes, Music, Rygel and Totem all now use Tracker 3. Photos will require 2.x until the next release. Outside of GNOME core, some apps are ported and some are not, so we are currently in a transitional period.

The important thing is only Tracker Miner FS 3 needs to run by default. Tracker Miner FS is the filesystem indexer which allows apps to do instant search and content discovery.

Since Photos 3.38 still uses Tracker 2.x we have modified it to start Tracker Miner FS 2 along with the app. This means the filesystem index in the central Tracker 2 database is kept up-to-date while Photos is running. This will increase resource usage, but only while you are using Photos. Other apps which are not yet ported may want to use the same method while they finish porting to Tracker 3 — see Photos merge request 142 to see how it’s done.

Flatpak apps can safely use Tracker Miner FS 3 on the host, via Tracker’s new portal which guards access to your data based on the type of content. It’s up to the app developer whether they use the system Tracker Miner service, or whether they run another instance inside the sandbox. There are upsides and downsides to both approaches.

We published some guidance for distributors in this thread on discourse.gnome.org.

Gratitude

We all owe thanks to Carlos for his huge effort re-thinking and re-implementing the core of Tracker. We should also thank Red Hat for sponsoring some of this work.

I also want to thank all the maintainers who collaborated with us. Marinus and Jean were early adopters in GNOME Music and gave valuable feedback including coming to the regular meetings, along with Jens who also ported Rygel early in the cycle. Bastien dug into reviewing the tracker3 grilo plugin, and made some big improvements for building Tracker Miners inside a Flatpak. In Nautilus, Ondrej and Antonio did some heroic last minute review of my branch and together we reworked the Starred Files feature to fix some long standing issues.

The new GNOME VM images were really useful for testing and catching issues early. The chat room is very responsive and friendly, Abderrahim, Jordan and Valentin all helped me a lot to get a working VM with Tracker 3.

GNOME’s release team were also responsive and helpful, right up to the last minute freeze break request which was crucial to avoiding a “Tracker Miner FS 2 and 3 running in parallel” scenario.

Thanks also to GNOME’s translation teams for keeping up with all the string changes in the CLI tool, and to distro packagers who are now working to make Tracker 3 available
to you.

Coming soon to your distro.

It takes time for a new GNOME release to reach users, because most distros have their own testing phase.

We can use Repology to see where Tracker 3 is available. Note that some distros package it in a new tracker3 package while others update the existing tracker package.

Let’s see both:

Packaging status Packaging status

Coming up…

I have a lot more to write about following the Tracker 3.0 release. I’ll be publishing a series of blog posts over the next month. Make sure you subscribe to my blog or to Planet GNOME to see them all!

Tracker at GUADEC 2020

GNOME’s conference is online this year, for obvious reasons. I spent the last 3 month teaching online classes so hopefully I’m prepared! I’m sad that there’s no Euro-trip this year and we can’t hang out in the pub, but nice that we’re saving hundreds of plane journeys.

There will be two talks related to Tracker: Carlos and I speaking about Tracker 3 (Friday 23rd July, 16.45 UTC), and myself on how to deal with challanges of working on GNOME’s session-wide daemons (Thursday 22nd July, 16.45 UTC). There are plenty of other fascinating talks, including inevitably one scheduled the same time as ours which you should, of course, watch as a replay during the break 🙂

Self-contained Tracker 3 apps

Let’s go back one year. The plan for Tracker 3 emerged when I spoke to Carlos Garnacho at GUADEC 2019 in Thessaloniki probably over a Freddo coffee like this one…

5 people drinking coffee in Thessaloniki

We had lots of improvements we want to make, but we knew we were at the limit of what we could to Tracker while keeping compatibility with the 10+ year old API. Changing a system service isn’t easy though (hence the talk). I’m a fan of the ‘Flatpak model’ of app deployment, and one benefit is that it can allow the latest apps to run on older LTS distributions. But there’s no magic there – this only works if the system and session-wide services follow strict compatibility rules.

Anything that wants to be running as a system service in combination with any kind of sandboxing system must have a protocol that is ABI stable and backwards compatible. (From https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/1001#note_370588157)

Tracker 3.0 adds important features for apps and users, but these changes require apps to use a new D-Bus API which won’t be available on older operating systems such as Ubuntu 20.04.

We’re considering various ways around this, and one that I prototyped recently is to bundle Tracker3 inside the sandbox. The downside is that some folders will be double indexed on systems where we can’t use the host’s Tracker, but the upside is the app actually works on all systems.

I created a branch of gnome-music demoing this approach. GNOME’s CI is so cool now that you can just go to that page, click ‘View exposed artifact’, then download and install a Flatpak bundle of gnome-music using Tracker 3! If you do, please comment on the MR about whether it works for you 🙂 Next on my list is GNOME Photos, but this is more complex for various reasons.

Blocklists and Allowlists

The world needs major changes to stamp out racism, and renaming variables in code isn’t a major change. That said, the terms ‘blacklist’ and ‘whitelist’ rely on and reinforce an association of ‘black bad, white good’. I’m happy to see a trend to replace these terms including Google, Linux, the IETF, and more.

It was simple to switch Tracker 3 to use the more accurate terms ‘blocklist’ and ‘allowlist’. I also learned something about stable releases — I merged a change to the 2.3 branch, but I didn’t realise that we consider the stable branch to be in ‘string freeze’ forever. (It’s obvious in hindsight 🙂 We’ve now reverted that but a few translation teams already updated their translations, so to the Spanish, Brazilian Portuguese and Romanian translators – sorry for creating extra work for you!

Acknowledging merge requests

I’ve noticed while working on app porting that some GNOME projects are quite unresponsive to merge requests. I’ve been volunteering my time as a GNOME contributor for longer than I want to remember, but it still impacts my motivation if I send a merge request and nobody comments. Part of the fun of contributing GNOME is being part of such an huge and talented community. How many potential contributors have we lost simply by ignoring their contributions?

Video of paper aeroplanes falling to the street

This started me thinking about how to improve the situation. Being a GNOME maintainer is not easy and is in most cases unpaid, so it’s not constructive to simply complain about the situation. Better if we can mobilise people with free time to look at whatever uncommented merge requests need attention! In many cases you can give useful feedback even if you don’t know the details of the project in question – if there’s a problem then it doesn’t need a maintainer to raise it.

So my idea, which I intend to raise somewhere other than my blog when I have the time, is we could have a bot that posts to discourse.gnome.org every Friday with a list of merge requests that are over a week old and haven’t received any comments. If you’re bored on a Friday afternoon or during the weekend you’ll be able to pick a merge request from the list and give some feedback to the contributor – a simple “Thanks for the patch, it looks fine to me” is better than silence.

Let me know what you think of the idea! Can you think of a better way to make sure we have speedy responses to merge requests?

Badge: I'm presenting at GUADEC 2020

See you there!

Tracker in Summer

Lots of effort is going into Tracker at the moment. I was waiting for a convenient time to blog about it all, but there isn’t a convenient moment on a project like this, just lots of interesting tasks all blocked on different things.

kawhi-watch-14

App porting

With the API changes mostly nailed down, our focus moved to making initial Tracker 3 ports of the libraries and apps that use Tracker. This is a crucial step to prove that the new design works as we expect, and has helped us to find and fix loads of rough edges. We want to work with the maintainers of each app to finish off these ports.

If you want to help, or just follow along with the app porting, the process is being tracked in this GNOME Initiatives issue.

The biggest success story so far is GNOME Music. The maintainers Jean and Marinus are regular collaborators in #tracker and in our video meetings, and we’ve already got a (mostly) working port to Tracker 3. You can download a Flatpak build from that merge request, but note that it requires tracker-miners 3.0 installed on your host.

We’re hoping we can work around the host dependency in most cases, but I got excited and made unofficial Fedora packages of Tracker 3 which allowed me to try it out on my laptop.

We are also happy that GTK can be built against Tracker 3 already, and excited for the work in progress on Rygel. At the time of writing, the other apps with Tracker 3 work in progress Boxes, Files, Notes, Photos, Videos. Some of these use the new tracker3 Grilo plugin which we hope a Grilo maintainer will be able to review and merge soon. All help with finishing these branches and the remaining apps will be very welcome.

Release strategy

We have been putting thought into how to release Tracker 3. We need collaboration on two sides: from app maintainers who we need to volunteer their time and energy to review, test and merge the Tracker 3 changes in their apps, and from distros who we need to volunteer their time to package the new version and release it.

We have some tricky puzzles to solve, the main one being how an app might switch to Tracker 3 without breaking on Ubuntu 20.04 and other distros that are unlikely to include Tracker 3, but are likely to host the latest Flatpak apps.

We are hoping to find a path forward that satisfies everyone, again, you can follow the discussion in Initiative issue #17.

As you can see, we are volunteering a lot of our time at the moment to make sure this complicated project is a success.

Data exporting

We made it more convenient to export data from Tracker databases, with the tracker export command. It’s nice to have a quick way to see exactly what is stored there. This feature will also be crucial for exporting app data such as photo albums and starred files from the centralized Tracker 2 database.

Hardware testing with umockdev

The removable device support in Tracker goes largely untested, because you need to actually plug and unplug a real USB to exercise it. As always, for a volunteer driven project like Tracker it’s vital that testing the code is as easy as possible.

I recently discovered umockdev and decided to give it a spin. I started with the power management code because it’s super simple – on low battery notication, we stop the indexer. I’m happy with the test code but unfortunately it fails on GNOME’s CI runners with an error from umockdev:

sendmsg_one: cannot connect to client's event socket: Permission denied

I’m not sure when I’ll be motived to dig into why this fails, since the problem only reproduces on the CI runners, so if anyone has a pointer on what’s wrong then please comment on the MR.

GUADEC

Due to the COVID-19 pandemic, GUADEC will be an online event but Tracker will be covered in two talks, “Tracker: The Future is Present” on the Friday, and my talk “Move Fast and Break Things” on Thursday.

The pandemic also means I’m likely to be spending the whole summer here in Galicia which can hardly be seen as bad luck. Here’s a photo of a beautiful spot I discovered recently about 30km from where I live:

Next steps

Carlos is working on some final API tweaks before we make another Tracker 2.99 beta release, after which the API should be fully stable. The Flatpak portal is also nearly ready.

We hope to see progress with app ports. This depends more and more on when app developers can volunteer their time to collaborate with us. Progress in the next few weeks will decide whether we target GNOME 3.38 (September 2020) or GNOME 3.40 (March 2021) for switching everything over to Tracker 3.

Unlike GTK 4, I can’t show any cool screenshots. I do have some ideas about how to demonstrate the improvements over Tracker 2, however … watch this space!

As always, we are available on IRC/Matrix in #tracker and you are welcome to join our online meetings.

Tracker documentation improvements

Word Cloud of Tracker ontology documentationIt’s cool storing stuff in a database, but what if you shared the database schema so other tools can work with the data? That’s the basic idea of Linked Data which Tracker tries to follow when indexing your content.

In a closed music database, you might see a “Music” table with a “name” column. What does that mean? Is it the name of a song, an artist, an album, … ? You will have to do some digging to find out.

When Tracker indexes your music, it will create a table called nmm:MusicAlbum. What does that mean? You can click the link to find out, because the database schema is self-documenting. The abbreviation nmm:MusicAlbum expands to a URL, which clearly identifies the type of data being stored.

By formalising the database schema, we create a shared vocabulary for talking about the data. This is very powerful – have you seen GMail Highlights, where a button appears in your email inbox to checkin for a flight and such things? These are powered by the https://schema.org/ shared vocabulary. Google don’t manually add support to GMail for each airline in the world. Instead, the airlines embed a https://schema.org/FlightReservation resource in the confirmation email which GMail uses to show the information. The vocabulary is an open standard, so other email providers can use the same data and even propose improvements. Everyone wins!

Recent improvements

Tracker began 5 years before the creation of schema.org, and we use an older vocabulary from a project called Nepomuk. Tracker may now be the only user of the Nepomuk vocabularies, but to avoid a huge porting effort we have opted to keep using them for 3.0.

Inspired by schema.org documentation, I changed the formatting of Tracker’s schema documentation trying to pack the important information more densely. Compare the 2.x documentation to the 3.x documentation to see what has changed – I think it’s a lot more readable now.

We have also stopped using broken or incorrect URLs. The https://tracker.api.gnome.org/ namespace was recently set up by the incredibly efficient GNOME sysadmins and we can trust it not to disappear at random, unlike the http://tracker-project.org/ and https://www.semanticdesktop.org/ontologies/ namespaces we were using before.

One thing you will notice if you followed the nmm:MusicAlbum link above is that the contents of the documentation still requires some improvement. I hope to see incremental improvements here; if you think you can make it better, please send us a merge request !

CLI documentation

We maintain documentation for the tracker CLI tool in the form of man pages. These were a bit neglected. We now publish the man pages online making it easier to read them and harder to forget they exist.

Internally this is done using Asciidoc and xmlto, plus a small Python script to post-process the output.

User documentation

There is a well-written and quite outdated set of documentation at https://wiki.gnome.org/Projects/Tracker. It’s mostly aimed at setting up Tracker on systems where it doesn’t come ready-integrated – which is a use case we don’t really want to support. I’m a bit stuck as I don’t want to delete what is quite good content, but I also don’t want to maintain documention for things that nobody should need to do…

Documentation hosting

This is a periodic reminder that the library-web script that manages developer.gnome.org needs a major reworking, such as the one proposed here. All the Tracker documentation on http://developer.gnome.org/ is years out of date, because we switched to Meson which requires us to do extra effort on each release to post the documentation. Much kudos is awaiting the people who can resolve this.

Stay tuned

Work is proceeding nicely on Tracker 3.0 and we hope to have the first beta release ready within the next couple of weeks. At that point, there will be opportunities to help with testing app ports and making sure performance is good – I will keep you posted here!

API changes in Tracker 3.0

OLYMPUS DIGITAL CAMERA
Ifton Meadows

This article has been updated to correct a misunderstanding I had about the CONSTRAINT feature. Apps will not need to explicitly add this to their queries, it will be added implicitly by the xdg-tracker-portal process..

Lots has happened in the 2 months since my last post, most notably the global coronavirus pandemic … in Spain we’re in week 3 of quarantine lockdown already and noone knows when it is going to end.

Let’s take our mind off the pandemic and talk about Tracker 3.0. At the start of the year Carlos worked on some key API changes which are now merged. It’s a good opportunity to recap what’s really changing in the new version.

I made the developer documentation for Tracker 3.0 available online. Thanks to GitLab, this can be updated every time we merge a change in Git. The documentation a work in progress and we appreciate if you can help us to improve it.

The documentation contains a migration guide, but let’s have a broader look at some common use cases.

Tracker 3.0 is still in development and things may change! We very much welcome feedback from app developers who are going to use this API.

Browsing and searching

The big news in Tracker 3.0 is decentralization. Each app can now manage its own private database! There’s no single “Tracker store” any longer.

Tracker 3.0 will index content from the filesystem to facilitate searching and browsing, as it does now. The filesystem miner will keep this in its own database, and Flatpak apps will access this database through a portal (currently in development).

Apps access this data using a TrackerSparqlConnection just like now, but when we create the connection we need to specify that we want to connect to the filesystem miner’s database.

Here’s a Python example of listing all the music files in the user’s ~/Music directory:

from gi.repository import Tracker

conn = Tracker.SparqlConnection.bus_new(
    "org.freedesktop.Tracker3.Miner.Files", None, None)
cursor = conn.query(
    'SELECT ?url { ?r a nmm:MusicPiece ; nie:url ?url }')
print("Found music files:\n")
while cursor.next():
    print(cursor.get_string()[0][0])

Running a full text search will be similar. Here’s how you’d look for “bananas” in every file in the users ~/Documents folder:

cursor = conn.query(
    'SELECT ?url fts:snippet(?r) { '
    '    ?r a nfo:Document ; '
    '        nie:url ?url ; '
    '        fts:match "Bananas" '
'}')
print("Found document files:\n")
while cursor.next():
    print("   url: {}".format(cursor.get_string()[0][0]))
    print("   snippet: {}".format(cursor.get_string()[0][0]))

If you are running inside a Flatpak sandbox then there will be a portal between you and the org.freedesktop.Tracker3.Miner.Files database. The read-only /.flatpak-info file inside the sandbox, which is created when building the Flatpak, will declare what graphs your app can access. The xdg-tracker-portal will add that information into the SPARQL query, using a Tracker-specific syntax like this: CONSTRAINT GRAPH , and the database will enforce the constraint ensuring that your app really does only see the graphs that it’s requested access to.

Storing your own data

Tracker can be used as a data store by applications. One principle behind the design of Tracker 1.x was that by using a centralized store and a common vocabulary, different apps could easily share data. For example, when you create an album in GNOME Photos, it’s stored in the Tracker database using the standard nfo:DataContainer class. Any other app, perhaps a file manager, or a photos app from a different platform, can show and edit albums stored in this way without having to know specifics about GNOME Photos. Playlists in GNOME Music and starred files in Nautilus are also stored this way.

This approach had some downsides. Having all data in a single database creates a single point of failure. It’s hard to backup the valuable user data without backing up the search and indexing data too – but since the index can be recreated from the filesystem, it’s a waste of resources to include that in a backup. Apps were also forced to share a single database schema which was maintained in the tracker.git repository.

Tracker 3.0, each app creates a private database for storing its own data. It can use the ontology (database schema) from Tracker, or it can provide its own version. Here’s how a photos app written in Python could store photo albums:

from gi.repository import Gio, GLib, Tracker
import pathlib

def app_database_dir():
    data_dir = pathlib.Path(GLib.get_user_data_dir())
    return data_dir.joinpath('my-photos-app/db')

location = Gio.File.new_for_path(app_database_dir())
conn = Tracker.SparqlConnection.new(
    Tracker.SparqlConnectionFlags.NONE, location, None)

conn.update(
    'INSERT {  a nfo:DataContainer, nie:DataObject ; '
    '           nie:title "My Album" }',
    0, None)

Now let’s insert a photo into this album. Remember that the user’s photos are indexed by the filesystem miner. We can use the SERVICE statement to connect the filesystem miner’s database to our app’s private database, like this:

conn.update(
    'CONSTRAINT GRAPH  '
    'INSERT { '
        '   SELECT ?photo { '
        '       SERVICE <dbus:org.freedesktop.Tracker3.Miner.Files> { '
        '           ?photo nie:isStoredAs <file:///home/me/Photos/my-photo.jpg>'
        '       } '
        '   }, '
        '   ?photo nie:isPartOf  . ',
    '}',
    0, None)

Now let’s display the contents of the album:

cursor = conn.query(
    'CONSTRAINT GRAPH  '
    'SELECT ?url { '
    '    SELECT ?photo ?url { '
    '        SERVICE <dbus:org.freedesktop.Tracker3.Miner.Files> { '
    '            ?photo a nmm:Photo ; nie:isStoredAs ?url . '
    '        } '
    '    } '
    '    ?photo nie:isPartOf <album:MyAlbum> . '
    '}')
while cursor.next():
    print(cursor.get_string(0)[0])

Notice again that the app has to request permission to access the Photos graph. If our example app is running in Flatpak, this will require a special permission.

It’s still possible for one app to share data with another, but it will require coordination at the app level. Using the example of photo albums, GNOME Photos can opt to make its database available to other apps. If a different app wants to see the user’s photo albums, they’ll need to connect to the org.gnome.Photos database over D-Bus. As usual, Flatpak apps would need permission to do this.

Is it a good time to port my app to Tracker 3.0?

It’s a good time to start porting your app. You will definitely be able to help us with testing and stabilising the library and the documentation if you start now.

There are some API changes still unmerged at time of writing, primarily the Flatpak portal and the CONSTRAINT feature, also the details of how you specify which ontology to use.

Some functionality is no longer exposed in C libraries, due to the privitization of libtracker-control and libtracker-miner. As far as we know libtracker-miner is unused outside Tracker, but some apps are currenly using libtracker-control to display status updates for the Tracker daemons and trigger indexing of removable devices. We have an open issue about improving the story for on-demand removable device indexing. For status monitoring you may use the underlying DBus signals, and I’m also hoping to make these more useful.

Ideally I’d like to add a new helper library for Tracker 3.0 which would conveniently wrap the high level features that apps use. My volunteer time is limited though. I can share ideas for this if you are looking for a way to contribute!

What about a hackfest?

At some point we need to finish the Tracker 3.0 work and make sure that apps that use Tracker are all ported and working. The best case is that we do this in time for the upcoming GNOME 3.38 release. We discussed about a hackfest some point between now and GNOME 3.38 to make sure things are settled; it now may be that an in-person hackfest won’t be feasible in light of the Coronavirus pandemic but a series of online meetings would be a good alternative. We can only wait, and see!

Sculpting Tracker 3.0

the-digital-marketing-collaboration-i0WO_RzeB2Y-unsplash
Photo by The Digital Marketing Collaboration on Unsplash

We’re in the second phase of work to create version 3.0 of the Tracker desktop search engine.

Tracker’s database is now up to date with the latest SPARQL 1.1 standards, including the magical SERVICE statement that lets you combine results from multiple databases in a single query. Now we’re converting the database from a service into a library, and turning the previously monolithic architecture into something more flexible.

Carlos has already done most of this work and the code is pushed as #172 (tracker.git) and  #136 (tracker-miners.git). At times it feels like we’re carving a big block of stone into a sculpture — just look at the diffstats:

    tracker.git: +4214 -10234
    tracker-miners.git: +375 -718

Read merge request #172 for full details, but the highlights are that there’s no more tracker-store daemon, and the libtracker-sparql library which was previously only used for querying and inserting data can now be used to create and manage your own database. You can keep the database private, or you can expose it over D-Bus.

The code in tracker.git is now only about managing data. We may rename it to tracker-sparql in due course, or even to SPARQLite if this is okayed by the developers of SQLite. There’s perhaps a niche for a desktop-scale database that supports SPARQL queries, and it’s a niche that Tracker’s database fits in nicely.

All the code related to desktop indexing and search is now in tracker-miners.git. The tracker-miner-fs daemon will maintain the index in its own database, which you’ll be able to query by connecting over D-Bus just like you used to connect to tracker-store in Tracker 2.0. However, apps running inside Flatpak will not be able to talk directly to the tracker-miner-fs daemon — communication will go through a new portal that Carlos is currently working on, allowing us to implement per-app access controls to your data for the first time.

We are still pending a Tracker 2.3.2 bugfix release too! This month Victor Gal solved an issue that was causing photo geolocation metadata to be ignored. Rasmus Thomsen also added Alpine Linux to our CI, and the GNOME translation teams have been hard at work too.

If you want to help out by testing, developing, documenting Tracker – get in touch on  GNOME Discourse (use the ‘tracker’ tag) or irc.gnome.org #tracker.

Last month in Tracker

Here’s an incomplete report of some work done on Tracker during the last month!

Bugs

Jean Felder fixed a thorny issue that was causing wrong track durations for MP3s.

Rasmus Thomsen has been testing on Alpine Linux, fixing one issue and finding several more. Alpine Linux uses musl libc instead of the more common GNU libc, which triggers bugs that we don’t usually see. Finding and fixing these issues could be a great learning experience for someone who wants to dig deep into the platform!

There’s an ongoing issue reported by many Ubuntu users which seems to be due to SQLite database corruption. SQLite is rather a black box to me, so I don’t know how or when we might get to the bottom of why this corruption is happening.

Ubuntu CI

We now test each commit on Ubuntu as well as Fedora. This a nice step forwards. It’s also triggering more intermittent failures in the CI — we’ve made huge progress in the last few years on bringing the CI up from zero, but there are some latent issues like these which we need to get rid of.

Tracker 3.0

Carlos has done more architectural work in the ‘master’ branch, working towards having a generic SPARQL store in tracker.git, and all GNOME/desktop/filesystem related code in tracker-miners.git.

As part of this, the tracker CLI tool is now split between tracker.git and tracker-miners.git (MR1, MR2).

We also moved the libtracker-control and libtracker-miner libraries into tracker-miners.git, and made the libtracker-control API private. As far as I know, the libtracker-control library is only being used by GNOME Photos to manage indexing of removable devices. We want to keep track of which apps need porting to 3.0, so please let me know if this is going to affect anything else.

New website

Tracker is famous enough that it merits a real website, not just an outdated set of wiki pages. So I made a real Tracker website, aiming to collect links to relevant user and developer documentation and to have a minimal overview and FAQ section. We can build and deploy this straight from the tracker.git repo, so whereas the wiki is easily forgotten, the new website lives in the same repo as the sourcecode. The next step will be to merge this and then tidy up most of the old wiki pages

 

Into the Pyramid

November 2019 wasn’t an easy month, for various reasons, and it also rained every single day of the month. But there were some highlights!

LibreTeo

At the bus stop one day I saw a poster for a local Free Software related event called LibreTeo. Of course I went, and saw some interesting talks related to technology and culture and also a useful workshop on improving your clown skills. Actually the clown workshop was a highlight. It was a small event but very friendly, I met several local Free Software heads, and we were even invited for lunch with the volunteers who organized it.

Purr Data on Flathub

I want to do my part for increasing the amount of apps that are easy to install Linux. I asked developers to Flatpak your app today last year, and this month I took the opportunity to package Purr Data on Flathub.

Here’s a quick demo video, showing one of the PD examples which generates an ‘audible illusion’ of a tone that descends forever, known as a Shepard Tone.

As always the motivation is a selfish one. I own an Organelle synth – it’s a hackable Linux-based device that generates sound using Pure Data, and I want to be able to edit the patches!

Pure Data is a very powerful open source tool for audio programming, but it’s never had much commercial interest (unlike its proprietary sibling Max/MSP) and that’s probably why the default UI is still implemented in TCL/TK in 2019. The Purr Data fork has made a lot of progress on an alternative HTML5/JavaScript UI, so I decided this would be more suitable for a Flathub package.

I was particularly motivated by the ongoing Pipewire project which is aiming to unify pro and consumer audio APIs on Linux in a Flatpak-friendly way. Christian Schaller mentioned this recently:

There is also a plan to have a core set of ProAudio applications available as Flatpaks for Fedora Workstation 32 tested and verified to work perfectly with Pipewire.

The Purr Data app will benefit a lot from this work. It currently has to use the OSS backend inside the sandbox and doesn’t seem to successfully communicate over MIDI either — so it’s rather a “tech preview” at this stage.

The developers of Purr Data are happy about the Flatpak packaging, although they aren’t interested in sharing the maintenance effort right now. If anyone reading this would like to help me with improving and maintaining the Purr Data Flatpak, please get in touch! I expect the effort required to be minimal, but I’d like to have a bus factor > 1.

Tracker bug fixes

This month we fixed a couple of issues in Tracker which were causing system lockups for some people. It was very encouraging to see people volunteering their time to help track down the issue, both in Gitlab issue 95 and in #tracker on IRC, and everyone involved in the discussion stayed really positive even though it’s obviously quite annoying when your computer keeps freezing.

In the end there were several things that come together to cause system lockups:

  • Tracker has a ‘generic image extraction’ rule that tries to find metadata for any image/* MIME type that isn’t a .bmp, .jpg, .gif, or .png. This codepath uses the GstDiscoverer API, the same as for video and audio files, in the hope that a GStreamer plugin on the system can give us useful info about the image.
  • The GstDiscoverer instance is created with a timeout of 5 seconds. (This seems quite high — the gst-typefind utility that ships with GStreamer uses a timeout of 1 second).
  • GStreamer’s GstDiscoverer API feeds any file where the type is unknown into an MPEG decoder, which is effectively an unwanted fuzz test and can trigger periods of high CPU and memory usage.
  • 5 seconds of processing non-MPEG data with an MPEG decoder is somehow enough to cause Linux’s scheduler to lock up the entire system.

We fixed this in the stable branches by blocking certain problematic MIME types. In the next major release of Tracker we will probably remove this codepath completely as the risks seem to outweigh the benefits.

Other bits

I also did some work on a pet project of mine called Calliope, related with music recommendations and playlist generation. More on this in a separate blog post.

And I finally installed Fedora on my partner’s laptop. It was nice to see that Gnome Shell works out-of-the-box on 12 year old consumer hardware. The fan, which was spinning 100% of the time under Windows 8, is virtually silent now – I had actually thought this problem was due to dust buildup or a hardware issue, but once again the cause was actually low-quality proprietary software.

Tracker developer experience improvements

There have been lots of blog posts since I suggested we write more blog posts. Great! I’m going to write about what I’ve done this month.

I’m excited that work started on Tracker 3.0, after we talked about it at GUADEC 2019. We merged Carlos’ enourmous branch to modernize the Tracker store database. This has broken some tests in tracker-miners, and the next step will be to track down and fix these regressions.

I’ve continued looking at the developer experience of Tracker. Recently we modernized the README.md file (as several GNOME projects have done recently). I want the README to document a simple “build and test Tracker from git” workflow, and that led into work making it simpler to run Tracker from the build tree, and also a bunch of improvements to the test suite.

The design of Tracker has always meant that it’s a pain in the ass to build and test, because to do anything useful you need to have 3 different daemons running and talking to each other over D-Bus, reading and writing data in the same location, and communicating with the CLI or an app. We had a method for running Tracker from the build tree for use by automated tests, whose code was duplicated in tracker.git and tracker-miners.git, and then we had a separate script for developers to test things manually, but you still had to install Tracker to use that one. It was a bit of a mess.

The first thing I fixed was the code duplication. Now we have a Python module named trackertestutils. We install it, so we don’t need to duplicate code between tracker.git and tracker-miners.git any more. Thanks to Marco Trevisan we also install a pkgconfig file.

Then I added a ./run-uninstalled script to tracker-miners.git. The improvement in developer experience I think is huge. Now you can do this to try out the latest Tracker code:

    git clone tracker-miners.git
    cd tracker-miners && mkdir build && cd build
    meson .. && ninja
    ./run-uninstalled --wait-for-miner=Files --wait-for-miner=Extract -- tracker index --file ~/Documents
    ./run-uninstalled -- tracker search "Hello"

The script is a small wrapper around trackertestutils, which takes care of spawning a private D-Bus daemon, collecting and filtering logs, and setting up the environment so that the Tracker cache is written to `/tmp/tracker-data`. (At the time of writing, there are some bug still and ./run-installed actually still requires you to install Tracker first.)

I also improved logging for Tracker’s functional-test suite. Since a year ago we’ve been running these tests in CI, but there have been some intermittent failures, which were hard to debug because log output from the tests was very messy. When you run a private D-Bus session, all kinds of daemons spawn and dump stuff to stdout. Now we set G_MESSAGE_PREFIXED in the environment, so the test harness can separate the messages that come from Tracker processes. It’s already allowed me to track down some of these annoying intermittent failures, and to increase the default log verbosity in CI.

Another neat thing about installing trackertestutils is that downstream projects can use it too. Rishi mentioned at GUADEC that gnome-photos has a test which starts the photos app and ends up displaying the actual photo collection of the user who is running the test. Automated tests should really be isolated from the real user data. And using trackertestutils, it’s now simple to do that: here’s a proof of concept for gnome-photos.

And I made a new tune!

Blog about what you do!

Am I the first to blog from GUADEC 2019? It has been a great conference: huge respect to the organization team for volunteering significant time and energy to make it all run smoothly.

The most interesting thing at GUADEC is talking to community members old and new. I discovered that I don’t know much about what people are doing in GNOME. I discovered Antonio is doing user support / bug triage and more in Nautilus. I discovered that Bastian is posting GNOME-related questions and answers on StackOverflow. I discovered Britt is promoting us on Twitter and moderating discussions on Reddit. I discovered Felipe is starting to do direct user support for Boxes. I wouldn’t know any of this if I hadn’t been to GUADEC.

So here’s my plea — if you contribute to GNOME, please blog about it! If everyone reading this wrote just one blog post a year… I’d have a much better idea of what you’re all doing!

Don’t forget: Planet GNOME is not only for announcing cool new projects and features – it’s “a window into the world, work and lives of GNOME hackers and contributors.” Blog about anything GNOME related, and be yourself — we’re not a corporation, we’re an underground network with a global, diverse, free thinking membership and that’s our strength.

Remember that there’s much more to GNOME than software development — read this long list of skillsets that you’re probably using. Write about translations, user support, testing, documentation, packaging, outreach, foundation work, event organization, bug triage, product management, release management, design, infrastructure operations. Write about why you enjoy contributing to GNOME, write about why it’s important to you. Write about what you did yesterday, or what you did last month. Write about your friends in GNOME. Make some graphs about your project to show how much work you do. Write short posts, write them quickly. Don’t worry about minor errors — it’s a blog, not a magazine article. Don’t be scared that readers won’t be interested — we are! We’re a distributed team and we need to keep each other posted about what we’re doing. Show links, screenshots, discussions, photos, graphs, anything. Don’t write reports, write stories.

If you contribute to GNOME but don’t have a blog… please start one! Write some nice posts about what you do. Become a Foundation member if you haven’t already*, and ask to join Planet GNOME.

And even if you forget all that, remember this: positive feedback for contributions encourages more contributions. Writing a blog post, like any other form of contribution, can sometimes feel shouting into an abyss. If you read an interesting post, leave a positive comment & thank the author for taking the time to write it.

* The people using and reviewing your contributions will be happy to vouch for you, don’t worry about that!

Inspire me, Nautilus!

When I have some free time I like to be creative but sometimes I need a push of inspiration to take me in the right direction.

Interior designers and people who are about to get married like to create inspiration boards by gluing magazine cutouts to the wall.

6907272105_b47a5ca31a_b
‘Mood board for a Tuscan Style Interior’ by Design Folly on Flickr

I find a lot of inspiration online, so I want a digital equivalent. I looked for one, and I found various apps for iOS and Mac which act like digital inspiration boards, but I didn’t find anything I can use with GNOME. So I began planning an elaborate new GTK+ app, but then I remembered that I get tired of such projects before they actually become useful. In fact, there’s already a program that lets you manage a collection of images and text! It’s known as Files (Nautilus), and for me it only lacks the ability to store web links amongst the other content.

Then, I discovered that you can create .desktop files that point to web locations, the equivalent of .url files on Microsoft Windows. Would a folder full of URL links serve my needs? I think so!

Nautilus had some crufty code paths to deal with these shortcut files, which was removed in 2018. Firefox understands them directly, so if you set Firefox as the default application for the application/x-desktop file type then they work nicely: click on a shortcut and it opens in Firefox.

There is no convenient way to create these .desktop files: dragging and dropping a tab from Epiphany will create a text file containing the URL, which is tantalisingly close to what I want, but the resulting file can’t be easily opened in a browser. So, I ended up writing a simple extension that adds a ‘Create web link…’ dialog to Nautilus, accessed from the right-click menu.

Now I can use Nautilus to easily manage collections of links and I can mix in (or link to) any local content easily too. Here’s me beginning my ‘inspiration board’ for recipes …

Screenshot from 2019-03-04 22-05-13.png

<

How Tracker is tested in 2019

I became interested in the Tracker project in 2011. I was looking at media file scanning and was happy to discover an active project that was focused on the same thing. I wanted to contribute, but I found it very hard to test my changes; and since Tracker runs as a daemon I really didn’t want to introduce any crazy regressions.

In those days Tracker already had a set of tests written in Python that tested the Tracker daemons as a whole, but they were a bit unfinished and unreliable. I focused some spare-time effort on improving those. Surprisingly enough it’s taken eight years to get the point where I’m happy with how they work.

The two biggest improvements parallel changes in many other GNOME projects. Last year Tracker stopped using GNU Autotools in favour of Meson, after a long incubation period. I probably don’t need to go into detail of how much better this is for developers. Also, we set up GitLab CI to automatically run the test suite, where previously developers and maintainers were required to run the test suite manually before merging anything. Together, these changes have made it about 100000% easier to review patches for Tracker, so if you were considering contributing code to the project I can safely say that there has never been a better time!

The Tracker project is now divided into two parts, the ‘core’ (tracker.git) and the ‘miners’ (tracker-miners.git) . The core project contains the database and the application interface libraries, while the miners project contains the daemons that scan your filesystem and extract metadata from your interesting files.

Let’s look at what happens automatically when you submit a merge request on GNOME GitLab for the tracker-miners project:

  1. The .gitlab-ci.yml file specifies a Docker image to be used for running tests. The Docker images are built automatically from this project and are based on Fedora.
  2. The script in .gitlab-ci.yml clones the ‘master’ version of Tracker core.
  3. The tracker and tracker-miners projects are configured and built, using Meson. There is a special build option in tracker-miners that makes it include Tracker core as a Meson subproject, instead of building against the system-provided version. (It still depends on a few files from host at the time of writing).
  4. The script starts a private D-Bus session using dbus-run-session, sets a fixed en_US.UTF8 locale, and runs the test suite for tracker-miners using meson test.
  5. Meson runs the tests that are defined in meson.build files. It tries to run them in parallel with one test per CPU core.
  6. The libtracker-miners-common tests exercises some utility code, which is duplicated from libtracker-common in Tracker core.
  7. The libtracker-extract tests exercises libtracker-extract, which is a private library with helper code for accessing file metadata. It mainly focuses on standard metadata formats like XMP and EXIF.
  8. The functional-300-miner-basic-ops and functional-301-resource-removal tests check the operation of the tracker-miner-fs daemon, mostly by copying files in and out of a specific path and then waiting for the corresponding changes to the Tracker database to take effect.
  9. The functional-310-fts-basic test tries some full-text search operations on a text file. There are a couple of other FTS tests too.
  10. The functional/extract/* tests effectively run tracker extract on a set of real media files, and test that the expected metadata is extracted. The tests are defined by JSON files such as this one.
  11. The functional-500-writeback tests exercise the tracker-writeback daemon (which allows updating things like MP3 tags following changes in the Tracker database). These tests are not particularly thorough. The writeback feature of Tracker is not widely used, to my knowledge.
  12. Finally, the functional-600-* tests simulate the behaviour of some MeeGo phone applications. Yes, that’s how old this code is 🙂

There is plenty of room for more testing of course, but this list is very comprehensive when compared to the total lack of automated testing that the project had just a year ago!

GUADEC 2018 Videos: All Done

All the editing & uploading for the GUADEC videos is now finished. The videos were all uploaded to YouTube some time ago, and they are all now available on http://videos.guadec.org/2018 as well.

Thanks to everyone who helped with the editing: Alexis Diavatis, Bin Li, Garrett LeSage, Alexandre Franke (who also did a lot of the work of uploading to YouTube), and Hubert Figuiere (who managed to edit so many that I’m suspicious he might be some kind of robot in disguise).

edit: If you are hungry for more videos to edit, some footage from GUADEC 2002 has been unearthed. It’d be great to have some of this history from fifteen years ago up on YouTube! If you’re interested, reply to the mail or speak up in #guadec on GIMPnet and we can coordinate efforts.

GUADEC 2018 Videos: Help Wanted

At this year’s GUADEC in Almería we had a team of volunteers recording the talks in the second room. This was organized very last minute as initially the University were going to do this, but thanks to various efforts (thanks in particular to Adrien Plazas and Bin Li) we managed to record nearly all the talks. There were some issues with sound on both the Friday and Saturday, which Britt Yazel has done his best to overcome using science, and we are now ready to edit and upload the 19 talks that took place in the 2nd room.

To bring you the videos from last year we had a team of 5 volunteers from the local team who spent our whole weekend in the Codethink offices. (Although none of us had much prior video editing experience so the morning of the first day was largely spent trying out different video editors to see which had the features we needed and could run without crashing too often… and the afternoon was mostly figuring out how transitions worked in Kdenlive).

This year, we don’t have such a resource and so we are looking to distribute the editing.  If you can, please get involved so we can share the videos as soon as possible!

The list of videos and a step-by-step guide on how to edit them is available at https://wiki.gnome.org/GUADEC/2018/Video. The guide is written for people who have never done video editing before and recommends that you use Kdenlive; if you’re already familiar with a different tool then of course feel free to use that instead and just use the process as a guideline. The first video is already up, so you can also use this as a guide to follow.

If you want to know more, get in touch on the GUADEC mailing list, or the #guadec IRC channel.

42412488965_64b9afc8eb_z

Tagcloud

The way we organize content on computers hasn’t really evolved since the arrival of navigational file managers in late 1980s. We have been organizing files into directories for decades. Perhaps the biggest change anyone has managed since then is that we now call directories “folders” instead, and that we obscure the full directory tree now pointing users instead towards certain entry points such as the “Music”, “Downloads” and “Videos” folders inside their home directory.

It’s 2018 already. There must be a better way to find content than to grope around in a partially obscured tree of files and folders?

GNOME has been innovating in this area for a while, and one of the results is the Tracker search and indexing tool which creates a database of all the content it finds on the user’s computer and allows you to run arbitrary queries over it. In principle this is quite cool as you can, for example, search for all photos taken within a given time period, all songs by a specific artist, all videos above a certain resolution ordered by title, or whatever else you can think of (where the necessary metadata is available). However the caveat is for this to be at all useful you currently have to enjoy writing SPARQL queries on the commandline:  Tracker itself is a “plumbing” component, the only interface it provides is the tracker commandline tool.

There is ongoing work on content-specific user interfaces that can work with Tracker to access local content, so for photos for example you can use GNOME Photos to view and organize your whole photo collection. However, there isn’t a content-agnostic tool available that might let you view and organize all the content on your computer… other than Nautilus which is limited to files and folders.

I’m interested in organizing content using tags, which are nothing but freeform textual category labels. On the web, tags are a very common way of categorizing content. (The name hashtags is probably more widely understood than tags among web users, but hashtag has connotations to social media and sharing which don’t necessarily apply when talking about desktop content so I will call them tags here.) Despite the popularity on the web, desktop support is low: Tagspaces seems to be the only option and the free edition is very limited in what it can do. Within GNOME, we have had support for storing tags in the Tracker database for many years but I don’t know of any applications that allow viewing or editing file tags.

Around the time of GUADEC 2017 I read Alexandru’s blog post about tags in Nautilus, in which he announced that Nautilus wasn’t going to get support for organizing files using tags because it would conflict to much with the existing organization principle in Nautilus of putting files into folders. I agree with that logic there, but it leaves open a question: when will GNOME get an interface that allows me to organize files using tags?

As it happened I had a bit of free time after GUADEC 2017 was finished and I started sketching out an application designed specifically for organizing content using tags.

The result so far looks like this:

This is really just a prototype, there are lots more features I’d like to add or improve too if I get the time, but it does support the basic use case of “add tags to my files” at this point and so I’ve started a stable release branch. The app is named Tagcloud and you can get it as a Flatpak .bundle of the 0.2.1 release from here. Note that it won’t autoupdate as this isn’t a proper Flatpak repo, just a bundle file.

Tagcloud is written using Python and PyGObject, and of course GTK+. I encountered several G-I bindings issues during development which mean that Tagcloud currently requires very new versions of GLib and GTK+ but the good news is that by using the Flatpak bundle you don’t need to care about any of that. Tagcloud uses Tracker internally and I’ve been thinking a lot about how to make Tracker work better for application developers; these thoughts are quite lengthy and not really complete yet so I will save them for a separate blog post.

One of the key principles of Tagcloud is that it should recognize any type of content, so for example you can group together photos, documents and videos related to a specific project. In future I would also like to see GNOME’s content-specific applications such as Photos and Documents recognize tags; this shouldn’t require too much plumbing work since everything seems to be tending towards using Tracker as a backend, but it would of course affect the user interfaces of those apps.

I didn’t yet mentioned in this blog that a couple of months ago I quit my job at Codethink and right now I’m training to be a language teacher. So I imagine that I will have very little time available to work on Tagcloud for a while, but please do send issue reports and patches if you like to https://gitlab.com/samthursfield/tagcloud. I will be at GUADEC 2018 and hopefully we can have lots of exciting discussions about applying tags to things. And for the future … while I would like Tagcloud to become a fully fledged application, I will also be happy if it serves simply as a prototype and as a way of driving improvements in Tracker which will then benefit all of GNOME’s content apps.

2017 in review

I began this year in a hedge in Mexico City and immediately had to set off on a 2 day aeroplane trek back to Manchester to make a very tired return to work on the 3rd January. From there things calmed down somewhat and I was geared up for a fairly mundane year but in fact there have been many highlights!

The single biggest event was certainly bringing GUADEC 2017 to Manchester. I had various goals for this such as ensuring we got a GUADEC 2017, showing my colleages at Codethink that GNOME is a great community, and being in the top 10 page authors on wiki.gnome.org for the year. The run up to the event from about January to July took up many evenings and it was sometimes hard to trade it off with my work at Codethink; it was great working with Allan, Alberto, Lene and Javier though and once the conference actually arrived there was a mass positive force from all involved that made sure it went well. The strangest moment was definitely walking into Kro Bar slightly before the preregistration event was due to start to find half the GNOME community already crammed into the tiny bar area waiting for something to happen. Obviously my experience of organizing music events (where you can expect people to arrive about 2 hours after you want them somewhere) didn’t help here.

Codethink provides engineers with a travel budget a little bit of extra leave for attending conferences; obviously what with GUADEC being in Manchester I didn’t make a huge use of that this year, but I did make it to FOSDEM and also to PyConES which took place in the beautiful city of Cácares. My friend Pedro was part of the organizing team and it was great to watch him running round fighting fires all day while I relaxed and watched the talks (which were mostly all trying to explain machine learning in 30 minutes with varying degrees of success).

Stream powered carriageWork wise I spent most of my year looking at compilers and build tools, perhaps not my dream job but it’s an enjoyable area to work in because (at least in terms of build tools) the state of the art is comically bad. In 10 years we will look back at GNU Autotools in the way we look at a car that needs to be started with a hand crank, and perhaps the next generation of distro packagers will think back in wonder at how their forebears had to individually maintain dependency and configuration info in their different incompatible formats.

BuildStream is in a good state and is about to hit 1.0; it’s beginning to get battle tested in a couple of places (one of these being GNOME) which is no doubt going to be a rough ride — I already have a wide selection of performance bottlenecks to be looking at in the new year. But it’s looking already like a healthy community and I want to thanks to everyone who has already got behind the project.

It also seems to have been a great year for Meson; something that has been a long time coming but seems to be finally bringing Free Software build systems into the 21st century. Last year I ported Tracker to build with Meson, and have been doing various ongoing fixes to the new build system — we’re not yet able to fully switch to Autotools primary because of issue #2166, and also because of some Tracker test suite failures that seem to only show up with Meson that we haven’t yet dug into fully.

With GUADEC out of the way I managed to spend some time prototyping something I named Tagcloud. This is the next iteration of a concept that I’ve wanted since more or less forever, that of being able to apply arbitrary tags to different local and online resources in a nice way. On the web this is a widespread concept but for some reason the desktop world doesn’t seem to buy into it. Tracker is a key part of this puzzle, as it can deal with many types of content and can actually already handle tags if you don’t mind using the commandline so part of my work on Tagcloud has been making Tracker easy to embed as a subproject. This means I can try new stuff without messing up any session-wide Tracker setup, and it builds builds on some great work Carlos has been doing to modernize Tracker as well. I’ve been developing the app in Python, which has required me to fix issues in Tracker’s introspection bindings (and GLib’s, and GTK+’s … on the whole I find the PyGObject experience pretty good and it’s obviously been a massive effort to get this far, but at the same time these teething issues are quite demotivating.) Anyway I will post more about Tagcloud in the new year once some of the ideas are a bit further worked out; and of course it may end up going nowhere at all but it’s been nice to actually write a GTK+ app for the first time in ages, and to make use of Flatpak for the first time.

It’s also been a great year for the Flatpak project; and to be honest if it wasn’t for Flatpak I would probably have talked myself out of writing a new app before I’d even started. Previously the story for getting a new app to end users was that you must either be involved or know someone involved in a distro or two so that you can have 2+ year old versions of your app installable through a package manager; or your users have to know how to drive Git and a buildsystem from the commandline. Now I can build a flatpak bundle every time I push to master and link people straight to that. What a world! And did I mention GitLab? I don’t know how I ever lived without GitLab CI and I think that GNOME’s migration to GitLab is going to be *hugely* beneficial for the project.

Looking back it seems I’ve done more programming stuff than I thought I had; perhaps a good sign that you can achieve stuff without sacrificing too much of your spare time.

It’s also been a good year music wise, Manchester continues to have a fantastic music scene which has only got better with the addition of the Old Abbey Taphouse where I in fact spent the last 4 Saturdays in a row. Last Saturday we put on Babar Luck, I saw a great gig of his 10 years ago and have managed to keep missing him ever since but things finally worked out this time. Other highlights have been Paddy Steer, Baghdaddies and a very cold gig we did with the Rubber Duck Orchestra on the outdoor stage on a snowy December evening.

I caught a few gigs by Henge who only get better with time and who will hopefully break way out of Manchester next year. And in September I had the privilege of watching Jeffrey Lewis supported by The Burning Hell in a little wooden hut outside Lochcarron in Scotland, that was certainly a highlight despite being ill and wearing cold shoes.

Lochcarron TreehouseI didn’t actually know much of Scotland until taking the van up there this year; I was amazed that such a beautiful place has been there the whole time just waiting there 400 miles north. This expedition was originally planned to be a bike trip but ended up being a road trip, and having now seen the roads that is probably for the best. However we did manage a great bike trip around the Netherlands and Belgium, the first time I’ve done a week long bike trip and hopefully the beginning of a new tradition ! Last year I did a lot of travel to crazily distant places, its a privilege to be able to do so but one that I prefer to use sparingly so it was nice to get around closer to home this year.

All in all a pretty successful year, not straightforward at times but one with several steps in the directions I wanted to head. Let’s see what next year holds 🙂

GUADEC 2017: timeline

After the statistics perhaps you are interested in reading a timeline of GUADEC 2017! In particular you can compare it to the burn down chart from the GUADEC HowTo and see how that interacts with reality.

Of course lots of details are excised from this overview but it gives a general sense of the timings. In some follow up posts I’ll go in more detail about what I think went well and what didn’t. We also welcome your feedback on the event (if you can still remember it 🙂

Summer 2014: At some point during GUADEC 2014 I start going on about doing a Manchester edition.

August 2015: Alberto and Allan both float the idea of doing a Manchester bid with me; it seems like there’s just about enough of a team to go for it. I was already planning to be away in summer 2016 at this point so we decided to target 2017.

Alberto has a friend working at MIDAS who gives us a good start and we end up meeting with the Marketing Manchester conference bureau, the University of Manchester and Manchester Metropolitan University.

The meeting with University of Manchester was discouraging (to be honest, they seemed to be geared up only for corporate conferences rather than volunteer-driven events) but Manchester Metropolitan were much more promising.

Winter 2015: We lost touch with MMU for a few months (presumably as University started back up), but we eventually got a proper contact in the conferences department and started moving forwards with the bid.

Spring 2016: Our bid is produced, with Marketing Manchester doing most of the content and layout (as you might be able to tell). Normally I would worry to see only one GUADEEC bid on the table but, having been thinking about our bid for almost a year already I was also glad that it looked like we’d be the main option.

Summer 2016: GUADEC 2016 in Karlsruhe; Manchester is selected as the location for 2017. Much rejoicing (although I am on a 9000 mile road trip at the time).

August 2016: Talks begin with venue drawing up contracts for venue and accommodation. The venue was reasonably painless to sort out but we spent lots of time figuring out accommodation; the University townhouses required final numbers and payment 6 months in advance of the event, so we spent a lot of time looking into other options (but ended up deciding that the townhouses would be best even though we would inevitably lose a bit money on them).

September 2016: We begin holding monthly-ish meetings with myself, Alberto, Allan and Javier present. Work begins on sponsorship brochure (which complicated by needing to coordinate with GNOME.Asia and potentially LAS), talks continue with venue.

December 2016: Contracts finally signed for venue and accommodation (4 months later!), conference dates finalized. We apply for a UK bank account as an “unincorporated association”. Discussion begins about the website, we decide to hold off on announcing the dates until we have some kind of website in place.

January 2017: Basic website finished, dates announced. Lots of work on getting the registration system ready. We begin meeting each week on a Monday evening. Initial logo made by Jakub and Allan.

February 2017: Trip to FOSDEM, where we put up a few GUADEC posters. Summer still seems a long way off. Codethink sponsorship confirmed. We start thinking about keynote speakers. Javier and Lene look into social event venues, including somewhere for the 20th birthday party(with hearts already set on MOSI). The search for new Executive Director for GNOME finally comes to a close with Neil McGovern being hired, and he soon starts joining the GUADEC calls and helping out (in particular with the search for sponsors, which up til now has been nearly all Alberto’s work).

March 2017: After 4 months of bureaucracy, our bank account finally approved. After much hacking and design work, we can finally open registration and the call for papers. We have to finalize room numbers at the University already, although most rooms are still unbooked. Investigation into getting GNOME Beer brewed (which ended up going nowhere, sadly). Requests for visa invites begin to arrive.

April 2017: Lots of planning for social events, the talk days and the unconference days. PIA sponsorship confirmed. Posters being designed. Call for papers closes, voting begins and Kat starts putting together the talks schedule.

May 2017: Birthday planning with help from the engagement team (in particular Nuritzi). The University temporarily decide that we’ll have to pay staff costs of £500 per day to have the canteen open; we do a bunch of research into alternatives but then we go back to the previous agreement of having the canteen open with just a minimum spend. Planning of video recording and design. Schedule and social events planning.

June and July 2017: Continual planning and discussion of everything. More sponsors confirmed. Allan does prodigious amounts of graphic design and organizing printing. Travel sponsorship finally confirmed and lots of visa invitation requests start to arrive. Accommodation bookings continue to come in, along with an increasing amount of queries, changes and cancellations that become quite time-consuming to keep track of and respond to. Evening events being booked and finalized, including more planning of the birthday party with Nuritzi. Discussions of how to make sure the conference is inclusive to newcomers. Water bottles, cake and T-shirts ordered. Registrations keep coming in until we actually hit and go over 200 registrations. We contact volunteers and come up with a timetable.

Finally, the day before GUADEC we collect the last of the printing, bring everything to the venue and hole up in a room on the 2nd floor ready to pre-print names on badges and stuff the lanyard pouches with gift bags. We discover two major issues: firstly the ink on the badges gets completely smudged when we run it through the printer to print a name on it; and secondly the emergency telephone number that we’ve printed on the badges has actually been recycled as the SIM card was inactive for a while and now goes through to some poor unsuspecting 3rd party.

guadec-badges.jpgWe lay out all the badges to try and dry the ink out but 3 hours later the smudging is still happening. We realise that the names will just have to be drawn on with marker pens. As for the emergency telephone… if you look closely at a GUADEC 2017 badge you’ll notice that there’s a sticky label with the correct number covering up the old number on the badge. Each one of these was printed onto stickyback paper and lovingly chopped out and stuck on by hand. You’re welcome! (Nobody actually called the emergency phone during the event).

Javier pointed out that we should be at the registration event at least an hour early (it started at 18:00). I said this was nonsense because most people wouldn’t get there til later anyway. How wrong I was !!! I’m used to organizing music events where people arrive about an hour after you tell them to, but we got to Kro Bar about 17:45 and it was already full to bursting with eager GNOME contributors, many of whom of course hadn’t seen each other for months. This was not the ideal environment to try and set up a registration desk for the first time and I mostly just stood around looking at boxes feeling confused and occasionally moving things around. Thankfully Kat and Benjamin soon arrived and made registration a reality leaving me free to drink a beer and remain confused.

And the rest is history!

GUADEC 2017 by numbers

I’m finally getting around to doing a bit of a post-mortem for the 2017 edition of GUADEC that we held in Manchester this year. Let’s start with some statistics!

GUADEC 2017 had…

  • 264 registrations (up from 186 last year)
  • 209 attendees (up from 160 last year)
  • 72 people staying at the University (30 of whom had sponsorship awarded by the travel committee)
  • 7 people who were sadly unable to attend because their visa application was refused at the last minute

We put four optional questions on the registration form asking for your country of residence, your age, your gender identity and how you first heard about GUADEC. The full set of responses (anonymous, of course) is available here.

I don’t plan to do much data mining of this, but here are some interesting stats:

  • 61 attendees said they are resident in the UK, roughly 32%.
  • The most common age of attendees was 35 (the full age range was between 11 years and 65 years)
  • 14 attendees said they heard about the conference through working at Codethink

We asked for an optional, “pay as you feel” donation towards the costs of the conference at registration time and we suggested payments of £15/€15 for students, £40/€40 for hobbyists and £150/€150 for professionals.

  • 47 attendees (22%) chose to donate nothing
  • 29 attendees (13%) chose 1-15
  • 75 attendees (36%) chose 16-40
  • 51 attendees (24%) chose >40
  • 7 attendees somehow chose “NULL” (I think these were on-site registrations, which followed a different process)

Note that we told Codethink staff that they shouldn’t feel required to donate from their company-provided conference budget as Codethink was already sponsoring at Platinum level, which should account for 15 or more of the people who chose to donate nothing with their registration.

The financial side of things is tricky for me to summarize as the sponsor money and registration donations mostly went straight to the Foundation’s bank account, which I don’t have access to. The fluctionation of GBP against the US dollar makes my own budget spreadsheet even less reliable,but I estimate that we raised around $10,000 USD for the GNOME Foundation from GUADEC 2017. This is of course only possible due to the generosity of our sponsors, and through the great work that Alberto and Neil did in this area.

My van did 94 miles around Manchester during the week of GUADEC. My house is only 4 miles from the centre so this is surprisingly high!

 

BuildStream and host tools

It’s been a while since I had to build a whole operating system from source. I’ve mostly been working on compilers so far this year at Codethink in fact, but my new project is to bring up some odd target systems that aren’t supported by any mainstream distros.

We did something similar about 4 years ago using Baserock and it worked well; this time we are using the Baserock OS definitions again but with BuildStream as a build tool. I’ve not had any chance to get involved in BuildStream up til now (beyond observing it) so this will be good.

The first thing I’m getting my head around is the “no host tools” policy. The design of BuildStream is that every build is run in a sandbox that’s isolated from the host. Older Baserock tools took a similar approach too and it makes a lot of sense: it’s a lot easier to maintain build instructions if you limit the set of environments in which they can run, and you are much more likely to be able to reproduce them later or on other people’s machines.

However your sandbox is going to need a compiler and a shell environment in there if it’s going to be able to build anything, and BuildStream leaves open the question of where those come from. It’s simple to find a prebuilt toolchain at least for mainstream architectures — pretty much every Linux distro can provide one so the only question is which one to use and how to get it into BuildStream’s sandbox?

GNOME and Freedesktop base runtime and SDK

The Flatpak project has a similar need for a controlled runtime and build environment, and is producing a GNOME SDK, and a lower level Freedesktop SDK. These are at present built on top of Yocto.

Up to date versions of these are made available in an OSTree repo at http://sdk.gnome.org/repo. This makes it easy to import them into BuildStream using an ‘import’ element and the ‘ostree’ source:

kind: import
description: Import the base freedesktop SDK
config:
  source: files
  target: usr
host-arches:
  x86_64:
    sources:
      - kind: ostree
        url: gnomesdk:repo/
        track: runtime/org.freedesktop.BaseSdk/x86_64/1.4
        gpg-key: keys/gnome-sdk.gpg
        ref: 0d9d255d56b08aeaaffb1c820eef85266eb730cb5667e50681185ccf5cd7c882
  i386:
    sources:
      - kind: ostree
        url: gnomesdk:repo/
        track: runtime/org.freedesktop.BaseSdk/i386/1.4
        gpg-key: keys/gnome-sdk.gpg
        ref: 16036b747c1ec8e7fe291f5b1f667cb942f0267d08fcad962e9b7627d6cf1981

The main downside to using these is that they are pretty large — the GNOME 3.18 SDK weighs in at 1.5 GB uncompressed and around 63,000 files. Creating a hardlink tree using `ostree checkout` takes up to a minute on my (admittedly rather old) laptop. The Freedesktop SDK is smaller but still not ideal. They are also only built for a small set of architectures — I think just some x86 and ARM families at the moment.

Debian in OSTree

As part of building GNOME’s jhbuild modulesets inside BuildStream Tristan created a script to produce Debian chroots for various architectures and commit them to an OSTree repo. The GNOME components are then built on top of these base Debian images, with the idea that in future they can be tested on top of a whole variety of distros in addition to Debian to make us catch platform-specific regressions more quickly.

The script, which uses the awesome Multistrap tool to do most of the heavy lifting, lives here and pushes its results to a repo that is temporarily housed at https://gnome7.codethink.co.uk/repo/ and signed with this key.

The resulting sysroot are 2.7 GB in size with 105,320 different files. This again takes up to a minute to check out on my laptop. Like the GNOME SDK, this sysroot contains every external dependency of GNOME which adds up to a lot of stuff.

Alpine Linux Toolchain

I want a lighter weight set of host tools to put in my build sandbox. Baserock’s OS images can be built with just a C++ toolchain and a minimal shell environment, so there’s no need to start copying gigabytes of dependencies around.

Ultimately the Baserock project could build its own set of host tools, but to save faff while prototyping things I decided to try Alpine Linux, which is a minimal distribution.

Alpine Linux provide “mini root filesystem” tarballs. These can’t be used directly as they contain device nodes (so require privileges to extract) and don’t contain a toolchain.

Here’s how I produced a workable host tools sysroot. I’m using Bubblewrap (the same tool used by BuildStream to create build sandboxes) as a simple container driver to run the `apk` package tool as root without needing special host privileges. This won’t work on every OS; you can use something like Docker or plain old `chroot` instead if needed.

wget https://nl.alpinelinux.org/alpine/v3.6/releases/x86_64/alpine-minirootfs-3.6.1-x86_64.tar.gz
mkdir -p sysroot
tar -x -f alpine-minirootfs-3.6.1-x86_64.tar.gz -C sysroot --exclude=./dev

alias alpine_exec='bwrap --unshare-all --share-net --setenv PATH /usr/bin:/bin:/usr/sbin:/sbin  --bind ./sysroot / --ro-bind /etc/resolv.conf /etc/resolv.conf --uid 0 --gid 0'
alpine_exec apk update
alpine_exec apk add bash bc gcc g++ musl-dev make gawk gettext-dev gzip linux-headers perl e2fsprogs mtools

tar -z -c -f alpine-host-tools-3.6.1-x86_64.tar.gz -C sysroot .

This produces a 219MB host tools sysroot containing 11,636 files. This is not as minimal as you can go with a GNU C/C++ toolchain but it’s around the right order of magnitude and it checks out from BuildStream’s artifact store into the build directory in a matter of seconds.

We include gawk as it is needed during the GCC build (BusyBox awk is not enough), and gettext-dev is needed by GLIBC (at least, libintl.h is needed and in Alpine only gettext provides that header). Bash is needed by scripts/config from linux.git, and bc, GNU gzip, linux-headers and Perl are also needed for building Linux. The e2fsprogs and mtools are useful for creating disk images.

I’ve integrated this into my builds in a pretty lazy way for now:

kind: import
description: Import an Alpine Linux C/C++ toolchain
host-arches:
  x86_64:
    sources:
    - kind: tar
      url: file:///home/sam/src/buildstream-bootstrap/alpine-host-tools-3.6.1-x86_64.tar.gz
      base-dir: .
      ref: e01d76ef2c7e3e105778e2aa849a42d38dc3163f8c15f5b2de8f64cd5543cf29

This element is obviously not something I can share with others — I’d need to upload the tarball somewhere or set up a public OSTree repo that others could pull from, and then have the element reference that.

However, this is just the first step towards some much deeper work which will result in me needing to move beyond Alpine in any case. In future I hope that it’ll be pretty straightforward to obtain a minimal toolchain as a sysroot that can be pulled into a sandbox using OSTree. The work required to produce such a thing is simple enough to automate but it requires a server to host the binaries which then requires ongoing maintenance for security updates, so I’m not yet going to commit to doing it …