This is part 2 of a series. Part 1 is here. Come back next week to find out more about the history and future of Tracker.
So what do we have to show, after a year of focused effort and a series of disruptive change to Tracker and GNOME?
A complete redesign
However, the shortcomings of the 2000’s era design have been clear for a while.
Back in 2005, around the time your grandparents first met each other on Myspace, it seemed a great idea to aggregate all the metadata we could find into a single database. The old
tracker-store database from Tracker 2.x includes the search index created by Tracker Miner FS right next to user data stored by apps like Notes, Photos and Contacts. This was going to allow cool features like tagging people in your photo collection with their phone number and online status (before the surveillence-advertising industry showed how creepy that actually is). Ivan Frade’s decade-old talk “Semantic social desktop& mobile devices” is a great insight into the thinking of the time.
I’m going to dig into Tracker’s origins in a future article, but for now — note that “Security on graphs” is listed in Ivan’s presentation as a “to-do” item.
In a world of untrusted Flatpak apps, “to-do” isn’t good enough. Any app that uses the system search service, even any app that stores RDF data with Tracker, requests a Flatpak D-Bus permission for
org.freedesktop.Tracker1. This gives access to the entire
tracker-store database, right down to the search terms indexed from ‘Documents’ folder. Imagine your documents as a savoury snack, stored in a big monolithic building. You accidentally install and run a malicious app, represented here as a hungry seagull…
To solve this, we had to make access control more granular. It didn’t make sense to retrofit this to
tracker-store. During 2019, Carlos casually eliminated the monolithic
tracker-store altogether and in its place implemented a desktop-wide distributed database, taking Tracker from a “public-by-default” model to “private-by-default”
libtracker-sparql-3 API lets apps store SPARQL data anywhere they like. You can keep it private, if you just want a lightweight database. Nautilus and Notes are already doing this, to store starred files and note data respectively.
If and when you want to publish data, it’s done by creating a TrackerEndpoint on DBus. Using a SPARQL federated query, one Tracker SPARQL database can pull data from multiple others in a single query. This, for example, allows Photos to merge photo metadata from the search index with album metadata stored in its own database. (I wrote more about this back in March).
The search index created by Tracker Miner FS is published at
org.freedesktop.Tracker3.Miner.Files, but we don’t let Flatpak apps access this directly. A new Flatpak portal gates access to search based on content type. You can now install a music player app and let it search ONLY your music collection, where previously your options were “let the app search everything” or “break it”.
A clearer architecture
“I’m finally starting to “get” tracker 3. And it’s like an epiphany. “
If someone asked “What actually is Tracker?” I used to find it tricky to answer. We narrowed the focus down to two things: a lightweight database, and a search engine.
For the last 3 years we worked on separating these two concerns, and as of Tracker 3.0 we are done. Were we starting from search, we could find clearer names for the two parts than ‘tracker’ and ‘tracker-miners’, but we kept the repos and package names the same to avoid making the 2.x to 3.x transition harder for distributors.
The name “Tracker” refers to the overall project. You can use “GNOME Tracker” for clarity where needed. The project maintains two code repositories:
- Tracker SPARQL: a distributed database, provided as a GObject C library and implementing the full SPARQL 1.1 query standard.
- Tracker Miners: a content indexer for the desktop, providing the
Tracker Miner FS system service and its companion Tracker Extract.
tracker3 commandline tool can operate on any Tracker SPARQL database, and it has some extensions for searching and managing the Tracker Miner FS indexer.
The headline feature is there’s one less reason to claim “Flatpak’s sandbox is a lie!”. Decentralisation brings more benefits too:
- You can backup app data by running
tracker3 exporton the app’s SPARQL database. Useful for Notes, Photos and more.
- Apps can bundle Tracker Miners inside Flatpak, allowing them to run on platforms that don’t ship a suitable version of Tracker Miners in the base OS.
- Apps are no longer limited to the Nepomuk data model when storing data. Tracker Miner FS still uses the Nepomuk ontologies, but apps can write their own. Distributed queries work even across different data models.
- Tracker’s test suite now sets up a private database for testing using public API, avoiding some hideous hacks.
- Apps test suites can also set up private databases and even a private instance of the indexer. GNOME’s search and content apps have rather low test coverage at present. I suspect this is partly because the old design of Tracker made it hard to write good tests.
- A distributed database is fundamentally a cool thing that you definitely need.
A system service, like a Victorian child, should be “seen and not heard”. Nobody wants the indexer to drain the battery, burn out the fan or lock up the desktop.
We prioritize any issue which reports the Tracker daemons have been behaving badly. In collaboration with many helpful bug reporters, we removed two codepaths in 3.0 that could trigger high CPU usage. One major change is we no longer index all plain text files, only those with an allowed extension. If you unpack Linux kernel tarballs in your Music folder, this is for you! (Remember Tracker isn’t designed to index source code). We also dropped a buggy and pointless codepath that tried and mostly failed to extract metadata from random
image/* type files using GStreamer’s Discoverer API.
Tracker Extract is designed for robustness but it also needs to report errors. If extraction of
foo.flac fails it may indicate a bug in Tracker, or in GStreamer, or libflac, or (more likely) the file is corrupt or mis-labelled. In Tracker Miners 3 we have improved how extraction errors are reported — instead of using the journal, we log errors to disk (at
~/.cache/tracker3/files/errors/). This prevents any ‘spamming’ of the journal when many errors are detected. You can check for errors by running
tracker3 status. Perhaps Nautilus could make these errors visible in future too.
Since Tracker Miners 3.0.0 was released, distro beta testers found two issues that could cause high CPU usage. These are fixed in the 3.0.1 release. If you see any issues with Tracker Miners 3.0.1, please report them on GitLab!
Here I also want to mention Benjamin’s excellent work to improve resource management for system services. Tracker Miner FS tries to avoid heavy resource use but filesystem IO is infinitely complicated and we cannot defend against every possible situation. Strange filesystems or bugs in dependencies can cause high CPU or IO consumption. If the kernel’s scheduler is not smart, it may focus on these tasks at the expense of the important shell and app processes, leaving the desktop effectively locked. Benjamin’s work lets the kernel know to prioritize a responsive desktop above any system services like Tracker Miner FS. Mac OS X could do this since 2013.
Whatever this decade brings, it should be free from desktop lockups!
The 2.x to 3.x transition was difficult partly because Tracker missed some big pieces of the SPARQL standard. Implementing a 3.x-to-2.x translation layer was out of the question — we had no motivation to re-implement the quirks of 2.x just so apps could avoid porting to 3.x.
I don’t see another major version break in Tracker’s near future, but we are now prepared. Tracker implements almost all of the SPARQL 1.1 standard.
SPARQL is not without its drawbacks — more on that in a future article — but aside from a few simple C and DBus interfaces, all of Tracker’s functionality is accessible through this W3C standard query language. Better to reuse standards than to make our own.
We have a new website, improved documentation. The
tracker3 commandline tool saw loads of cleanups and improvements. Files are automatically re-processed when the relevant tracker-extract module changes — a ten year old feature request. Debugging is nicer as a keyword-enabled
TRACKER_DEBUG variable replaces the old
TRACKER_VERBOSITY. Deprecated APIs and dependencies are gone, including the venerable intltool. The core and Nepomuk ontologies are slimmed down and better organised. We measure test coverage, and test coverage is higher than ever. We enabled Coverity static analysis too which has found some obscure bugs. I’m no doubt forgetting some things.
A few of these changes impact everyone, but mostly the improvements benefit power users, app developers, and ourselves as maintainers. It’s crucial that a volunteer-driven project like Tracker is easy and fun to maintain, otherwise it can only fail. I think we have paved the way for a bright future.
Come back next week to find out more about the background and future of Tracker.