API changes in Tracker 3.0

 

This article has been updated to correct a misunderstanding I had about the CONSTRAINT feature. Apps will not need to explicitly add this to their queries, it will be added implicitly by the xdg-tracker-portal process..

Lots has happened in the 2 months since my last post, most notably the global coronavirus pandemic … in Spain we’re in week 3 of quarantine lockdown already and noone knows when it is going to end.

Let’s take our mind off the pandemic and talk about Tracker 3.0. At the start of the year Carlos worked on some key API changes which are now merged. It’s a good opportunity to recap what’s really changing in the new version.

I made the developer documentation for Tracker 3.0 available online. Thanks to GitLab, this can be updated every time we merge a change in Git. The documentation a work in progress and we appreciate if you can help us to improve it.

The documentation contains a migration guide, but let’s have a broader look at some common use cases.

Tracker 3.0 is still in development and things may change! We very much welcome feedback from app developers who are going to use this API.

Browsing and searching

The big news in Tracker 3.0 is decentralization. Each app can now manage its own private database! There’s no single “Tracker store” any longer.

Tracker 3.0 will index content from the filesystem to facilitate searching and browsing, as it does now. The filesystem miner will keep this in its own database, and Flatpak apps will access this database through a portal (currently in development).

Apps access this data using a TrackerSparqlConnection just like now, but when we create the connection we need to specify that we want to connect to the filesystem miner’s database.

Here’s a Python example of listing all the music files in the user’s ~/Music directory:

from gi.repository import Tracker

conn = Tracker.SparqlConnection.bus_new(
    "org.freedesktop.Tracker3.Miner.Files", None, None)
cursor = conn.query(
    'SELECT ?url { ?r a nmm:MusicPiece ; nie:url ?url }')
print("Found music files:\n")
while cursor.next():
    print(cursor.get_string()[0][0])

Running a full text search will be similar. Here’s how you’d look for “bananas” in every file in the users ~/Documents folder:

cursor = conn.query(
    'SELECT ?url fts:snippet(?r) { '
    '    ?r a nfo:Document ; '
    '        nie:url ?url ; '
    '        fts:match "Bananas" '
'}')
print("Found document files:\n")
while cursor.next():
    print("   url: {}".format(cursor.get_string()[0][0]))
    print("   snippet: {}".format(cursor.get_string()[0][0]))

If you are running inside a Flatpak sandbox then there will be a portal between you and the org.freedesktop.Tracker3.Miner.Files database. The read-only /.flatpak-info file inside the sandbox, which is created when building the Flatpak, will declare what graphs your app can access. The xdg-tracker-portal will add that information into the SPARQL query, using a Tracker-specific syntax like this: CONSTRAINT GRAPH , and the database will enforce the constraint ensuring that your app really does only see the graphs that it’s requested access to.

Storing your own data

Tracker can be used as a data store by applications. One principle behind the design of Tracker 1.x was that by using a centralized store and a common vocabulary, different apps could easily share data. For example, when you create an album in GNOME Photos, it’s stored in the Tracker database using the standard nfo:DataContainer class. Any other app, perhaps a file manager, or a photos app from a different platform, can show and edit albums stored in this way without having to know specifics about GNOME Photos. Playlists in GNOME Music and starred files in Nautilus are also stored this way.

This approach had some downsides. Having all data in a single database creates a single point of failure. It’s hard to backup the valuable user data without backing up the search and indexing data too – but since the index can be recreated from the filesystem, it’s a waste of resources to include that in a backup. Apps were also forced to share a single database schema which was maintained in the tracker.git repository.

Tracker 3.0, each app creates a private database for storing its own data. It can use the ontology (database schema) from Tracker, or it can provide its own version. Here’s how a photos app written in Python could store photo albums:

from gi.repository import Gio, GLib, Tracker
import pathlib

def app_database_dir():
    data_dir = pathlib.Path(GLib.get_user_data_dir())
    return data_dir.joinpath('my-photos-app/db')

location = Gio.File.new_for_path(app_database_dir())
conn = Tracker.SparqlConnection.new(
    Tracker.SparqlConnectionFlags.NONE, location, None)

conn.update(
    'INSERT {  a nfo:DataContainer, nie:DataObject ; '
    '           nie:title "My Album" }',
    0, None)

Now let’s insert a photo into this album. Remember that the user’s photos are indexed by the filesystem miner. We can use the SERVICE statement to connect the filesystem miner’s database to our app’s private database, like this:

conn.update(
    'CONSTRAINT GRAPH  '
    'INSERT { '
        '   SELECT ?photo { '
        '       SERVICE  { '
        '           ?photo nie:url  '
        '       } '
        '   }, '
        '   ?photo nie:isPartOf  . ',
    '}',
    0, None)

Now let’s display the contents of the album:

cursor = conn.query(
    'CONSTRAINT GRAPH  '
    'SELECT ?url { '
    '    SELECT ?photo ?url { '
    '        SERVICE  { '
    '            ?photo a nmm:Photo ; nie:url ?url . '
    '        } '
    '    } '
    '    ?photo nie:isPartOf . '
    '}')
while cursor.next():
    print(cursor.get_string(0)[0])

Notice again that the app has to request permission to access the Photos graph. If our example app is running in Flatpak, this will require a special permission.

It’s still possible for one app to share data with another, but it will require coordination at the app level. Using the example of photo albums, GNOME Photos can opt to make its database available to other apps. If a different app wants to see the user’s photo albums, they’ll need to connect to the org.gnome.Photos database over D-Bus. As usual, Flatpak apps would need permission to do this.

Is it a good time to port my app to Tracker 3.0?

It’s a good time to start porting your app. You will definitely be able to help us with testing and stabilising the library and the documentation if you start now.

There are some API changes still unmerged at time of writing, primarily the Flatpak portal and the CONSTRAINT feature, also the details of how you specify which ontology to use.

Some functionality is no longer exposed in C libraries, due to the privitization of libtracker-control and libtracker-miner. As far as we know libtracker-miner is unused outside Tracker, but some apps are currenly using libtracker-control to display status updates for the Tracker daemons and trigger indexing of removable devices. We have an open issue about improving the story for on-demand removable device indexing. For status monitoring you may use the underlying DBus signals, and I’m also hoping to make these more useful.

Ideally I’d like to add a new helper library for Tracker 3.0 which would conveniently wrap the high level features that apps use. My volunteer time is limited though. I can share ideas for this if you are looking for a way to contribute!

What about a hackfest?

At some point we need to finish the Tracker 3.0 work and make sure that apps that use Tracker are all ported and working. The best case is that we do this in time for the upcoming GNOME 3.38 release. We discussed about a hackfest some point between now and GNOME 3.38 to make sure things are settled; it now may be that an in-person hackfest won’t be feasible in light of the Coronavirus pandemic but a series of online meetings would be a good alternative. We can only wait, and see!

Posted in Uncategorized | 3 Comments

Sculpting Tracker 3.0

We’re in the second phase of work to create version 3.0 of the Tracker desktop search engine.

Tracker’s database is now up to date with the latest SPARQL 1.1 standards, including the magical SERVICE statement that lets you combine results from multiple databases in a single query. Now we’re converting the database from a service into a library, and turning the previously monolithic architecture into something more flexible.

Carlos has already done most of this work and the code is pushed as #172 (tracker.git) and  #136 (tracker-miners.git). At times it feels like we’re carving a big block of stone into a sculpture — just look at the diffstats:

    tracker.git: +4214 -10234
    tracker-miners.git: +375 -718

Read merge request #172 for full details, but the highlights are that there’s no more tracker-store daemon, and the libtracker-sparql library which was previously only used for querying and inserting data can now be used to create and manage your own database. You can keep the database private, or you can expose it over D-Bus.

The code in tracker.git is now only about managing data. We may rename it to tracker-sparql in due course, or even to SPARQLite if this is okayed by the developers of SQLite. There’s perhaps a niche for a desktop-scale database that supports SPARQL queries, and it’s a niche that Tracker’s database fits in nicely.

All the code related to desktop indexing and search is now in tracker-miners.git. The tracker-miner-fs daemon will maintain the index in its own database, which you’ll be able to query by connecting over D-Bus just like you used to connect to tracker-store in Tracker 2.0. However, apps running inside Flatpak will not be able to talk directly to the tracker-miner-fs daemon — communication will go through a new portal that Carlos is currently working on, allowing us to implement per-app access controls to your data for the first time.

We are still pending a Tracker 2.3.2 bugfix release too! This month Victor Gal solved an issue that was causing photo geolocation metadata to be ignored. Rasmus Thomsen also added Alpine Linux to our CI, and the GNOME translation teams have been hard at work too.

If you want to help out by testing, developing, documenting Tracker – get in touch on  GNOME Discourse (use the ‘tracker’ tag) or irc.gnome.org #tracker.

Posted in Uncategorized | 1 Comment

Last month in Tracker

Here’s an incomplete report of some work done on Tracker during the last month!

Bugs

Jean Felder fixed a thorny issue that was causing wrong track durations for MP3s.

Rasmus Thomsen has been testing on Alpine Linux, fixing one issue and finding several more. Alpine Linux uses musl libc instead of the more common GNU libc, which triggers bugs that we don’t usually see. Finding and fixing these issues could be a great learning experience for someone who wants to dig deep into the platform!

There’s an ongoing issue reported by many Ubuntu users which seems to be due to SQLite database corruption. SQLite is rather a black box to me, so I don’t know how or when we might get to the bottom of why this corruption is happening.

Ubuntu CI

We now test each commit on Ubuntu as well as Fedora. This a nice step forwards. It’s also triggering more intermittent failures in the CI — we’ve made huge progress in the last few years on bringing the CI up from zero, but there are some latent issues like these which we need to get rid of.

Tracker 3.0

Carlos has done more architectural work in the ‘master’ branch, working towards having a generic SPARQL store in tracker.git, and all GNOME/desktop/filesystem related code in tracker-miners.git.

As part of this, the tracker CLI tool is now split between tracker.git and tracker-miners.git (MR1, MR2).

We also moved the libtracker-control and libtracker-miner libraries into tracker-miners.git, and made the libtracker-control API private. As far as I know, the libtracker-control library is only being used by GNOME Photos to manage indexing of removable devices. We want to keep track of which apps need porting to 3.0, so please let me know if this is going to affect anything else.

New website

Tracker is famous enough that it merits a real website, not just an outdated set of wiki pages. So I made a real Tracker website, aiming to collect links to relevant user and developer documentation and to have a minimal overview and FAQ section. We can build and deploy this straight from the tracker.git repo, so whereas the wiki is easily forgotten, the new website lives in the same repo as the sourcecode. The next step will be to merge this and then tidy up most of the old wiki pages

 

Posted in Uncategorized | 1 Comment

Into the Pyramid

November 2019 wasn’t an easy month, for various reasons, and it also rained every single day of the month. But there were some highlights!

LibreTeo

At the bus stop one day I saw a poster for a local Free Software related event called LibreTeo. Of course I went, and saw some interesting talks related to technology and culture and also a useful workshop on improving your clown skills. Actually the clown workshop was a highlight. It was a small event but very friendly, I met several local Free Software heads, and we were even invited for lunch with the volunteers who organized it.

Purr Data on Flathub

I want to do my part for increasing the amount of apps that are easy to install Linux. I asked developers to Flatpak your app today last year, and this month I took the opportunity to package Purr Data on Flathub.

Here’s a quick demo video, showing one of the PD examples which generates an ‘audible illusion’ of a tone that descends forever, known as a Shepard Tone.

As always the motivation is a selfish one. I own an Organelle synth – it’s a hackable Linux-based device that generates sound using Pure Data, and I want to be able to edit the patches!

Pure Data is a very powerful open source tool for audio programming, but it’s never had much commercial interest (unlike its proprietary sibling Max/MSP) and that’s probably why the default UI is still implemented in TCL/TK in 2019. The Purr Data fork has made a lot of progress on an alternative HTML5/JavaScript UI, so I decided this would be more suitable for a Flathub package.

I was particularly motivated by the ongoing Pipewire project which is aiming to unify pro and consumer audio APIs on Linux in a Flatpak-friendly way. Christian Schaller mentioned this recently:

There is also a plan to have a core set of ProAudio applications available as Flatpaks for Fedora Workstation 32 tested and verified to work perfectly with Pipewire.

The Purr Data app will benefit a lot from this work. It currently has to use the OSS backend inside the sandbox and doesn’t seem to successfully communicate over MIDI either — so it’s rather a “tech preview” at this stage.

The developers of Purr Data are happy about the Flatpak packaging, although they aren’t interested in sharing the maintenance effort right now. If anyone reading this would like to help me with improving and maintaining the Purr Data Flatpak, please get in touch! I expect the effort required to be minimal, but I’d like to have a bus factor > 1.

Tracker bug fixes

This month we fixed a couple of issues in Tracker which were causing system lockups for some people. It was very encouraging to see people volunteering their time to help track down the issue, both in Gitlab issue 95 and in #tracker on IRC, and everyone involved in the discussion stayed really positive even though it’s obviously quite annoying when your computer keeps freezing.

In the end there were several things that come together to cause system lockups:

  • Tracker has a ‘generic image extraction’ rule that tries to find metadata for any image/* MIME type that isn’t a .bmp, .jpg, .gif, or .png. This codepath uses the GstDiscoverer API, the same as for video and audio files, in the hope that a GStreamer plugin on the system can give us useful info about the image.
  • The GstDiscoverer instance is created with a timeout of 5 seconds. (This seems quite high — the gst-typefind utility that ships with GStreamer uses a timeout of 1 second).
  • GStreamer’s GstDiscoverer API feeds any file where the type is unknown into an MPEG decoder, which is effectively an unwanted fuzz test and can trigger periods of high CPU and memory usage.
  • 5 seconds of processing non-MPEG data with an MPEG decoder is somehow enough to cause Linux’s scheduler to lock up the entire system.

We fixed this in the stable branches by blocking certain problematic MIME types. In the next major release of Tracker we will probably remove this codepath completely as the risks seem to outweigh the benefits.

Other bits

I also did some work on a pet project of mine called Calliope, related with music recommendations and playlist generation. More on this in a separate blog post.

And I finally installed Fedora on my partner’s laptop. It was nice to see that Gnome Shell works out-of-the-box on 12 year old consumer hardware. The fan, which was spinning 100% of the time under Windows 8, is virtually silent now – I had actually thought this problem was due to dust buildup or a hardware issue, but once again the cause was actually low-quality proprietary software.

Posted in Uncategorized | 2 Comments

What I did in October

October in Galicia has a weather surprise for every week. I like it because every time the sun appears you feel like you gotta enjoy it – there might be no more until March.

I didn’t do much work on Tracker this month, beside bug triage and a small amount of prep for the 2.3.1 stable release. The next step for Tracker 3.0 is still to fix a few regressions causing tests to fail in tracker-miners.git. Follow the Tracker 3.0 milestone for more information!

Planalyzer

In September I began teaching English classes again after the summer, and so I’ve been polishing the tool that I wrote to index old lesson plans.

It looks a little cooler than before:

Screenshot of Planalyzer app

I’m still quite happy with the hybrid GTK+/webapp approach that I’m taking. I began this way because the app really needs to be available in a browser: you can’t rely on running a custom desktop app on a classroom PC. However, for my own use running it as a webapp is inconvenient, so I added a simple GTK+/WebKit wrapper. It’s kind of experimental and a few weird things come out of it, like how clipboard selections contain some unwanted style info that WebKit injects, but it’s been pretty quick and fun to build the app this way.

I see some developers using Electron these days. In some ways it’s good: apps have strong portabilility to Linux, and are usually easy to hack on too due to being mostly JavaScript. But having multiple 150MB binary builds of Chromium dotted about my machine makes me sad. In the Planalyzer app I use WebKitGTK+, which is already part of GNOME and it works very well. It would be cool if Electron could make use of this in future 🙂

Hydra

I was always interested in making cool visuals, since I first learned about the PC demoscene back in the 1990s, but i was never very good at it. I once made a rather lame plasma demo using an algorithm i copied from somewhere else.

And then, while reading the Create Digital Music blog earlier this year, I discovered Hydra. I was immediately attracted by the simple, obvious interface: you chain JavaScript functions together and visuals appear right behind the code. You can try it here right away in your browser. I’ve been out of touch with the 3D graphics world forever, so I was impressed just to see that WebGL now exists and works.

I’ve been very much in touch with the world of audio synthesizers, so Hydra’s model of chaining together GL shaders as if it was a signal chain feels very natural to me. I still couldn’t write a fragment or a vertex shader myself, but now I don’t need to, I can skip to the creative part!

So far I’ve only made this rather basic webcam mashup but you can see a lot more Hydra examples in the @hydra_patterns Twitter account.

I also had a go at making online documentation, and added a few features that make it more suitable to non-live coding, such as loading prerecorded audio tracks and videos, and allowing you to record a .webm video of the output. I’m not sure this stuff will make it upstream, as the tool is intended for live coding use, but we’ll see. It’s been a lot of fun hacking on a project that’s so simple and yet so powerful, and hopefully you’ll see some cool music videos from me in the future!

Posted in Uncategorized | 2 Comments

Tracker developer experience improvements

There have been lots of blog posts since I suggested we write more blog posts. Great! I’m going to write about what I’ve done this month.

I’m excited that work started on Tracker 3.0, after we talked about it at GUADEC 2019. We merged Carlos’ enourmous branch to modernize the Tracker store database. This has broken some tests in tracker-miners, and the next step will be to track down and fix these regressions.

I’ve continued looking at the developer experience of Tracker. Recently we modernized the README.md file (as several GNOME projects have done recently). I want the README to document a simple “build and test Tracker from git” workflow, and that led into work making it simpler to run Tracker from the build tree, and also a bunch of improvements to the test suite.

The design of Tracker has always meant that it’s a pain in the ass to build and test, because to do anything useful you need to have 3 different daemons running and talking to each other over D-Bus, reading and writing data in the same location, and communicating with the CLI or an app. We had a method for running Tracker from the build tree for use by automated tests, whose code was duplicated in tracker.git and tracker-miners.git, and then we had a separate script for developers to test things manually, but you still had to install Tracker to use that one. It was a bit of a mess.

The first thing I fixed was the code duplication. Now we have a Python module named trackertestutils. We install it, so we don’t need to duplicate code between tracker.git and tracker-miners.git any more. Thanks to Marco Trevisan we also install a pkgconfig file.

Then I added a ./run-uninstalled script to tracker-miners.git. The improvement in developer experience I think is huge. Now you can do this to try out the latest Tracker code:

    git clone tracker-miners.git
    cd tracker-miners && mkdir build && cd build
    meson .. && ninja
    ./run-uninstalled --wait-for-miner=Files --wait-for-miner=Extract -- tracker index --file ~/Documents
    ./run-uninstalled -- tracker search "Hello"

The script is a small wrapper around trackertestutils, which takes care of spawning a private D-Bus daemon, collecting and filtering logs, and setting up the environment so that the Tracker cache is written to `/tmp/tracker-data`. (At the time of writing, there are some bug still and ./run-installed actually still requires you to install Tracker first.)

I also improved logging for Tracker’s functional-test suite. Since a year ago we’ve been running these tests in CI, but there have been some intermittent failures, which were hard to debug because log output from the tests was very messy. When you run a private D-Bus session, all kinds of daemons spawn and dump stuff to stdout. Now we set G_MESSAGE_PREFIXED in the environment, so the test harness can separate the messages that come from Tracker processes. It’s already allowed me to track down some of these annoying intermittent failures, and to increase the default log verbosity in CI.

Another neat thing about installing trackertestutils is that downstream projects can use it too. Rishi mentioned at GUADEC that gnome-photos has a test which starts the photos app and ends up displaying the actual photo collection of the user who is running the test. Automated tests should really be isolated from the real user data. And using trackertestutils, it’s now simple to do that: here’s a proof of concept for gnome-photos.

And I made a new tune!

Posted in Uncategorized | 1 Comment

Blog about what you do!

Am I the first to blog from GUADEC 2019? It has been a great conference: huge respect to the organization team for volunteering significant time and energy to make it all run smoothly.

The most interesting thing at GUADEC is talking to community members old and new. I discovered that I don’t know much about what people are doing in GNOME. I discovered Antonio is doing user support / bug triage and more in Nautilus. I discovered that Bastian is posting GNOME-related questions and answers on StackOverflow. I discovered Britt is promoting us on Twitter and moderating discussions on Reddit. I discovered Felipe is starting to do direct user support for Boxes. I wouldn’t know any of this if I hadn’t been to GUADEC.

So here’s my plea — if you contribute to GNOME, please blog about it! If everyone reading this wrote just one blog post a year… I’d have a much better idea of what you’re all doing!

Don’t forget: Planet GNOME is not only for announcing cool new projects and features – it’s “a window into the world, work and lives of GNOME hackers and contributors.” Blog about anything GNOME related, and be yourself — we’re not a corporation, we’re an underground network with a global, diverse, free thinking membership and that’s our strength.

Remember that there’s much more to GNOME than software development — read this long list of skillsets that you’re probably using. Write about translations, user support, testing, documentation, packaging, outreach, foundation work, event organization, bug triage, product management, release management, design, infrastructure operations. Write about why you enjoy contributing to GNOME, write about why it’s important to you. Write about what you did yesterday, or what you did last month. Write about your friends in GNOME. Make some graphs about your project to show how much work you do. Write short posts, write them quickly. Don’t worry about minor errors — it’s a blog, not a magazine article. Don’t be scared that readers won’t be interested — we are! We’re a distributed team and we need to keep each other posted about what we’re doing. Show links, screenshots, discussions, photos, graphs, anything. Don’t write reports, write stories.

If you contribute to GNOME but don’t have a blog… please start one! Write some nice posts about what you do. Become a Foundation member if you haven’t already*, and ask to join Planet GNOME.

And even if you forget all that, remember this: positive feedback for contributions encourages more contributions. Writing a blog post, like any other form of contribution, can sometimes feel shouting into an abyss. If you read an interesting post, leave a positive comment & thank the author for taking the time to write it.

* The people using and reviewing your contributions will be happy to vouch for you, don’t worry about that!

Posted in Uncategorized | 5 Comments