Calliope 10.0: creating music playlists using Tracker Miner FS

I just published version 10.0 of the open source playlist generation toolkit, Calliope. This fixes a couple of long standing issues I wanted to tackle.

SQLite Concurrency

The first of these only manifest itself as intermittent Gitlab CI failures when you submitted pull requests. Calliope uses SQLite to cache data, and a cache may be used by multiple concurrent process. SQLite has a “Write-Ahead Log” journalling mode that should excel at concurrency but somehow I kept seeing “database is locked” errors from a test that verified the cache with multiple writers. Well – make sure to explicitly *close* database connections in your Python threads.

Content resolution with Tracker Miner FS

The second issue was content resolution using Tracker Miner FS, which worked nicely but very slowly. Some background here: “content resolution” involves finding a playable URL for a piece of music, given metadata such as the artist name and track name. Calliope can resolve content against remote services such as Spotify, and can also resolve against a local music collection using the Tracker Miner FS index. The “special mix” example, which generates nightly playlists of excellent music, takes a large list of songs taken from Listenbrainz and tries to resolve each one locally, to check it’s available and get the duration. Until now this took over 30 minutes at 100% CPU.

Why so slow? The short answer is: cpe tracker resolve was not using the Tracker FTS (Full Text Search) engine. Why? Because there are some limitations in Tracker FTS that means we couldn’t use it in all cases.

About Tracker FTS

The full-text search engine in Tracker uses the SQLite FTS5 module. Any resource type marked with tracker:fullTextIndexed can be queried using a special fts:match predicate. This is how Nautilus search and the tracker3 search command work internally. Try running this command to search your music collection locally for the word “Baby”:

tracker3 sparql --dbus-service org.freedesktop.Tracker3.Miner.Files \
-q 'SELECT ?title { ?track a nmm:MusicPiece; nie:title ?title; fts:match "Baby" }'

This feature is great for desktop search, but it’s not quite right for resolving music content based on metadata.

Firstly, it is doing a substring match. So if I search for the artist named “Eagles”, it will also match “Eagles of Death Metal” and any other artist that contains the word “Eagles”.

Secondly, symbol matching is very complicated, and the current Tracker FTS code doesn’t always return the results I want. There are at least two open issues, 400 and 171 about bugs. It is tricky to get this right: is ' (Unicode +0027) the same as ʽ (Unicode +02BD)? What about ՚ (Unicode +055A, the “Armenian Apostrophe”)? This might require some time+money investment in Tracker SPARQL before we get a fully polished implementation.

My solution the meantime is as follows:

  1. Strip all words with symbols from the “artist name” and “track name” fields
  2. If one of the fields is now empty, run the slow lookup_track_by_name query which uses FILTER to do string matching against every track in the database.
  3. Otherwise, run the faster lookup_track_by_name_fts query. This uses both FTS *and* FILTER string matching. If FTS returns extra results, the FILTER query still picks the right one, but we are only doing string matching aginst the FTS results rather than the whole database.

Some unscientific profiling shows the special_mix script took 7 minutes to run last night, down from 35 minutes the night before. Success! And it’d be faster still if people can stop writing songs with punctuation marks in the titles.

Screenshot of a Special Mix playlist
Yesterday’s Special Mix.

Standalone SPARQL queries

You might think Tracker SPARQL and Tracker Miners have stood still since the Tracker 3.0 release in 2020. Not so. Carlos Garnacho has done huge amounts of work all over the two codebases bringing performance improvements, better security and better developer experience. At some point we need to do a review of all this stuff.

Anyway, precompiled queries are one area that improved, and it’s now practical to store all of an apps queries in separate files. Today most Tracker SPARQL users still use string concatenation to build queries, so the query is hidden away in Python or C code in string fragments, and can’t easily be tested or verified independently. That’s not necessary any more. In Tracker Miners we already migrated to using standalone query files (here and here). I took the opportunity to do the same in Calliope.

The advantages are clear:

  • no danger of “SPARQL injection” attacks, nor bugs caused by concatenation mistakes
  • a query is compiled to SQLite bytecode just once per process, instead of happening on every execution
  • you can check and lint the queries at build time (to do: actually write a SPARQL linter)
  • you can run and test the queries independently of the app, using tracker3 sparql --file. (Support for setting query parameters due to land Real Soon).

The only catch is some apps have logic in C or Python that affects the query functionality, which will need to be implemented in SPARQL instead. It’s usually possible but not entirely obvious. I got ChatGPT to generate ideas for how to change the SPARQL. Take the easy option! (But don’t trust anything it generates).


Next steps for Calliope

Version 10.0 is a nice milestone for the project. I have some ideas for more playlist generators but I am not sure when I’ll get more time to experiment. In fact I only got time for the work above because I was stuck on the sofa with a head-cold. In a way this is what success looks like.

One thought on “Calliope 10.0: creating music playlists using Tracker Miner FS

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.