Status update 18/10/2022

The most important news this week is that my musical collaborator Vladimir Chicken just released a new song about Manchester’s most famous elephant. Released with a weird B-side about a “Baboon on the Moon”, I am not sure what he was thinking with that one.

I posted on discourse.gnome.org already about GNOME OpenQA testing, now that the tests are up to date I’m aiming to keep an eye on them for a full release cycle and see how much ongoing maintenance effort they need. Hopefully at next year’s GUADEC we’ll be able to talk about moving this beyond an “alpha” service. We’ll soon have something like GNOME Continuous back in action after “only” 6 years of downtime.

Other exciting things in this area: Abderrahim Kitouni and Jordan Petridis have updated gnome-build-meta to track exact refs in its Git history; there are some details to work out so that it still provides quick CI feedback but this was basically necessary to ensure build reproducibility. And Tristan Van Berkom already blogged about research to use Recc inside BuildStream, with the eventual goal of unlocking fast incremental builds within the reproducibility guarantees that BuildStream already provides.

There is no direct link between these projects but I think we share the common vision that Colin Walters already laid out 10 years ago when describing Continuous: GNOME contributors need to be able to develop and test system-level changes involving GNOME, using a reliable & documented process with modest hardware requirements. Many issues and bug reports go beyond a single component, and in many cases right down to the kernel. As an example, when a background indexing task causes lagging in the desktop shell, folk blame the background indexer process, but the indexer is not in control of its own scheduling and such an issue can’t be fully reproduced if we don’t control exactly which kernel is running. Hopefully when these streams of work come to fruition, these kinds of bugs will finally become “shallow”.

Outside of volunteer efforts, I’ve been working on a new client project that is essentially a complex database migration. I don’t get to do much database work at Codethink, its nice to have absolutely no legacy Makefiles to deal with for once, and its been a good opportunity to try out Nushell in a bit more depth. My research so far is mostly setting up Python scripts to run database queries and output CSV, then using Nushell to filter and sort the output. When I tried Nushell a few years ago it still lacked some important features – it didn’t even have a way to set variables at that point – now it’s prepared for anything you can throw at it and I look forward to doing more data processing with it.

I’m not yet ready to switch completely from Fish to Nushell, but … who knows? Maybe it’s coming.

Status update 21/09/22

Last week I attended OSSEU 2022 in Dublin, gave a talk about BuildStream 2.0 and the REAPI, and saw some new and old faces. Good times apart from the common cold I picked up on the way — I was glad that the event mandated face-masks for everyone so I could cover my own face without being the “odd one out”. (And so that we were safer from the 3+ COVID-19 cases reported at the event).

Being in the same room as Javier allowed some progress on our slightly “skunkworks” project to bring OpenQA testing to upstream GNOME. There was enough time to fix the big regressions that had halted testing completely since last year, one being an expired API key and the other, removal of virtio VGA support in upstream’s openqa_worker container. We prefer using the upstream container over maintaining our own fork, in the hope that our limited available time can go on maintaining tests instead, but the containers are provided on a “best effort” basis and since our tests are different to openqa.opensuse.org, regressions like this are to be expected.

I am also hoping to move the tests out of gnome-build-meta into a separate openqa-tests repo. We initially put them in gnome-build-meta because ultimately we’d like to be able to do pre-merge testing of gnome-build-meta branches, but since it takes hours to produce an ISO image from a given commit, it is painfully slow to create and update the OpenQA tests themselves. Now that Gitlab supports child pipelines, we can hopefully satisfy both use cases: one pipeline that quickly runs tests against the prebuilt “s3-image” from os.gnome.org, and a second that is triggered for a specific gnome-build-meta build pipeline and validates that.

First though, we need to update all the existing tests for the visual changes that occurred in the meantime, which are mostly due to gnome-initial-setup now using GTK4. That’s still a slow process as there are many existing needles (screenshots), and each time the tests are run, the Web UI allows updating only the first one to fail. That’s something else we’ll need to figure out before this could be called “production ready”, as any non-trivial style change to Adwaita would imply rerunning this whole update process.

All in all, for now openqa.gnome.org remains an interesting experiment. Perhaps by GUADEC next year there may be something more useful to report.

Team Codethink in the OSSEU 2022 lobby

My main fascination this month besides work has been exploring “AI” image generation. It’s amazing how quickly this technology has spread – it seems we had a big appetite for generative digital images.

I am really interested in the discussion about whether such things are “art”, because I this discussion is soon going to encompass music as well. We know that both OpenAI and Spotify are researching machine-generated music, and it’s particularly convenient for Spotify if they can continue to charge you £10 a month while progressively serving you more music that they generated in-house – and therefore reducing their royalty payments to record labels.

There are two related questions: whether AI-generated content is art, and whether something generated by an AI has the same monetary value as something a human made “by hand”. In my mind the answer is clear, but at the same time not quantifiable. Art is a form of human communication. Whether you use a neural network, a synthesizer, a microphone or a wax cylinder to produce that art is not relevant. Whether you use DALL-E 2 or a paintbrush is not relevant. Whether your art is any good depends on how it makes people feel.

I’ve been using Stable Diffusion to try and illustrate some of sound worlds from my songs, and my favourite results so far are for Don’t Go Into The Zone:

And finally, a teaser for an upcoming song release…

An elephant with a yellow map background

Calliope, slowly building steam

I wrote in December about Calliope, a small toolkit for building music recommendations. It can also be used for some automation tasks.

I added a bandcamp module which list albums in your Bandcamp collection. I sometimes buy albums and then don’t download them because maybe I forgot or I wasn’t at home when I bought it. So I want to compare my Bandcamp collection against my local music collection and check if something is missing. Here’s how I did it:

# Albums in your online collection that are missing from your local collection.

ONLINE_ALBUMS="cpe bandcamp --user ssssam collection"
LOCAL_ALBUMS="cpe tracker albums"
#LOCAL_ALBUMS="cpe beets albums"

cpe diff --scope=album <($ONLINE_ALBUMS | cpe musicbrainz resolve-ids -) <($LOCAL_ALBUMS) 


Like all things in Calliope this outputs a playlist as a JSON stream, in this case, a list of all the albums I need to download:

{
  "album": "Take Her Up To Monto",
  "bandcamp.album_id": 2723242634,
  "location": "https://roisinmurphy.bandcamp.com/album/take-her-up-to-monto",
  "creator": "Róisín Murphy",
  "bandcamp.artist_id": "423189696",
  "musicbrainz.artist_id": "4c56405d-ba8e-4283-99c3-1dc95bdd50e7",
  "musicbrainz.release_id": "0a79f6ee-1978-4a4e-878b-09dfe6eac3f5",
  "musicbrainz.release_group_id": "d94fb84a-2f38-4fbb-971d-895183744064"
}
{
  "album": "LA OLA INTERIOR Spanish Ambient & Acid Exoticism 1983-1990",
  "bandcamp.album_id": 3275122274,
  "location": "https://lesdisquesbongojoe.bandcamp.com/album/la-ola-interior-spanish-ambient-acid-exoticism-1983-1990",
  "creator": "Various Artists",
  "bandcamp.artist_id": "3856789729",
  "meta.warnings": [
    "musicbrainz: Unable to find release on musicbrainz"
  ]
}

There are some interesting complexities to this, and in 12 hours of hacking I didn’t solve them all. Firstly, Bandcamp artist and album names are not normalized. Some artist names have spurious “The”, some album names have “(EP)” or “(single)” appended, so they don’t match your tags. These details are of interest only to librarians, but how can software tell the difference?

The simplest approach is use Musicbrainz, specifically cpe musicbrainz resolve-ids. By comparing ids where possible we get mostly good results. There are many albums not on Musicbrainz, though, which for now turn up as false positives. Resolving Musicbrainz IDs is a tricky process, too — how do we distinguish Multi-Love (album) from Multi-Love (single) if we only have an album name?

If you want to try it out, great! It’s still aimed at hackers — you’ll have to install from source with Meson and probably fix some bugs along the way. Please share the fixes!

Calliope: Music recommendations for hackers

I started thinking about playlist generation software about 15 years ago. In that time, so much happened that I can’t possibly summarize it all here. I’ll just mention two things. Firstly, Spotify appeared, and proceeded to hire or buy most of the world’s music recommendation experts and make automatic playlists into a commodity. Secondly, I spent a lot of time iterating on a music tool I call Calliope.

Spotify or not?

Spotify’s discovery features can be a great way to find new music, but I’ve always felt like something was missing. The recommendations are opaque. We know broadly how they work, but there’s no way to know why it’s suggesting I listen to ska punk all day, or I try a podcast titled ‘Tu Inglés’, or play some 80’s alternative classics I’m already familiar with. It gets repetitive.

Some of the most original new music isn’t even available on Spotify. Most folk don’t release that small artists have to pay a distributor to get their music to appear on streaming services like Spotify and Apple Music, a dubious investment when the return for the artist might be a cheque for $0.10 and a little exposure. No wonder that some artists use music purchase sites like Bandcamp exclusively. Of course, this means they’ll never appear in your Discover Weekly playlist.

Algorithms decide which social media posts I see, whether I can get a credit card, and how much I would pay to insure a car. Spotify’s recommendation system is another closed system like the others. But unlike credit agencies and big social networks, the world of music has some very successful repositories of open data. I’ve been saving my listen history to Last.fm since 2006. Shouldn’t I do something with it?

Introducing Calliope

Calliope is an open source tool for hackers who want to generate playlists. Its primary goals are to be a fun side project for me and to produce interesting playlists from of my digital music collection. Recently it has begun fulfilling both of those goals so I decided it’s time to share some details.

Querying my music collection with Calliope

The project consists of a set of commandline tools which operate on playlist data. You use a shell pipeline to define the data pipeline. Your local music collection is queried from Tracker or Beets. You can mix in data from Last.fm, Musicbrainz and Spotify. You can output the results as XSPF playlists in your music player. The implementation is Python, but the commandline focus means it can interact with tools in any language that parses JSON.

The goal is not to replace Spotify here. The goal is to make recommendations open and transparent. That means you’re going to see the details of how they work. My dream would be that this becomes an educational tool to help us understand more about what “algorithms” (used in the journalistic sense) actually do.

I’m developing a series of example playlist generation scripts. I’m particularly enjoying “Music I haven’t listened to in over a year” — that one requires over a year of listen history data to be useful, of course. But even the “One hour random shuffle” playlist is fun.

A breakthrough this month was the start of a constraints-based approach for selecting songs. I found a useful model in a paper from 2006 titled “Fast Generation of Optimal Music Playlists using Local Search”, and implemented a subset using the Python simpleai library. Simple things can produce great results. I’m only scratching the surface of what’s possible with this model, using constraints on the duration property to ensure songs and playlists are a suitable length. I expect to show off some more sophisticated examples in future.

I’m not going to talk much more about it here — if it sounds interesting, read the documentation which I’ve recently been working on, clone the source code, and ask me if there’s any questions. I’m keen to hear what ideas you have.

Why I love Bandcamp

The Coronavirus quarantine would be much harder if we didn’t have great music to listen to. But making an income from live music is very difficult in a pandemic. What’s a good way to support the artists who are helping us through?

One ethical way is to buy music on Bandcamp. The idea of Bandcamp is that you browse music (and merch), and if you like something you buy a real download1. You get unlimited web streaming of everything you bought too2. Their business model is clear and upfront:

Our share is 15% on digital items, and 10% on physical goods. Payment processor fees are separate and vary depending on the size of the transaction, but for an average size purchase, amount to an additional 4-7%. The remainder, usually 80-85%, goes directly to the artist, and we pay out daily.

On Friday 1st May 2020, which is tomorrow, or today, or some point in the past, Bandcamp are waiving their 10-%15% share of sales. It’s a great time to buy some music!

Here are some recommendations taken from the recent social media challenge of posting album covers that have a big effect on your music taste, with no other context. (My social media posts are mostly of music recommendations with no context anyway, so this wasn’t much of a challenge).

Orange Whip by Honeyfeet

Widow City by The Fiery Furnaces

at Version City by Victor Rice

Unknown Mortal Orchestra by Unknown Mortal Orchestra

Sonido Amazonico by Chicha Libre

When you’ve listened to those, it’s time to dive into the enourmous list of curated recommendations (curated by real humans, not by robots). The best metal, the best hip-hop, the best contemporary Chinese post-punk, the best Theremin music of the last 100 years, etc. etc. You can also follow me if you want 🙂

In the parallel universe of unethical music services, I read that Spotify have insultingly added a virtual “tip jar”. It can’t make amends for the deeply unfair business relationship that many streaming sites have with artists.

Listen to the T-shirt:


blackdogtee

Have fun & make sure to spend your music money ethically!

1: You can even download in Ogg Vorbis format if you like.
2: In practice, you get unlimited streaming of all the music on Bandcamp. Artists can choose to put a nag screen up after a certain number of listens. Some artists would prefer the site to be more restrictive in this regard.