Last month in Tracker

Here’s an incomplete report of some work done on Tracker during the last month!

Bugs

Jean Felder fixed a thorny issue that was causing wrong track durations for MP3s.

Rasmus Thomsen has been testing on Alpine Linux, fixing one issue and finding several more. Alpine Linux uses musl libc instead of the more common GNU libc, which triggers bugs that we don’t usually see. Finding and fixing these issues could be a great learning experience for someone who wants to dig deep into the platform!

There’s an ongoing issue reported by many Ubuntu users which seems to be due to SQLite database corruption. SQLite is rather a black box to me, so I don’t know how or when we might get to the bottom of why this corruption is happening.

Ubuntu CI

We now test each commit on Ubuntu as well as Fedora. This a nice step forwards. It’s also triggering more intermittent failures in the CI — we’ve made huge progress in the last few years on bringing the CI up from zero, but there are some latent issues like these which we need to get rid of.

Tracker 3.0

Carlos has done more architectural work in the ‘master’ branch, working towards having a generic SPARQL store in tracker.git, and all GNOME/desktop/filesystem related code in tracker-miners.git.

As part of this, the tracker CLI tool is now split between tracker.git and tracker-miners.git (MR1, MR2).

We also moved the libtracker-control and libtracker-miner libraries into tracker-miners.git, and made the libtracker-control API private. As far as I know, the libtracker-control library is only being used by GNOME Photos to manage indexing of removable devices. We want to keep track of which apps need porting to 3.0, so please let me know if this is going to affect anything else.

New website

Tracker is famous enough that it merits a real website, not just an outdated set of wiki pages. So I made a real Tracker website, aiming to collect links to relevant user and developer documentation and to have a minimal overview and FAQ section. We can build and deploy this straight from the tracker.git repo, so whereas the wiki is easily forgotten, the new website lives in the same repo as the sourcecode. The next step will be to merge this and then tidy up most of the old wiki pages

 

Into the Pyramid

November 2019 wasn’t an easy month, for various reasons, and it also rained every single day of the month. But there were some highlights!

LibreTeo

At the bus stop one day I saw a poster for a local Free Software related event called LibreTeo. Of course I went, and saw some interesting talks related to technology and culture and also a useful workshop on improving your clown skills. Actually the clown workshop was a highlight. It was a small event but very friendly, I met several local Free Software heads, and we were even invited for lunch with the volunteers who organized it.

Purr Data on Flathub

I want to do my part for increasing the amount of apps that are easy to install Linux. I asked developers to Flatpak your app today last year, and this month I took the opportunity to package Purr Data on Flathub.

Here’s a quick demo video, showing one of the PD examples which generates an ‘audible illusion’ of a tone that descends forever, known as a Shepard Tone.

As always the motivation is a selfish one. I own an Organelle synth – it’s a hackable Linux-based device that generates sound using Pure Data, and I want to be able to edit the patches!

Pure Data is a very powerful open source tool for audio programming, but it’s never had much commercial interest (unlike its proprietary sibling Max/MSP) and that’s probably why the default UI is still implemented in TCL/TK in 2019. The Purr Data fork has made a lot of progress on an alternative HTML5/JavaScript UI, so I decided this would be more suitable for a Flathub package.

I was particularly motivated by the ongoing Pipewire project which is aiming to unify pro and consumer audio APIs on Linux in a Flatpak-friendly way. Christian Schaller mentioned this recently:

There is also a plan to have a core set of ProAudio applications available as Flatpaks for Fedora Workstation 32 tested and verified to work perfectly with Pipewire.

The Purr Data app will benefit a lot from this work. It currently has to use the OSS backend inside the sandbox and doesn’t seem to successfully communicate over MIDI either — so it’s rather a “tech preview” at this stage.

The developers of Purr Data are happy about the Flatpak packaging, although they aren’t interested in sharing the maintenance effort right now. If anyone reading this would like to help me with improving and maintaining the Purr Data Flatpak, please get in touch! I expect the effort required to be minimal, but I’d like to have a bus factor > 1.

Tracker bug fixes

This month we fixed a couple of issues in Tracker which were causing system lockups for some people. It was very encouraging to see people volunteering their time to help track down the issue, both in Gitlab issue 95 and in #tracker on IRC, and everyone involved in the discussion stayed really positive even though it’s obviously quite annoying when your computer keeps freezing.

In the end there were several things that come together to cause system lockups:

  • Tracker has a ‘generic image extraction’ rule that tries to find metadata for any image/* MIME type that isn’t a .bmp, .jpg, .gif, or .png. This codepath uses the GstDiscoverer API, the same as for video and audio files, in the hope that a GStreamer plugin on the system can give us useful info about the image.
  • The GstDiscoverer instance is created with a timeout of 5 seconds. (This seems quite high — the gst-typefind utility that ships with GStreamer uses a timeout of 1 second).
  • GStreamer’s GstDiscoverer API feeds any file where the type is unknown into an MPEG decoder, which is effectively an unwanted fuzz test and can trigger periods of high CPU and memory usage.
  • 5 seconds of processing non-MPEG data with an MPEG decoder is somehow enough to cause Linux’s scheduler to lock up the entire system.

We fixed this in the stable branches by blocking certain problematic MIME types. In the next major release of Tracker we will probably remove this codepath completely as the risks seem to outweigh the benefits.

Other bits

I also did some work on a pet project of mine called Calliope, related with music recommendations and playlist generation. More on this in a separate blog post.

And I finally installed Fedora on my partner’s laptop. It was nice to see that Gnome Shell works out-of-the-box on 12 year old consumer hardware. The fan, which was spinning 100% of the time under Windows 8, is virtually silent now – I had actually thought this problem was due to dust buildup or a hardware issue, but once again the cause was actually low-quality proprietary software.

What I did in October

October in Galicia has a weather surprise for every week. I like it because every time the sun appears you feel like you gotta enjoy it – there might be no more until March.

I didn’t do much work on Tracker this month, beside bug triage and a small amount of prep for the 2.3.1 stable release. The next step for Tracker 3.0 is still to fix a few regressions causing tests to fail in tracker-miners.git. Follow the Tracker 3.0 milestone for more information!

Planalyzer

In September I began teaching English classes again after the summer, and so I’ve been polishing the tool that I wrote to index old lesson plans.

It looks a little cooler than before:

Screenshot of Planalyzer app

I’m still quite happy with the hybrid GTK+/webapp approach that I’m taking. I began this way because the app really needs to be available in a browser: you can’t rely on running a custom desktop app on a classroom PC. However, for my own use running it as a webapp is inconvenient, so I added a simple GTK+/WebKit wrapper. It’s kind of experimental and a few weird things come out of it, like how clipboard selections contain some unwanted style info that WebKit injects, but it’s been pretty quick and fun to build the app this way.

I see some developers using Electron these days. In some ways it’s good: apps have strong portabilility to Linux, and are usually easy to hack on too due to being mostly JavaScript. But having multiple 150MB binary builds of Chromium dotted about my machine makes me sad. In the Planalyzer app I use WebKitGTK+, which is already part of GNOME and it works very well. It would be cool if Electron could make use of this in future 🙂

Hydra

I was always interested in making cool visuals, since I first learned about the PC demoscene back in the 1990s, but i was never very good at it. I once made a rather lame plasma demo using an algorithm i copied from somewhere else.

And then, while reading the Create Digital Music blog earlier this year, I discovered Hydra. I was immediately attracted by the simple, obvious interface: you chain JavaScript functions together and visuals appear right behind the code. You can try it here right away in your browser. I’ve been out of touch with the 3D graphics world forever, so I was impressed just to see that WebGL now exists and works.

I’ve been very much in touch with the world of audio synthesizers, so Hydra’s model of chaining together GL shaders as if it was a signal chain feels very natural to me. I still couldn’t write a fragment or a vertex shader myself, but now I don’t need to, I can skip to the creative part!

So far I’ve only made this rather basic webcam mashup but you can see a lot more Hydra examples in the @hydra_patterns Twitter account.

I also had a go at making online documentation, and added a few features that make it more suitable to non-live coding, such as loading prerecorded audio tracks and videos, and allowing you to record a .webm video of the output. I’m not sure this stuff will make it upstream, as the tool is intended for live coding use, but we’ll see. It’s been a lot of fun hacking on a project that’s so simple and yet so powerful, and hopefully you’ll see some cool music videos from me in the future!

Tracker developer experience improvements

There have been lots of blog posts since I suggested we write more blog posts. Great! I’m going to write about what I’ve done this month.

I’m excited that work started on Tracker 3.0, after we talked about it at GUADEC 2019. We merged Carlos’ enourmous branch to modernize the Tracker store database. This has broken some tests in tracker-miners, and the next step will be to track down and fix these regressions.

I’ve continued looking at the developer experience of Tracker. Recently we modernized the README.md file (as several GNOME projects have done recently). I want the README to document a simple “build and test Tracker from git” workflow, and that led into work making it simpler to run Tracker from the build tree, and also a bunch of improvements to the test suite.

The design of Tracker has always meant that it’s a pain in the ass to build and test, because to do anything useful you need to have 3 different daemons running and talking to each other over D-Bus, reading and writing data in the same location, and communicating with the CLI or an app. We had a method for running Tracker from the build tree for use by automated tests, whose code was duplicated in tracker.git and tracker-miners.git, and then we had a separate script for developers to test things manually, but you still had to install Tracker to use that one. It was a bit of a mess.

The first thing I fixed was the code duplication. Now we have a Python module named trackertestutils. We install it, so we don’t need to duplicate code between tracker.git and tracker-miners.git any more. Thanks to Marco Trevisan we also install a pkgconfig file.

Then I added a ./run-uninstalled script to tracker-miners.git. The improvement in developer experience I think is huge. Now you can do this to try out the latest Tracker code:

    git clone tracker-miners.git
    cd tracker-miners && mkdir build && cd build
    meson .. && ninja
    ./run-uninstalled --wait-for-miner=Files --wait-for-miner=Extract -- tracker index --file ~/Documents
    ./run-uninstalled -- tracker search "Hello"

The script is a small wrapper around trackertestutils, which takes care of spawning a private D-Bus daemon, collecting and filtering logs, and setting up the environment so that the Tracker cache is written to `/tmp/tracker-data`. (At the time of writing, there are some bug still and ./run-installed actually still requires you to install Tracker first.)

I also improved logging for Tracker’s functional-test suite. Since a year ago we’ve been running these tests in CI, but there have been some intermittent failures, which were hard to debug because log output from the tests was very messy. When you run a private D-Bus session, all kinds of daemons spawn and dump stuff to stdout. Now we set G_MESSAGE_PREFIXED in the environment, so the test harness can separate the messages that come from Tracker processes. It’s already allowed me to track down some of these annoying intermittent failures, and to increase the default log verbosity in CI.

Another neat thing about installing trackertestutils is that downstream projects can use it too. Rishi mentioned at GUADEC that gnome-photos has a test which starts the photos app and ends up displaying the actual photo collection of the user who is running the test. Automated tests should really be isolated from the real user data. And using trackertestutils, it’s now simple to do that: here’s a proof of concept for gnome-photos.

And I made a new tune!

Blog about what you do!

Am I the first to blog from GUADEC 2019? It has been a great conference: huge respect to the organization team for volunteering significant time and energy to make it all run smoothly.

The most interesting thing at GUADEC is talking to community members old and new. I discovered that I don’t know much about what people are doing in GNOME. I discovered Antonio is doing user support / bug triage and more in Nautilus. I discovered that Bastian is posting GNOME-related questions and answers on StackOverflow. I discovered Britt is promoting us on Twitter and moderating discussions on Reddit. I discovered Felipe is starting to do direct user support for Boxes. I wouldn’t know any of this if I hadn’t been to GUADEC.

So here’s my plea — if you contribute to GNOME, please blog about it! If everyone reading this wrote just one blog post a year… I’d have a much better idea of what you’re all doing!

Don’t forget: Planet GNOME is not only for announcing cool new projects and features – it’s “a window into the world, work and lives of GNOME hackers and contributors.” Blog about anything GNOME related, and be yourself — we’re not a corporation, we’re an underground network with a global, diverse, free thinking membership and that’s our strength.

Remember that there’s much more to GNOME than software development — read this long list of skillsets that you’re probably using. Write about translations, user support, testing, documentation, packaging, outreach, foundation work, event organization, bug triage, product management, release management, design, infrastructure operations. Write about why you enjoy contributing to GNOME, write about why it’s important to you. Write about what you did yesterday, or what you did last month. Write about your friends in GNOME. Make some graphs about your project to show how much work you do. Write short posts, write them quickly. Don’t worry about minor errors — it’s a blog, not a magazine article. Don’t be scared that readers won’t be interested — we are! We’re a distributed team and we need to keep each other posted about what we’re doing. Show links, screenshots, discussions, photos, graphs, anything. Don’t write reports, write stories.

If you contribute to GNOME but don’t have a blog… please start one! Write some nice posts about what you do. Become a Foundation member if you haven’t already*, and ask to join Planet GNOME.

And even if you forget all that, remember this: positive feedback for contributions encourages more contributions. Writing a blog post, like any other form of contribution, can sometimes feel shouting into an abyss. If you read an interesting post, leave a positive comment & thank the author for taking the time to write it.

* The people using and reviewing your contributions will be happy to vouch for you, don’t worry about that!

Twitter without Infinite Scroll

I like reading stuff on twitter.com because a lot of interesting people write things there which they don’t write anywhere else.

But Twitter is designed to be addictive, and a key mechanism they use is the “infinite scroll” design. Infinite scroll has been called the Web’s slot machine because of the way it exploits our minds to make us keep reading. It’s an unethical design.

In an essay entitled “If the internet is addictive, why don’t we regulate it?”, the writer Michael Schulson says:

… infinite scroll has no clear benefit for users. It exists almost entirely to circumvent self-control.

Hopefully Twitter will one day consider the ethics of their design. Until then, I made a Firefox extension to remove the infinite scroll feature and replace it with a ‘Load older tweets’ link at the bottom of the page, like this:

example

The Firefox extension is called Twitter Without Infinite Scroll. It works by injecting some JavaScript code into the Twitter website which disconnects the ‘uiNearTheBottom’ event that would otherwise automatically fetch new data.

Quoting Michael Shulson’s article again:

Giving users a chance to pause and make a choice at the end of each discrete page or session tips the balance of power back in the individual’s direction.

So, if you are a Twitter user, enjoy your new-found power!

Tools I like for creating web apps

I used to dislike creating websites. CSS confused me and JavaScript annoyed me.

In the last year I’ve grown to like web development again! The developer tools available in the web browsers of 2019 are incredible to work with. The desktop world is catching up, but but the browser world is ahead. I rarely have to write CSS at all. JavaScript has gained a lot of the features that it was inexplicably missing.

Here’s a list of the technologies I’ve recently used and liked.

First, Bootstrap. It’s really a set of CSS classes which turn HTML into something that “works out of the box” for creating decently styled webapps. To me it feels like a well-designed widget toolkit, a kind of web counterpart to GTK. Once you know what Bootstrap looks like, you realize that everyone else is already using it. Thanks to Bootstrap, I don’t really need to understand CSS at all anymore. Once you get bored of the default theme, you can try some others.

Then, jQuery. I guess everyone in the world uses jQuery. It provides powerful methods to access JavaScript functionality that is otherwise horrible to use. One of its main features is the ability to select elements from a HTML document using CSS selectors. Normally jQuery provides a function named $, so for example you could get the text of every paragraph in a document like this: $('p').text().  Open up your browser’s inspector and try it now! Now, try to do the same thing without jQuery — you’ll need at least 6 times more code.1

After that, Riot.js. Riot is a UI library which lets you create a web page using building blocks which they call custom tags. Each custom tag is a snippet of HTML. You can attach JavaScript properties and methods to a custom tag as well, and you can refer to them in the snippet of HTML giving you a powerful client-side template framework.

There are lots of “frameworks” which provide similar functionality to Riot.js. I find frameworks a bit overwhelming, and I’m suspicious of tools like create-react-app that need to generate code for me before I can even get started. I like that Riot can run completely in the browser without any special tooling required, and that it has one, specific purpose. Riot isn’t perfect; in particular I find the documentation quite hard to understand at times, but so far I’m enjoying using it.

Finally, lunr.js. Lunr provides a powerful full-text search engine, implemented completely as JavaScript that runs in your users’ web browsers. “Isn’t that a terrible idea?” you think. For large data sets, Lunr is not at all appropriate (you might consider its larger sibling Solr). For a small webapp or prototype, Lunr can work well and can save you from having to write and run a more complex backend service.

If, like I did, you think web development is super annoying due to bad tooling, give it another try! It’s (kind of) fun now!

1. Here’s how it looks without jQuery: Array.map(document.getElementsByTagName('p'), function(e) { return e.textContent; })

The Lesson Planalyzer

I’ve now been working as a teacher for 8 months. There are a lot of things I like about the job. One thing I like is that every day brings a new deadline. That sounds bad right? It’s not: one day I prepare a class, the next day I deliver the class one or more times and I get instant feedback on it right there and then from the students. I’ve seen enough of the software industry, and the music industry, to know that such a quick feedback loop is a real privilege!

Creating a lesson plan can be a slow and sometimes frustrating process, but the more plans I write the more I can draw on things I’ve done before. I’ve planned and delivered over 175 different lessons already. It’s sometimes hard to know if I’m repeating myself or not, or if I could be reusing an activity from a past lesson, so I’ve been looking for easy ways to look back at all my old lesson plans.

Search

GNOME’s Tracker search engine provides a good starting point for searching a set of lesson plans: I can put the plans in my ~/Documents folder, open the folder in Nautilus, and then I type a term like "present perfect" into the search bar.

Screenshot of Nautilus showing search results

The results aren’t as helpful as they could be, though. I can only see a short snippet of the text in each document, when I really need to see the whole paragraph for the result to be directly useful. Also, the search returns anything where the words present and perfect appear, so we could be talking about tenses, or birthdays, or presentation skills.  I wanted a better approach.

Reading .docx files

My lesson plans have a fairly regular structure. An all-purpose search tool doesn’t know anything about my personal approach to writing lesson plans, though. I decided to try writing my own tool to extract more structured information from the documents. The plans are in .docx format1 which is remarkably easy to parse — you just need the Python ‘unzip’ and ‘xml’ modules, and some guesswork to figure out what the XML elements mean. I was surprised not to find a Python library that already did this for me, but in the end I wrote a very basic .docx helper module, and I used this to create a tool that read my existing lesson plans and dumped the data as a JSON document.

It works reliably! In a few cases I chose to update documents rather than add code to the tool to deal with formatting inconsistencies. Also, the tool currently throws away all formatting information, but I barely notice.

Web and desktop apps

From there, of course, things got out of control and I started writing a simple web application to display and search the lesson plans. Two months of sporadic effort later, and I just made a prototype release of The Lesson Planalyzer. It remains to be seen how useful it is for anyone, including me, but it’s very satisfying to have gone from an idea to a prototype application in such a short time. Here’s an ugly screenshot, which displays a couple of example lesson plans that I found online.

The user interface is HTML5, made using Bootstrap and a couple of other cool JavaScript libraries (which I might mention in a separate blog post). I’ve wrapped that up in a basic GTK application, which runs a tiny HTTP server and uses a WebKitWebView display its output. The desktop application has a couple of features that can’t be implemented inside a browser, one is the ability to open plan documents directly in LibreOffice, and also the other is a dedicated entry in the alt+tab menu.

If you’re curious, you can see the source at https://gitlab.com/samthursfield/planalyzer/. Let me know if you think it might be useful for you!

1. I need to be able to print the documents on computers which don’t have LibreOffice available, so they are all in .docx format.

Inspire me, Nautilus!

When I have some free time I like to be creative but sometimes I need a push of inspiration to take me in the right direction.

Interior designers and people who are about to get married like to create inspiration boards by gluing magazine cutouts to the wall.

6907272105_b47a5ca31a_b
‘Mood board for a Tuscan Style Interior’ by Design Folly on Flickr

I find a lot of inspiration online, so I want a digital equivalent. I looked for one, and I found various apps for iOS and Mac which act like digital inspiration boards, but I didn’t find anything I can use with GNOME. So I began planning an elaborate new GTK+ app, but then I remembered that I get tired of such projects before they actually become useful. In fact, there’s already a program that lets you manage a collection of images and text! It’s known as Files (Nautilus), and for me it only lacks the ability to store web links amongst the other content.

Then, I discovered that you can create .desktop files that point to web locations, the equivalent of .url files on Microsoft Windows. Would a folder full of URL links serve my needs? I think so!

Nautilus had some crufty code paths to deal with these shortcut files, which was removed in 2018. Firefox understands them directly, so if you set Firefox as the default application for the application/x-desktop file type then they work nicely: click on a shortcut and it opens in Firefox.

There is no convenient way to create these .desktop files: dragging and dropping a tab from Epiphany will create a text file containing the URL, which is tantalisingly close to what I want, but the resulting file can’t be easily opened in a browser. So, I ended up writing a simple extension that adds a ‘Create web link…’ dialog to Nautilus, accessed from the right-click menu.

Now I can use Nautilus to easily manage collections of links and I can mix in (or link to) any local content easily too. Here’s me beginning my ‘inspiration board’ for recipes …

Screenshot from 2019-03-04 22-05-13.png

<

Paying money for things

Sometimes it’s hard to make money from software. How do you make money from something that can be copied infinitely?

Right now there are 3 software tools that I pay for. Each one is supplied by a small company, and each one charges a monthly or annual fee. I prefer software with this business model because it creates an incentive for careful, ongoing maintenance and improvement. The alternative (pay a large fee, once) encourages a model that is more like “add many new features, sell the new version and then move onto something else”.

The 3 tools are:

  1. Feedbin, which is a tool that collects new content from many different blogs and shows them all in a single interface. This is done with a standard called RSS. The tool is a pleasure to use, and best of all, it’s “>Free Software released under a permissive license.
  2. Pinboard, a bookmarking and archival tool. The interface doesn’t spark joy and the search tool leaves a lot to be desired too. However, Pinboard carefully archives a copy of every single website that I bookmark, just at the time that I bookmark it. Since the Web is changing all the time and interesting content comes and goes, I find this very valuable. I don’t know if I’ll actually use this archive of content for much as I don’t actually enjoy writing articles particularly, but I use the existance of the archive as a way to convince myself to close browser tabs.
  3. Checkvist, a “to do list” tool that supports nesting items, filtering by tags, styling with Markdown, and keyboard-only operation. I use this not as a to-do list but as a way of categorising activities and resources that I use when teaching. To be honest, the “free” tier of this tool is generous enough that I don’t really need to pay, but I like to support the project.

Music can also be copied infinitely, and historically I’ve not been keen to buy it because I didn’t like the very shady operations of many record companies. Now I use Bandcamp, which has an incredible library of music with rich, manually curated recommendations, and a clear, sustainable business model.

What digital goods do you pay for on a regular basis?

 

How Tracker is tested in 2019

I became interested in the Tracker project in 2011. I was looking at media file scanning and was happy to discover an active project that was focused on the same thing. I wanted to contribute, but I found it very hard to test my changes; and since Tracker runs as a daemon I really didn’t want to introduce any crazy regressions.

In those days Tracker already had a set of tests written in Python that tested the Tracker daemons as a whole, but they were a bit unfinished and unreliable. I focused some spare-time effort on improving those. Surprisingly enough it’s taken eight years to get the point where I’m happy with how they work.

The two biggest improvements parallel changes in many other GNOME projects. Last year Tracker stopped using GNU Autotools in favour of Meson, after a long incubation period. I probably don’t need to go into detail of how much better this is for developers. Also, we set up GitLab CI to automatically run the test suite, where previously developers and maintainers were required to run the test suite manually before merging anything. Together, these changes have made it about 100000% easier to review patches for Tracker, so if you were considering contributing code to the project I can safely say that there has never been a better time!

The Tracker project is now divided into two parts, the ‘core’ (tracker.git) and the ‘miners’ (tracker-miners.git) . The core project contains the database and the application interface libraries, while the miners project contains the daemons that scan your filesystem and extract metadata from your interesting files.

Let’s look at what happens automatically when you submit a merge request on GNOME GitLab for the tracker-miners project:

  1. The .gitlab-ci.yml file specifies a Docker image to be used for running tests. The Docker images are built automatically from this project and are based on Fedora.
  2. The script in .gitlab-ci.yml clones the ‘master’ version of Tracker core.
  3. The tracker and tracker-miners projects are configured and built, using Meson. There is a special build option in tracker-miners that makes it include Tracker core as a Meson subproject, instead of building against the system-provided version. (It still depends on a few files from host at the time of writing).
  4. The script starts a private D-Bus session using dbus-run-session, sets a fixed en_US.UTF8 locale, and runs the test suite for tracker-miners using meson test.
  5. Meson runs the tests that are defined in meson.build files. It tries to run them in parallel with one test per CPU core.
  6. The libtracker-miners-common tests exercises some utility code, which is duplicated from libtracker-common in Tracker core.
  7. The libtracker-extract tests exercises libtracker-extract, which is a private library with helper code for accessing file metadata. It mainly focuses on standard metadata formats like XMP and EXIF.
  8. The functional-300-miner-basic-ops and functional-301-resource-removal tests check the operation of the tracker-miner-fs daemon, mostly by copying files in and out of a specific path and then waiting for the corresponding changes to the Tracker database to take effect.
  9. The functional-310-fts-basic test tries some full-text search operations on a text file. There are a couple of other FTS tests too.
  10. The functional/extract/* tests effectively run tracker extract on a set of real media files, and test that the expected metadata is extracted. The tests are defined by JSON files such as this one.
  11. The functional-500-writeback tests exercise the tracker-writeback daemon (which allows updating things like MP3 tags following changes in the Tracker database). These tests are not particularly thorough. The writeback feature of Tracker is not widely used, to my knowledge.
  12. Finally, the functional-600-* tests simulate the behaviour of some MeeGo phone applications. Yes, that’s how old this code is 🙂

There is plenty of room for more testing of course, but this list is very comprehensive when compared to the total lack of automated testing that the project had just a year ago!

Writing well

We rely on written language to develop software. I used to joke that I worked as a professional email writer rather than a computer programmer (and it wasn’t really a joke). So if you want to be a better engineer, I recommend that you focus some time on improving your written English.

I recently bought 100 Ways to Improve Your Writing by Gary Provost, which is a compact and rewarding book full of simple and widely applicable guidelines to writers. My advice is to buy a copy!

You can also find plenty of resources online. Start by improving your commit messages. Since we love to automate things, try these shell scripts that catch common writing mistakes. And every time you write a paragraph simply ask yourself: what is the purpose of this paragraph? Is it serving that purpose?

Native speakers and non-native speakers will both find useful advice in Gary Provost’s book. In the UK school system we aren’t taught this stuff particularly well. Many English-as-a-second-language courses don’t teach how to write on a “macro” level either, which is sad because there are many differences from language to language that non-natives need to be aware of. I have seen “Business English” courses that focus on clear and convincing communication, so you may want to look into one of those if you want more than just a book.

Code gets read more than it gets written, so it’s worth taking extra time so that it’s easy for future developers to read. The same is true of emails that you write to project mailing lists. If you want to make a positive change to development of your project, don’t just focus on the code — see if you can find 3 ways to improve the clarity of your writing.

GUADEC 2018 Videos: All Done

All the editing & uploading for the GUADEC videos is now finished. The videos were all uploaded to YouTube some time ago, and they are all now available on http://videos.guadec.org/2018 as well.

Thanks to everyone who helped with the editing: Alexis Diavatis, Bin Li, Garrett LeSage, Alexandre Franke (who also did a lot of the work of uploading to YouTube), and Hubert Figuiere (who managed to edit so many that I’m suspicious he might be some kind of robot in disguise).

edit: If you are hungry for more videos to edit, some footage from GUADEC 2002 has been unearthed. It’d be great to have some of this history from fifteen years ago up on YouTube! If you’re interested, reply to the mail or speak up in #guadec on GIMPnet and we can coordinate efforts.

Natural Language Processing

This month I have been thinking about good English sentence and paragraph structure. Non-native English speakers who are learning write in English will often think of what they want to say in their first language and then translate it. This generally results in a mess. The precise structure of the mess will depend on the rules of the student’s first language. The important thing is to teach the conventions of good English writing; but how?

Visualizing a problem helps to solve it. However there doesn’t seem to be a tool available today that can clearly visualize the various concerns writers have to deal with. A paragraph might contain 100 words, each of which relate to each other in some way. How do you visualize that clearly… not like this, anyway.

I did find some useful resources though. I discovered the Paramedic Method, through this blog post from helpscout.net. The Paramedic Method was devised by Richard Lanham and consists of these 6 steps:

  1. Highlight the prepositions.
  2. Highlight the “is” verb forms.
  3. Find the action. (Who is kicking whom?)
  4. Change the action into a simple active verb.
  5. Start fast—no slow windups.
  6. Read the passage out loud with emphasis and feeling.

This is good advice for anyone writing English. It’ll be particularly helpful in my classes in Spain where we need to clean up long strings of relative clauses. (For example, a sentence such as “On the way I met one of the workers from the company where I was going to do the interview that my friend got for me”. I would rewrite this as: “On the way I met a person from Company X, where my friend had recently got me an interview.”

I found a tool called Write Music which I like a lot. The idea is simple: to illustrate and visualize the rule that varying sentence length is important when writing. The creator of the tool, Titus Wormer, seems to be doing some interesting and well documented research.

I looked at a variety of open source tools for natural language processing. These provide good ways to tokenize a text and to identify the “part of speech” (noun, verb, adjective, adverb, etc.) but I didn’t yet find one that could analyze the types of clauses that are used. Which is a shame. My understanding of this is an area of English grammar is still quite weak and I was hoping my laptop might be able teach me by example but it seems not.

I found some surprisingly polished libraries that I’m keen to use for … something. One day I’ll know what. The compromise library for JavaScript can do all kinds of parsing and wordplay and is refreshingly honest about its limitations, and spaCy for Python also looks exciting. People like to interact with a computer through text. We hide the UNIX commandline. But one of the most popular user interfaces in the world is the Google search engine, which is a text box that accepts any kind of natural language and gives the impression of understanding it. In many cases this works brilliantly — I check spellings and convert measurements all the time using this “search engine” interface. Did you realize GNOME Shell can also do unit conversions? Try typing “50lb in kg” into the GNOME Shell search box and look at the result. Very useful! More apps should do helpful things like this.

I found some other amazing natural language technologies too. Inform 7 continues to blow my mind whenever I look at it. Commercial services like IBM Watson can promise incredible things like analysing the sentiments and emotions expressed in a text, and even the relationships expressed between the subjects and objects. It’s been an interesting day of research!

GUADEC 2018 Videos: Help Wanted

At this year’s GUADEC in Almería we had a team of volunteers recording the talks in the second room. This was organized very last minute as initially the University were going to do this, but thanks to various efforts (thanks in particular to Adrien Plazas and Bin Li) we managed to record nearly all the talks. There were some issues with sound on both the Friday and Saturday, which Britt Yazel has done his best to overcome using science, and we are now ready to edit and upload the 19 talks that took place in the 2nd room.

To bring you the videos from last year we had a team of 5 volunteers from the local team who spent our whole weekend in the Codethink offices. (Although none of us had much prior video editing experience so the morning of the first day was largely spent trying out different video editors to see which had the features we needed and could run without crashing too often… and the afternoon was mostly figuring out how transitions worked in Kdenlive).

This year, we don’t have such a resource and so we are looking to distribute the editing.  If you can, please get involved so we can share the videos as soon as possible!

The list of videos and a step-by-step guide on how to edit them is available at https://wiki.gnome.org/GUADEC/2018/Video. The guide is written for people who have never done video editing before and recommends that you use Kdenlive; if you’re already familiar with a different tool then of course feel free to use that instead and just use the process as a guideline. The first video is already up, so you can also use this as a guide to follow.

If you want to know more, get in touch on the GUADEC mailing list, or the #guadec IRC channel.

42412488965_64b9afc8eb_z

Tagcloud

The way we organize content on computers hasn’t really evolved since the arrival of navigational file managers in late 1980s. We have been organizing files into directories for decades. Perhaps the biggest change anyone has managed since then is that we now call directories “folders” instead, and that we obscure the full directory tree now pointing users instead towards certain entry points such as the “Music”, “Downloads” and “Videos” folders inside their home directory.

It’s 2018 already. There must be a better way to find content than to grope around in a partially obscured tree of files and folders?

GNOME has been innovating in this area for a while, and one of the results is the Tracker search and indexing tool which creates a database of all the content it finds on the user’s computer and allows you to run arbitrary queries over it. In principle this is quite cool as you can, for example, search for all photos taken within a given time period, all songs by a specific artist, all videos above a certain resolution ordered by title, or whatever else you can think of (where the necessary metadata is available). However the caveat is for this to be at all useful you currently have to enjoy writing SPARQL queries on the commandline:  Tracker itself is a “plumbing” component, the only interface it provides is the tracker commandline tool.

There is ongoing work on content-specific user interfaces that can work with Tracker to access local content, so for photos for example you can use GNOME Photos to view and organize your whole photo collection. However, there isn’t a content-agnostic tool available that might let you view and organize all the content on your computer… other than Nautilus which is limited to files and folders.

I’m interested in organizing content using tags, which are nothing but freeform textual category labels. On the web, tags are a very common way of categorizing content. (The name hashtags is probably more widely understood than tags among web users, but hashtag has connotations to social media and sharing which don’t necessarily apply when talking about desktop content so I will call them tags here.) Despite the popularity on the web, desktop support is low: Tagspaces seems to be the only option and the free edition is very limited in what it can do. Within GNOME, we have had support for storing tags in the Tracker database for many years but I don’t know of any applications that allow viewing or editing file tags.

Around the time of GUADEC 2017 I read Alexandru’s blog post about tags in Nautilus, in which he announced that Nautilus wasn’t going to get support for organizing files using tags because it would conflict to much with the existing organization principle in Nautilus of putting files into folders. I agree with that logic there, but it leaves open a question: when will GNOME get an interface that allows me to organize files using tags?

As it happened I had a bit of free time after GUADEC 2017 was finished and I started sketching out an application designed specifically for organizing content using tags.

The result so far looks like this:

This is really just a prototype, there are lots more features I’d like to add or improve too if I get the time, but it does support the basic use case of “add tags to my files” at this point and so I’ve started a stable release branch. The app is named Tagcloud and you can get it as a Flatpak .bundle of the 0.2.1 release from here. Note that it won’t autoupdate as this isn’t a proper Flatpak repo, just a bundle file.

Tagcloud is written using Python and PyGObject, and of course GTK+. I encountered several G-I bindings issues during development which mean that Tagcloud currently requires very new versions of GLib and GTK+ but the good news is that by using the Flatpak bundle you don’t need to care about any of that. Tagcloud uses Tracker internally and I’ve been thinking a lot about how to make Tracker work better for application developers; these thoughts are quite lengthy and not really complete yet so I will save them for a separate blog post.

One of the key principles of Tagcloud is that it should recognize any type of content, so for example you can group together photos, documents and videos related to a specific project. In future I would also like to see GNOME’s content-specific applications such as Photos and Documents recognize tags; this shouldn’t require too much plumbing work since everything seems to be tending towards using Tracker as a backend, but it would of course affect the user interfaces of those apps.

I didn’t yet mentioned in this blog that a couple of months ago I quit my job at Codethink and right now I’m training to be a language teacher. So I imagine that I will have very little time available to work on Tagcloud for a while, but please do send issue reports and patches if you like to https://gitlab.com/samthursfield/tagcloud. I will be at GUADEC 2018 and hopefully we can have lots of exciting discussions about applying tags to things. And for the future … while I would like Tagcloud to become a fully fledged application, I will also be happy if it serves simply as a prototype and as a way of driving improvements in Tracker which will then benefit all of GNOME’s content apps.

How BuildStream uses OSTree

Note: In version 1.2, BuildStream stopped using OSTree to cache artifacts. It now uses a generic “Content Addressable Storage” system, implemented internally but designed to be compatible with Bazel and any other tool which supports the Remote Execution API. I’ve updated this article accordingly.

I’ve been asked a few times about the relationship between BuildStream and OSTree. The answer is a bit complicated so I decided to answer the question here.

OSTree is a content-addressed content store, inspired in many ways by Git but optimized for storing trees of binary files rather than trees of text files.

BuildStream is an integration tool which deals with trees of binary files.

I’m deliberately using the abstract term “trees of binary files” here because neither BuildStream or OSTree limit themselves to a particular use case. BuildStream itself uses the term “artifact” to describe the output of a build job and in practice this could be the set of development headers and documentation for library, a package file such as a .deb or .rpm, a filesystem for a whole operating system, a bootable VM disk image, or whatever else.

Anyway let’s get to the point! There are actually four two ways that BuildStream uses OSTree.

The `ostree` source plugin

The `ostree` source plugin allows pulling arbitrary data from a remote OSTree repository. It is normally used with an `import` element as a way of importing prebuilt binaries into a build pipeline. For example BuildStream’s integration tests currently run on top of the Freedesktop SDK binaries (which were originally intended for use with Flatpak applications but are equally useful as a generic platform runtime). The gnome-build-meta project uses this mechanism to import a prebuilt Debian base image, which is currently manually pushed to an OSTree repo (this is a temporary measure, in future we want to base gnome-build-meta on top of the upcoming Freedesktop SDK 1.8 instead).

It’s also possible to import binaries using the `tar` and `local` source types of course, and you can even use the `git` or `bzr` plugins for this if you really get off on using the wrong tools for the wrong job.

In future we will likely add other source plugins for importing binaries, for example from the Docker Registry and perhaps using casync.

Storing artifacts locally

Once a build has completed, BuildStream needs to store the results somewhere locally. The results go in the exciting-sounding “local artifact cache”, which is usually located inside your home directory at ​~/.cache/buildstream/artifacts.

BuildStream 1.0 used OSTree to store artifacts. BuildStream 1.2 and later use a generic Content Addressed Storage implementation.

Storing artifacts remotely

As a way of saving everyone from building the same things, BuildStream supports downloading prebuilt artifacts from a remote cache.

BuildStream 1.0 used OSTree for remote storage. BuildStream 1.2 and later use the same CAS service that is used for local storage.

Pushing and pulling artifacts

BuildStream 1.2 and later use the CAS protocol from the Remote Execution API to transfer artifacts. This protocol is implemented using GRPC.

Indirect uses of OSTree

It may be that you also end up deploying stuff into an OSTree repository somewhere. BuildStream itself is only interested with building and integrating your project — once that is done you run `bst checkout` and are rewarded with a tree of files on your local machine. What if, let’s say, your project aims to build a Flatpak application?

Flatpak actually uses OSTree as well and so your deployment step may involve committing those files into yet another OSTree repo ready for Flatpak to run them. (This can be a bit long winded at present so there will likely be some better integration appearing here at some point).

So, is anywhere safe from the rise of OSTree or is it going to take over completely? Something you might not know about me is that I grew up outside a town in north Shropshire called Oswestry. Is that a coincidence? I can’t say.

Oswestry
Oswestry, from Wikipedia.

2017 in review

I began this year in a hedge in Mexico City and immediately had to set off on a 2 day aeroplane trek back to Manchester to make a very tired return to work on the 3rd January. From there things calmed down somewhat and I was geared up for a fairly mundane year but in fact there have been many highlights!

The single biggest event was certainly bringing GUADEC 2017 to Manchester. I had various goals for this such as ensuring we got a GUADEC 2017, showing my colleages at Codethink that GNOME is a great community, and being in the top 10 page authors on wiki.gnome.org for the year. The run up to the event from about January to July took up many evenings and it was sometimes hard to trade it off with my work at Codethink; it was great working with Allan, Alberto, Lene and Javier though and once the conference actually arrived there was a mass positive force from all involved that made sure it went well. The strangest moment was definitely walking into Kro Bar slightly before the preregistration event was due to start to find half the GNOME community already crammed into the tiny bar area waiting for something to happen. Obviously my experience of organizing music events (where you can expect people to arrive about 2 hours after you want them somewhere) didn’t help here.

Codethink provides engineers with a travel budget a little bit of extra leave for attending conferences; obviously what with GUADEC being in Manchester I didn’t make a huge use of that this year, but I did make it to FOSDEM and also to PyConES which took place in the beautiful city of Cácares. My friend Pedro was part of the organizing team and it was great to watch him running round fighting fires all day while I relaxed and watched the talks (which were mostly all trying to explain machine learning in 30 minutes with varying degrees of success).

Stream powered carriageWork wise I spent most of my year looking at compilers and build tools, perhaps not my dream job but it’s an enjoyable area to work in because (at least in terms of build tools) the state of the art is comically bad. In 10 years we will look back at GNU Autotools in the way we look at a car that needs to be started with a hand crank, and perhaps the next generation of distro packagers will think back in wonder at how their forebears had to individually maintain dependency and configuration info in their different incompatible formats.

BuildStream is in a good state and is about to hit 1.0; it’s beginning to get battle tested in a couple of places (one of these being GNOME) which is no doubt going to be a rough ride — I already have a wide selection of performance bottlenecks to be looking at in the new year. But it’s looking already like a healthy community and I want to thanks to everyone who has already got behind the project.

It also seems to have been a great year for Meson; something that has been a long time coming but seems to be finally bringing Free Software build systems into the 21st century. Last year I ported Tracker to build with Meson, and have been doing various ongoing fixes to the new build system — we’re not yet able to fully switch to Autotools primary because of issue #2166, and also because of some Tracker test suite failures that seem to only show up with Meson that we haven’t yet dug into fully.

With GUADEC out of the way I managed to spend some time prototyping something I named Tagcloud. This is the next iteration of a concept that I’ve wanted since more or less forever, that of being able to apply arbitrary tags to different local and online resources in a nice way. On the web this is a widespread concept but for some reason the desktop world doesn’t seem to buy into it. Tracker is a key part of this puzzle, as it can deal with many types of content and can actually already handle tags if you don’t mind using the commandline so part of my work on Tagcloud has been making Tracker easy to embed as a subproject. This means I can try new stuff without messing up any session-wide Tracker setup, and it builds builds on some great work Carlos has been doing to modernize Tracker as well. I’ve been developing the app in Python, which has required me to fix issues in Tracker’s introspection bindings (and GLib’s, and GTK+’s … on the whole I find the PyGObject experience pretty good and it’s obviously been a massive effort to get this far, but at the same time these teething issues are quite demotivating.) Anyway I will post more about Tagcloud in the new year once some of the ideas are a bit further worked out; and of course it may end up going nowhere at all but it’s been nice to actually write a GTK+ app for the first time in ages, and to make use of Flatpak for the first time.

It’s also been a great year for the Flatpak project; and to be honest if it wasn’t for Flatpak I would probably have talked myself out of writing a new app before I’d even started. Previously the story for getting a new app to end users was that you must either be involved or know someone involved in a distro or two so that you can have 2+ year old versions of your app installable through a package manager; or your users have to know how to drive Git and a buildsystem from the commandline. Now I can build a flatpak bundle every time I push to master and link people straight to that. What a world! And did I mention GitLab? I don’t know how I ever lived without GitLab CI and I think that GNOME’s migration to GitLab is going to be *hugely* beneficial for the project.

Looking back it seems I’ve done more programming stuff than I thought I had; perhaps a good sign that you can achieve stuff without sacrificing too much of your spare time.

It’s also been a good year music wise, Manchester continues to have a fantastic music scene which has only got better with the addition of the Old Abbey Taphouse where I in fact spent the last 4 Saturdays in a row. Last Saturday we put on Babar Luck, I saw a great gig of his 10 years ago and have managed to keep missing him ever since but things finally worked out this time. Other highlights have been Paddy Steer, Baghdaddies and a very cold gig we did with the Rubber Duck Orchestra on the outdoor stage on a snowy December evening.

I caught a few gigs by Henge who only get better with time and who will hopefully break way out of Manchester next year. And in September I had the privilege of watching Jeffrey Lewis supported by The Burning Hell in a little wooden hut outside Lochcarron in Scotland, that was certainly a highlight despite being ill and wearing cold shoes.

Lochcarron TreehouseI didn’t actually know much of Scotland until taking the van up there this year; I was amazed that such a beautiful place has been there the whole time just waiting there 400 miles north. This expedition was originally planned to be a bike trip but ended up being a road trip, and having now seen the roads that is probably for the best. However we did manage a great bike trip around the Netherlands and Belgium, the first time I’ve done a week long bike trip and hopefully the beginning of a new tradition ! Last year I did a lot of travel to crazily distant places, its a privilege to be able to do so but one that I prefer to use sparingly so it was nice to get around closer to home this year.

All in all a pretty successful year, not straightforward at times but one with several steps in the directions I wanted to head. Let’s see what next year holds 🙂

Using BuildStream through Docker

BuildStream isn’t packaged in any distributions yet, and it’s not entirely trivial to install it yourself from source. BuildStream itself is just Python, but it depends on a modern version of OSTree (2017.8 or newer at time of writing), with the GObject introspection bindings, which is a little annoying to have to build yourself1.

So we have put some work into making it convenient to run BuildStream inside a Docker container. We publish an image to the Docker hub with the necessary dependencies, and we provide a helper script named bst-here that sets up a container with the current worked directory mounted at /src and then runs a BuildStream command or an interactive shell inside it. Just download the script, read it through and run it: all going well you’ll be rewarded with an interactive Bash session where you can run the bst command. This allows users on any distro that supports Docker to run BuildStream builds in a pretty transparent way and without any major performance limitations. It even works on Mac OS X!

In order to run builds inside a sandbox, BuildStream uses Bubblewrap. This requires certain kernel features, in particular CONFIG_USER_NS which right now is not enabled by default in Arch and possibly in other distros. Docker containers run against the kernel of the host OS so it doesn’t help with this issue.

The Docker images we provide are based off Fedora and are built by GitLab CI from this repo. After a commit to that repo’s ‘master’ branch, a new image wends its way across hyperspace from GitLab to the Docker hub. These images are then pulled when the bst-here wrapper script calls docker run. (We also use these images for running the BuildStream CI pipelines).

More importantly, we now have a mascot now! Let me introduce the BuildStream beaver:

Beavers, of course, are known for building things in streams, are also native to Canada. This isn’t going to be the final logo, he’s just been brought in as we got tired of the project being represented by a capital letter B in a grey circle. If anyone can contribute a better one then please get in touch!

So what can you build with BuildStream now that you have it running in Docker? As recently announced, you can build GNOME! Follow this modified version of the newcomer’s guide to get started. Soon you will also be able to build Flatpak runtimes using the rewritten Freedesktop SDK; or build VM images using Baserock; and of course you can create pipelines for your own projects (although if you only have a few dependencies, using Meson subprojects might be quicker).

After one year of development, we are just a few blockers away from releasing BuildStream 1.0. So it is a great time to get involved in the project!

[1]. Installing modern OSTree from source is not impossible — my advice if you want to avoid Docker and your distro doesn’t provide a new enough OSTree would be to build the latest tagged release of OSTree from Git, and configure it to install into /opt/ostree. Then put something like export GI_TYPELIB_PATH=/opt/ostree/lib/girepository-1.0/ in your shell’s startup file. Make sure you have all the necessary build dependencies installed first.

Manchester transport consulation

Manchester council is running a transport survey at the moment.

There’s nothing I like more than ranting into pointless online forms, and here’s an extract from my response:

This lane (from Mad Cycle Lanes of Manchester) is a favourite.

Manchester is not the worst city to cycle in due to having largely quite
wide roads, but cycle infrastructure is almost always an afterthought
with many issues. Most cycle lanes pop in and out of existence, forcing
cyclists into the road at dangerous points. For example London Road
going south just past the A57(m) overpass has a cycle lane which
suddenly ends forcing cyclists to pull out in front of fast-moving
traffic that has just come off a motorway. Cycle lanes sometimes
coexist with tram lines, which is very dangerous as mountain bike wheels
are just wide enough to get stuck in the tram lines. When wet, tram
lines are also very slippery which makes them dangerous to cross on a
bike. I have been injured twice as a result of cycle lanes that cross
tram lines. There are also cycle lanes which go onto narrow stretches of
pavement, for example the corner by London Road Fire Station and
Munroe’s Hotel which is painted to look like a cycle lane but is also
clearly a walk way and is too narrow to function as both. In some places
there are cycle lanes which are rendered useless by cars parking in them
or next to them.

Walking should be the main mode of transport to get around the city. At
the moment walking around takes longer than it should because so much
time is spent waiting at traffic lights.

I actually don’t mind cycling round the city too much, and the traffic jams are actually great for cycle safety because cars generally don’t hit you if they aren’t going anywhere. But the absurdity of our existing cycle infrastructure needs to be recognised.

There are more great examples from round the country here and here.