Tracker at GUADEC 2020

GNOME’s conference is online this year, for obvious reasons. I spent the last 3 month teaching online classes so hopefully I’m prepared! I’m sad that there’s no Euro-trip this year and we can’t hang out in the pub, but nice that we’re saving hundreds of plane journeys.

There will be two talks related to Tracker: Carlos and I speaking about Tracker 3 (Friday 23rd July, 16.45 UTC), and myself on how to deal with challanges of working on GNOME’s session-wide daemons (Thursday 22nd July, 16.45 UTC). There are plenty of other fascinating talks, including inevitably one scheduled the same time as ours which you should, of course, watch as a replay during the break 🙂

Self-contained Tracker 3 apps

Let’s go back one year. The plan for Tracker 3 emerged when I spoke to Carlos Garnacho at GUADEC 2019 in Thessaloniki probably over a Freddo coffee like this one…

5 people drinking coffee in Thessaloniki

We had lots of improvements we want to make, but we knew we were at the limit of what we could to Tracker while keeping compatibility with the 10+ year old API. Changing a system service isn’t easy though (hence the talk). I’m a fan of the ‘Flatpak model’ of app deployment, and one benefit is that it can allow the latest apps to run on older LTS distributions. But there’s no magic there – this only works if the system and session-wide services follow strict compatibility rules.

Anything that wants to be running as a system service in combination with any kind of sandboxing system must have a protocol that is ABI stable and backwards compatible. (From https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/1001#note_370588157)

Tracker 3.0 adds important features for apps and users, but these changes require apps to use a new D-Bus API which won’t be available on older operating systems such as Ubuntu 20.04.

We’re considering various ways around this, and one that I prototyped recently is to bundle Tracker3 inside the sandbox. The downside is that some folders will be double indexed on systems where we can’t use the host’s Tracker, but the upside is the app actually works on all systems.

I created a branch of gnome-music demoing this approach. GNOME’s CI is so cool now that you can just go to that page, click ‘View exposed artifact’, then download and install a Flatpak bundle of gnome-music using Tracker 3! If you do, please comment on the MR about whether it works for you 🙂 Next on my list is GNOME Photos, but this is more complex for various reasons.

Blocklists and Allowlists

The world needs major changes to stamp out racism, and renaming variables in code isn’t a major change. That said, the terms ‘blacklist’ and ‘whitelist’ rely on and reinforce an association of ‘black bad, white good’. I’m happy to see a trend to replace these terms including Google, Linux, the IETF, and more.

It was simple to switch Tracker 3 to use the more accurate terms ‘blocklist’ and ‘allowlist’. I also learned something about stable releases — I merged a change to the 2.3 branch, but I didn’t realise that we consider the stable branch to be in ‘string freeze’ forever. (It’s obvious in hindsight 🙂 We’ve now reverted that but a few translation teams already updated their translations, so to the Spanish, Brazilian Portuguese and Romanian translators – sorry for creating extra work for you!

Acknowledging merge requests

I’ve noticed while working on app porting that some GNOME projects are quite unresponsive to merge requests. I’ve been volunteering my time as a GNOME contributor for longer than I want to remember, but it still impacts my motivation if I send a merge request and nobody comments. Part of the fun of contributing GNOME is being part of such an huge and talented community. How many potential contributors have we lost simply by ignoring their contributions?

Video of paper aeroplanes falling to the street

This started me thinking about how to improve the situation. Being a GNOME maintainer is not easy and is in most cases unpaid, so it’s not constructive to simply complain about the situation. Better if we can mobilise people with free time to look at whatever uncommented merge requests need attention! In many cases you can give useful feedback even if you don’t know the details of the project in question – if there’s a problem then it doesn’t need a maintainer to raise it.

So my idea, which I intend to raise somewhere other than my blog when I have the time, is we could have a bot that posts to discourse.gnome.org every Friday with a list of merge requests that are over a week old and haven’t received any comments. If you’re bored on a Friday afternoon or during the weekend you’ll be able to pick a merge request from the list and give some feedback to the contributor – a simple “Thanks for the patch, it looks fine to me” is better than silence.

Let me know what you think of the idea! Can you think of a better way to make sure we have speedy responses to merge requests?

Badge: I'm presenting at GUADEC 2020

See you there!

Posted in Uncategorized | Leave a comment

Tracker in Summer

Lots of effort is going into Tracker at the moment. I was waiting for a convenient time to blog about it all, but there isn’t a convenient moment on a project like this, just lots of interesting tasks all blocked on different things.

kawhi-watch-14

App porting

With the API changes mostly nailed down, our focus moved to making initial Tracker 3 ports of the libraries and apps that use Tracker. This is a crucial step to prove that the new design works as we expect, and has helped us to find and fix loads of rough edges. We want to work with the maintainers of each app to finish off these ports.

If you want to help, or just follow along with the app porting, the process is being tracked in this GNOME Initiatives issue.

The biggest success story so far is GNOME Music. The maintainers Jean and Marinus are regular collaborators in #tracker and in our video meetings, and we’ve already got a (mostly) working port to Tracker 3. You can download a Flatpak build from that merge request, but note that it requires tracker-miners 3.0 installed on your host.

We’re hoping we can work around the host dependency in most cases, but I got excited and made unofficial Fedora packages of Tracker 3 which allowed me to try it out on my laptop.

We are also happy that GTK can be built against Tracker 3 already, and excited for the work in progress on Rygel. At the time of writing, the other apps with Tracker 3 work in progress Boxes, Files, Notes, Photos, Videos. Some of these use the new tracker3 Grilo plugin which we hope a Grilo maintainer will be able to review and merge soon. All help with finishing these branches and the remaining apps will be very welcome.

Release strategy

We have been putting thought into how to release Tracker 3. We need collaboration on two sides: from app maintainers who we need to volunteer their time and energy to review, test and merge the Tracker 3 changes in their apps, and from distros who we need to volunteer their time to package the new version and release it.

We have some tricky puzzles to solve, the main one being how an app might switch to Tracker 3 without breaking on Ubuntu 20.04 and other distros that are unlikely to include Tracker 3, but are likely to host the latest Flatpak apps.

We are hoping to find a path forward that satisfies everyone, again, you can follow the discussion in Initiative issue #17.

As you can see, we are volunteering a lot of our time at the moment to make sure this complicated project is a success.

Data exporting

We made it more convenient to export data from Tracker databases, with the tracker export command. It’s nice to have a quick way to see exactly what is stored there. This feature will also be crucial for exporting app data such as photo albums and starred files from the centralized Tracker 2 database.

Hardware testing with umockdev

The removable device support in Tracker goes largely untested, because you need to actually plug and unplug a real USB to exercise it. As always, for a volunteer driven project like Tracker it’s vital that testing the code is as easy as possible.

I recently discovered umockdev and decided to give it a spin. I started with the power management code because it’s super simple – on low battery notication, we stop the indexer. I’m happy with the test code but unfortunately it fails on GNOME’s CI runners with an error from umockdev:

sendmsg_one: cannot connect to client's event socket: Permission denied

I’m not sure when I’ll be motived to dig into why this fails, since the problem only reproduces on the CI runners, so if anyone has a pointer on what’s wrong then please comment on the MR.

GUADEC

Due to the COVID-19 pandemic, GUADEC will be an online event but Tracker will be covered in two talks, “Tracker: The Future is Present” on the Friday, and my talk “Move Fast and Break Things” on Thursday.

The pandemic also means I’m likely to be spending the whole summer here in Galicia which can hardly be seen as bad luck. Here’s a photo of a beautiful spot I discovered recently about 30km from where I live:

Next steps

Carlos is working on some final API tweaks before we make another Tracker 2.99 beta release, after which the API should be fully stable. The Flatpak portal is also nearly ready.

We hope to see progress with app ports. This depends more and more on when app developers can volunteer their time to collaborate with us. Progress in the next few weeks will decide whether we target GNOME 3.38 (September 2020) or GNOME 3.40 (March 2021) for switching everything over to Tracker 3.

Unlike GTK 4, I can’t show any cool screenshots. I do have some ideas about how to demonstrate the improvements over Tracker 2, however … watch this space!

As always, we are available on IRC/Matrix in #tracker and you are welcome to join our online meetings.

Posted in Uncategorized | Leave a comment

Why I love Bandcamp

The Coronavirus quarantine would be much harder if we didn’t have great music to listen to. But making an income from live music is very difficult in a pandemic. What’s a good way to support the artists who are helping us through?

One ethical way is to buy music on Bandcamp. The idea of Bandcamp is that you browse music (and merch), and if you like something you buy a real download1. You get unlimited web streaming of everything you bought too2. Their business model is clear and upfront:

Our share is 15% on digital items, and 10% on physical goods. Payment processor fees are separate and vary depending on the size of the transaction, but for an average size purchase, amount to an additional 4-7%. The remainder, usually 80-85%, goes directly to the artist, and we pay out daily.

On Friday 1st May 2020, which is tomorrow, or today, or some point in the past, Bandcamp are waiving their 10-%15% share of sales. It’s a great time to buy some music!

Here are some recommendations taken from the recent social media challenge of posting album covers that have a big effect on your music taste, with no other context. (My social media posts are mostly of music recommendations with no context anyway, so this wasn’t much of a challenge).

Orange Whip by Honeyfeet

Widow City by The Fiery Furnaces

at Version City by Victor Rice

Unknown Mortal Orchestra by Unknown Mortal Orchestra

Sonido Amazonico by Chicha Libre

When you’ve listened to those, it’s time to dive into the enourmous list of curated recommendations (curated by real humans, not by robots). The best metal, the best hip-hop, the best contemporary Chinese post-punk, the best Theremin music of the last 100 years, etc. etc. You can also follow me if you want 🙂

In the parallel universe of unethical music services, I read that Spotify have insultingly added a virtual “tip jar”. It can’t make amends for the deeply unfair business relationship that many streaming sites have with artists.

Listen to the T-shirt:


blackdogtee

Have fun & make sure to spend your music money ethically!

1: You can even download in Ogg Vorbis format if you like.
2: In practice, you get unlimited streaming of all the music on Bandcamp. Artists can choose to put a nag screen up after a certain number of listens. Some artists would prefer the site to be more restrictive in this regard.

Posted in Uncategorized | 1 Comment

Tracker documentation improvements

Word Cloud of Tracker ontology documentationIt’s cool storing stuff in a database, but what if you shared the database schema so other tools can work with the data? That’s the basic idea of Linked Data which Tracker tries to follow when indexing your content.

In a closed music database, you might see a “Music” table with a “name” column. What does that mean? Is it the name of a song, an artist, an album, … ? You will have to do some digging to find out.

When Tracker indexes your music, it will create a table called nmm:MusicAlbum. What does that mean? You can click the link to find out, because the database schema is self-documenting. The abbreviation nmm:MusicAlbum expands to a URL, which clearly identifies the type of data being stored.

By formalising the database schema, we create a shared vocabulary for talking about the data. This is very powerful – have you seen GMail Highlights, where a button appears in your email inbox to checkin for a flight and such things? These are powered by the https://schema.org/ shared vocabulary. Google don’t manually add support to GMail for each airline in the world. Instead, the airlines embed a https://schema.org/FlightReservation resource in the confirmation email which GMail uses to show the information. The vocabulary is an open standard, so other email providers can use the same data and even propose improvements. Everyone wins!

Recent improvements

Tracker began 5 years before the creation of schema.org, and we use an older vocabulary from a project called Nepomuk. Tracker may now be the only user of the Nepomuk vocabularies, but to avoid a huge porting effort we have opted to keep using them for 3.0.

Inspired by schema.org documentation, I changed the formatting of Tracker’s schema documentation trying to pack the important information more densely. Compare the 2.x documentation to the 3.x documentation to see what has changed – I think it’s a lot more readable now.

We have also stopped using broken or incorrect URLs. The https://tracker.api.gnome.org/ namespace was recently set up by the incredibly efficient GNOME sysadmins and we can trust it not to disappear at random, unlike the http://tracker-project.org/ and https://www.semanticdesktop.org/ontologies/ namespaces we were using before.

One thing you will notice if you followed the nmm:MusicAlbum link above is that the contents of the documentation still requires some improvement. I hope to see incremental improvements here; if you think you can make it better, please send us a merge request !

CLI documentation

We maintain documentation for the tracker CLI tool in the form of man pages. These were a bit neglected. We now publish the man pages online making it easier to read them and harder to forget they exist.

Internally this is done using Asciidoc and xmlto, plus a small Python script to post-process the output.

User documentation

There is a well-written and quite outdated set of documentation at https://wiki.gnome.org/Projects/Tracker. It’s mostly aimed at setting up Tracker on systems where it doesn’t come ready-integrated – which is a use case we don’t really want to support. I’m a bit stuck as I don’t want to delete what is quite good content, but I also don’t want to maintain documention for things that nobody should need to do…

Documentation hosting

This is a periodic reminder that the library-web script that manages developer.gnome.org needs a major reworking, such as the one proposed here. All the Tracker documentation on http://developer.gnome.org/ is years out of date, because we switched to Meson which requires us to do extra effort on each release to post the documentation. Much kudos is awaiting the people who can resolve this.

Stay tuned

Work is proceeding nicely on Tracker 3.0 and we hope to have the first beta release ready within the next couple of weeks. At that point, there will be opportunities to help with testing app ports and making sure performance is good – I will keep you posted here!

Posted in Uncategorized | 3 Comments

API changes in Tracker 3.0

This article has been updated to correct a misunderstanding I had about the CONSTRAINT feature. Apps will not need to explicitly add this to their queries, it will be added implicitly by the xdg-tracker-portal process..

Lots has happened in the 2 months since my last post, most notably the global coronavirus pandemic … in Spain we’re in week 3 of quarantine lockdown already and noone knows when it is going to end.

Let’s take our mind off the pandemic and talk about Tracker 3.0. At the start of the year Carlos worked on some key API changes which are now merged. It’s a good opportunity to recap what’s really changing in the new version.

I made the developer documentation for Tracker 3.0 available online. Thanks to GitLab, this can be updated every time we merge a change in Git. The documentation a work in progress and we appreciate if you can help us to improve it.

The documentation contains a migration guide, but let’s have a broader look at some common use cases.

Tracker 3.0 is still in development and things may change! We very much welcome feedback from app developers who are going to use this API.

Browsing and searching

The big news in Tracker 3.0 is decentralization. Each app can now manage its own private database! There’s no single “Tracker store” any longer.

Tracker 3.0 will index content from the filesystem to facilitate searching and browsing, as it does now. The filesystem miner will keep this in its own database, and Flatpak apps will access this database through a portal (currently in development).

Apps access this data using a TrackerSparqlConnection just like now, but when we create the connection we need to specify that we want to connect to the filesystem miner’s database.

Here’s a Python example of listing all the music files in the user’s ~/Music directory:

from gi.repository import Tracker

conn = Tracker.SparqlConnection.bus_new(
    "org.freedesktop.Tracker3.Miner.Files", None, None)
cursor = conn.query(
    'SELECT ?url { ?r a nmm:MusicPiece ; nie:url ?url }')
print("Found music files:\n")
while cursor.next():
    print(cursor.get_string()[0][0])

Running a full text search will be similar. Here’s how you’d look for “bananas” in every file in the users ~/Documents folder:

cursor = conn.query(
    'SELECT ?url fts:snippet(?r) { '
    '    ?r a nfo:Document ; '
    '        nie:url ?url ; '
    '        fts:match "Bananas" '
'}')
print("Found document files:\n")
while cursor.next():
    print("   url: {}".format(cursor.get_string()[0][0]))
    print("   snippet: {}".format(cursor.get_string()[0][0]))

If you are running inside a Flatpak sandbox then there will be a portal between you and the org.freedesktop.Tracker3.Miner.Files database. The read-only /.flatpak-info file inside the sandbox, which is created when building the Flatpak, will declare what graphs your app can access. The xdg-tracker-portal will add that information into the SPARQL query, using a Tracker-specific syntax like this: CONSTRAINT GRAPH , and the database will enforce the constraint ensuring that your app really does only see the graphs that it’s requested access to.

Storing your own data

Tracker can be used as a data store by applications. One principle behind the design of Tracker 1.x was that by using a centralized store and a common vocabulary, different apps could easily share data. For example, when you create an album in GNOME Photos, it’s stored in the Tracker database using the standard nfo:DataContainer class. Any other app, perhaps a file manager, or a photos app from a different platform, can show and edit albums stored in this way without having to know specifics about GNOME Photos. Playlists in GNOME Music and starred files in Nautilus are also stored this way.

This approach had some downsides. Having all data in a single database creates a single point of failure. It’s hard to backup the valuable user data without backing up the search and indexing data too – but since the index can be recreated from the filesystem, it’s a waste of resources to include that in a backup. Apps were also forced to share a single database schema which was maintained in the tracker.git repository.

Tracker 3.0, each app creates a private database for storing its own data. It can use the ontology (database schema) from Tracker, or it can provide its own version. Here’s how a photos app written in Python could store photo albums:

from gi.repository import Gio, GLib, Tracker
import pathlib

def app_database_dir():
    data_dir = pathlib.Path(GLib.get_user_data_dir())
    return data_dir.joinpath('my-photos-app/db')

location = Gio.File.new_for_path(app_database_dir())
conn = Tracker.SparqlConnection.new(
    Tracker.SparqlConnectionFlags.NONE, location, None)

conn.update(
    'INSERT {  a nfo:DataContainer, nie:DataObject ; '
    '           nie:title "My Album" }',
    0, None)

Now let’s insert a photo into this album. Remember that the user’s photos are indexed by the filesystem miner. We can use the SERVICE statement to connect the filesystem miner’s database to our app’s private database, like this:

conn.update(
    'CONSTRAINT GRAPH  '
    'INSERT { '
        '   SELECT ?photo { '
        '       SERVICE <dbus:org.freedesktop.Tracker3.Miner.Files> { '
        '           ?photo nie:isStoredAs <file:///home/me/Photos/my-photo.jpg>'
        '       } '
        '   }, '
        '   ?photo nie:isPartOf  . ',
    '}',
    0, None)

Now let’s display the contents of the album:

cursor = conn.query(
    'CONSTRAINT GRAPH  '
    'SELECT ?url { '
    '    SELECT ?photo ?url { '
    '        SERVICE <dbus:org.freedesktop.Tracker3.Miner.Files> { '
    '            ?photo a nmm:Photo ; nie:isStoredAs ?url . '
    '        } '
    '    } '
    '    ?photo nie:isPartOf <album:MyAlbum> . '
    '}')
while cursor.next():
    print(cursor.get_string(0)[0])

Notice again that the app has to request permission to access the Photos graph. If our example app is running in Flatpak, this will require a special permission.

It’s still possible for one app to share data with another, but it will require coordination at the app level. Using the example of photo albums, GNOME Photos can opt to make its database available to other apps. If a different app wants to see the user’s photo albums, they’ll need to connect to the org.gnome.Photos database over D-Bus. As usual, Flatpak apps would need permission to do this.

Is it a good time to port my app to Tracker 3.0?

It’s a good time to start porting your app. You will definitely be able to help us with testing and stabilising the library and the documentation if you start now.

There are some API changes still unmerged at time of writing, primarily the Flatpak portal and the CONSTRAINT feature, also the details of how you specify which ontology to use.

Some functionality is no longer exposed in C libraries, due to the privitization of libtracker-control and libtracker-miner. As far as we know libtracker-miner is unused outside Tracker, but some apps are currenly using libtracker-control to display status updates for the Tracker daemons and trigger indexing of removable devices. We have an open issue about improving the story for on-demand removable device indexing. For status monitoring you may use the underlying DBus signals, and I’m also hoping to make these more useful.

Ideally I’d like to add a new helper library for Tracker 3.0 which would conveniently wrap the high level features that apps use. My volunteer time is limited though. I can share ideas for this if you are looking for a way to contribute!

What about a hackfest?

At some point we need to finish the Tracker 3.0 work and make sure that apps that use Tracker are all ported and working. The best case is that we do this in time for the upcoming GNOME 3.38 release. We discussed about a hackfest some point between now and GNOME 3.38 to make sure things are settled; it now may be that an in-person hackfest won’t be feasible in light of the Coronavirus pandemic but a series of online meetings would be a good alternative. We can only wait, and see!

Posted in Uncategorized | 5 Comments

Sculpting Tracker 3.0

We’re in the second phase of work to create version 3.0 of the Tracker desktop search engine.

Tracker’s database is now up to date with the latest SPARQL 1.1 standards, including the magical SERVICE statement that lets you combine results from multiple databases in a single query. Now we’re converting the database from a service into a library, and turning the previously monolithic architecture into something more flexible.

Carlos has already done most of this work and the code is pushed as #172 (tracker.git) and  #136 (tracker-miners.git). At times it feels like we’re carving a big block of stone into a sculpture — just look at the diffstats:

    tracker.git: +4214 -10234
    tracker-miners.git: +375 -718

Read merge request #172 for full details, but the highlights are that there’s no more tracker-store daemon, and the libtracker-sparql library which was previously only used for querying and inserting data can now be used to create and manage your own database. You can keep the database private, or you can expose it over D-Bus.

The code in tracker.git is now only about managing data. We may rename it to tracker-sparql in due course, or even to SPARQLite if this is okayed by the developers of SQLite. There’s perhaps a niche for a desktop-scale database that supports SPARQL queries, and it’s a niche that Tracker’s database fits in nicely.

All the code related to desktop indexing and search is now in tracker-miners.git. The tracker-miner-fs daemon will maintain the index in its own database, which you’ll be able to query by connecting over D-Bus just like you used to connect to tracker-store in Tracker 2.0. However, apps running inside Flatpak will not be able to talk directly to the tracker-miner-fs daemon — communication will go through a new portal that Carlos is currently working on, allowing us to implement per-app access controls to your data for the first time.

We are still pending a Tracker 2.3.2 bugfix release too! This month Victor Gal solved an issue that was causing photo geolocation metadata to be ignored. Rasmus Thomsen also added Alpine Linux to our CI, and the GNOME translation teams have been hard at work too.

If you want to help out by testing, developing, documenting Tracker – get in touch on  GNOME Discourse (use the ‘tracker’ tag) or irc.gnome.org #tracker.

Posted in Uncategorized | 1 Comment

Last month in Tracker

Here’s an incomplete report of some work done on Tracker during the last month!

Bugs

Jean Felder fixed a thorny issue that was causing wrong track durations for MP3s.

Rasmus Thomsen has been testing on Alpine Linux, fixing one issue and finding several more. Alpine Linux uses musl libc instead of the more common GNU libc, which triggers bugs that we don’t usually see. Finding and fixing these issues could be a great learning experience for someone who wants to dig deep into the platform!

There’s an ongoing issue reported by many Ubuntu users which seems to be due to SQLite database corruption. SQLite is rather a black box to me, so I don’t know how or when we might get to the bottom of why this corruption is happening.

Ubuntu CI

We now test each commit on Ubuntu as well as Fedora. This a nice step forwards. It’s also triggering more intermittent failures in the CI — we’ve made huge progress in the last few years on bringing the CI up from zero, but there are some latent issues like these which we need to get rid of.

Tracker 3.0

Carlos has done more architectural work in the ‘master’ branch, working towards having a generic SPARQL store in tracker.git, and all GNOME/desktop/filesystem related code in tracker-miners.git.

As part of this, the tracker CLI tool is now split between tracker.git and tracker-miners.git (MR1, MR2).

We also moved the libtracker-control and libtracker-miner libraries into tracker-miners.git, and made the libtracker-control API private. As far as I know, the libtracker-control library is only being used by GNOME Photos to manage indexing of removable devices. We want to keep track of which apps need porting to 3.0, so please let me know if this is going to affect anything else.

New website

Tracker is famous enough that it merits a real website, not just an outdated set of wiki pages. So I made a real Tracker website, aiming to collect links to relevant user and developer documentation and to have a minimal overview and FAQ section. We can build and deploy this straight from the tracker.git repo, so whereas the wiki is easily forgotten, the new website lives in the same repo as the sourcecode. The next step will be to merge this and then tidy up most of the old wiki pages

 

Posted in Uncategorized | 1 Comment

Into the Pyramid

November 2019 wasn’t an easy month, for various reasons, and it also rained every single day of the month. But there were some highlights!

LibreTeo

At the bus stop one day I saw a poster for a local Free Software related event called LibreTeo. Of course I went, and saw some interesting talks related to technology and culture and also a useful workshop on improving your clown skills. Actually the clown workshop was a highlight. It was a small event but very friendly, I met several local Free Software heads, and we were even invited for lunch with the volunteers who organized it.

Purr Data on Flathub

I want to do my part for increasing the amount of apps that are easy to install Linux. I asked developers to Flatpak your app today last year, and this month I took the opportunity to package Purr Data on Flathub.

Here’s a quick demo video, showing one of the PD examples which generates an ‘audible illusion’ of a tone that descends forever, known as a Shepard Tone.

As always the motivation is a selfish one. I own an Organelle synth – it’s a hackable Linux-based device that generates sound using Pure Data, and I want to be able to edit the patches!

Pure Data is a very powerful open source tool for audio programming, but it’s never had much commercial interest (unlike its proprietary sibling Max/MSP) and that’s probably why the default UI is still implemented in TCL/TK in 2019. The Purr Data fork has made a lot of progress on an alternative HTML5/JavaScript UI, so I decided this would be more suitable for a Flathub package.

I was particularly motivated by the ongoing Pipewire project which is aiming to unify pro and consumer audio APIs on Linux in a Flatpak-friendly way. Christian Schaller mentioned this recently:

There is also a plan to have a core set of ProAudio applications available as Flatpaks for Fedora Workstation 32 tested and verified to work perfectly with Pipewire.

The Purr Data app will benefit a lot from this work. It currently has to use the OSS backend inside the sandbox and doesn’t seem to successfully communicate over MIDI either — so it’s rather a “tech preview” at this stage.

The developers of Purr Data are happy about the Flatpak packaging, although they aren’t interested in sharing the maintenance effort right now. If anyone reading this would like to help me with improving and maintaining the Purr Data Flatpak, please get in touch! I expect the effort required to be minimal, but I’d like to have a bus factor > 1.

Tracker bug fixes

This month we fixed a couple of issues in Tracker which were causing system lockups for some people. It was very encouraging to see people volunteering their time to help track down the issue, both in Gitlab issue 95 and in #tracker on IRC, and everyone involved in the discussion stayed really positive even though it’s obviously quite annoying when your computer keeps freezing.

In the end there were several things that come together to cause system lockups:

  • Tracker has a ‘generic image extraction’ rule that tries to find metadata for any image/* MIME type that isn’t a .bmp, .jpg, .gif, or .png. This codepath uses the GstDiscoverer API, the same as for video and audio files, in the hope that a GStreamer plugin on the system can give us useful info about the image.
  • The GstDiscoverer instance is created with a timeout of 5 seconds. (This seems quite high — the gst-typefind utility that ships with GStreamer uses a timeout of 1 second).
  • GStreamer’s GstDiscoverer API feeds any file where the type is unknown into an MPEG decoder, which is effectively an unwanted fuzz test and can trigger periods of high CPU and memory usage.
  • 5 seconds of processing non-MPEG data with an MPEG decoder is somehow enough to cause Linux’s scheduler to lock up the entire system.

We fixed this in the stable branches by blocking certain problematic MIME types. In the next major release of Tracker we will probably remove this codepath completely as the risks seem to outweigh the benefits.

Other bits

I also did some work on a pet project of mine called Calliope, related with music recommendations and playlist generation. More on this in a separate blog post.

And I finally installed Fedora on my partner’s laptop. It was nice to see that Gnome Shell works out-of-the-box on 12 year old consumer hardware. The fan, which was spinning 100% of the time under Windows 8, is virtually silent now – I had actually thought this problem was due to dust buildup or a hardware issue, but once again the cause was actually low-quality proprietary software.

Posted in Uncategorized | 2 Comments

What I did in October

October in Galicia has a weather surprise for every week. I like it because every time the sun appears you feel like you gotta enjoy it – there might be no more until March.

I didn’t do much work on Tracker this month, beside bug triage and a small amount of prep for the 2.3.1 stable release. The next step for Tracker 3.0 is still to fix a few regressions causing tests to fail in tracker-miners.git. Follow the Tracker 3.0 milestone for more information!

Planalyzer

In September I began teaching English classes again after the summer, and so I’ve been polishing the tool that I wrote to index old lesson plans.

It looks a little cooler than before:

Screenshot of Planalyzer app

I’m still quite happy with the hybrid GTK+/webapp approach that I’m taking. I began this way because the app really needs to be available in a browser: you can’t rely on running a custom desktop app on a classroom PC. However, for my own use running it as a webapp is inconvenient, so I added a simple GTK+/WebKit wrapper. It’s kind of experimental and a few weird things come out of it, like how clipboard selections contain some unwanted style info that WebKit injects, but it’s been pretty quick and fun to build the app this way.

I see some developers using Electron these days. In some ways it’s good: apps have strong portabilility to Linux, and are usually easy to hack on too due to being mostly JavaScript. But having multiple 150MB binary builds of Chromium dotted about my machine makes me sad. In the Planalyzer app I use WebKitGTK+, which is already part of GNOME and it works very well. It would be cool if Electron could make use of this in future 🙂

Hydra

I was always interested in making cool visuals, since I first learned about the PC demoscene back in the 1990s, but i was never very good at it. I once made a rather lame plasma demo using an algorithm i copied from somewhere else.

And then, while reading the Create Digital Music blog earlier this year, I discovered Hydra. I was immediately attracted by the simple, obvious interface: you chain JavaScript functions together and visuals appear right behind the code. You can try it here right away in your browser. I’ve been out of touch with the 3D graphics world forever, so I was impressed just to see that WebGL now exists and works.

I’ve been very much in touch with the world of audio synthesizers, so Hydra’s model of chaining together GL shaders as if it was a signal chain feels very natural to me. I still couldn’t write a fragment or a vertex shader myself, but now I don’t need to, I can skip to the creative part!

So far I’ve only made this rather basic webcam mashup but you can see a lot more Hydra examples in the @hydra_patterns Twitter account.

I also had a go at making online documentation, and added a few features that make it more suitable to non-live coding, such as loading prerecorded audio tracks and videos, and allowing you to record a .webm video of the output. I’m not sure this stuff will make it upstream, as the tool is intended for live coding use, but we’ll see. It’s been a lot of fun hacking on a project that’s so simple and yet so powerful, and hopefully you’ll see some cool music videos from me in the future!

Posted in Uncategorized | 2 Comments

Tracker developer experience improvements

There have been lots of blog posts since I suggested we write more blog posts. Great! I’m going to write about what I’ve done this month.

I’m excited that work started on Tracker 3.0, after we talked about it at GUADEC 2019. We merged Carlos’ enourmous branch to modernize the Tracker store database. This has broken some tests in tracker-miners, and the next step will be to track down and fix these regressions.

I’ve continued looking at the developer experience of Tracker. Recently we modernized the README.md file (as several GNOME projects have done recently). I want the README to document a simple “build and test Tracker from git” workflow, and that led into work making it simpler to run Tracker from the build tree, and also a bunch of improvements to the test suite.

The design of Tracker has always meant that it’s a pain in the ass to build and test, because to do anything useful you need to have 3 different daemons running and talking to each other over D-Bus, reading and writing data in the same location, and communicating with the CLI or an app. We had a method for running Tracker from the build tree for use by automated tests, whose code was duplicated in tracker.git and tracker-miners.git, and then we had a separate script for developers to test things manually, but you still had to install Tracker to use that one. It was a bit of a mess.

The first thing I fixed was the code duplication. Now we have a Python module named trackertestutils. We install it, so we don’t need to duplicate code between tracker.git and tracker-miners.git any more. Thanks to Marco Trevisan we also install a pkgconfig file.

Then I added a ./run-uninstalled script to tracker-miners.git. The improvement in developer experience I think is huge. Now you can do this to try out the latest Tracker code:

    git clone tracker-miners.git
    cd tracker-miners && mkdir build && cd build
    meson .. && ninja
    ./run-uninstalled --wait-for-miner=Files --wait-for-miner=Extract -- tracker index --file ~/Documents
    ./run-uninstalled -- tracker search "Hello"

The script is a small wrapper around trackertestutils, which takes care of spawning a private D-Bus daemon, collecting and filtering logs, and setting up the environment so that the Tracker cache is written to `/tmp/tracker-data`. (At the time of writing, there are some bug still and ./run-installed actually still requires you to install Tracker first.)

I also improved logging for Tracker’s functional-test suite. Since a year ago we’ve been running these tests in CI, but there have been some intermittent failures, which were hard to debug because log output from the tests was very messy. When you run a private D-Bus session, all kinds of daemons spawn and dump stuff to stdout. Now we set G_MESSAGE_PREFIXED in the environment, so the test harness can separate the messages that come from Tracker processes. It’s already allowed me to track down some of these annoying intermittent failures, and to increase the default log verbosity in CI.

Another neat thing about installing trackertestutils is that downstream projects can use it too. Rishi mentioned at GUADEC that gnome-photos has a test which starts the photos app and ends up displaying the actual photo collection of the user who is running the test. Automated tests should really be isolated from the real user data. And using trackertestutils, it’s now simple to do that: here’s a proof of concept for gnome-photos.

And I made a new tune!

Posted in Uncategorized | 1 Comment

Blog about what you do!

Am I the first to blog from GUADEC 2019? It has been a great conference: huge respect to the organization team for volunteering significant time and energy to make it all run smoothly.

The most interesting thing at GUADEC is talking to community members old and new. I discovered that I don’t know much about what people are doing in GNOME. I discovered Antonio is doing user support / bug triage and more in Nautilus. I discovered that Bastian is posting GNOME-related questions and answers on StackOverflow. I discovered Britt is promoting us on Twitter and moderating discussions on Reddit. I discovered Felipe is starting to do direct user support for Boxes. I wouldn’t know any of this if I hadn’t been to GUADEC.

So here’s my plea — if you contribute to GNOME, please blog about it! If everyone reading this wrote just one blog post a year… I’d have a much better idea of what you’re all doing!

Don’t forget: Planet GNOME is not only for announcing cool new projects and features – it’s “a window into the world, work and lives of GNOME hackers and contributors.” Blog about anything GNOME related, and be yourself — we’re not a corporation, we’re an underground network with a global, diverse, free thinking membership and that’s our strength.

Remember that there’s much more to GNOME than software development — read this long list of skillsets that you’re probably using. Write about translations, user support, testing, documentation, packaging, outreach, foundation work, event organization, bug triage, product management, release management, design, infrastructure operations. Write about why you enjoy contributing to GNOME, write about why it’s important to you. Write about what you did yesterday, or what you did last month. Write about your friends in GNOME. Make some graphs about your project to show how much work you do. Write short posts, write them quickly. Don’t worry about minor errors — it’s a blog, not a magazine article. Don’t be scared that readers won’t be interested — we are! We’re a distributed team and we need to keep each other posted about what we’re doing. Show links, screenshots, discussions, photos, graphs, anything. Don’t write reports, write stories.

If you contribute to GNOME but don’t have a blog… please start one! Write some nice posts about what you do. Become a Foundation member if you haven’t already*, and ask to join Planet GNOME.

And even if you forget all that, remember this: positive feedback for contributions encourages more contributions. Writing a blog post, like any other form of contribution, can sometimes feel shouting into an abyss. If you read an interesting post, leave a positive comment & thank the author for taking the time to write it.

* The people using and reviewing your contributions will be happy to vouch for you, don’t worry about that!

Posted in Uncategorized | 5 Comments

Twitter without Infinite Scroll

I like reading stuff on twitter.com because a lot of interesting people write things there which they don’t write anywhere else.

But Twitter is designed to be addictive, and a key mechanism they use is the “infinite scroll” design. Infinite scroll has been called the Web’s slot machine because of the way it exploits our minds to make us keep reading. It’s an unethical design.

In an essay entitled “If the internet is addictive, why don’t we regulate it?”, the writer Michael Schulson says:

… infinite scroll has no clear benefit for users. It exists almost entirely to circumvent self-control.

Hopefully Twitter will one day consider the ethics of their design. Until then, I made a Firefox extension to remove the infinite scroll feature and replace it with a ‘Load older tweets’ link at the bottom of the page, like this:

example

The Firefox extension is called Twitter Without Infinite Scroll. It works by injecting some JavaScript code into the Twitter website which disconnects the ‘uiNearTheBottom’ event that would otherwise automatically fetch new data.

Quoting Michael Shulson’s article again:

Giving users a chance to pause and make a choice at the end of each discrete page or session tips the balance of power back in the individual’s direction.

So, if you are a Twitter user, enjoy your new-found power!

Posted in Uncategorized | 7 Comments

Tools I like for creating web apps

I used to dislike creating websites. CSS confused me and JavaScript annoyed me.

In the last year I’ve grown to like web development again! The developer tools available in the web browsers of 2019 are incredible to work with. The desktop world is catching up, but but the browser world is ahead. I rarely have to write CSS at all. JavaScript has gained a lot of the features that it was inexplicably missing.

Here’s a list of the technologies I’ve recently used and liked.

First, Bootstrap. It’s really a set of CSS classes which turn HTML into something that “works out of the box” for creating decently styled webapps. To me it feels like a well-designed widget toolkit, a kind of web counterpart to GTK. Once you know what Bootstrap looks like, you realize that everyone else is already using it. Thanks to Bootstrap, I don’t really need to understand CSS at all anymore. Once you get bored of the default theme, you can try some others.

Then, jQuery. I guess everyone in the world uses jQuery. It provides powerful methods to access JavaScript functionality that is otherwise horrible to use. One of its main features is the ability to select elements from a HTML document using CSS selectors. Normally jQuery provides a function named $, so for example you could get the text of every paragraph in a document like this: $('p').text().  Open up your browser’s inspector and try it now! Now, try to do the same thing without jQuery — you’ll need at least 6 times more code.1

After that, Riot.js. Riot is a UI library which lets you create a web page using building blocks which they call custom tags. Each custom tag is a snippet of HTML. You can attach JavaScript properties and methods to a custom tag as well, and you can refer to them in the snippet of HTML giving you a powerful client-side template framework.

There are lots of “frameworks” which provide similar functionality to Riot.js. I find frameworks a bit overwhelming, and I’m suspicious of tools like create-react-app that need to generate code for me before I can even get started. I like that Riot can run completely in the browser without any special tooling required, and that it has one, specific purpose. Riot isn’t perfect; in particular I find the documentation quite hard to understand at times, but so far I’m enjoying using it.

Finally, lunr.js. Lunr provides a powerful full-text search engine, implemented completely as JavaScript that runs in your users’ web browsers. “Isn’t that a terrible idea?” you think. For large data sets, Lunr is not at all appropriate (you might consider its larger sibling Solr). For a small webapp or prototype, Lunr can work well and can save you from having to write and run a more complex backend service.

If, like I did, you think web development is super annoying due to bad tooling, give it another try! It’s (kind of) fun now!

1. Here’s how it looks without jQuery: Array.map(document.getElementsByTagName('p'), function(e) { return e.textContent; })

Posted in Uncategorized | 2 Comments

The Lesson Planalyzer

I’ve now been working as a teacher for 8 months. There are a lot of things I like about the job. One thing I like is that every day brings a new deadline. That sounds bad right? It’s not: one day I prepare a class, the next day I deliver the class one or more times and I get instant feedback on it right there and then from the students. I’ve seen enough of the software industry, and the music industry, to know that such a quick feedback loop is a real privilege!

Creating a lesson plan can be a slow and sometimes frustrating process, but the more plans I write the more I can draw on things I’ve done before. I’ve planned and delivered over 175 different lessons already. It’s sometimes hard to know if I’m repeating myself or not, or if I could be reusing an activity from a past lesson, so I’ve been looking for easy ways to look back at all my old lesson plans.

Search

GNOME’s Tracker search engine provides a good starting point for searching a setof lesson plans: I can put the plans in my ~/Documents folder, open the folder in Nautilus, and then I type a term like "present perfect" into the search bar.

Screenshot of Nautilus showing search results

The results aren’t as helpful as they could be, though. I can only see a short snippet of the text in each document, when I really need to see the whole paragraph for the result to be directly useful. Also, the search returns anything where the words present and perfect appear, so we could be talking about tenses, or birthdays, or presentation skills.  I wanted a better approach.

Reading .docx files

My lesson plans have a fairly regular structure. An all-purpose search tool doesn’t know anything about my personal approach to writing lesson plans, though. I decided to try writing my own tool to extract more structured information from the documents. The plans are in .docx format1 which is remarkably easy to parse — you just need the Python ‘unzip’ and ‘xml’ modules, and some guesswork to figure out what the XML elements mean. I was surprised not to find a Python library that already did this for me, but in the end I wrote a very basic .docx helper module, and I used this to create a tool that read my existing lesson plans and dumped the data as a JSON document.

It works reliably! In a few cases I chose to update documents rather than add code to the tool to deal with formatting inconsistencies. Also, the tool currently throws away all formatting information, but I barely notice.

Web and desktop apps

From there, of course, things got out of control and I started writing a simple web application to display and search the lesson plans. Two months of sporadic effort later, and I just made a prototype release of The Lesson Planalyzer. It remains to be seen how useful it is for anyone, including me, but it’s very satisfying to have gone from an idea to a prototype application in such a short time. Here’s an ugly screenshot, which displays a couple of example lesson plans that I found online.

The user interface is HTML5, made using Bootstrap and a couple of other cool JavaScript libraries (which I might mention in a separate blog post). I’ve wrapped that up in a basic GTK application, which runs a tiny HTTP server and uses a WebKitWebView display its output. The desktop application has a couple of features that can’t be implemented inside a browser, one is the ability to open plan documents directly in LibreOffice, and also the other is a dedicated entry in the alt+tab menu.

If you’re curious, you can see the source at https://gitlab.com/samthursfield/planalyzer/. Let me know if you think it might be useful for you!

1. I need to be able to print the documents on computers which don’t have LibreOffice available, so they are all in .docx format.

Posted in Uncategorized | 1 Comment

Inspire me, Nautilus!

When I have some free time I like to be creative but sometimes I need a push of inspiration to take me in the right direction.

Interior designers and people who are about to get married like to create inspiration boards by gluing magazine cutouts to the wall.

6907272105_b47a5ca31a_b

‘Mood board for a Tuscan Style Interior’ by Design Folly on Flickr

I find a lot of inspiration online, so I want a digital equivalent. I looked for one, and I found various apps for iOS and Mac which act like digital inspiration boards, but I didn’t find anything I can use with GNOME. So I began planning an elaborate new GTK+ app, but then I remembered that I get tired of such projects before they actually become useful. In fact, there’s already a program that lets you manage a collection of images and text! It’s known as Files (Nautilus), and for me it only lacks the ability to store web links amongst the other content.

Then, I discovered that you can create .desktop files that point to web locations, the equivalent of .url files on Microsoft Windows. Would a folder full of URL links serve my needs? I think so!

Nautilus had some crufty code paths to deal with these shortcut files, which was removed in 2018. Firefox understands them directly, so if you set Firefox as the default application for the application/x-desktop file type then they work nicely: click on a shortcut and it opens in Firefox.

There is no convenient way to create these .desktop files: dragging and dropping a tab from Epiphany will create a text file containing the URL, which is tantalisingly close to what I want, but the resulting file can’t be easily opened in a browser. So, I ended up writing a simple extension that adds a ‘Create web link…’ dialog to Nautilus, accessed from the right-click menu.

Now I can use Nautilus to easily manage collections of links and I can mix in (or link to) any local content easily too. Here’s me beginning my ‘inspiration board’ for recipes …

Screenshot from 2019-03-04 22-05-13.png

<

Posted in Uncategorized | 4 Comments

Paying money for things

Sometimes it’s hard to make money from software. How do you make money from something that can be copied infinitely?

Right now there are 3 software tools that I pay for. Each one is supplied by a small company, and each one charges a monthly or annual fee. I prefer software with this business model because it creates an incentive for careful, ongoing maintenance and improvement. The alternative (pay a large fee, once) encourages a model that is more like “add many new features, sell the new version and then move onto something else”.

The 3 tools are:

  1. Feedbin, which is a tool that collects new content from many different blogs and shows them all in a single interface. This is done with a standard called RSS. The tool is a pleasure to use, and best of all, it’s Free Software released under a permissive license.
  2. Pinboard, a bookmarking and archival tool. The interface doesn’t spark joy and the search tool leaves a lot to be desired too. However, Pinboard carefully archives a copy of every single website that I bookmark, just at the time that I bookmark it. Since the Web is changing all the time and interesting content comes and goes, I find this very valuable. I don’t know if I’ll actually use this archive of content for much as I don’t actually enjoy writing articles particularly, but I use the existance of the archive as a way to convince myself to close browser tabs.
  3. Checkvist, a “to do list” tool that supports nesting items, filtering by tags, styling with Markdown, and keyboard-only operation. I use this not as a to-do list but as a way of categorising activities and resources that I use when teaching. To be honest, the “free” tier of this tool is generous enough that I don’t really need to pay, but I like to support the project.

Music can also be copied infinitely, and historically I’ve not been keen to buy it because I didn’t like the very shady operations of many record companies. Now I use Bandcamp, which has an incredible library of music with rich, manually curated recommendations, and a clear, sustainable business model.

What digital goods do you pay for on a regular basis?

 

Posted in Uncategorized | 4 Comments

How Tracker is tested in 2019

I became interested in the Tracker project in 2011. I was looking at media file scanning and was happy to discover an active project that was focused on the same thing. I wanted to contribute, but I found it very hard to test my changes; and since Tracker runs as a daemon I really didn’t want to introduce any crazy regressions.

In those days Tracker already had a set of tests written in Python that tested the Tracker daemons as a whole, but they were a bit unfinished and unreliable. I focused some spare-time effort on improving those. Surprisingly enough it’s taken eight years to get the point where I’m happy with how they work.

The two biggest improvements parallel changes in many other GNOME projects. Last year Tracker stopped using GNU Autotools in favour of Meson, after a long incubation period. I probably don’t need to go into detail of how much better this is for developers. Also, we set up GitLab CI to automatically run the test suite, where previously developers and maintainers were required to run the test suite manually before merging anything. Together, these changes have made it about 100000% easier to review patches for Tracker, so if you were considering contributing code to the project I can safely say that there has never been a better time!

The Tracker project is now divided into two parts, the ‘core’ (tracker.git) and the ‘miners’ (tracker-miners.git) . The core project contains the database and the application interface libraries, while the miners project contains the daemons that scan your filesystem and extract metadata from your interesting files.

Let’s look at what happens automatically when you submit a merge request on GNOME GitLab for the tracker-miners project:

  1. The .gitlab-ci.yml file specifies a Docker image to be used for running tests. The Docker images are built automatically from this project and are based on Fedora.
  2. The script in .gitlab-ci.yml clones the ‘master’ version of Tracker core.
  3. The tracker and tracker-miners projects are configured and built, using Meson. There is a special build option in tracker-miners that makes it include Tracker core as a Meson subproject, instead of building against the system-provided version. (It still depends on a few files from host at the time of writing).
  4. The script starts a private D-Bus session using dbus-run-session, sets a fixed en_US.UTF8 locale, and runs the test suite for tracker-miners using meson test.
  5. Meson runs the tests that are defined in meson.build files. It tries to run them in parallel with one test per CPU core.
  6. The libtracker-miners-common tests exercises some utility code, which is duplicated from libtracker-common in Tracker core.
  7. The libtracker-extract tests exercises libtracker-extract, which is a private library with helper code for accessing file metadata. It mainly focuses on standard metadata formats like XMP and EXIF.
  8. The functional-300-miner-basic-ops and functional-301-resource-removal tests check the operation of the tracker-miner-fs daemon, mostly by copying files in and out of a specific path and then waiting for the corresponding changes to the Tracker database to take effect.
  9. The functional-310-fts-basic test tries some full-text search operations on a text file. There are a couple of other FTS tests too.
  10. The functional/extract/* tests effectively run tracker extract on a set of real media files, and test that the expected metadata is extracted. The tests are defined by JSON files such as this one.
  11. The functional-500-writeback tests exercise the tracker-writeback daemon (which allows updating things like MP3 tags following changes in the Tracker database). These tests are not particularly thorough. The writeback feature of Tracker is not widely used, to my knowledge.
  12. Finally, the functional-600-* tests simulate the behaviour of some MeeGo phone applications. Yes, that’s how old this code is 🙂

There is plenty of room for more testing of course, but this list is very comprehensive when compared to the total lack of automated testing that the project had just a year ago!

Posted in Uncategorized | 3 Comments

Writing well

We rely on written language to develop software. I used to joke that I worked as a professional email writer rather than a computer programmer (and it wasn’t really a joke). So if you want to be a better engineer, I recommend that you focus some time on improving your written English.

I recently bought 100 Ways to Improve Your Writing by Gary Provost, which is a compact and rewarding book full of simple and widely applicable guidelines to writers. My advice is to buy a copy!

You can also find plenty of resources online. Start by improving your commit messages. Since we love to automate things, try these shell scripts that catch common writing mistakes. And every time you write a paragraph simply ask yourself: what is the purpose of this paragraph? Is it serving that purpose?

Native speakers and non-native speakers will both find useful advice in Gary Provost’s book. In the UK school system we aren’t taught this stuff particularly well. Many English-as-a-second-language courses don’t teach how to write on a “macro” level either, which is sad because there are many differences from language to language that non-natives need to be aware of. I have seen “Business English” courses that focus on clear and convincing communication, so you may want to look into one of those if you want more than just a book.

Code gets read more than it gets written, so it’s worth taking extra time so that it’s easy for future developers to read. The same is true of emails that you write to project mailing lists. If you want to make a positive change to development of your project, don’t just focus on the code — see if you can find 3 ways to improve the clarity of your writing.

Posted in Uncategorized | 3 Comments

GUADEC 2018 Videos: All Done

All the editing & uploading for the GUADEC videos is now finished. The videos were all uploaded to YouTube some time ago, and they are all now available on http://videos.guadec.org/2018 as well.

Thanks to everyone who helped with the editing: Alexis Diavatis, Bin Li, Garrett LeSage, Alexandre Franke (who also did a lot of the work of uploading to YouTube), and Hubert Figuiere (who managed to edit so many that I’m suspicious he might be some kind of robot in disguise).

edit: If you are hungry for more videos to edit, some footage from GUADEC 2002 has been unearthed. It’d be great to have some of this history from fifteen years ago up on YouTube! If you’re interested, reply to the mail or speak up in #guadec on GIMPnet and we can coordinate efforts.

Posted in Uncategorized | 2 Comments

Natural Language Processing

This month I have been thinking about good English sentence and paragraph structure. Non-native English speakers who are learning write in English will often think of what they want to say in their first language and then translate it. This generally results in a mess. The precise structure of the mess will depend on the rules of the student’s first language. The important thing is to teach the conventions of good English writing; but how?

Visualizing a problem helps to solve it. However there doesn’t seem to be a tool available today that can clearly visualize the various concerns writers have to deal with. A paragraph might contain 100 words, each of which relate to each other in some way. How do you visualize that clearly… not like this, anyway.

I did find some useful resources though. I discovered the Paramedic Method, through this blog post from helpscout.net. The Paramedic Method was devised by Richard Lanham and consists of these 6 steps:

  1. Highlight the prepositions.
  2. Highlight the “is” verb forms.
  3. Find the action. (Who is kicking whom?)
  4. Change the action into a simple active verb.
  5. Start fast—no slow windups.
  6. Read the passage out loud with emphasis and feeling.

This is good advice for anyone writing English. It’ll be particularly helpful in my classes in Spain where we need to clean up long strings of relative clauses. (For example, a sentence such as “On the way I met one of the workers from the company where I was going to do the interview that my friend got for me”. I would rewrite this as: “On the way I met a person from Company X, where my friend had recently got me an interview.”

I found a tool called Write Music which I like a lot. The idea is simple: to illustrate and visualize the rule that varying sentence length is important when writing. The creator of the tool, Titus Wormer, seems to be doing some interesting and well documented research.

I looked at a variety of open source tools for natural language processing. These provide good ways to tokenize a text and to identify the “part of speech” (noun, verb, adjective, adverb, etc.) but I didn’t yet find one that could analyze the types of clauses that are used. Which is a shame. My understanding of this is an area of English grammar is still quite weak and I was hoping my laptop might be able teach me by example but it seems not.

I found some surprisingly polished libraries that I’m keen to use for … something. One day I’ll know what. The compromise library for JavaScript can do all kinds of parsing and wordplay and is refreshingly honest about its limitations, and spaCy for Python also looks exciting. People like to interact with a computer through text. We hide the UNIX commandline. But one of the most popular user interfaces in the world is the Google search engine, which is a text box that accepts any kind of natural language and gives the impression of understanding it. In many cases this works brilliantly — I check spellings and convert measurements all the time using this “search engine” interface. Did you realize GNOME Shell can also do unit conversions? Try typing “50lb in kg” into the GNOME Shell search box and look at the result. Very useful! More apps should do helpful things like this.

I found some other amazing natural language technologies too. Inform 7 continues to blow my mind whenever I look at it. Commercial services like IBM Watson can promise incredible things like analysing the sentiments and emotions expressed in a text, and even the relationships expressed between the subjects and objects. It’s been an interesting day of research!

Posted in Uncategorized | 2 Comments