Status update 21/09/22

Last week I attended OSSEU 2022 in Dublin, gave a talk about BuildStream 2.0 and the REAPI, and saw some new and old faces. Good times apart from the common cold I picked up on the way — I was glad that the event mandated face-masks for everyone so I could cover my own face without being the “odd one out”. (And so that we were safer from the 3+ COVID-19 cases reported at the event).

Being in the same room as Javier allowed some progress on our slightly “skunkworks” project to bring OpenQA testing to upstream GNOME. There was enough time to fix the big regressions that had halted testing completely since last year, one being an expired API key and the other, removal of virtio VGA support in upstream’s openqa_worker container. We prefer using the upstream container over maintaining our own fork, in the hope that our limited available time can go on maintaining tests instead, but the containers are provided on a “best effort” basis and since our tests are different to openqa.opensuse.org, regressions like this are to be expected.

I am also hoping to move the tests out of gnome-build-meta into a separate openqa-tests repo. We initially put them in gnome-build-meta because ultimately we’d like to be able to do pre-merge testing of gnome-build-meta branches, but since it takes hours to produce an ISO image from a given commit, it is painfully slow to create and update the OpenQA tests themselves. Now that Gitlab supports child pipelines, we can hopefully satisfy both use cases: one pipeline that quickly runs tests against the prebuilt “s3-image” from os.gnome.org, and a second that is triggered for a specific gnome-build-meta build pipeline and validates that.

First though, we need to update all the existing tests for the visual changes that occurred in the meantime, which are mostly due to gnome-initial-setup now using GTK4. That’s still a slow process as there are many existing needles (screenshots), and each time the tests are run, the Web UI allows updating only the first one to fail. That’s something else we’ll need to figure out before this could be called “production ready”, as any non-trivial style change to Adwaita would imply rerunning this whole update process.

All in all, for now openqa.gnome.org remains an interesting experiment. Perhaps by GUADEC next year there may be something more useful to report.

Team Codethink in the OSSEU 2022 lobby

My main fascination this month besides work has been exploring “AI” image generation. It’s amazing how quickly this technology has spread – it seems we had a big appetite for generative digital images.

I am really interested in the discussion about whether such things are “art”, because I this discussion is soon going to encompass music as well. We know that both OpenAI and Spotify are researching machine-generated music, and it’s particularly convenient for Spotify if they can continue to charge you £10 a month while progressively serving you more music that they generated in-house – and therefore reducing their royalty payments to record labels.

There are two related questions: whether AI-generated content is art, and whether something generated by an AI has the same monetary value as something a human made “by hand”. In my mind the answer is clear, but at the same time not quantifiable. Art is a form of human communication. Whether you use a neural network, a synthesizer, a microphone or a wax cylinder to produce that art is not relevant. Whether you use DALL-E 2 or a paintbrush is not relevant. Whether your art is any good depends on how it makes people feel.

I’ve been using Stable Diffusion to try and illustrate some of sound worlds from my songs, and my favourite results so far are for Don’t Go Into The Zone:

And finally, a teaser for an upcoming song release…

An elephant with a yellow map background

Status update, 16/08/2022

Building Wheels

For the first time this year I got to spend a little paid time on open source work, in this case putting some icing on the delicious and nourishing cake that we call BuildStream 2.

If you’ve tried the 1.9x pre-releases you’ll have seen it depends on a set of C++ tools under the name BuildBox. Some hot codepaths that were part of the Python core are now outsourced to helper tools, specifically data storage (buildbox-casd, buildbox-fuse) and container creation (buildbox-run-bubblewrap). These tools implement remote-apis standards and are useful to other build tools, the only catch is that they are not yet widely available in distros, and neither are the BuildStream 2 prereleases.

Separately, BuildStream 2 has some other hot codepaths written with Cython. If you’ve ever tried to install a Python package from PyPI and wondered why a package manager is running GCC, the answer is usually that it’s installing a source package and has to build some Cython code with your system’s C compiler.

The way to avoid requiring GCC in this case is to ship prebuilt binary packages known as wheels. So that’s what we implemented for BuildStream 2 – and as a bonus, we can bundle prebuilt BuildBox binaries into these packages. The wheels have a platform compatibility tag of “manylinux_2_28.x86_64” so they should work on any x86_64 host with GLIBC 2.28 or later.

Connecting Gnomes

I didn’t participate in GUADEC this year for various reasons, I’m very glad to see it was a success. I was surprised to see six writeups of the Berlin satellite event, and only two of the main event in Mexico (plus one talk transcript and some excellent coverage in LWN) – are Europeans better at blogging? 🙂

I again saw folk mention that connections *between* the local, online and satellite events are lacking – I felt this at LAS this year – and I still think we should take inspiration from 2007 Twitter and put up a few TVs in the venue with chat and microblog windows open.

The story is, that Twitter launched in 2006 to nobody, and it became a hit only after putting up screens at a US music festival in 2007 displaying their site where folk could live-blog their activities.

Hey old people, remember this?! Via webdesignmuseum.org

I’d love to see something similar at conferences next year where online participants can “write on the walls” of the venue (avoiding the ethically dubious website that Twitter has become).

On that note, I just released a new version of my Twitter Without Infinite Scroll extension for Firefox.

Riding Trains

I made it from Galicia to the UK overland, actually for the first time. (I did do the reverse journey already by boat + van in 2018). It was about 27 hours of travel spread across 5 days, including a slow train from Barcelona into the Pyrenees, and a night train onwards to Paris, and I guess cost around 350€. The trip went fine, in comparison to the plane I had booked to return which was cancelled by the airline without notification, so the return flight+bus ended up costing a similar amount and taking nearly 15 hours. No further comment on that but I can recommend the train ride!

High speed main line through the French Pyrenees

Status update 18/07/2022

Summer is here!

All my creative energy has gone into wrapping up a difficult project at Codethink, and the rest of the time I’ve been enjoying sunshine and festivals. I was able to dedicate some time to learning the basics of async Rust but I don’t have much to share from the last month. Instead, let me focus on some projects I’m keeping an eye on.

Firstly, in the Tracker search engine, Carlos Garnacho has landed some important features and refactors. The main one being stream-based serializers and deserializers.

This allows more easily backing up and importing data in and out the tracker-store, and cleaning up some cruft like multiple different implementations of Turtle. It seems ideal having a totally stream-based codec so you can process an effectively infinite amount of data, but there is a tradeoff if you serialize data triple-by-triple – the serialized output is much less human-readable and in some cases larger than if you do some buffering and group related statements together. For this reason we didn’t yet land the JSON-LD support.

Carlos also rewrote the last piece of Vala in libtracker-sparql into C. Vala makes some compromises that aren’t helpful for making a long-term ABI stable C library. There are more lines of code now and we particularly miss Vala’s async support, but this makes maintenance easier as we can now be sure that any misbehaving C code was generated by ourselves and not by the Vla compiler.

There are some other fixes for issues reported by various contributors, thanks to everyone that got involved in the 3.3.0 cycle so far 🙂

Meanwhile, in world of BuildStream there is a lot of activity for the final push towards a BuildStream 2.0 release. I’m only a bystander in the process and to me things look promising. The 2.0 API is finalized and frozen, there’s a small list of blockers remaining – any help to resolve these is welcome! – and then the door is open to a 2.0 release. That’s most likely to be preceded by one or more “1.97” release candidates to allow wider testing. Whatever happens the hope is that the upcoming Freedesktop SDK 22.08 release can be rolled with BuildStream “2.0”.

I’ve been using Bst 1.x for a while and finding it a very helpful tool. The REAPI support is particularly cool as it allows organisations to manage a single distributed build infrastructure that multiple tools can make use of – I hope it gains more traction, and I think it’s already helping to “sell” BuildStream to organisations that are looking at doing distributed builds.

Finally, Kate Bush’s music is super popular again, which is great and I want to share this tidbit from my own youth in the early 2000s right out of Sunderland doing an excellent Hounds of Love cover:

Status update, 17/06/2022

I am currently in the UK – visiting folk, working, and enjoying the nice weather. So my successful travel plans continue for the moment… (corporate mismanagement has led to various transport crises in the UK so we’ll see if I can leave as successfully as I arrived).

I started the Calliope playlist toolkit back in 2016. The goal is to bring open data together and allow making DIY music recommenders, but its rather abstract to explain this via the medium of JSON documents. Coupled with a desire to play with GTK4, which I’ve had no opportunity to do yet, and inspired by a throwaway comment in the MusicBrainz IRC room, I prototyped up a graphical app that shows what kind of open data is available for playlist generation.

This “calliope popup” app can watch MPRIS nofications, or page through an existing playlist. In future it could also page through your Listenbrainz listen history. So far it just shows one type of data:

This screenshot shows MusicBrainz metadata for my test playlist’s first track, which happens to be the song “Under Pressure”. (This is a great test because it is credited to two artists :-). The idea is to flesh out the app with metadata from various different providers, making it easier to see what data is available and detect bad/missing info.

The majority of time spent on this so far has been (re-)learning GTK and figuring out how to represent the data on screen. There was some also work involved making Calliope itself return data more usefully.

Some nice discoveries since I last did anything in GTK are the Blueprint UI language, and the Workbench app. Its also very nice having the GTK Inspector available everywhere, and being able to specify styling via a CSS file. (I’ve probably done more web sites than GTK apps in the last 10 years, so being able to use the same mental model for both is a win for me.). The separation of Libadwaita from GTK also makes sense and helps GTK4 feels more focused, avoiding (mostly) having 2 or 3 widgets for one purpose.

Apart from that, I’ve been editing and mixing new Vladimir Chicken music – I can strongly recommend that you never try to make an eight minute song. This may be the first and last 8 minute song from VC 🙂

Status update, 15/04/2022

As i mentioned last month, I bought one of these Norns audio-computers and a grey grid device to go with it. So now I have this lap-size electronic music apparatus.

Its very fun to develop for – the truth is I’ve never got on well with the “standard” tools of Max/MSP and Pure Data, I suspect flow-based programming just isn’t for me. As within an hour of opening the web-based Lua editor on the Norns device I already had a pretty cool prototype of something I hadn’t even intended to make. More on that when I can devote more time to it.

There is some work to make the open-source Norns software run on the Linux desktop in a container. It seems the status is “working but awkward.” I thought to myself – “How hard could it be to package this as a Flatpak?”

As soon as I saw use of Golang and Node.js I knew the answer – very hard.

In the world of Flatpak apps we are rigorous about doing reproducible builds from controlled inputs. The flatpak-builder tool requires that all the Go and JavaScript dependencies are specified ahead of time, and build tools are prevented from connecting to the internet at build time. This is a good thing, as the results are otherwise unpredictable. The packaging tools in the Go and Node.js ecosystems make this kind of attention to detail difficult to achieve.

There is a solution, which would be to build the components using BuildStream and develop plugins to integrate with the Node.js and Go worlds, so it can fetch deps ahead of time. There’s an open issue for Go plugin. For Node.js I am not sure if anything exists.

This led me to thinking – how come the open-source, community developed world of Flatpaks is so much better and doing reliable, reproducible builds than the corporate environments that I see in my day job?

One last cool thing – thanks to instructions here by @infinitedigits I was able to produce this amazing music video!

Status update, 16/02/2022

January 2022 was the sunniest January i’ve ever experienced. So I spent its precious weekends mostly climbing around in the outside world, and the weekdays preparing for the enourmous Python 3 migration that one of Codethink’s clients is embarking on.

Since I discovered Listenbrainz, I always wanted to integrate it with Calliope, with two main goals. The first, to use an open platform to share and store listen history rather than the proprietary Last.fm. And the second, to have an open, neutral place to share playlists rather than pushing them to a private platform like Spotify or Youtube. Over the last couple of months I found time to start that work, and you can now sync listen history and playlists with two new cpe listenbrainz-history and cpe listenbrainz commands. So far playlists can only be exported *from* Listenbrainz, and the necessary changes to the pylistenbrainz binding are still in review, but its a nice start.

Status update, 17/01/2022

Happy 2022 everyone! Hope it was a safe one. I managed to travel a bit and visit my family while somehow dodging Omicron each step of the way. I guess you cant ask for much more than that.

I am keeping busy at work integrating BuildStream with a rather large, tricky set of components, Makefiles and a custom dependency management system. I am appreciating how much flexibility BuildStream provides. As an example, some internal tools expect builds to happen at a certain path. BuildStream makes it easy to customize the build path by adding this stanza to the .bst file:

variables:
    build-root: /magic/path

I am also experimenting with committing whole build trees as artifacts, as a way to distribute tests which are designed to run within a build tree. I think this will be easier in Bst 2.x, but it’s not impossible in Bst 1.6 either.

Besides that I have been mostly making music, stay tuned for a new Vladimir Chicken EP in the near future.

Status update, November 2021

I am impressed with the well-deserved rise of Sourcehut, a minimalist and open source alternative to Github and Gitlab. I like their unbiased performance comparison with other JavaScript-heavy Git forges. I am impressed by their substantial contributions to Free Software. And I like that the main developers, Drew DeVault and Simon Ser, both post monthly status update blog posts on their respective blogs.

I miss blog posts.

So I am unashamedly copying the format. I am mostly not paid to work on Free Software but sometimes I am so the length of the report will vary wildly.

This month I got a little more Codethink time to work on openqa.gnome.org (shout out Javier Jardón for getting me that time). Status report here.

image

I spoke at the first ever PackagingCon to spread the good word about Freedesktop SDK and BuildStream.

As always, I did a little review and issue triage for Tracker and Tracker Miners.

And I have been spending time working on an audio effect. More about that in another post.

Beginning Rust

I have the privilege of some free time this December and I unexpectedly was inspired to do the first few days of the Advent of Code challenge, by a number of inspiring people including Philip Chimento, Daniel Silverstone and Ed Cragg.

The challenge can be completed in any language, but it’s a great excuse to learn something new. I have read a lot about Rust and never used until a few days ago.

Most of my recent experience is with Python and C, and Rust feels like it has many of the best bits of both languages. I didn’t get on well with Haskell, but the things I liked about that language are also there in Rust. It’s done very well at taking the good parts of these languages and leaving out the bad parts. There’s no camelCaseBullshit, in particular.

As a C programmer, it’s a pleasure to see all of C’s invisible traps made explicit and visible in the code. Even integer overflow is a compile time error. As a Python programmer, I’m used to writing long chains of operations on iterables, and Rust allows me to do pretty much the same thing. Easy!

Rust does invent some new, unique bad parts. I wanted to use Ed’s cool Advent of Code helper crate, but somehow installing this tiny library using Cargo took up almost 900MB of disk space. This appears to be a known problem. It makes me sad that I can’t simply use Meson to build my code. I understand that Cargo’s design brings some cool features, but these are big tradeoffs to make. Still, for now I can simply avoid using 3rd party crates which is anyway a good motivation to learn to work with Rust’s standard library.

I also spend a lot of time figuring out compiler errors. Rust’s compiler errors are very good. If you compare them to C++ compiler errors, then there’s really no comparison at all. In fact, they’re so good that my expectations have increased, and paradoxically this makes me more critical! (Sometimes you have to measure success by how many complaints you get). When the compiler tells me ‘you forgot this semicolon’, part of me thinks “Well you know what I meant — you add the semicolon!”. And while some errors clearly tell you what to fix, others are still pretty cryptic. Here’s an example:

error[E0599]: no method named `product` found for struct `Vec<i64>` in the current scope
   --> day3.rs:82:34
    |
82  |       let count: i64 = tree_counts.product();
    |                                    ^^^^^^^ method not found in `Vec<i64>`
    |
    = note: the method `product` exists but the following trait bounds were not satisfied:
            `Vec<i64>: Iterator`
            which is required by `&mut Vec<i64>: Iterator`
            `[i64]: Iterator`
            which is required by `&mut [i64]: Iterator`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0599`.

What’s the problem here? If you know Rust, maybe it’s obvious that my tree_counts variable is a Vec (list), and I need to call .iter() to produce an iterator. If you’re a beginner, this isn’t a huge help. You might be tempted to call rustc --explain E0599, which will tell you that you might, for example, need to implement the .chocolate() method on your Mouth struct. This doesn’t get you any closer to knowing why you can’t iterate across a list, which is something that you’d expect to be iterable.

Like I said, Rust is lightyears ahead of other compilers in terms of helpful error messages. However, if it’s a goal that “Rust is for students”, then there is still lots of work to do to improve them.

I know enough about software development to know that the existence of Rust is nothing short of a miracle. The Rust community are clearly amazing and deserve ongoing congratulations. I’m also impressed with Advent of Code. December is a busy time, which is why I’ve never got involved before, but if you are looking for something to do then I can recommend it!

You can see my Advent of Code repo here. It may, or may not proceed beyond day 4. It’s useful to check your completed code against some kind of ‘model’, and I’ve been using Daniel’s repo for that. Who else has some code to show?

Writing well

We rely on written language to develop software. I used to joke that I worked as a professional email writer rather than a computer programmer (and it wasn’t really a joke). So if you want to be a better engineer, I recommend that you focus some time on improving your written English.

I recently bought 100 Ways to Improve Your Writing by Gary Provost, which is a compact and rewarding book full of simple and widely applicable guidelines to writers. My advice is to buy a copy!

You can also find plenty of resources online. Start by improving your commit messages. Since we love to automate things, try these shell scripts that catch common writing mistakes. And every time you write a paragraph simply ask yourself: what is the purpose of this paragraph? Is it serving that purpose?

Native speakers and non-native speakers will both find useful advice in Gary Provost’s book. In the UK school system we aren’t taught this stuff particularly well. Many English-as-a-second-language courses don’t teach how to write on a “macro” level either, which is sad because there are many differences from language to language that non-natives need to be aware of. I have seen “Business English” courses that focus on clear and convincing communication, so you may want to look into one of those if you want more than just a book.

Code gets read more than it gets written, so it’s worth taking extra time so that it’s easy for future developers to read. The same is true of emails that you write to project mailing lists. If you want to make a positive change to development of your project, don’t just focus on the code — see if you can find 3 ways to improve the clarity of your writing.

Natural Language Processing

This month I have been thinking about good English sentence and paragraph structure. Non-native English speakers who are learning write in English will often think of what they want to say in their first language and then translate it. This generally results in a mess. The precise structure of the mess will depend on the rules of the student’s first language. The important thing is to teach the conventions of good English writing; but how?

Visualizing a problem helps to solve it. However there doesn’t seem to be a tool available today that can clearly visualize the various concerns writers have to deal with. A paragraph might contain 100 words, each of which relate to each other in some way. How do you visualize that clearly… not like this, anyway.

I did find some useful resources though. I discovered the Paramedic Method, through this blog post from helpscout.net. The Paramedic Method was devised by Richard Lanham and consists of these 6 steps:

  1. Highlight the prepositions.
  2. Highlight the “is” verb forms.
  3. Find the action. (Who is kicking whom?)
  4. Change the action into a simple active verb.
  5. Start fast—no slow windups.
  6. Read the passage out loud with emphasis and feeling.

This is good advice for anyone writing English. It’ll be particularly helpful in my classes in Spain where we need to clean up long strings of relative clauses. (For example, a sentence such as “On the way I met one of the workers from the company where I was going to do the interview that my friend got for me”. I would rewrite this as: “On the way I met a person from Company X, where my friend had recently got me an interview.”

I found a tool called Write Music which I like a lot. The idea is simple: to illustrate and visualize the rule that varying sentence length is important when writing. The creator of the tool, Titus Wormer, seems to be doing some interesting and well documented research.

I looked at a variety of open source tools for natural language processing. These provide good ways to tokenize a text and to identify the “part of speech” (noun, verb, adjective, adverb, etc.) but I didn’t yet find one that could analyze the types of clauses that are used. Which is a shame. My understanding of this is an area of English grammar is still quite weak and I was hoping my laptop might be able teach me by example but it seems not.

I found some surprisingly polished libraries that I’m keen to use for … something. One day I’ll know what. The compromise library for JavaScript can do all kinds of parsing and wordplay and is refreshingly honest about its limitations, and spaCy for Python also looks exciting. People like to interact with a computer through text. We hide the UNIX commandline. But one of the most popular user interfaces in the world is the Google search engine, which is a text box that accepts any kind of natural language and gives the impression of understanding it. In many cases this works brilliantly — I check spellings and convert measurements all the time using this “search engine” interface. Did you realize GNOME Shell can also do unit conversions? Try typing “50lb in kg” into the GNOME Shell search box and look at the result. Very useful! More apps should do helpful things like this.

I found some other amazing natural language technologies too. Inform 7 continues to blow my mind whenever I look at it. Commercial services like IBM Watson can promise incredible things like analysing the sentiments and emotions expressed in a text, and even the relationships expressed between the subjects and objects. It’s been an interesting day of research!

Enourmous Git Repositories

If you had a 100GB Subversion repository, where a full checkout came to about 10GB of source files, how would you go about migrating it to Git?

One thing you probably wouldn’t do is import the whole thing into a single Git repo, it’s pretty well known that Git isn’t designed for that. But, you know, Git does have some tools that let you pretend it’s a centralised version control system, and, huge monolithic repos are cool, and it works in Mercurial… evidence is worth more than hearsay, so I decided to create a Git repo with 10GB of text files to see what happened. I did get told in #git on Freenode that Git will not cope with a repo that’s larger than available RAM, but I was a little suspicious given the number of multi-gigabyte Git repos in existance.

I adapted a Bash script from here to create random filenames, and the csmith program to fill those files with nonsense C++ code, until I had 10GB 7GB of such gibberish.(I realised that, having used du -s instead of du --apparent-size -s to check the size of my test data, it was only 7GB of content, that was using 10GB of disk space.)

The test machine was an x86 virtual machine with 2GB of RAM and 1CPU, with no swap. The repo was on a 100GB ext4 volume. Doing a performance benchmark on a virtual machine on shared infrastructure is a bad idea, but I’m testing a bad idea, so whatever. The machine ran Git version 2.5.0.

Results

Generating the initial data: this took all night, perhaps because I included a call to du inside the loop that generated the data, which would take an increasing amount of time on each iteration.

Creating an initial 10GB 7GB commit: 95 minutes

$ time git add .
real    90m0.219s
user    84m57.117s
sys     1m6.932s

$ time git status
real    1m15.992s
user    0m4.071s
sys     0m20.728s

$ time git commit -m "Initial commit"
real    4m22.397s
user    0m27.168s
sys     1m5.815s

The git log command is pretty instant, a git show of this commit takes a minute the first time I run it, about 5 seconds if I run it again.

Doing git add and git rm to create a second commit is really quick, git status is still slow, but git commit is quick:

$ time git status
real    1m19.937s
user    0m5.063s
sys     0m16.678s

$ time git commit -m "Put all z files in same directory"
real    0m11.317s
user    0m1.639s
sys     0m5.306s

Furthermore, git show of this second commit is quick too.

Next I used git daemon to serve the repo over git:// protocol:

$ git daemon --verbose --export-all --base-path=`pwd`

Doing a full clone from a different machine (with Git 2.4.3, over
intranet): 22 minutes

$ time git clone git://172.16.20.95/huge-repo
Cloning into 'huge-repo'...
remote: Counting objects: 339412, done.
remote: Compressing objects: 100% (33351/33351), done.
remote: Total 339412 (delta 5436), reused 0 (delta 0)
Receiving objects: 100% (339412/339412), 752.12 MiB | 2.53 MiB/s, done.
Resolving deltas: 100% (5436/5436), done.
Checking connectivity... done.
Checking out files: 100% (46345/46345), done.

real    22m17.734s
user    2m12.606s
sys     0m54.603s

Doing a sparse checkout of a few files: 15 minutes

$ mkdir sparse-checkout
$ cd sparse-checkout
$ git init .
$ git config core.sparsecheckout true
$ echo z-files/ >> .git/info/sparse-checkout

$ time git pull  git://172.16.20.95/huge-repo master
remote: Counting objects: 339412, done.
remote: Compressing objects: 100% (33351/33351), done.
remote: Total 339412 (delta 5436), reused 0 (delta 0)
Receiving objects: 100% (339412/339412), 752.12 MiB | 2.58 MiB/s, done.
Resolving deltas: 100% (5436/5436), done.
From git://172.16.20.95/huge-repo
 * branch            master     -> FETCH_HEAD

real    14m26.032s
user    1m9.133s
sys     0m22.683s

This is rather unimpressive. I only pull a 55MB subset of the repo, a single directory, but the clone still takes nearly 15 minutes. Cloning the same subset again from the same git-daemon process took a similar time. The .git directory of the sparse clone is the same size as with a full clone.

I think these numbers are interesting. They show that the sky doesn’t fall if you put a huge amount of code into Git. At the same time, the ‘sparse checkouts’ feature doesn’t really let you pretend that Git is a centralised version control system, so you can’t actually avoid the consequences of having such a huge repo.

Also, I learned that if you are profiling file size, you should use du --apparent-size to measure that, because file size != disk usage!

Disclaimer: there are better ways to spend your time than trying to use a tool for things that it’s not designed for (sometimes).

Cleaning up stale Git branches

I get bored looking through dozens and dozens of stale Git branches. If git branch --remote takes up more than a screenful of text then I am unhappy!

Here are some shell hacks that can help you when trying to work out what can be deleted.

This shows you all the remote branches which are already merged, those can probably be deleted right away!

git branch --remote --merged

These are the remote branches that aren’t merged yet.

git branch --remote --no-merged

Best not to delete those straight away. But some of them are probably totally stale. This snippet will loop through each unmerged branch and tell you (a) when the last commit was made, and (b) how many commits it contains which are not merged to ‘origin/master’.

for b in $(git branch --remote --no-merged); do
    echo $b;
    git show $b --pretty="format:  Last commit: %cd" | head -n 1;
    echo -n "  Commits from 'master': ";
    git log --oneline $(git merge-base $b origin/master)..$b | wc -l;
    echo;
done

The output looks like this:

origin/album-art-to-libtracker-extract
  Last commit: Mon Mar 29 17:22:14 2010 +0100
  Commits from 'master': 1

origin/albumart-qt
  Last commit: Thu Oct 21 11:10:25 2010 +0200
  Commits from 'master': 1

origin/api-cleanup
  Last commit: Thu Feb 20 12:16:43 2014 +0100
  Commits from 'master': 18

...

Two of those haven’t been touched for five years, and only contain a single commit! So they are probably good targets for deletion, for example.

You can also get the above info sorted, with the oldest branches first. First you need to generate a list. This outputs each branch and the date of its newest commit (as a number), sorts it numerically, then filters out the number and writes it to a file called ‘unmerged-branches.txt’:

for b in $(git branch --remote --no-merged); do
    git show $b --pretty="format:%ct $b" | head -n 1;
done | sort -n | cut -d ' ' -f 2 > unmerged-branches.txt

Then you can run the formatting command again, but replace the first line with:

for b in $(cat unmerged-branches.txt); do

OK! You have a list of all the unmerged branches and you can send a mail to people saying you’re going to delete all of them older than a certain point unless they beg you not to.

.yml files are an anti-pattern

A lot of people are representing data as YAML these days. That’s good! It’s an improvement over the days when everything seemed to be represented as XML, anyway.

But one thing about the YAML format is that it doesn’t require you to embed any information in the file about how the data should be interpreted. So now we have projects where there are hundreds of different .yml files committed to Git and I have no idea what any of them are for.

YAML is popular because it’s minimal and convenient, so I don’t think that requiring that everyone suddenly creates an ontology for the data in these .yml files would be practical. But I would really like to see a convention that the first line of any .yml file was a comment describing what the file did, e.g.

# This is a BOSH deployment manifest, see http://bosh.io/ for more information

That’s all!

Running Firefox in a cgroup (using systemd)

This blog post is very out of date. As of 2020, you can find up to date information about this topic in the LWN article “Resource management for the desktop”

I’m a long time user of Firefox and it’s a pretty good browser but you know how sometimes it eats all of the memory on your computer and uses lots of CPU so the whole thing becomes completely unusable? That is incredibly annoying!

I’ve been using a build system with a web interface recently and it is really a problem there, because build logs can be quite large (40MB) and Firefox handles them really badly.

Linux has been theoretically able to limit how much CPU, RAM and IO that a process can use for some time, with the cgroups mechanism. Its default behavour, at the time of writing, is to let Firefox starve out all other processes so that I am totally unable to kill it and have to force power-off on my computer and restart it. It would make much more sense for Linux’s scheduler to ensure that the user interface always gets some CPU time, so I can kill programs that are going nuts, and also for the out-of-memory killer to actually work properly. There is a proposal to integrate gnome-session with systemd which I hope would solve this problem for me. But in the meantime, here’s a fairly hacky way of making sure that Firefox always runs in a cgroup with a fixed amount of memory, so that it will crash itself when it tries to use too much RAM instead of making your computer completely unusable.

I’m using Fedora 20 right now, but probably any operating system with Linux and systemd will work the same.

First, you need to create a ‘slice’. The documentation for this stuff is quite dense but the concept is simple: your system’s resources get divided up into slices. Slices are heirarchical, and there are some predefined slices that systemd provides including user.slice (for user applications) and system.slice (for system services). So I made a user-firefox.slice:

[Unit]
Description=Firefox Slice
Before=slices.target

[Slice]
MemoryAccounting=true
MemoryLimit=512M
# CPUQuota isn't available in systemd 208 (Fedora 20).
#CPUAccounting=true
#CPUQuota=25%

This should be saved as /etc/systemd/system/user-firefox.slice. Then you can run systemctl daemon-reload && systemctl restart user-firefox.slice and your slice is created with its resource limit!

You can now run a command in this slice using the systemd-run command, as root.

sudo systemd-run --slice user-firefox.slice --scope xterm

The xterm process and anything you run from it will be limited to using 512MB of RAM, and memory allocations will fail for them if more than that is used. Most programs crash when this happens because nobody really checks the result of malloc() (or they do check it, but they never tested the code path that runs if an allocation fails so it probably crashes anyway). If you want to be confident this is working, change the MemoryLimit in user-firefox.slice to 10M and run a desktop application: probably it will crash before it even starts (you need to daemon-reload and restart the .slice after you edit the file for the changes to take effect).

About the --scope argument: a ‘scope’ is basically a way of identifying one or more processes that aren’t being managed directly by systemd. By default, systemd-run would start xterm as a system service, which wouldn’t work because it would be isolated from the X server.

So now you can run Firefox in a cgroup, but it’s a bit shit because you can only do so as the ‘root’ user. You’ll find if you try to use `sudo` or `su` to become your user again that these create a new systemd user session that is outside the user-firefox.slice cgroup. You can use `systemd-cgls` to show the heirarchy of slices, and you’ll see that the commands run under `sudo` or `su` show up in a new scope called something like session-c3.scope, where the scope created by systemd-run that is in the correct slice is called run-1234.scope.

There are various nice ways that we could go about fixing this but today I am not going to be a nice person, instead I just wrote this Python wrapper that becomes my user and then runs Firefox from inside the same scope:

#!/usr/bin/env python3

import os
import pwd


user_info = pwd.getpwnam('sam')
os.setuid(user_info.pw_uid)

env = os.environ.copy()
env['HOME'] = user_info.pw_dir

os.execle('/usr/bin/firefox', 'Firefox (tame)', env)

Now I can run:

sudo systemd-run --slice user-firefox.slice --scope
./user-firefox

This is a massive hack and don’t hold me responsible for anything bad that may come of it. Please contribute to Firefox if you can.

Viruses

I just did a bit of virus removal for a friend, one of the inventive police warning ones. I discovered a couple of useful tricks, since it’s been a while since I had to do this.

http://pogostick.net/~pnh/ntpasswd/ hosts a tool to manipulate a Windows registry from Linux system (Wine itself doesn’t use the real Windows registry format, so it can’t be used for this as I originally expected).

The virus did some trickery to prevent any .exe programs from running (other than some whitelisted ones), preventing access to cmd.exe, taskmgr.exe, regedit.exe or anything else you might be like to use to remove the virus. I forget the mechanism it uses to do this, but one simple fix is to rename regedit.exe to regedit.com.

It was then a disappointingly simple search through the registry to the classic HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Run key where the file tpl_0_c.exe had done a rather poor job of hiding.

I ran Fedora 17 from a memory stick before going back into Windows, and the transition from Gnome Shell on a netbook to Windows XP on a netbook made me understand a lot more some of the design decisions that went into the shell. Windows XP is really fiddly to use on such tiny and crappy hardware whereas the shell felt really comfortable. I’m still not at all convinced that we should be making GNOME run on tablets, but for netbooks it makes perfect sense to maximise things by default and make the buttons bigger. Sadly some of this is coming at the expense of usability on big desktops; I’m ok with having to configure stuff to get a comfortable UI, but the recent thread where side-by-side windows were suggested as a replacement for Nautilus’ split view makes me worried that we’re losing sight of desktop users completely. Side by side windows on a 25″ monitor aren’t remotely comparable to a split window, though they might be on a netbook. OS X has always been dreadful on large screens, we shouldn’t take GNOME down the same path.

Little tidbits of wisdom #9117

When you are writing code with GCC on Windows, and you pass a callback to a Windows API function, your callback must have the __stdcall calling convention. Otherwise, something in a random part of your application quite distant from the callback will fail in a weird way and you will spend two days learning lots of things you never wanted to know about the internals of Windows when in fact you just needed to add one word to your callback definition.

Here’s a netlabel who put out mostly free releases of ambient and electroacoustic albums and EP’s, so you can relax after a hard day of banging your head against a wall and nerdrage.

Asus motherboards and USB boot

I’m writing this mainly for google’s benefit. If you’re trying to get an ASUS motherboard, such as the M3N78-VM I have, to boot from a memory stick, it turns out you have to do it in a weird way: turn on the PC with the memory stick plugged in, go into BIOS Setup and the Boot section and then go onto Hard Disk Drives. Delete all the entries except USB (maybe you can just put it to the top, I didn’t try). Now you can go to the normal Boot Device Priority list, and “USB” will be an entry which you can put where you like.

This is all pretty counterintuitive, because the boot device list has “Removable media” as an entry, which is apparently useless – in fact, worse than useless, or I might have worked this out faster. Hopefully writing about it will save others from wasting time ..

In other news, since here am I writing.. I finished my degree in music & music technology recently (which is why i finally have time to fix my computer), it’s been a fun ride and I achieved a bunch of things I always have wanted to do, like mixing for bad metal bands, writing and recording crazy dub tunes and playing sounds too quiet for anyone to hear in a gallery with some free wine. After a summer getting some programaction done (more on that later), mi novia y yo are going to South America for a while. We fly in to Buenos Aires in September and out of Lima in January (hopefully later) and so far that is the plan, I’ve never been out the UK for more than a few weeks before, i am really looking forward finally to some proper travelling in a very beautiful part of the world.

On Cheapness

Both of my IBM Thinkpad power adapters are now working only because of ample solder and insulating tape. I have a third, but that’s disintegrated altogether.

How, after over 100 years of development can we not manufacture power cables properly? You’d think especially the Thinkpad might come with adapters and cables which could last more than a few years.