What’s coming in Tracker 1.10

Tracker 1.9.1 was released last month, and it comes with some work we did to improve the various extract modules (the code which looks at files on disk and extracts what we think is useful information). The extract modules are no longer hardcoded to generate SPARQL commands, instead they now use the new TrackerResource API which is a simple way of describing resources programmatically.

TrackerResource is hopefully fairly self-explanatory; you can read the unstable documentation for it here. Basically a TrackerResource object represents a resource (for example a Video) and the API lets you describe properties of that resource (such as its title, who directed it, etc.). When you’re done you can serialize it to some kind of interchange format. The RDF working committee have invented many of these over the years (most of which are absurd); Tracker generally uses Turtle, which is both efficient and human-friendly, so TrackerResource lets you serialize information about one or more resources either to Turtle or to a series of SPARQL commands that you can use to update a database (such as the Tracker-store!) with the new information.

What’s the point? One highlight of this work was removing 2,000 lines of code from the extract modules, and making them (in my opinion) a whole lot more readable and maintainable in the process … but more interesting to me is that the code in the extract modules should be useful to a lot more people. Scanning files and then outputting a series of SPARQL INSERT statements is really only useful to folk who want to use a SPARQL database to track file metadata, and most people (at least in the desktop & embedded world) are not asking for that at all. Data serialized as Turtle can be easily parsed by other programs, inserted into tables in an SQL database, and/or converted into any other serialization format you want; the most interesting one to me is JSON-LD, which gets you the convenience of JSON and (optionally) the rigour of Linked Data too. In fact I’m hoping TrackerResource will be able to serialize to JSON-LD directly in future, the work is partly done already.

I’ve often felt like there’s a lot of solid, well-tested code sat in tracker.git that would be a useful to a lot more people if it wasn’t quite so Tracker-specific. Hopefully this will help to “open up” all that code that parses different file formats. I should note that there’s nothing really Tracker-specific in the TrackerResource code, it could be moved out into a generic “RDF for GObject folk” library if there was any demand for such a thing. Likewise, the Tracker extract modules could be moved into their own “Generic metadata extraction” library if it looked like the extra maintenance burden would be worthwhile (splitting up Tracker has been under discussion for some time).

Unless someone speaks up, the TrackerSparqlBuilder API will be deprecated in due course in favour of TrackerResource. The TrackerSparqlBuilder API was never flexible enough to build the kind of advanced queries that SPARQL is really designed for, it was basically created so we could create correct INSERT statements in the extractors, and it’s no longer needed there.

This is quite a big change internally, although functionality-wise nothing should have changed, so please test out Tracker 1.9.1 if you can and report any issues you see to GNOME Bugzilla.

Example usage of TrackerResource

Object Resource Mapping (the resource-centric equivalent of Object Relation Mapping in the SQL database world) is not a new idea… Adrien Bustany had a go at an ORM for Tracker many years ago now named Hormiga. The Hormiga approach was to generate code to deal with each new type of resource which is fiddly in practice, although it does mean validation code can be generated automatically.

The TrackerResource approach was inspired by the Python RDFLib library which has a similar rdflib.Resource class. It’s simple and effective, and suits the “open world”, “there are no absolute truths except that everything is a resource” philosophy of RDF better.

To describe something with TrackerResource, you might do this:

    #include <libtracker-sparql/tracker-sparql.h>

    ...

    TrackerResource *video = tracker_resource_new ("file:///home/sam/cat-on-stairs.avi");
    tracker_resource_set_string (video,
                                 "http://www.semanticdesktop.org/ontologies/2007/01/19/nie#title",
                                 "My cat walking downstairs");

Properties are themselves resources so they have URIs, which you can follow to find a description of said property… but it’s quite painful reading and writing long URIs everywhere, so you should of course take advantage of Compact URIs and do this instead:

    TrackerResource *video = tracker_resource_new ("file:///home/sam/cat-on-stairs.avi");
    tracker_resource_set_string (video, "nie:title", "My cat walking downstairs");

You can now serialize this information to Turtle or to SPARQL, ready to send it or save it somewhere. For it to make sense to the receiver, the compact URI prefixes need to be kept with the data. The TrackerNamespaceManager class tracks the mappings between prefixes and the corresponding full URIs, and you can call tracker_namespace_manager_get_default() to receive a TrackerNamespaceManager that already knows about Tracker’s built-in prefixes. So to generate some Turtle you’d do this:

    char *text;
    text = tracker_resource_print_turtle (video,
                                          tracker_namespace_manager_get_default ());
    g_print (text);
    g_free (text);

The output looks like this:

    @prefix nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#> .

    <file:///home/sam/cat-on-stairs.avi> nie:title "My cat walking downstairs" .

The equivalent JSON-LD output could be this:

  {
    "@context": {
      "title": "http://www.semanticdesktop.org/ontologies/2007/01/19/nie#title"
    },
    "@id": "file:///home/sam/cat-on-stairs.avi",
    "title": "My cat walking downstairs"
  }

Here’s the equivalent SPARQL update query. Note that it removes any existing
values for nie:title.

    DELETE {
      <file:///home/sam/cat-on-stairs.avi>
          nie:title ?nie_title
    }
    WHERE {
      <file:///home/sam/cat-on-stairs.avi>
          nie:title ?nie_title
    }
    INSERT {
    <file:///home/sam/cat-on-stairs.avi> nie:title "My cat walking downstairs" .
    }

If you want to add a 2nd value to a property, rather than removing existing
values, you can do that too. The number of values that a property can have is
called its “cardinality” for some reason. If a property has a cardinality of
more than 1, you can *add* values, e.g. maybe you want to associate some new
artwork you found:

    tracker_resource_add_uri (video,
                              "nmm:artwork",
                              "file://home/sam/cat-on-stairs-front-cover.png");

The generated SPARQL for this would be:

    INSERT {
    <file:///home/sam/cat-on-stairs.avi> nie:title "My cat walking downstairs" .
    }

There’s no DELETE statement first because we are *adding* a value to the property, not *setting* the property

By the way, it’s up to the Tracker Store to make sure that the data you are
inserting is valid. TrackerResource doesn’t have any way right now to check that the data you give it makes sense according to the ontologies (database schemas). So you can give the video 15 different titles, but if you try to execute the INSERT statement that results then the Tracker Store will raise an error, because the nie:title
property has a cardinality of 1 — it can only have 1 value.

Right now TrackerResource doesn’t support *querying* resources. It’s never
going to gain any kind of advanced querying features that try to hide SPARQL
from developers; the best way to write advanced data queries is using a query
language, and that’s really what SPARQL is best at. It is possible we could add a way to read information about known resources in the Tracker Store into
TrackerResource objects, and if that sounds useful to you please get hacking 🙂

Example usage of the new Tracker extractors

It’s always been possible to run `tracker-extract` on a single file, by
running `/usr/libexec/tracker-extract –file` explicitly. Try it:

    $ /usr/libexec/tracker-extract --file ~/Downloads/Best\ Coast\ -\ The\ Only\ Place.mp3  --verbosity=0
    Locale 'TRACKER_LOCALE_LANGUAGE' was set to 'en_GB.utf8'
    Locale 'TRACKER_LOCALE_TIME' was set to 'en_GB.utf8'
    Locale 'TRACKER_LOCALE_COLLATE' was set to 'en_GB.utf8'
    Locale 'TRACKER_LOCALE_NUMERIC' was set to 'en_GB.utf8'
    Locale 'TRACKER_LOCALE_MONETARY' was set to 'en_GB.utf8'
    Setting priority nice level to 19
    Loading extractor rules... (/usr/share/tracker/extract-rules)
      Loaded rule '10-abw.rule'
      Loaded rule '10-bmp.rule'
      Loaded rule '10-comics.rule'
      Loaded rule '10-dvi.rule'
      Loaded rule '10-ebooks.rule'
      Loaded rule '10-epub.rule'
      Loaded rule '10-flac.rule'
      Loaded rule '10-gif.rule'
      Loaded rule '10-html.rule'
      Loaded rule '10-ico.rule'
      Loaded rule '10-jpeg.rule'
      Loaded rule '10-msoffice.rule'
      Loaded rule '10-oasis.rule'
      Loaded rule '10-pdf.rule'
      Loaded rule '10-png.rule'
      Loaded rule '10-ps.rule'
      Loaded rule '10-svg.rule'
      Loaded rule '10-tiff.rule'
      Loaded rule '10-vorbis.rule'
      Loaded rule '10-xmp.rule'
      Loaded rule '10-xps.rule'
      Loaded rule '11-iso.rule'
      Loaded rule '11-msoffice-xml.rule'
      Loaded rule '15-gstreamer-guess.rule'
      Loaded rule '15-playlist.rule'
      Loaded rule '15-source-code.rule'
      Loaded rule '90-gstreamer-audio-generic.rule'
      Loaded rule '90-gstreamer-image-generic.rule'
      Loaded rule '90-gstreamer-video-generic.rule'
      Loaded rule '90-text-generic.rule'
    Extractor rules loaded
    Initializing media art processing requirements...
    Found '37 GB Volume' mounted on path '/media/TV'
      Found mount with volume and drive which can be mounted: Assuming it's  removable, if wrong report a bug!
      Adding mount point with UUID: '7EFAE0646A3F8E6E', removable: yes, optical: no, path: '/media/TV'
    Found 'Summer 2012' mounted on path '/media/Music'
      Found mount with volume and drive which can be mounted: Assuming it's  removable, if wrong report a bug!
      Adding mount point with UUID: 'c94b153c-754a-48d4-af67-698bb8972ee2', removable: yes, optical: no, path: '/media/Music'
    MIME type guessed as 'audio/mpeg' (from GIO)
    Using /usr/lib64/tracker-1.0/extract-modules/libextract-gstreamer.so...
    GStreamer backend in use:
      Discoverer/GUPnP-DLNA
    Retrieving geolocation metadata...
    Processing media art: artist:'Best Coast', title:'The Only Place', type:'album', uri:'file:///home/sam/Downloads/Best%20Coast%20-%20The%20Only%20Place.mp3', flags:0x00000000
    Album art already exists for uri:'file:///home/sam/Downloads/Best%20Coast%20-%20The%20Only%20Place.mp3' as '/home/sam/.cache/media-art/album-165d0788d1c055902e028da9ea6db92a-b93cfc4ae665a6ebccfb820445adec56.jpeg'
    Done (14 objects added)


    SPARQL pre-update:
    --
    INSERT {
    <urn:artist:Best%20Coast> a nmm:Artist ;
         nmm:artistName "Best Coast" .
    }
    INSERT {
    <urn:album:The%20Only%20Place:Best%20Coast> a nmm:MusicAlbum ;
         nmm:albumTitle "The Only Place" ;
         nmm:albumArtist <urn:artist:Best%20Coast> .
    }
    DELETE {
    <urn:album-disc:The%20Only%20Place:Best%20Coast:Disc1> nmm:setNumber ?unknown .
    }
    WHERE {
    <urn:album-disc:The%20Only%20Place:Best%20Coast:Disc1> nmm:setNumber ?unknown .
    }
    DELETE {
    <urn:album-disc:The%20Only%20Place:Best%20Coast:Disc1> nmm:albumDiscAlbum ?unknown .
    }
    WHERE {
    <urn:album-disc:The%20Only%20Place:Best%20Coast:Disc1> nmm:albumDiscAlbum ?unknown .  }
    INSERT {
    <urn:album-disc:The%20Only%20Place:Best%20Coast:Disc1> a nmm:MusicAlbumDisc ;
         nmm:setNumber 1 ;
         nmm:albumDiscAlbum <urn:album:The%20Only%20Place:Best%20Coast> .
    }
    --

    SPARQL item:
    --
     a nfo:Audio , nmm:MusicPiece ;
         nie:title "The Only Place" ;
         nie:comment "Free download from http://www.last.fm/music/Best+Coast and http://MP3.com" ;
         nmm:trackNumber 1 ;
         nfo:codec "MPEG-1 Layer 3 (MP3)" ;
         nfo:gain 0 ;
         nfo:peakGain 0 ;
         nmm:performer <urn:artist:Best%20Coast> ;
         nmm:musicAlbum <urn:album:The%20Only%20Place:Best%20Coast> ;
         nmm:musicAlbumDisc <urn:album-disc:The%20Only%20Place:Best%20Coast:Disc1> ;
         nfo:channels 2 ;
         nfo:sampleRate 44100 ;
         nfo:duration 164 .
    --

    SPARQL where clause:
    --
    --

    SPARQL post-update:
    --
    --

All the info is there, plus way more logging output that you didn’t ask for,
but it’s really hard to make use of that and this feature was really only
intended for testing.

In the Tracker 1.9.1 unstable release things have improved. The `tracker`
commandline tool has an `extract` command which you can use to see the metadata of any file that has a corresponding extract module, and the output is readable both for other programs and for humans (at least, for developers):

    $ tracker extract ~/Downloads/Best\ Coast\ -\ The\ Only\ Place.mp3
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix nmm: <http://www.tracker-project.org/temp/nmm#> .
    @prefix nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#> .
    @prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#> .

    <urn:artist:Best%20Coast> nmm:artistName "Best Coast" ;
      a nmm:Artist .

    <urn:album:The%20Only%20Place> nmm:albumTitle "The Only Place" ;
      a nmm:MusicAlbum .

    <urn:album-disc:The%20Only%20Place:Disc1> nmm:setNumber 1 ;
      nmm:albumDiscAlbum <urn:album:The%20Only%20Place> ;
      a nmm:MusicAlbumDisc .

    <file:///media/home-backup/Best%20Coast%20-%20The%20Only%20Place.mp3> nie:comment "Free download from http://www.last.fm/music/Best+Coast and http://MP3.com" ;
      nmm:trackNumber 1 ;
      nmm:performer <urn:artist:Best%20Coast> ;
      nfo:averageBitrate 128000 ;
      nmm:musicAlbum <urn:album:The%20Only%20Place> ;
      nfo:channels 2 ;
      nmm:dlnaProfile "MP3" ;
      nmm:musicAlbumDisc <urn:album-disc:The%20Only%20Place:Disc1> ;
      a nmm:MusicPiece , nfo:Audio ;
      nfo:duration 164 ;
      nfo:codec "MPEG" ;
      nmm:dlnaMime "audio/mpeg" ;
      nfo:sampleRate 44100 ;
      nie:title "The Only Place" .

Turtle is still a bit of an esoteric format; I think this will be more fun when it can output JSON, but you can already pipe this into a tool like `rapper` or `rdfpipe` and convert it to whatever format you like.

Other news

I didn’t make it to GUADEC this year because I was busy raising money for charity by driving a car right across Europe, Russia and Mongolia… but I was excited to find out that our bid to host it in Manchester in 2017 has been accepted. Any UK folk wishing to get involved in the organisation, please get in touch!

Advertisements

About Sam Thursfield

Who's that kid in the back of the room? He's setting all his papers on fire! Where did he get that crazy smile? We all think he's really weird.
This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to What’s coming in Tracker 1.10

  1. Pingback: GNOME: What’s coming in Tracker 1.10 https://samthursfield.wordpres… | Dr. Roy Schestowitz (罗伊)

  2. Pingback: Links 5/9/2016: Linux 4.8 RC5, Mageia Picks DNF | Techrights

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s