Archive for the ‘desktop search’ Category

Lenses in Unity-5.0. Porting and New Features

Wednesday, January 18th, 2012

Hi there. More gibberish about Unity lenses. You’d think that I don’t experience much else in life, huh? I am, believe you me, but for some reason it is so much easier to blog about technical matters :-)

Now, as some have picked up, the Unity Lenses API has changed slightly in Unity-5.0 (the version that’ll be in Ubuntu 12.04). First of all; sorry! As I’ll outline why we did this you’ll hopefully learn to appreciate it. Flames and pitchforks can go in my general direction if not. And if you still have hate to spare after that you can direct it at Michal ;-) Before we start, do note that the Unity-5.0 API overview on developer.ubuntu.com is already updated, and Michal is updating the wiki documentation.

I’ll take this on in the form of a case study; updating my unity-lens-bliss. This is a Python lens, which makes for a good example – identical changes apply to lenses written in Vala or C. Let’s roll.

Porting unity-lens-bliss to Unity-5.0

We introduced a new signal “search-changed” on the Unity.Scope class. The old property notification (on “notify::active-search” and “notify::active-global-search”) is still not available anymore, as the properties has been removed. The reason for the new signal was that the property notification scheme was racy in some subtle ways that would require some tricky GObject magic in the scopes to work correctly in all circumstances. The race manifested itself in lenses that dispatched the property change notification to an async handler of some sort. If the scope received another search while the async handler was still running we’d have re-entrancy issues in the async handler. This was the reason why you might have seen some mysterious calls to self.freeze_notify() and self.thaw_notify(). It seemed that no one really understood this, and I think we can all agree that having to know the intricacies of GObject property notifications is not a nice requirement for an API that should be simple.

For unity-lens-bliss, what was before:

self.connect ("notify::active-search", self._on_search_changed)
self.connect ("notify::active-global-search", self._on_global_search_changed)

Should now become:

self.connect ("search-changed", self._on_search_changed)

The callback  _on_search_changed() changes signature from:

def _on_search_changed (self, scope, param_spec):

to

def _on_search_changed (self, scope, search, search_type, cancellable):

The search parameter is a LensSearch instance. The LensSearch class has grown some new public properties. Of particular interest is the “results-model” property. This property will hold the correct model depending on whether it is a global- or in-scope search. You can figure out what kind of search this is by looking at the search_type parameter which is an enumeration Unity.SearchType with values Unity.SearchType.GLOBAL and Unity.SearchType.DEFAULT for global- and in-scope searches respectively.

So the implementation of the _on_search_changed() callback changes from:

def _on_search_changed (self, scope, search, search_type, cancellable):
	search = self.get_search_string()
	results = scope.props.results_model
 
	print "Search changed to: '%s'" % search
 
	self._update_results_model (search, results)
	self.search_finished()

to

def _on_search_changed (self, scope, search, search_type, cancellable):
	search_string = search.props.search_string
	results = search.props.results_model
 
	print "Search changed to: '%s'" % search_string
 
	self._update_results_model (search_string, results)
	search.emit("finished")

And this is all there is to it! Bliss will work fine in Unity-5.0 with these changes.

We haven’t yet mentioned the new parameter cancellable. Unsurprisingly (hopefully ;-) ) this is a Gio.Cancellable instance. If you’re writing a fully synchronous lens (like bliss is) it wont be of interest to you, but if you’re dispatching to async methods from your “search-changed” handler (like fx. both unity-lens-files and unity-lens-applications does) then read the next section carefully.

Concurrency and Cancellation

Before we get too deep into this, let’s make it clear what I mean by asynchronous. A method being async means that GLib will spin the mainloop while waiting for the method to return. This means that your app/daemon will continue to handle events (in particular requests from Unity to update the search) while your methods are running. Why would one want to use async methods if it is so complicated? Good question. If you’re just writing a simple scope or lens then chances are that it may not be worth it. But if you go beyond “simple” then it may matter.

Let’s imagine a slightly more complex scope. Maybe something that puts webapps inside the applications lens. The listing is done by first asynchronously querying a web service to list the apps and then asynchronously querying Zeitgeist to sort by popularity. If we didn’t do the web- and Zeitgeist queries asynchronously then the scope would block all requests while any queries were running. This would mean slower responses if the user changes the query under you (which is very likely when we’re talking live searching here), and also you’d run the chance of showing “outdated” results and doing work that you’ll discard immediately anyway. What you want is to be told in the middle of everything that “hey, there’s a new query; stop what you’re doing and do this in stead!”.

An alternative case where’d you want async searching is if you wrote a scope that was living inside an application. There’s really no reason why this wouldn’t work. And it circumvents some of the intricacies of sharing a datastore between a scope daemon and an app. Anyways, back to the example with the web apps. In simplified form it would look something like this:

def _on_search_changed (self, scope, search, search_type, cancellable):
	# Dispatch an async call with callback _on_web_apps_received()
	self._query_web_service_async (search, self._on_web_apps_received)
 
def _on_web_apps_received (self, search, list_of_webapps):
	# Web apps listed by remote server.
	# Now sort them async with zeitgeist, with callback _on_webapps_sorted_received()
	self._sort_webaps_with_zeitgeist_async (list_of_webapps, search, self._on_webapps_sorted_received)
 
def _on_webapps_sorted_received (self, search, sorted_list_of_webapps):
	# We now have the web apps sorted by popularity,
	# add them to the results model
 
	results_model = search.props.results_model
	results_model.clear ()
 
	for app in sorted_list_of_webapps:
		results_model.append(...)
 
	search.finished ()

If you did something like this in the Unity-4.0 API then have to deal with all the re-entrancy, cancellation, and concurrent search handling yourself. Probably by elaborate application of freeze/thaw_notify() and Gio.Cancellables. Tricky stuff. In Unity-5.0 this is a breeze! Contrary to the old way with “notify::active-search” libunity goes out of its way to make the “search-changed” signal nice to use for scope authors (and no, it wouldn’t be technically possible to do the same with the old property notification system).

Firstly, libunity wont call you again before you’ve called search.finished(). So we’re re-entrancy safe in the example already. What’s more – libunity will cancel the cancellable parameter when you get a new query. So sprinkling some if cancellable.is_cancelled(): return lines around will make sure that you don’t do work in vain. We could fx. insert one  right after we receive the results from the web service. Note that you don’t have to call search.finished() if you have been cancelled (libunity will ignore it if you do):

def _on_search_changed (self, scope, search, search_type, cancellable):
	self._query_web_service_async (search, cancellable, self._on_web_apps_received)
 
def _on_web_apps_received (self, search, cancellable, list_of_webapps):
	# NOTE: The new parameter        ^^^^^^^^^^^
	if (cancellable.is_cancelled()): return
	self._sort_webaps_with_zeitgeist_async (list_of_webapps, search, cancellable, self._on_webapps_sorted_received)
 
def _on_webapps_sorted_received (self, search, cancellable, sorted_list_of_webapps):
	# NOTE: The new parameter              ^^^^^^^^^^^
	if (cancellable.is_cancelled()): return
	...

Filters

Bliss doesn’t use filters, so I didn’t touch on that yet. If your scope is using filters, the correct thing is in 99.99% of all scopes to connect the “filters-changed” signal to calling self.queue_search_changed(Unity.SearchType.DEFAULT). In Python:

def __init__ (self):
	...
	self.connect ("filters-changed", self._on_filters_changed)
 
def _on_filters_changed (self, scope):
	self.queue_search_changed(Unity.SearchType.DEFAULT)

Personally I’d probably do it with lambdas:

	...
	self.connect ("filters-changed",
	              lambda scope: self.queue_search_changed(Unity.SearchType.DEFAULT)

 

Out of Band Result Changes

Many scopes feature result sets that can change through external means. Fx. if you  are listing the contents of a directory, listing browser bookmarks, listing recent stuff from Zeitgeist, etc. All can change when the user is doing something else than searching the lenses. When the result set should be updated, disregarding whether the search string has changed, you can call self.invalidate_search().

Search String Change Checking

In the previous paragraph I wrote “disregarding whether the search string has changed”. But when has the search string changed? Does appending a white space change the search string? Most lenses strips the search string from white spaces anyway; so in essence the strings “xyz” and “xyz    “ are identical, seen from the scope. We don’t want to fire off a new search for these kinds of changes. Going further down this road – is “XYZ” and “xYz” the same as well? For most scopes, they will be. The problem is that this is highly dependent on the particular scope.

Doing change checking on search strings was a recurring chunk of similar code in all the default Unity lenses. In order to make this easier for our selves and everyone we baked it into libunity by means of the “generate-search-key” signal on the Unity.Scope class. This is a particular kind of signal that has a return value. The signal takes a Unity.LensSearch as input and returns a “normalized” version of the search string. This could typically be lower casing and chugging off white space at the ends. In code:

def __init__ (self):
	...
	self.connect ("generate-search-key", self._generate_search_key)
 
def _generate_search_key (self, scope, search):
	return search.props.search_string.lower().strip()

Cancellation and Transactions

Considering again the example with async searches and cancellation. One could easily imagine a scenario where you had a bunch of async methods, some of which added rows to the results model and then going on to dispatch more async searches before calling search.finished(). If we got cancelled in the middle of all this, the results model would be left in a dirty state with only half the results of the search. Enter Dee.Transaction.

Dee.Transaction is new class in Dee that implements the Dee.Model interface. You create a new Transaction instance, txn, from your results model, then go on clearing and adding rows to the txn model as you go through your chain of async calls. The real results model will not be updated before you call txn.commit(). So if you’re cancelled somewhere in the middle you just let txn go out of scope (or unref it if you’re writing in C) and it’ll vanish like the Cheshire cat. If you make it all the way to the end you call txn.commit() right before you call search.finished(). So with an example:

def _on_search_changed (self, scope, search, search_type, cancellable):
	txn = Dee.Transaction.new (search.props.results_model)
	self._query_web_service1_async (search, txn, cancellable, self._on_web_apps1_received)
 
def _on_web_apps1_received (self, search, txn, cancellable, list_of_webapps):
	# First set of results retrieved, add them to the transaction
	# and then fetch some more results from another web service
	if cancellable.is_cancelled(): return
 
	txn.clear ()
 
	for app in list_of_webapps:
		txn.append(...)
 
	self._query_web_service2_async (search, txn, cancellable, self._on_web_apps2_received)
 
def _on_web_apps2_received (self, search, txn, cancellable, list_of_webapps):
	# Second batch of results
	if cancellable.is_cancelled(): return
 
	for app in list_of_webapps:
		txn.append(...)
 
	txn.commit ()
	search.finished ()

Fin!

Wow, you’ve made it to the end of this blog post! You surely are an impressively patient person :-)

Please feel free to ask questions or post corrections in the comments. Or catch me, kamstrup, or Michal, mhr3, on #ayatana on FreeNode if you’re into IRC.

Now as a bonus for your patience you’ll get… A FREE picture of Me Looking At A Webcam!

Mikkel Looking at a Webcam

Hacking the Unity Shell – An Alternative Apps Lens

Friday, November 4th, 2011

(fret not, this is not only a wall of text, there’s a juicy screencast at the end if you make it all the way)

Me being the maintainer of the applications lens in Unity you might wonder why I am now blogging about an alternative apps lens – let alone why I actually wrote the alternative myself! :-)

I am personally quite happy about the current default apps lens in Unity. It doesn’t try to be too smart, but aims more for the simple and intuitive. That’s why we only do prefix matching on the words in the application, eg  if the user types “term” it matches th word “Terminal”, but not “XTerm”. We also want the matching to be consistent with that of the results coming from the Ubuntu Software Center – which also works with prefix matching.

Not all users find prefix matching to be the best thing since sliced bread. I like it, but astonishingly the whole world doesn’t think like me!? Nonetheless I can respect that :-)

Some users wants to see substring matching which means that “term” matches both “Terminal” and “XTerm”. More progressive users wants a more powerful approach that we can call subpattern matching where the letters in the input string must occur in the same order in the string we test against, eg. “term” matches both “Terminal”, “XTerm” and Television Remote”. This can also be thought of as some sort of “acronym matching”.

Matching algorithms aside some users simply hate to search for their apps and doesn’t like to go digging in the filters we have on the right (the filters are also hidden by default which makes them not so easily discoverable). They want to browse their good olde hierarchical menus.

… some users abhor the Most Used and Downloadable apps categories of the deafults lens – and some users probably want something completely different!

Had I not been an old fart I would probably gladly had added tonnes of options to the unity-lens-applications codebase trying to make everyone happy. But I am an old fart :-) I want a simple and tight codebase and I don’t want tonnes of options because that makes the code harder to maintain. More tricky maintenance means that the ones that are happy with the defaults will suffer.

Enter the power of Unity! You see; Unity is not only a shell in the user-facing kinda way. It is also a shell in the programmable kind of way :-) The default lenses are not hard coded, you can replace them. So you can replace the apps lens as well if you want.

I’ve aired the idea of writing an alternative apps lens numerous times to the ones requesting changes, but none ever appeared that I know of. So I was thinking that I could maybe kick start that effort if I provide a solid starting point. Hence I whipped up Bliss, https://launchpad.net/unity-lens-bliss.

Bliss is a very simple replacement for the apps lens. It does basic searching with substring matching and it allows you to browse your apps by category. It also contains a good collection of bugs, but I’ve been dogfooding it here for a while now and it’s nothing unbearable :-)

Considering the new focus on power users for the Precise cycle I thought/hoped that I could inspire someone to grab the code and write a production ready app launcher specifically tailored for power users. I made the code so that it should be easy to hack on and extend, so let’s see where it ends up…

Caveat emptor: Bliss is by no means official or anything. It is a quick hack to showcase how you can go about this, mostly intended for developers who want to do their own thing. That is also why you wont find a PPA for it (not from me at least :-) ).

Intruding Bliss, an Alternative Apps Lens for Unity from Mikkel Kamstrup Erlandsen on Vimeo.

So branch it, hack it, break, it, fork it. Knock yourselves out!

(I know of at least two obvious bugs: b1) the back arrow sometimes doesn’t appear as the first item, b2) the More Apps shortcut on the dash home screen breaks when you remove unity-lens-applications)

Zeitgeist Hackfest

Sunday, February 6th, 2011

Zeitgeist logo

Prepping up for the Zeitgeist hackfest which is kicking off tomorrow in Aarhus, Denmark. You’ve probably not heard a lot about this event before this late moment – that’s because it has all happened a bit fast. As we where internally discussing the possibilities of a hackfest a bit back, it quickly became evident that we needed it hold it Very Soon Now (TM) if we wanted all the core maintainers to have a chance of showing up.

I was wincing a bit because we recently expanded our family (can you believe I have three kids now..? I’m not sure I can) and I wasn’t very keen on traveling more than I already do with my work. Seif, being the man of action that he is, didn’t let that put him off an arranged that we could hold the hackfest conveniently close to my home. Not only that, but he pretty much did all of the necessary arrangements for getting a cool venue, accommodation, and not least – getting some sponsors to help us out. Seif – this one is to you – you rock man!

The sponsors are the GNOME Foundation and Collabora, and the venue will be the CS Department at Aarhus University, in the Incuba Science Park. All have been incredibly helpful despite our short notice. Thanks to everyone involved!

Sponsored by Gnome Foundation BadgeCollabora logo

cs.au.dk logo

I’m gonna have to hold the suspension a bit about what we intend to do with this precious opportunity we’ve been given. I’m just too tired right now – but my plan is to have a short daily log posted on my blog each day. So by the end of this week you should all hopefully have an idea :-) Stay tuned.

Fwd: On Zeitgeist optimization

Friday, November 19th, 2010

I don’t think this has reached Planet Gnome yet – so let me just give some major props to Markus Korn for his awesome Zeitgeist optimizations. Go read it! The post has multicolor graphs and the whole shebang! ;-)

Fascinating Facets!

Wednesday, September 29th, 2010

What is Facetting?

“Facetting” is a word which has a special meaning in search-engine-world. It could be defined as the generalization of “Tagging” which I assume you’re familiar with (from Twitter, Flickr, et al).

So instead of having just one kind of tags we could be creative and have two kinds; “Tags” and “Jags”. To help you organize your stuff your system displays statistics about your Tags and Jags with counts on how many matching items you have. Fx.


Tags

  • pony (3)
  • kitten (27)

Jags

  • ninja (5)
  • samurai (68)

If I now search across my system for anything containing the word “ramen” these statistics would narrow down to show the counts for the search results. Fx:

Searching for “ramen”
Tags

  • kitten (3)

Jags

  • ninja (5)
  • samurai (2)

In any part of my journey I could click on on a particular Tag or Jag and narrow my search results down to only match items with that particular attribute. Fx. clicking on the “ninja” Jag:

Searching for “ramen”, restricted to the “ninja” Jag
Tags

  • kitten (1)

Jags

  • ninja (5) X

In real life we don’t deal as much with Tags and Jags, so consider that you could stuff anything old metadata attribute in there instead. Searching a library catalogue very useful facets would be

Example facets for a library system:
Author
Title
Publisher
Year

To be honest these facets where not exactly grabbed out of the thin air :-) I highly encourage to go play with the real deal and the homepage of the State Library of Denmark.

Technical Aspects of Facetting

If you take indexing libraries like Lucene or Xapian out of the box – you have to do quite a lot of work to get correct facetting. And by correct I mean always getting the counts exactly right and always calculating the entire facet sets for the active query.

A common solution to give the illusion of facets is to simply calculate them on the search engine for the first 100 hits (or 1000, or whatever). This leads to a slow and resource hungry solution that doesn’t provide the right results for large results sets (with more than 100 hits).

Fear not! There are shrink wrapped products like Summa or Solr that can give you correct facets pretty much out of the box. However it’s still not exactly something you are going to run on low end servers (unless you have a very small index).

This is where my true inspiration behind this blog post is revealed! Toke Eskildsen (my awesome former coworker) has been hacking away, trying to get the facetting system from Summa upstreamed into Lucene. Along his way optimizing the internals of Lucene with facetting in mind and providing hooks to make facetting more efficient. Toke’s latest status update certainly heralds a brighter future! :-D

It’s my hope that Toke’s work can help bring facetting more into the mainstream – because it’s truly and awesome way to browse huge datasets.

Facetting on the Open Source Infrastructure

Dreaming on into a world where facetting is ubiquitous I can certainly see Bugzilla, Launchpad, translations sites, wikis, and what not making lives a lot easier for everyone from passers-by to professional developers if they could do facetting across their metadata.

Facetting on the Desktop

Even though Toke’s work is all sorts of awesome, my gut instinct tells me that general facetting still would be too heavy a task for a normal desktop.

That said, it may not be impossible. At the very least I want it to be possible! :-) Really polishing of the low level data structures, maybe cheating just a wee bit, we can get something which is good enough.

A while back I actually configured Summa to harvest my desktop (wiring it up with Tika) configuring Summa to create facets for document titles, uris, and mimetypes. Stuff like that. And when I started browsing my files in Summa I just immediately had one of those Eurika moments:

Files are meant to be browsed through facets!

It just felt so right :-)

(Bonus: Facetting and Zeitgeist?)

Sorry I don’t have a cool demo to show here :-) Just a pipe dream to share.

In theory; it is possible to define a Timeline facet where each entry would correspond to a certain time range (the histogram for Gnome Activity Journal is actually more or less doing this).

Couple this with the zeitgeist-fts-extension to give you a full text search interface and you have the foundations. Now you “just” need to intersect the searches with some facetting info on the logged metadata and do a heckuwa lot of counting, and presto – magical interface to replace the aging hierarchical file system metaphor :-)

Ok – I may have made that last part sound easier than it’s likely to be… To be honest it’s gonna be darn friggin hard to implement in an effiencient and light way. So don’t hold you breath… I’m not.

So… What is it that you’re doing again?

Sunday, August 15th, 2010

It’s now over 4 months since I started at Canonical, so a retrospective blog post might be in order by now :-)

I will try and keep a not-too-technical tone in this blog post as there seems to be quite a lot of non-technical people reading my blog as well. I’m getting a lot of those “So… What is it that you’re doing again? I don’t understand much from your blog posts”. So here’s to you guys and gals! ;-)

As you may know I spend most of my time here hacking on Unity – a new super shiny user interface for netbooks. So if you wanna be cooler than all your friends you will replace Windows on your netbook with Unity running on Ubuntu, and it will look something like this:

Unity

Or view a full screencast I did to demo some of the cool stuff we have been working on (please note that this is the in-development software and not the final product):

Unity Development Demo from Mikkel Kamstrup Erlandsen on Vimeo.

The code I write runs just below all the fancy graphics you see and wire up all the components and data models that end up as nice little icons on your screen.

So a little more detailed than you might be interested in; these “components and data models” are :

  • dee – A system library that enables applications to share small in-memory databases. For tech-savvy people: dee is a library that implements some peer-discovery and peer-to-peer tables over dbus (and lots of nifty helper APIs around this)
  • libzeitgeist – A system library that enables applications to talk to a system service called Zeitgeist. The confusing part here is that Zeitgeist is what I develop in my spare time :-) Zeitgeist is a small magical thing that tracks user activity and enables you to search, sort, and categorize everything you do on your computer.
  • zeitgeist-fts-extension – Also known as the Zeitgeist Full Text Search Extension. This is an extension module to Zeitgeist that allows you to search your history as you briefly see in the screencast above where I search for “zeit”.
  • unity-place-files – A system service that implements all the file searching- and browsing logic in Unity. You can briefly see it in action in the screen that lists all my recent files and where I search for “zeit”. It’s also delivering the all the files and folders you see in the topmost screenshot.
  • unity-place-applications – Unsurprisingly much like unity-place-files above, but applies much of the same logic to applications in stead of files

Zeitgeist Proceedings

Saturday, June 12th, 2010

As was announced yesterday Zeitgeist 0.4.0 is out. Time to celebrate!

I am quite confident that it’ll be an excellent component in the infrastructure behind the Unity file handling experience. There are some slight issues I want to resolve before I am in development nirvana, but my fellow Zeitgeist developers has already more or less agreed on their solutions so that is looking dandy :-)

Generally it feels great that we are so close to “feature completeness” with regards to what we mapped out at the Zeitgeist hackfest in Bolzano last year. The last major thing to land as I see it is the storage awareness that I blogged about a while ago. We still don’t have anyone being paid to hack on the core engine, so we’re not landing boatloads of code each day, but we are chugging along at a steady pace.

As I also hint in the linked announcement at the top, we have a series of announcements coming up for the various projects related to Zeitgeist – so stay tuned!

Zeitgeist 0.3.0

Tuesday, December 1st, 2009

In between sick children, work, my daughters birthday, my son starting in kindergarten, and what have seemed to be an endless stream of bumps in the road which is otherwise known as life, I stole a few moments this afternoon and rolled up our first development release of the new Zeitgeist release series – Zeitgeist 0.3.0.

I am pasting the release announcement here because I know all you lazy ass busy readers out there wont bother have time to click a link to the gnome-announce-list archives [2] :-)

Hi,

On behalf of the Zeitgeist team I am proud to announce our first
development release, Zeitgeist 0.3.0, leading up to what will be our
stable series which will be 0.4. It is our intent to aim for a 1.0
release as soon as we feel good about the stable series, but that is
still a bit in the future. Now that we've crossed the initial hurdle
in the rewrite we expect the release cycle to be much shorter than
this one, although we have not settled on something strict yet.

As many of you know the bulk work on this release was done in the
Zeitgeist hackfest in Bolzano. Since we came back we been busy little
bees polishing it up and fixing bugs - trying not to flame each others
too much when discussing the designs :-)  Working face to face in
Bolzano gave us a unique chance to really discuss things through and
get to the bottom of the details. This will also affect other
developers a bit since...

We were bad boys and decided to change both our internal database
structures as well as our public DBus API. Sorry - but after long
discussions we all agreed that this was for the best. The new design
is leaps and bounds better than the old one. This means that you both
have to give up on your old log database, and accept that there are no
GUI written for the new API just yet. This is being worked on as you
read this though!

Something that might come as a shock to some other developers is that
we decided not to store annotations and bookmarks within Zeitgeist.
This should be done in Tracker or some other semantic metadata
storage[1]. Zeitgeist answers only when and how data was accessed, but
stores no information about the current state of the metadata. We will
be working very closely with Tracker from now on since 0.7 is a
blessed dependency for GNOME 2.30. Congrats to the Tracker Devs.

You can download the release from: https://launchpad.net/zeitgeist/+download

The NEWS entry reads:

First development release leading up to the stable 0.4 series. This
release features:

 - Complete rework of engine and DBus API. Read: apps written against 0.2.*
   will most certainly need an update (see fx.
   http://mail.gnome.org/archives/desktop-devel-list/2009-November/msg00019.html)
 - Public Python client API defined in zeitgeist.datamodel and
   zeitgeist.client modules
 - Documented public API with Sphinx (we'll have an URL for you shortly)
 - Changed Ontology from XESAM to Nepomuk.
 - Removed the Storm backend (obsoleted in 0.2.1).
 - Removed the Querymancer backend.
 - Support for event payloads (binary attachments to events)
 - An extension API for the core engine, allowing extensions direct
   access to the DB. There are already a handful extensions things in
   the works here, you will hear more about this later

There are a few DISCLAIMERS that needs to be attached to this:

 - The event notification/signals are not yet ready in the new DBus API.
   We expect to have that ready for 0.3.1.
 - We plan to support querying only for available items (eg. filtering out
   deleted files, not listing files on detached USB storage, etc.). However this
   feature is not fully supported yet, even though it is exposed in the API.
 - While we are pretty satisfied with the database layout, there may still be
   changes in the ontologies or concrete data extraction methods. This might
   require that users delete their log databases in order to rebuild them
   with the new definitions. Of course this will no longer happen when we
   go stable
 - Much related to the point above our event ontologies are not yet set in stone,
   and minor changes are expected
 - We have only one logger enabled for now. Namely the one monitoring your
   recent files. In coming releases this logger may well be deprecated in favour
   of application specific plugins.
 - And finally. Please note that this is a *development release*. We can not
   guarantee stability of services nor APIs, although we strive hard to keep
   things stable.

Cheers,

Mikkel

[1]: There have been talk about defining (and implementing) a very
simple DBus API for storing semantic annotations (bookmarks, tags,
comments, ratings, etc). In our internal speak such a service is
called a Repository. Tracker or Soprano would expose this API in most
cases, but on platforms where they are not available the simple
Repository implementation would be most handy. This being said, it is
currently not a high priority to implement a Repository, there are
alternatives ready in Tracker and Soprano.

[2]: Sorry it’s not a direct link to the announcement, because I am still waiting for my mail to get through. If you must see it in a mail archive then check the Zeitgeist list archives on Launchpad.

What We Talk About When We Talk About Zeitgeist

Monday, November 16th, 2009

There is a tangible confusion around as to what Zeitgeist is and what it isn’t; what it can do and what it can’t do. This is partly our own fault because we could have communicated this whole thing better, for instance we have some very outdated wiki pages lying around that you should probably stay away from until we updated them. In this post I aim to give a semi technical run down of the core Zeitgeist functionality and how we expose it for you to work with. This should hopefully clear out some confusion.

Events

The Zeitgeist daemon (also known as the engine) is a process that exposes an event logging framework as a DBus API. The structure of these events is that they have a block of metadata that describe the event itself (this is known simply as the event metadata) and another block of metadata that describes the subject, or subjects, that this event happened to (this part is known as the subject metadata). The metadata for the event looks like:

  • Timestamp – When did this happen. Milliseconds since the Unix Epoch. Note that we see events as single points in time, meaning that events don’t have a duration
  • Interpretation – Abstract interpretation of this event; what happened. Fx. “opened”, “saved”, “closed”, “send”, etc.
  • Manifestation – How the event happened. Fx. “user activity”, “notification”, or “scheduled activity”.
  • Actor – Who triggered it. This will typically point to the .desktop file of the acting application. It will most likely be an application, but it is not required to be so.
  • Payload – A free-form binary blob that you may attach to the event. This is specifically application specific and mainly intended to be a “back door” for people to do all sorts of funky hacks.

Each event has one or more subjects associated. For each subject we store:

  • URI – You guessed it! The URI of the subject
  • Interpretation – Abstract interpretation of the subject. This could be “Document”, “Image”, “Video”, “Email”, “Instant message”, “Contact”, anything.
  • Manifestation – How the subject is stored. This could be something like “File”, “Mailbox”, “Web page”.
  • Origin – A URI pointing to the origin or “patron” of the subject. For files this would be the parent folder. For YouTube videos it would be http://youtube.com
  • Mimetype – The format of the datastream representing the subject. Fx. text/plain, application/xml.
  • Text – Textual information added to the subject. This is not applicable for for types of subjects.
  • Storage – Identifier for the storage medium this subject resides on. We use this to make it possible for queries that return only events for subjects that are “available now”. Fx. some clients don’t want to show events for files that are stored on you USB pen drive when it is not connected.

Ontology – Or Data Description

In reality the metadata fields we store don’t contain simple strings like “Document” for the subject interpretation. It’s a bit more complex than that – sorry! We store a URI pointing to a formal definition of something categorized as a Document. This formal categorization is called an ontology if you want a word to confuse your friends with. We are fortunate enough that someone already wrote such a spec, namely the Nepomuk Ontology. So instead of just “Document” we store the string”http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#Document“.

Since Tracker also uses the Nepomuk ontologies you may take these formal classification strings and plug them directly into Tracker to find everything that Tracker considers a document.

We will also have an ontology for the event metadata as this is not covered by Nepomuk. We are actively working on this.

Getting Data Out – Querying the Event Log

We employ a template based query API for searching our log data. You send us a list of event templates you want to look for and how you want it sorted and we give you the results. So if you want to find all “open” events on subjects of type “Document” simply create an Event object, set the interpretaion to “open”-event and add a subject to the event template with the interpretation set to “document”. All other fields should be left blank. Send this template to us and we will give you the matches.

The list of event templates is collected into a big OR-query to imbue the consumers with more power.

Getting Data Into Zeitgeist

There are really no limits to what kind of events we could store. If you have a spare mobile with a in-built accelerometer and glue it to your front door then you could send an event over bluetooth to your desktop each time your front door opens. Probably there are better use cases?

The point is that the usefulness of Zeitgeist stands and falls with the events that you push into us. We can store anything that you can model using the structures I outlined above. I am pretty certain that people will not agree on the kinds of data they want logged, but we are ready for anything :-)

Normal users would of course not need to think about getting their data into Zeitgeist. What developers need to know is that we have a simple DBus API to insert events (surprisingly called InsertEvents). It is called InsertEvents and not AppendEvents or something like that for a reason. Namely that you are allowed to insert events that are in the past. This is useful if you want to import your Firefox history or what ever. If you try to log an event twice the engine will throw an error at you, so no need to worry about dupes.

Ok. I think that about wraps up what I intended to say for now. Hope it’s useful to at least one person out there! :-)

Zeitgeist Hackfest Reflections

Friday, November 13th, 2009

Sitting here on my final day of the Zeitgeist hackfest. I really wanted to blog more, but I’ve been so busy hacking that I didn’t find the time. I hardly slept or ate either! I will follow up on this post with more technical details from the hackfest.

The stay here in Bolzano has been very nice. Fresh mountain air and very warm conditions compared to my usual Danish habitat. I want to give a warm thanks to our sponsors and organizers for making this possible, the whole arrangement has been run very smoothly and I don’t have a single thing to complain about. So big props go to sponsors:

And let us not forget the horde of local Bolzanites (what’s the correct term for people living in Bolzano again?) who helped us around the city and helped us at the CTS (our hacking venue). They showed an extraordinary display of patience and hospitality helping the flock of geeks around :-) Thanks guys and girls!

As I mentioned in the first paragraph we’ve been extraordinarily busy hacking. More or less skipping lunch and supper, hacking until we collapsed from fatigue. I was working fully in the engine team and I really really wanted us to have the new engine in a working and unit-tested state after the hackfest (so we can give you a development release during the next week hopefully). I had a moment of despair in the middle of the week where bugs just continued to pop up and we had a hard time coming up with the right architecture. With a relentless effort from the entire engine team we pulled it through in the end though, and I am really happy about our new design and API. Perhaps I am just a tiny bit biased, but I really feel that our new design feels “just right” :-)

The UI team where very busy as well and I saw lots of very cool stuff being hacked out. Let’s hope that they choose to share some of the bling with us :-) I know as a matter of fact that they made huge progress on the Gnome Activity Journal and I can’t wait to have a release of this ready on top of the new engine.

That’s all for now, I intend to follow this post up with a more technical post later. Ciao!