Posts Tagged ‘Gnome’

Zeitgeist Hackfest Day 1

Monday, February 7th, 2011

So. The first day of the Zeitgeist hackfest has ended. The venue at the CS department of Aarhus University worked really well. The wifi worked without a hitch and we all got keys and keycards to access the premises of the Incuba Science Park all 24 hours of the day. Awesome.

We spend the first 2 hours figuring out what items to focus on and who does what and Seif compiled it into a list of assigments on the wiki.

Seif, Morten, and I talked about what pieces we needed to fit together to make the Zeitgeist and Telepathy integration work perfectly. Got some nice and simple work items nailed down that’ll take us a long way.

While we did that Michal updated the zeitgeist-datahub to listen for the new DBus signal emitted by GDesktopAppInfo when they are launched. He also worked with Colin Walters on getting a nifty little patch that adds some extra info to the DBus signal that will make Zeitgeist even more clever when logging your app usage patterns. Uh… And Michal is brewing up something awesome for you as well, but I shall not steal his thunder :-)

My personal little project was to update and make my “Storage Awareness” branch from a while back ready to merge to trunk. There are some kinks to iron out before it’s ready, but I’m most thrilled about the prospect of getting this done.

Oh! And there is a #zeitgeisthackfest hashtag if you wanna bigbrother us on Twitter :-)

Zeitgeist Hackfest

Sunday, February 6th, 2011

Zeitgeist logo

Prepping up for the Zeitgeist hackfest which is kicking off tomorrow in Aarhus, Denmark. You’ve probably not heard a lot about this event before this late moment – that’s because it has all happened a bit fast. As we where internally discussing the possibilities of a hackfest a bit back, it quickly became evident that we needed it hold it Very Soon Now (TM) if we wanted all the core maintainers to have a chance of showing up.

I was wincing a bit because we recently expanded our family (can you believe I have three kids now..? I’m not sure I can) and I wasn’t very keen on traveling more than I already do with my work. Seif, being the man of action that he is, didn’t let that put him off an arranged that we could hold the hackfest conveniently close to my home. Not only that, but he pretty much did all of the necessary arrangements for getting a cool venue, accommodation, and not least – getting some sponsors to help us out. Seif – this one is to you – you rock man!

The sponsors are the GNOME Foundation and Collabora, and the venue will be the CS Department at Aarhus University, in the Incuba Science Park. All have been incredibly helpful despite our short notice. Thanks to everyone involved!

Sponsored by Gnome Foundation BadgeCollabora logo

cs.au.dk logo

I’m gonna have to hold the suspension a bit about what we intend to do with this precious opportunity we’ve been given. I’m just too tired right now – but my plan is to have a short daily log posted on my blog each day. So by the end of this week you should all hopefully have an idea :-) Stay tuned.

So… What is it that you’re doing again?

Sunday, August 15th, 2010

It’s now over 4 months since I started at Canonical, so a retrospective blog post might be in order by now :-)

I will try and keep a not-too-technical tone in this blog post as there seems to be quite a lot of non-technical people reading my blog as well. I’m getting a lot of those “So… What is it that you’re doing again? I don’t understand much from your blog posts”. So here’s to you guys and gals! ;-)

As you may know I spend most of my time here hacking on Unity – a new super shiny user interface for netbooks. So if you wanna be cooler than all your friends you will replace Windows on your netbook with Unity running on Ubuntu, and it will look something like this:

Unity

Or view a full screencast I did to demo some of the cool stuff we have been working on (please note that this is the in-development software and not the final product):

Unity Development Demo from Mikkel Kamstrup Erlandsen on Vimeo.

The code I write runs just below all the fancy graphics you see and wire up all the components and data models that end up as nice little icons on your screen.

So a little more detailed than you might be interested in; these “components and data models” are :

  • dee – A system library that enables applications to share small in-memory databases. For tech-savvy people: dee is a library that implements some peer-discovery and peer-to-peer tables over dbus (and lots of nifty helper APIs around this)
  • libzeitgeist – A system library that enables applications to talk to a system service called Zeitgeist. The confusing part here is that Zeitgeist is what I develop in my spare time :-) Zeitgeist is a small magical thing that tracks user activity and enables you to search, sort, and categorize everything you do on your computer.
  • zeitgeist-fts-extension – Also known as the Zeitgeist Full Text Search Extension. This is an extension module to Zeitgeist that allows you to search your history as you briefly see in the screencast above where I search for “zeit”.
  • unity-place-files – A system service that implements all the file searching- and browsing logic in Unity. You can briefly see it in action in the screen that lists all my recent files and where I search for “zeit”. It’s also delivering the all the files and folders you see in the topmost screenshot.
  • unity-place-applications – Unsurprisingly much like unity-place-files above, but applies much of the same logic to applications in stead of files

Zeitgeist Proceedings

Saturday, June 12th, 2010

As was announced yesterday Zeitgeist 0.4.0 is out. Time to celebrate!

I am quite confident that it’ll be an excellent component in the infrastructure behind the Unity file handling experience. There are some slight issues I want to resolve before I am in development nirvana, but my fellow Zeitgeist developers has already more or less agreed on their solutions so that is looking dandy :-)

Generally it feels great that we are so close to “feature completeness” with regards to what we mapped out at the Zeitgeist hackfest in Bolzano last year. The last major thing to land as I see it is the storage awareness that I blogged about a while ago. We still don’t have anyone being paid to hack on the core engine, so we’re not landing boatloads of code each day, but we are chugging along at a steady pace.

As I also hint in the linked announcement at the top, we have a series of announcements coming up for the various projects related to Zeitgeist – so stay tuned!

Zeitgeist Storage Awareness

Sunday, January 17th, 2010

Leading up to our last Zeitgeist release (0.3.1) I hacked up our new Blacklisting- and Monitoring APIs, both things quite fun work and very useful API if I might say so my self :-) But I regret not blogging about it as I wrote it – we gotta keep them olde hype-wheels a’turnin’. So here we go about the next feature on my plate…

Storage Awareness

So what does the buzz wordy term “Storage Awareness” cover? We had a few requests from application developers like:

  1. “I don’t want to show online resources when there is no network interface up”
  2. “I don’t want to show work related to files on a disconnected USB drive”
  3. Or another one I just came up with: “When I plug in my USB-drive show my recent activity on that device”
  4. Very much related is how deleted files should be handled in the results, but I will not discuss that right now

Since Zeitgeist is a log and not a snapshot of your environment we will keep the information around even if you delete files or detach your storage devices. So you might indeed get data about subjects that are not readily available when you query the log. However, the use cases above seems valid and applications stat()ing each file:// URL in the result set seems like a very bad idea, so it would be nice if we could help a bit with this.   Even though we are “just” a log doesn’t mean we can’t provide some nifty API for application developers.

So our query API has flag that filters events to only those events with subjects that are “available right now”, it has not been functional until now, but it will be so for 0.3.2. Since we also log information about what storage medium each event subject uses one can also ask for recent- or popular stuff on a given storage medium.

Storage Identifiers.. Help?

We associate each subject URL with a storage medium via a unique string identifier. For stuff like USB drives we have the UUID readily available from GIO. For online resources we simply use the id “net” and I use NetworkManager to check for network availability (ConnMan should be easy too).

So far so good, but I have not been able to handle CDs (both audio and data) and DVDs properly yet. I am not a storage format expert so I don’t know if it’s even possible to obtain a unique identifier for a given data CD (or what have we) – at least I can only get the disk label from GIO and that is not unique (but it might very well be “unique enough” for this to work well in practice). So any help on obtaining real unique ids for CDs and/or DVDs would be appreciated. Note that I would like to use the G* stack and not introduce funky dependencies – and I am also not going to read the first N bytes and checksum those.

Next bump on the road is that it seems that I can not get the disk label from within from gio.VolumeMonitor‘s "volume-removed" signal handler (calling volume.get_identifier()). I just get a None whereas I get the right label in the "volume-added" handler. I can probably figure this one out, but any ideas are appreciated.

The Code

Hold on… While you can indeed dig out the code from Launchpad, it’s not a secret,  I would recommend that you wait a bit just yet. It’s not ready for testing (and not even wired up in the engine so you have to do that yourself). So no code pointer for you, sorry :-)

What We Talk About When We Talk About Zeitgeist

Monday, November 16th, 2009

There is a tangible confusion around as to what Zeitgeist is and what it isn’t; what it can do and what it can’t do. This is partly our own fault because we could have communicated this whole thing better, for instance we have some very outdated wiki pages lying around that you should probably stay away from until we updated them. In this post I aim to give a semi technical run down of the core Zeitgeist functionality and how we expose it for you to work with. This should hopefully clear out some confusion.

Events

The Zeitgeist daemon (also known as the engine) is a process that exposes an event logging framework as a DBus API. The structure of these events is that they have a block of metadata that describe the event itself (this is known simply as the event metadata) and another block of metadata that describes the subject, or subjects, that this event happened to (this part is known as the subject metadata). The metadata for the event looks like:

  • Timestamp – When did this happen. Milliseconds since the Unix Epoch. Note that we see events as single points in time, meaning that events don’t have a duration
  • Interpretation – Abstract interpretation of this event; what happened. Fx. “opened”, “saved”, “closed”, “send”, etc.
  • Manifestation – How the event happened. Fx. “user activity”, “notification”, or “scheduled activity”.
  • Actor – Who triggered it. This will typically point to the .desktop file of the acting application. It will most likely be an application, but it is not required to be so.
  • Payload – A free-form binary blob that you may attach to the event. This is specifically application specific and mainly intended to be a “back door” for people to do all sorts of funky hacks.

Each event has one or more subjects associated. For each subject we store:

  • URI – You guessed it! The URI of the subject
  • Interpretation – Abstract interpretation of the subject. This could be “Document”, “Image”, “Video”, “Email”, “Instant message”, “Contact”, anything.
  • Manifestation – How the subject is stored. This could be something like “File”, “Mailbox”, “Web page”.
  • Origin – A URI pointing to the origin or “patron” of the subject. For files this would be the parent folder. For YouTube videos it would be http://youtube.com
  • Mimetype – The format of the datastream representing the subject. Fx. text/plain, application/xml.
  • Text – Textual information added to the subject. This is not applicable for for types of subjects.
  • Storage – Identifier for the storage medium this subject resides on. We use this to make it possible for queries that return only events for subjects that are “available now”. Fx. some clients don’t want to show events for files that are stored on you USB pen drive when it is not connected.

Ontology – Or Data Description

In reality the metadata fields we store don’t contain simple strings like “Document” for the subject interpretation. It’s a bit more complex than that – sorry! We store a URI pointing to a formal definition of something categorized as a Document. This formal categorization is called an ontology if you want a word to confuse your friends with. We are fortunate enough that someone already wrote such a spec, namely the Nepomuk Ontology. So instead of just “Document” we store the string”http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#Document“.

Since Tracker also uses the Nepomuk ontologies you may take these formal classification strings and plug them directly into Tracker to find everything that Tracker considers a document.

We will also have an ontology for the event metadata as this is not covered by Nepomuk. We are actively working on this.

Getting Data Out – Querying the Event Log

We employ a template based query API for searching our log data. You send us a list of event templates you want to look for and how you want it sorted and we give you the results. So if you want to find all “open” events on subjects of type “Document” simply create an Event object, set the interpretaion to “open”-event and add a subject to the event template with the interpretation set to “document”. All other fields should be left blank. Send this template to us and we will give you the matches.

The list of event templates is collected into a big OR-query to imbue the consumers with more power.

Getting Data Into Zeitgeist

There are really no limits to what kind of events we could store. If you have a spare mobile with a in-built accelerometer and glue it to your front door then you could send an event over bluetooth to your desktop each time your front door opens. Probably there are better use cases?

The point is that the usefulness of Zeitgeist stands and falls with the events that you push into us. We can store anything that you can model using the structures I outlined above. I am pretty certain that people will not agree on the kinds of data they want logged, but we are ready for anything :-)

Normal users would of course not need to think about getting their data into Zeitgeist. What developers need to know is that we have a simple DBus API to insert events (surprisingly called InsertEvents). It is called InsertEvents and not AppendEvents or something like that for a reason. Namely that you are allowed to insert events that are in the past. This is useful if you want to import your Firefox history or what ever. If you try to log an event twice the engine will throw an error at you, so no need to worry about dupes.

Ok. I think that about wraps up what I intended to say for now. Hope it’s useful to at least one person out there! :-)

Zeitgeist Status Update

Tuesday, November 3rd, 2009

I just posted a long status update on the Zeitgeist project to Gnome’s desktop-devel mailing list. I bring it here in bloggified form to help spread the word past the desktop-devel crowd.

With the 2.30 module deadline passed it seems appropriate that we give a status report from the Zeitgeist team.

Since there have been a good deal of confusion about what Zeitgeist is, and isn’t, about I will try clear this up in this mail as well. I will try to stay low on the buzz word factor and leave some of the more exotic use cases out to avoid too much speculation.

Zeitgeist in 1 sentence

Zeitgeist is an event logging framework used to keep a log of user activity in a structured way.

What new services do we provide for UIs and applications

Zeitgeist provides a DBus API to query and update the activity log. Clients can query on time ranges, the acting applications, mimetypes, and Nepomuk classifications of the subjects and events. Sorting can be done on various criteria such as usage frequency and recent usage.

Concrete examples could be “Get me most used files of mimetypes x,y or z between the months January till March”

One can also query for documents that are used in context with others. As in “Which documents/websites are used with http://youtube.com within the last week”.

It is also possible for the applications to get notified when the log is updated. This is for instance used by the Parental Control application as well as the GNOME Activity Journal.

What Problems can we solve

The straight forward use case is as a GtkRecentManager on drugs. Zeitgeist removes the need for each application to parse a big XML file to retrieve recently used documents. It also removes the need to ever truncate your usage history, our database format is compact and can easily contain years of history. My estimation is that 1M log entries will take up about 80mb (give or take 20mb).

Open up for a range of query capabilities that GtkRecentManager doesn’t provide. Instead of simply storing the most recent usage event on a resource we store all usage events. This way we can not only answer when the most recent use case was, but also account for the entire usage history.

One use case that is already in the works is having the most used resources within the last 3 weeks for an app in the context menu in a window list. This is for example done in Docky.

Looking past just logging resource usage we will also start monitoring window and document focus times. This opens up to a whole new world of contextual relevancy that I wont elaborate on here. I am trying to stick to the more down to earth aspects of Zeitgeist.

Which processes/daemons do we run

Zeitgeist itself is a single DBus daemon. Where the picture gets a little more fuzzy is how we collect events. The long term goal is for apps to submit events, maybe hooking directly into GtkRecentManager, or in any case provide a very convenient way for apps to do this. Apps like Pidgin or Empathy would probably need some plugin for logging usage statistics of your contacts.

Right now we resort to less elegant ways of collecting events, like running a separate daemon harvesting Firefox’s history, GtkRecentlyUsed’s and other applications’ history (this daemon is also known as the datahub). The datahub is already on its way to becoming redundant now that a Firefox extension is in the works (and one for Epiphany already exist). It is our intent that the datahub should eventually go away as application support becomes widespread, but it
may eventually still prove useful for usage together with online service.

How resource hungry are we

Normal memory usage is around 5-10mb for the core Zeitgeist daemon. The datahub process (and I repeat; we want to get rid of this) is about 12mb.

What dependencies

Right now the daemon depends on SQLite, Python 2.5, python-gobject, python-xdg, and python-dbus. For the datahub we additionally need python-gconf and python-gtk2, but the datahub is optional.

Future plans

We have spend a lot of time planning and designing lately. When we have a stable reference implementation of our design in Python we plan to use that as a template for a C implementation. To be clear – the C version will be log-format and API compatible with the Python version.

We plan to make good use of the upcoming Zeitgeist hackfest and should have a 0.3 development release ready shortly after. If we are happy about the 0.3 series we will rename it to 0.9 and go for a 1.0.

Regarding Gnome 3.0 I think we are in a situation much like Owen Taylor recently outlined for Gnome Shell on the release-team mailing list. If we are desperate for Zeitgeist to be included in a Gnome 3.0 this March I believe it would be doable. It will require that we really bust our backs and cut some corners, but it’s doable. Personally (not speaking for the Zeitgeist team here) I am not sure it would be a very good idea for the same reasons Owen mention.

Relation to Tracker and Other Semantic Technologies

The very short version of this is that Tracker and Zeitgeist does not depend on each other in any way. The catch however is that either one becomes a whole lot more powerful when working together. To take an example consider tagging. Zeitgeist is just a log so we don’t manage your tags, we are however fully equipped to understand events concerning your tags. So you manage the tags via Tracker and track their usage in Zeitgeist. The combined power enables one to reason about what tags relate to resources in a temporal manner, even with resources that are not tagged.

In the Zeitgeist world we call an application like Tracker a Repository. Nepomuk or Desktop-CouchDB might work as other Repositories. If there is some confusion in this area it is understandable, since we do have some Repository-like features in our 0.2 series. This is however removed from the 0.3 series. It is still undecided if we want to define a minimal Repostiory DBus API for Zeitgeist and then ship a reference impl. of this API (which would run in a separate process). Any full fledged Repository would be able own the Repository service on the bus and Zeitgeist would not run its own. But again let me stress that a Repository is not needed for the Zeitgeist Log daemon to be useful.

With the 2.30 module deadline passed it seems appropriate that we give
a status report from the Zeitgeist team.

Since there have been a good deal of confusion about what Zeitgeist
is, and isn’t, about I will try clear this up in this mail as well. I
will try to stay low on the buzz word factor and leave some of the
more exotic use cases out to avoid too much speculation.

Zeitgeist in 1 sentence

Zeitgeist is an event logging framework used to keep a log of user
activity in a structured way.

What new services do we provide for UIs and applications

Zeitgeist provides a DBus API to query and update the activity log.
Clients can query on time ranges, the acting applications, mimetypes,
and Nepomuk classifications of the subjects and events. Sorting can be
done on various criteria such as usage frequency and recent usage.

Concrete examples could be “Get me most used files of mimetypes x,y or
z between the months January till March”

One can also query for documents that are used in context with others.
As in “Which documents/websites are used with http://youtube.com/ within the last week”.

It is also possible for the applications to get notified when the log
is updated. This is for instance used by the Parental Control
application as well as the GNOME Activity Journal.

What Problems can we solve

The straight forward use case is as a GtkRecentManager on drugs.
Zeitgeist removes the need for each application to parse a big XML
file to retrieve recently used documents. It also removes the need to
ever truncate your usage history, our database format is compact and
can easily contain years of history. My estimation is that 1M log
entries will take up about 80mb (give or take 20mb).

Open up for a range of query capabilities that GtkRecentManager
doesn’t provide. Instead of simply storing the most recent usage event
on a resource we store all usage events. This way we can not only
answer when the most recent use case was, but also account for the
entire usage history.

One use case that is already in the works is having the most used
resources within the last 3 weeks for an app in the context menu in a
window list. This is for example done in Docky.

Looking past just logging resource usage we will also start monitoring
window and document focus times. This opens up to a whole new world of
contextual relevancy that I wont elaborate on here. I am trying to
stick to the more down to earth aspects of Zeitgeist.

Which processes/daemons do we run

Zeitgeist itself is a single DBus daemon. Where the picture gets a
little more fuzzy is how we collect events. The long term goal is for
apps to submit events, maybe hooking directly into GtkRecentManager,
or in any case provide a very convenient way for apps to do this. Apps
like Pidgin or Empathy would probably need some plugin for logging
usage statistics of your contacts.

Right now we resort to less elegant ways of collecting events, like
running a separate daemon harvesting Firefox’s history,
GtkRecentlyUsed’s and other applications’ history (this daemon is also
known as the datahub). The datahub is already on its way to becoming
redundant now that a Firefox extension is in the works (and one for
Epiphany already exist). It is our intent that the datahub should
eventually go away as application support becomes widespread, but it
may eventually still prove useful for usage together with online
service.

How resource hungry are we

Normal memory usage is around 5-10mb for the core Zeitgeist daemon.
The datahub process (and I repeat; we want to get rid of this) is
about 12mb.

What dependencies

Right now the daemon depends on SQLite, Python 2.5, python-gobject,
python-xdg, and python-dbus. For the datahub we additionally need
python-gconf and python-gtk2, but the datahub is optional.

Future plans

We have spend a lot of time planning and designing lately. When we
have a stable reference implementation of our design in Python we plan
to use that as a template for a C implementation. To be clear – the C
version will be log-format and API compatible with the Python version.

We plan to make good use of the upcoming Zeitgeist hackfest and should
have a 0.3 development release ready shortly after. If we are happy
about the 0.3 series we will rename it to 0.9 and go for a 1.0.

Regarding Gnome 3.0 I think we are in a situation much like Owen
Taylor recently outlined for Gnome Shell on the release-team mailing
list[1]. If we are desperate for Zeitgeist to be included in a Gnome
3.0 this March I believe it would be doable. It will require that we
really bust our backs and cut some corners, but it’s doable.
Personally (not speaking for the Zeitgeist team here) I am not sure it
would be a very good idea for the same reasons Owen mention.

Relation to Tracker and Other Semantic Technologies

The very short version of this is that Tracker and Zeitgeist does not
depend on each other in any way. The catch however is that either one
becomes a whole lot more powerful when working together. To take an
example consider tagging. Zeitgeist is just a log so we don’t manage
your tags, we are however fully equipped to understand events
concerning your tags. So you manage the tags via Tracker and track
their usage in Zeitgeist. The combined power enables one to reason
about what tags relate to resources in a temporal manner, even with
resources that are not tagged.

In the Zeitgeist world we call an application like Tracker a
Repository. Nepomuk or Desktop-CouchDB might work as other
Repositories. If there is some confusion in this area it is
understandable, since we do have some Repository-like features in our
0.2 series. This is however removed from the 0.3 series. It is still
undecided if we want to define a minimal Repostiory DBus API for
Zeitgeist and then ship a reference impl. of this API (which would run
in a separate process). Any full fledged Repository would be able own
the Repository service on the bus and Zeitgeist would not run its own.
But again let me stress that a Repository is not needed for the
Zeitgeist Log daemon to be useful.

!Windows

Tuesday, August 25th, 2009

Popup windows, Utility windows, Modal dialogs, Warning windows, Schmindow window! Arrrgh! Why do we have them in the first place?

Lately, like the past few months, I’ve been ever increasingly annoyed with application windows of all kinds. Why is it that almost all applications pops up tonnes of small windows for all kinds of odd purposes? Talking about elegance I am pretty sure that we could provide a more elegant solution without a new window popping up in 95% of the cases.

For instance I really like the new GtkInfoBar as seen in GEdit below, one less dialog right there:

I also think that Apple has been ingenious with their Sheet Windows:

For those unfamiliar with sheet windows they roll out from the top of your window and acts as document- or application modal dialogs, but contrary to dialogs they are not full fledged windows, but appear as “attached” to the application.

I am not saying that we should blindly copy the competition, but I really think that we could give our users a really smooth ride if we don’t pop up new windows unless we absolutely have to.

Window Manager?

I actually think that most modern window managers do a respectable job, but when apps start throwing random windows all over the screen there’s really not much they can do to make it look pretty. But why do the apps create all these windows that I am talking about then? Well, I don’t think they have ever had much of a choice. With the inclusion of GtkInfoBar in Gtk+ this has become a bit better, but we can take it so much further.

Example

Let me stress that I am not picking on Synaptic in particular, it is just a convenient example. In general I like Synaptic a lot. Try counting the number of windows spawned when installing a new package. Counting the root password dialog and a single “find” action into the entire process I counted 7 windows. I can think of no reason (other than toolkit constraints) that there would ever be more than 1 window needed.

The One Window App

I think it would be cool to try and constrain applications to only one window (with the obvious exceptions like file managers etc.). Preference- , warning-, and Utility dialogs could all be integrated into the main window in some manner.

As a first stab one could convert all (modal) dialogs to borderless overlay windows and dim out the background. These overlay windows should be immovable and stick to the parent window when it is moved, much like the sheet windows mentioned above. This might even be testable by hacking up GtkDialog.

More advanced window-internalling techniques such as rollouts and morphing window layouts would likely require bigger changes in the applications.

I am on Crack

Yeah, I know that this is all a pipe dream, and that there are lots of pitfalls and stuff I haven’t considered. All these pesky windows are just getting to my nerves!

I hope that there are at least someone out there who agrees with me on this whole deal…         Somebody..?

… anybody?

Zeitgeist API Ramblings

Saturday, August 1st, 2009

I’ve been spending a lot of my brain cycles lately thinking about how to design the Zeitgeist DBus API properly. Let me tell you that it is a tough nut to crack – that, or I am not very good at this stuff :-)

Please note that this is merely a personal brain dump. I have not discussed this with the other Zeitgeist developers yet.

Zeitgeist Object Model

Before I can get to the real problem we have to be on the same page regarding the object model employed by Zeitgeist.

Here’s the deal. First chant with me: “Zeitgeist is an event logging framework”. For Zeitgeist to provide a useful API it needs to know about more things than just Events. It needs to know some minimal stuff about the Items that make up your desktop (files, emails, contacts, online services, etc.). It also needs to consider that these items can have Annotations such as tags, ratings, comments, being bookmarked, or other. Behold my graphical superiority:

Zeitgeist event, item, and annotation relations

Zeitgeist event, item, and annotation relations

As it stands both Annotations and Events are considered subclasses of Item in an Object Oriented sense. This means that we can have Annotations on Annotations and Events on Annotations (or Events on Events!).

I am definitely of the opinion that Annotations should be first class Items in their own right, however I am not sure about events. The case about Events being Items or not is mostly technical. From an ideological POV I think it is great if everything inherits from Item (more flexibility – yay!). The case is that we can cut down on the DB size if Events doesn’t subclass Item – hence the question marks on the label.

You may be able to grasp the Zeitgeist data model better if you read the database design spec.

The Problem: Querying

So we want to utilize this rich event log to query for interesting relations between Items (and by Item I will also mean any subclass hereof). Listing the most recently used tags would be a simple use case. The same application would probably also be interested in listing all the existing tags – or maybe more advanced, all tags on some specific subset of Items such as all files (and probably also a lot of more complex things).

The current Zeitgeist can do this, no problem. There is however the subtle problem that Zeitgeist is not meant to manage your tags. Zeitgeist is an Event Log – recall?

So assume that Item metadata and Annotations are stored elsewhere, in Tracker, CouchDB, Midgard, Soprano, Ikea or where ever you want. Where they should be. Let me call this sacred silo of user data the Repository. Applications will generally be interested in doing what amounts to an SQL JOIN over the Repository and the Zeitgeist event log. That is, cross referencing both and selecting subsets of data from both based on the relations between them.

If the Repository and Zeitgeist can’t cooperate in some way (fx. being in the same process having access to the same resources)  these “JOINS” can only be accomplished by fetching broad selections of data from the Repository and have Zeitgeist filter out the relevant parts before it returns the data to you. This will perform like crap.

Raising the Questions

  • Separation – How should we separate the Repository and the event log? If this is makes sense to do. To me they are two very different things.
  • API – Should Zeitgeist expose what it knows about the Repository as some “weak Repository API”, essentially acting as a proxy. This could make sense so that you might be able to run without a full blown repository, but only using the limited Repository functionality that Zeitgeist can provide natively.
  • Query Language vs. API – How does one query the Repository+Log? Currently we have nifty API utilising some “filters” you pass to the methods to limit what types of data you query. My playing around with this says that this is simply not powerful enough. It will power the ideas we have now, but what about next year? One really need the power of a query language to really reap the fruits of the event log. Obviously raising the question about what query language to expose:
    • SparQL – Powerful but a very hard (read:impossible) requirement if we want alterntive backends
    • MQL – JSON based query language of the Freebase project. The json-glib package should make this easier,
    • Xesam Query Language – Simple (the question is if it is too simple?). Has libs for Python, C/GObject, C++/Qt.
    • SQL – A simple subset of SQL should be fairly easy to shoehorn on top of most backends. Should be easy to parse too.

    For most of these languages we’d need only support of a subset of the full language

To be honest I hope someone comes up with a brilliant idea that is flexible enough to not make us need a full query language. I hope I am just an Architecture Astronaut.

The Other Problem: Different Backends

There are lots of interested parties in the Zeitgeist universe. This is wonderful, but also complicates matters a bit because there are lots of different agendas. I already mentioned Tracker, CouchDB, Midgard, and Soprano(Nepomuk). Of course we also have our own backend based on SQLite.

Tracker and Soprano can be queried with Sparql using the Nepomuk ontologies, but that is also where any similarities between all of the above ends. At least from my limited reading.

  • CouchDB uses some predefined “views” that must be known a priori. And is not very good at accommodating very varying queries. Maybe some Couch wizards can elaborate on what kind of querying one can do. And don’t get me wrong – I think the way data is queried in Couch is extremely elegant!
  • Midgard appears to expose a query building framework without exposing the actual query language. I like this approach because it gives flexibility to both the client and the server.

Ideas

I have a small collection of ideas that address many of the issues I’ve raised. I am not really happy about any of them, but I give them here anyway.

  • Monolithic Log – The idea here is to store all relevant item data right inside each log statement. That way we will not have to do the “JOIN” with the Repository I lamented about earlier. Everything will be right here in the log. The problems this approach drags along are an increased size of the log file and that the application submitting the log entry would have to know all about the subject of the logged event (that or the Zeitgeist engine needs to look it up before if commits to the log).
  • DBus Query Builder – A way to remove the need for a full query language would be to expose some query building framework in the API. This would not be a problem if the query building where done locally by an in-prcess library, but we are exposing a DBus interface here. Building queries will result in lots of DBus round trips and this is really bad (especially on lighter devices). To remedy this one could provide a “prepared statement” like interface where the apps build query templates before they need them and then submit the parameters to these templates when they perform the actual query. This will bring some book keeping for both clients and server. The client side could probably be handled fairly gracefully by a library though.
  • Log File With Index – The idea here is simply build up a semantic log file without really having a database or such thing it is stored in. Use a GZip compressed text file appending JSON or Turtle much inspired by the Metadata on Removable Devices Spec. This will of course not result in something we can query efficiently (updating such a structure will be very fast and light though). Alternatively this log can be stored in a CouchDB backend and you can have it replicated among all your workstations. To make it “queryable” this log would be indexed by Tracker, Strigi, whatever, or some Zeitgeist native SQLite.

All of these ideas can actually all be combined and solve almost all problems I’ve discussed. However there are almost certainly better solutions lurking out there on the internet – in the back of young aspiring hackers (or veterans!). Perhaps you dear reader?

Or maybe we can actually do fine with what we already have and I just pulled you through a long and elaborate blog post wasting your time?

Fin!

Sorry if this seemed like a long incomprehensible rambling. There are almost certainly loose ends and incomplete explanations. Do ask me to elaborate – I will respond :-)

tla

Sunday, December 14th, 2008

This post has a three letter abbrev. in the title because it is nothing but a shameless attempt to fix the post title lengths at three letters on planet gnome.

Update:

Yeah I admit that I might be reading XKCD too much… For the ones that didn’t catch it: