Posts Tagged ‘zeitgeist’

Zeitgeist Hackfest Reflections

Friday, November 13th, 2009

Sitting here on my final day of the Zeitgeist hackfest. I really wanted to blog more, but I’ve been so busy hacking that I didn’t find the time. I hardly slept or ate either! I will follow up on this post with more technical details from the hackfest.

The stay here in Bolzano has been very nice. Fresh mountain air and very warm conditions compared to my usual Danish habitat. I want to give a warm thanks to our sponsors and organizers for making this possible, the whole arrangement has been run very smoothly and I don’t have a single thing to complain about. So big props go to sponsors:

And let us not forget the horde of local Bolzanites (what’s the correct term for people living in Bolzano again?) who helped us around the city and helped us at the CTS (our hacking venue). They showed an extraordinary display of patience and hospitality helping the flock of geeks around :-) Thanks guys and girls!

As I mentioned in the first paragraph we’ve been extraordinarily busy hacking. More or less skipping lunch and supper, hacking until we collapsed from fatigue. I was working fully in the engine team and I really really wanted us to have the new engine in a working and unit-tested state after the hackfest (so we can give you a development release during the next week hopefully). I had a moment of despair in the middle of the week where bugs just continued to pop up and we had a hard time coming up with the right architecture. With a relentless effort from the entire engine team we pulled it through in the end though, and I am really happy about our new design and API. Perhaps I am just a tiny bit biased, but I really feel that our new design feels “just right” :-)

The UI team where very busy as well and I saw lots of very cool stuff being hacked out. Let’s hope that they choose to share some of the bling with us :-) I know as a matter of fact that they made huge progress on the Gnome Activity Journal and I can’t wait to have a release of this ready on top of the new engine.

That’s all for now, I intend to follow this post up with a more technical post later. Ciao!

All My Bags Are Packed, I’m Ready to Go

Saturday, November 7th, 2009

I’ve prepped up for the upcoming Zeitgeist hackfest in Bolzano, bags packed, laptop charged, and kissed the kids goodnight and extra time. I have to get up at 3.40 this night in order to catch my plane.

It’s going to be great to catch up with the other developers and finally meet them face to face. It’s always an odd feeling meeting people in real life for the first time when you talked so much with them on various online mediums.

My personal aim for the hackfest is to do a lot of coding and really push Zeitgeist closer to production readiness. We done a lot of drafting and discussion lately, so we are really set to get the actual coding done now. I have a lot of travelling time before I get there so I expect to get a head start on the hacking :-)

Huge props to the sponsors and organizers!

Zeitgeist Status Update

Tuesday, November 3rd, 2009

I just posted a long status update on the Zeitgeist project to Gnome’s desktop-devel mailing list. I bring it here in bloggified form to help spread the word past the desktop-devel crowd.

With the 2.30 module deadline passed it seems appropriate that we give a status report from the Zeitgeist team.

Since there have been a good deal of confusion about what Zeitgeist is, and isn’t, about I will try clear this up in this mail as well. I will try to stay low on the buzz word factor and leave some of the more exotic use cases out to avoid too much speculation.

Zeitgeist in 1 sentence

Zeitgeist is an event logging framework used to keep a log of user activity in a structured way.

What new services do we provide for UIs and applications

Zeitgeist provides a DBus API to query and update the activity log. Clients can query on time ranges, the acting applications, mimetypes, and Nepomuk classifications of the subjects and events. Sorting can be done on various criteria such as usage frequency and recent usage.

Concrete examples could be “Get me most used files of mimetypes x,y or z between the months January till March”

One can also query for documents that are used in context with others. As in “Which documents/websites are used with http://youtube.com within the last week”.

It is also possible for the applications to get notified when the log is updated. This is for instance used by the Parental Control application as well as the GNOME Activity Journal.

What Problems can we solve

The straight forward use case is as a GtkRecentManager on drugs. Zeitgeist removes the need for each application to parse a big XML file to retrieve recently used documents. It also removes the need to ever truncate your usage history, our database format is compact and can easily contain years of history. My estimation is that 1M log entries will take up about 80mb (give or take 20mb).

Open up for a range of query capabilities that GtkRecentManager doesn’t provide. Instead of simply storing the most recent usage event on a resource we store all usage events. This way we can not only answer when the most recent use case was, but also account for the entire usage history.

One use case that is already in the works is having the most used resources within the last 3 weeks for an app in the context menu in a window list. This is for example done in Docky.

Looking past just logging resource usage we will also start monitoring window and document focus times. This opens up to a whole new world of contextual relevancy that I wont elaborate on here. I am trying to stick to the more down to earth aspects of Zeitgeist.

Which processes/daemons do we run

Zeitgeist itself is a single DBus daemon. Where the picture gets a little more fuzzy is how we collect events. The long term goal is for apps to submit events, maybe hooking directly into GtkRecentManager, or in any case provide a very convenient way for apps to do this. Apps like Pidgin or Empathy would probably need some plugin for logging usage statistics of your contacts.

Right now we resort to less elegant ways of collecting events, like running a separate daemon harvesting Firefox’s history, GtkRecentlyUsed’s and other applications’ history (this daemon is also known as the datahub). The datahub is already on its way to becoming redundant now that a Firefox extension is in the works (and one for Epiphany already exist). It is our intent that the datahub should eventually go away as application support becomes widespread, but it
may eventually still prove useful for usage together with online service.

How resource hungry are we

Normal memory usage is around 5-10mb for the core Zeitgeist daemon. The datahub process (and I repeat; we want to get rid of this) is about 12mb.

What dependencies

Right now the daemon depends on SQLite, Python 2.5, python-gobject, python-xdg, and python-dbus. For the datahub we additionally need python-gconf and python-gtk2, but the datahub is optional.

Future plans

We have spend a lot of time planning and designing lately. When we have a stable reference implementation of our design in Python we plan to use that as a template for a C implementation. To be clear – the C version will be log-format and API compatible with the Python version.

We plan to make good use of the upcoming Zeitgeist hackfest and should have a 0.3 development release ready shortly after. If we are happy about the 0.3 series we will rename it to 0.9 and go for a 1.0.

Regarding Gnome 3.0 I think we are in a situation much like Owen Taylor recently outlined for Gnome Shell on the release-team mailing list. If we are desperate for Zeitgeist to be included in a Gnome 3.0 this March I believe it would be doable. It will require that we really bust our backs and cut some corners, but it’s doable. Personally (not speaking for the Zeitgeist team here) I am not sure it would be a very good idea for the same reasons Owen mention.

Relation to Tracker and Other Semantic Technologies

The very short version of this is that Tracker and Zeitgeist does not depend on each other in any way. The catch however is that either one becomes a whole lot more powerful when working together. To take an example consider tagging. Zeitgeist is just a log so we don’t manage your tags, we are however fully equipped to understand events concerning your tags. So you manage the tags via Tracker and track their usage in Zeitgeist. The combined power enables one to reason about what tags relate to resources in a temporal manner, even with resources that are not tagged.

In the Zeitgeist world we call an application like Tracker a Repository. Nepomuk or Desktop-CouchDB might work as other Repositories. If there is some confusion in this area it is understandable, since we do have some Repository-like features in our 0.2 series. This is however removed from the 0.3 series. It is still undecided if we want to define a minimal Repostiory DBus API for Zeitgeist and then ship a reference impl. of this API (which would run in a separate process). Any full fledged Repository would be able own the Repository service on the bus and Zeitgeist would not run its own. But again let me stress that a Repository is not needed for the Zeitgeist Log daemon to be useful.

With the 2.30 module deadline passed it seems appropriate that we give
a status report from the Zeitgeist team.

Since there have been a good deal of confusion about what Zeitgeist
is, and isn’t, about I will try clear this up in this mail as well. I
will try to stay low on the buzz word factor and leave some of the
more exotic use cases out to avoid too much speculation.

Zeitgeist in 1 sentence

Zeitgeist is an event logging framework used to keep a log of user
activity in a structured way.

What new services do we provide for UIs and applications

Zeitgeist provides a DBus API to query and update the activity log.
Clients can query on time ranges, the acting applications, mimetypes,
and Nepomuk classifications of the subjects and events. Sorting can be
done on various criteria such as usage frequency and recent usage.

Concrete examples could be “Get me most used files of mimetypes x,y or
z between the months January till March”

One can also query for documents that are used in context with others.
As in “Which documents/websites are used with http://youtube.com/ within the last week”.

It is also possible for the applications to get notified when the log
is updated. This is for instance used by the Parental Control
application as well as the GNOME Activity Journal.

What Problems can we solve

The straight forward use case is as a GtkRecentManager on drugs.
Zeitgeist removes the need for each application to parse a big XML
file to retrieve recently used documents. It also removes the need to
ever truncate your usage history, our database format is compact and
can easily contain years of history. My estimation is that 1M log
entries will take up about 80mb (give or take 20mb).

Open up for a range of query capabilities that GtkRecentManager
doesn’t provide. Instead of simply storing the most recent usage event
on a resource we store all usage events. This way we can not only
answer when the most recent use case was, but also account for the
entire usage history.

One use case that is already in the works is having the most used
resources within the last 3 weeks for an app in the context menu in a
window list. This is for example done in Docky.

Looking past just logging resource usage we will also start monitoring
window and document focus times. This opens up to a whole new world of
contextual relevancy that I wont elaborate on here. I am trying to
stick to the more down to earth aspects of Zeitgeist.

Which processes/daemons do we run

Zeitgeist itself is a single DBus daemon. Where the picture gets a
little more fuzzy is how we collect events. The long term goal is for
apps to submit events, maybe hooking directly into GtkRecentManager,
or in any case provide a very convenient way for apps to do this. Apps
like Pidgin or Empathy would probably need some plugin for logging
usage statistics of your contacts.

Right now we resort to less elegant ways of collecting events, like
running a separate daemon harvesting Firefox’s history,
GtkRecentlyUsed’s and other applications’ history (this daemon is also
known as the datahub). The datahub is already on its way to becoming
redundant now that a Firefox extension is in the works (and one for
Epiphany already exist). It is our intent that the datahub should
eventually go away as application support becomes widespread, but it
may eventually still prove useful for usage together with online
service.

How resource hungry are we

Normal memory usage is around 5-10mb for the core Zeitgeist daemon.
The datahub process (and I repeat; we want to get rid of this) is
about 12mb.

What dependencies

Right now the daemon depends on SQLite, Python 2.5, python-gobject,
python-xdg, and python-dbus. For the datahub we additionally need
python-gconf and python-gtk2, but the datahub is optional.

Future plans

We have spend a lot of time planning and designing lately. When we
have a stable reference implementation of our design in Python we plan
to use that as a template for a C implementation. To be clear – the C
version will be log-format and API compatible with the Python version.

We plan to make good use of the upcoming Zeitgeist hackfest and should
have a 0.3 development release ready shortly after. If we are happy
about the 0.3 series we will rename it to 0.9 and go for a 1.0.

Regarding Gnome 3.0 I think we are in a situation much like Owen
Taylor recently outlined for Gnome Shell on the release-team mailing
list[1]. If we are desperate for Zeitgeist to be included in a Gnome
3.0 this March I believe it would be doable. It will require that we
really bust our backs and cut some corners, but it’s doable.
Personally (not speaking for the Zeitgeist team here) I am not sure it
would be a very good idea for the same reasons Owen mention.

Relation to Tracker and Other Semantic Technologies

The very short version of this is that Tracker and Zeitgeist does not
depend on each other in any way. The catch however is that either one
becomes a whole lot more powerful when working together. To take an
example consider tagging. Zeitgeist is just a log so we don’t manage
your tags, we are however fully equipped to understand events
concerning your tags. So you manage the tags via Tracker and track
their usage in Zeitgeist. The combined power enables one to reason
about what tags relate to resources in a temporal manner, even with
resources that are not tagged.

In the Zeitgeist world we call an application like Tracker a
Repository. Nepomuk or Desktop-CouchDB might work as other
Repositories. If there is some confusion in this area it is
understandable, since we do have some Repository-like features in our
0.2 series. This is however removed from the 0.3 series. It is still
undecided if we want to define a minimal Repostiory DBus API for
Zeitgeist and then ship a reference impl. of this API (which would run
in a separate process). Any full fledged Repository would be able own
the Repository service on the bus and Zeitgeist would not run its own.
But again let me stress that a Repository is not needed for the
Zeitgeist Log daemon to be useful.

Zeitgeist API Ramblings

Saturday, August 1st, 2009

I’ve been spending a lot of my brain cycles lately thinking about how to design the Zeitgeist DBus API properly. Let me tell you that it is a tough nut to crack – that, or I am not very good at this stuff :-)

Please note that this is merely a personal brain dump. I have not discussed this with the other Zeitgeist developers yet.

Zeitgeist Object Model

Before I can get to the real problem we have to be on the same page regarding the object model employed by Zeitgeist.

Here’s the deal. First chant with me: “Zeitgeist is an event logging framework”. For Zeitgeist to provide a useful API it needs to know about more things than just Events. It needs to know some minimal stuff about the Items that make up your desktop (files, emails, contacts, online services, etc.). It also needs to consider that these items can have Annotations such as tags, ratings, comments, being bookmarked, or other. Behold my graphical superiority:

Zeitgeist event, item, and annotation relations

Zeitgeist event, item, and annotation relations

As it stands both Annotations and Events are considered subclasses of Item in an Object Oriented sense. This means that we can have Annotations on Annotations and Events on Annotations (or Events on Events!).

I am definitely of the opinion that Annotations should be first class Items in their own right, however I am not sure about events. The case about Events being Items or not is mostly technical. From an ideological POV I think it is great if everything inherits from Item (more flexibility – yay!). The case is that we can cut down on the DB size if Events doesn’t subclass Item – hence the question marks on the label.

You may be able to grasp the Zeitgeist data model better if you read the database design spec.

The Problem: Querying

So we want to utilize this rich event log to query for interesting relations between Items (and by Item I will also mean any subclass hereof). Listing the most recently used tags would be a simple use case. The same application would probably also be interested in listing all the existing tags – or maybe more advanced, all tags on some specific subset of Items such as all files (and probably also a lot of more complex things).

The current Zeitgeist can do this, no problem. There is however the subtle problem that Zeitgeist is not meant to manage your tags. Zeitgeist is an Event Log – recall?

So assume that Item metadata and Annotations are stored elsewhere, in Tracker, CouchDB, Midgard, Soprano, Ikea or where ever you want. Where they should be. Let me call this sacred silo of user data the Repository. Applications will generally be interested in doing what amounts to an SQL JOIN over the Repository and the Zeitgeist event log. That is, cross referencing both and selecting subsets of data from both based on the relations between them.

If the Repository and Zeitgeist can’t cooperate in some way (fx. being in the same process having access to the same resources)  these “JOINS” can only be accomplished by fetching broad selections of data from the Repository and have Zeitgeist filter out the relevant parts before it returns the data to you. This will perform like crap.

Raising the Questions

  • Separation – How should we separate the Repository and the event log? If this is makes sense to do. To me they are two very different things.
  • API – Should Zeitgeist expose what it knows about the Repository as some “weak Repository API”, essentially acting as a proxy. This could make sense so that you might be able to run without a full blown repository, but only using the limited Repository functionality that Zeitgeist can provide natively.
  • Query Language vs. API – How does one query the Repository+Log? Currently we have nifty API utilising some “filters” you pass to the methods to limit what types of data you query. My playing around with this says that this is simply not powerful enough. It will power the ideas we have now, but what about next year? One really need the power of a query language to really reap the fruits of the event log. Obviously raising the question about what query language to expose:
    • SparQL – Powerful but a very hard (read:impossible) requirement if we want alterntive backends
    • MQL – JSON based query language of the Freebase project. The json-glib package should make this easier,
    • Xesam Query Language – Simple (the question is if it is too simple?). Has libs for Python, C/GObject, C++/Qt.
    • SQL – A simple subset of SQL should be fairly easy to shoehorn on top of most backends. Should be easy to parse too.

    For most of these languages we’d need only support of a subset of the full language

To be honest I hope someone comes up with a brilliant idea that is flexible enough to not make us need a full query language. I hope I am just an Architecture Astronaut.

The Other Problem: Different Backends

There are lots of interested parties in the Zeitgeist universe. This is wonderful, but also complicates matters a bit because there are lots of different agendas. I already mentioned Tracker, CouchDB, Midgard, and Soprano(Nepomuk). Of course we also have our own backend based on SQLite.

Tracker and Soprano can be queried with Sparql using the Nepomuk ontologies, but that is also where any similarities between all of the above ends. At least from my limited reading.

  • CouchDB uses some predefined “views” that must be known a priori. And is not very good at accommodating very varying queries. Maybe some Couch wizards can elaborate on what kind of querying one can do. And don’t get me wrong – I think the way data is queried in Couch is extremely elegant!
  • Midgard appears to expose a query building framework without exposing the actual query language. I like this approach because it gives flexibility to both the client and the server.

Ideas

I have a small collection of ideas that address many of the issues I’ve raised. I am not really happy about any of them, but I give them here anyway.

  • Monolithic Log – The idea here is to store all relevant item data right inside each log statement. That way we will not have to do the “JOIN” with the Repository I lamented about earlier. Everything will be right here in the log. The problems this approach drags along are an increased size of the log file and that the application submitting the log entry would have to know all about the subject of the logged event (that or the Zeitgeist engine needs to look it up before if commits to the log).
  • DBus Query Builder – A way to remove the need for a full query language would be to expose some query building framework in the API. This would not be a problem if the query building where done locally by an in-prcess library, but we are exposing a DBus interface here. Building queries will result in lots of DBus round trips and this is really bad (especially on lighter devices). To remedy this one could provide a “prepared statement” like interface where the apps build query templates before they need them and then submit the parameters to these templates when they perform the actual query. This will bring some book keeping for both clients and server. The client side could probably be handled fairly gracefully by a library though.
  • Log File With Index – The idea here is simply build up a semantic log file without really having a database or such thing it is stored in. Use a GZip compressed text file appending JSON or Turtle much inspired by the Metadata on Removable Devices Spec. This will of course not result in something we can query efficiently (updating such a structure will be very fast and light though). Alternatively this log can be stored in a CouchDB backend and you can have it replicated among all your workstations. To make it “queryable” this log would be indexed by Tracker, Strigi, whatever, or some Zeitgeist native SQLite.

All of these ideas can actually all be combined and solve almost all problems I’ve discussed. However there are almost certainly better solutions lurking out there on the internet – in the back of young aspiring hackers (or veterans!). Perhaps you dear reader?

Or maybe we can actually do fine with what we already have and I just pulled you through a long and elaborate blog post wasting your time?

Fin!

Sorry if this seemed like a long incomprehensible rambling. There are almost certainly loose ends and incomplete explanations. Do ask me to elaborate – I will respond :-)