google
yahoo
bing

Zeitgeist Status Update

I just posted a long status update on the Zeitgeist project to Gnome’s desktop-devel mailing list. I bring it here in bloggified form to help spread the word past the desktop-devel crowd.

With the 2.30 module deadline passed it seems appropriate that we give a status report from the Zeitgeist team.

Since there have been a good deal of confusion about what Zeitgeist is, and isn’t, about I will try clear this up in this mail as well. I will try to stay low on the buzz word factor and leave some of the more exotic use cases out to avoid too much speculation.

Zeitgeist in 1 sentence

Zeitgeist is an event logging framework used to keep a log of user activity in a structured way.

What new services do we provide for UIs and applications

Zeitgeist provides a DBus API to query and update the activity log. Clients can query on time ranges, the acting applications, mimetypes, and Nepomuk classifications of the subjects and events. Sorting can be done on various criteria such as usage frequency and recent usage.

Concrete examples could be “Get me most used files of mimetypes x,y or z between the months January till March”

One can also query for documents that are used in context with others. As in “Which documents/websites are used with http://youtube.com within the last week”.

It is also possible for the applications to get notified when the log is updated. This is for instance used by the Parental Control application as well as the GNOME Activity Journal.

What Problems can we solve

The straight forward use case is as a GtkRecentManager on drugs. Zeitgeist removes the need for each application to parse a big XML file to retrieve recently used documents. It also removes the need to ever truncate your usage history, our database format is compact and can easily contain years of history. My estimation is that 1M log entries will take up about 80mb (give or take 20mb).

Open up for a range of query capabilities that GtkRecentManager doesn’t provide. Instead of simply storing the most recent usage event on a resource we store all usage events. This way we can not only answer when the most recent use case was, but also account for the entire usage history.

One use case that is already in the works is having the most used resources within the last 3 weeks for an app in the context menu in a window list. This is for example done in Docky.

Looking past just logging resource usage we will also start monitoring window and document focus times. This opens up to a whole new world of contextual relevancy that I wont elaborate on here. I am trying to stick to the more down to earth aspects of Zeitgeist.

Which processes/daemons do we run

Zeitgeist itself is a single DBus daemon. Where the picture gets a little more fuzzy is how we collect events. The long term goal is for apps to submit events, maybe hooking directly into GtkRecentManager, or in any case provide a very convenient way for apps to do this. Apps like Pidgin or Empathy would probably need some plugin for logging usage statistics of your contacts.

Right now we resort to less elegant ways of collecting events, like running a separate daemon harvesting Firefox’s history, GtkRecentlyUsed’s and other applications’ history (this daemon is also known as the datahub). The datahub is already on its way to becoming redundant now that a Firefox extension is in the works (and one for Epiphany already exist). It is our intent that the datahub should eventually go away as application support becomes widespread, but it
may eventually still prove useful for usage together with online service.

How resource hungry are we

Normal memory usage is around 5-10mb for the core Zeitgeist daemon. The datahub process (and I repeat; we want to get rid of this) is about 12mb.

What dependencies

Right now the daemon depends on SQLite, Python 2.5, python-gobject, python-xdg, and python-dbus. For the datahub we additionally need python-gconf and python-gtk2, but the datahub is optional.

Future plans

We have spend a lot of time planning and designing lately. When we have a stable reference implementation of our design in Python we plan to use that as a template for a C implementation. To be clear – the C version will be log-format and API compatible with the Python version.

We plan to make good use of the upcoming Zeitgeist hackfest and should have a 0.3 development release ready shortly after. If we are happy about the 0.3 series we will rename it to 0.9 and go for a 1.0.

Regarding Gnome 3.0 I think we are in a situation much like Owen Taylor recently outlined for Gnome Shell on the release-team mailing list. If we are desperate for Zeitgeist to be included in a Gnome 3.0 this March I believe it would be doable. It will require that we really bust our backs and cut some corners, but it’s doable. Personally (not speaking for the Zeitgeist team here) I am not sure it would be a very good idea for the same reasons Owen mention.

Relation to Tracker and Other Semantic Technologies

The very short version of this is that Tracker and Zeitgeist does not depend on each other in any way. The catch however is that either one becomes a whole lot more powerful when working together. To take an example consider tagging. Zeitgeist is just a log so we don’t manage your tags, we are however fully equipped to understand events concerning your tags. So you manage the tags via Tracker and track their usage in Zeitgeist. The combined power enables one to reason about what tags relate to resources in a temporal manner, even with resources that are not tagged.

In the Zeitgeist world we call an application like Tracker a Repository. Nepomuk or Desktop-CouchDB might work as other Repositories. If there is some confusion in this area it is understandable, since we do have some Repository-like features in our 0.2 series. This is however removed from the 0.3 series. It is still undecided if we want to define a minimal Repostiory DBus API for Zeitgeist and then ship a reference impl. of this API (which would run in a separate process). Any full fledged Repository would be able own the Repository service on the bus and Zeitgeist would not run its own. But again let me stress that a Repository is not needed for the Zeitgeist Log daemon to be useful.

With the 2.30 module deadline passed it seems appropriate that we give
a status report from the Zeitgeist team.

Since there have been a good deal of confusion about what Zeitgeist
is, and isn’t, about I will try clear this up in this mail as well. I
will try to stay low on the buzz word factor and leave some of the
more exotic use cases out to avoid too much speculation.

Zeitgeist in 1 sentence

Zeitgeist is an event logging framework used to keep a log of user
activity in a structured way.

What new services do we provide for UIs and applications

Zeitgeist provides a DBus API to query and update the activity log.
Clients can query on time ranges, the acting applications, mimetypes,
and Nepomuk classifications of the subjects and events. Sorting can be
done on various criteria such as usage frequency and recent usage.

Concrete examples could be “Get me most used files of mimetypes x,y or
z between the months January till March”

One can also query for documents that are used in context with others.
As in “Which documents/websites are used with http://youtube.com/ within the last week”.

It is also possible for the applications to get notified when the log
is updated. This is for instance used by the Parental Control
application as well as the GNOME Activity Journal.

What Problems can we solve

The straight forward use case is as a GtkRecentManager on drugs.
Zeitgeist removes the need for each application to parse a big XML
file to retrieve recently used documents. It also removes the need to
ever truncate your usage history, our database format is compact and
can easily contain years of history. My estimation is that 1M log
entries will take up about 80mb (give or take 20mb).

Open up for a range of query capabilities that GtkRecentManager
doesn’t provide. Instead of simply storing the most recent usage event
on a resource we store all usage events. This way we can not only
answer when the most recent use case was, but also account for the
entire usage history.

One use case that is already in the works is having the most used
resources within the last 3 weeks for an app in the context menu in a
window list. This is for example done in Docky.

Looking past just logging resource usage we will also start monitoring
window and document focus times. This opens up to a whole new world of
contextual relevancy that I wont elaborate on here. I am trying to
stick to the more down to earth aspects of Zeitgeist.

Which processes/daemons do we run

Zeitgeist itself is a single DBus daemon. Where the picture gets a
little more fuzzy is how we collect events. The long term goal is for
apps to submit events, maybe hooking directly into GtkRecentManager,
or in any case provide a very convenient way for apps to do this. Apps
like Pidgin or Empathy would probably need some plugin for logging
usage statistics of your contacts.

Right now we resort to less elegant ways of collecting events, like
running a separate daemon harvesting Firefox’s history,
GtkRecentlyUsed’s and other applications’ history (this daemon is also
known as the datahub). The datahub is already on its way to becoming
redundant now that a Firefox extension is in the works (and one for
Epiphany already exist). It is our intent that the datahub should
eventually go away as application support becomes widespread, but it
may eventually still prove useful for usage together with online
service.

How resource hungry are we

Normal memory usage is around 5-10mb for the core Zeitgeist daemon.
The datahub process (and I repeat; we want to get rid of this) is
about 12mb.

What dependencies

Right now the daemon depends on SQLite, Python 2.5, python-gobject,
python-xdg, and python-dbus. For the datahub we additionally need
python-gconf and python-gtk2, but the datahub is optional.

Future plans

We have spend a lot of time planning and designing lately. When we
have a stable reference implementation of our design in Python we plan
to use that as a template for a C implementation. To be clear – the C
version will be log-format and API compatible with the Python version.

We plan to make good use of the upcoming Zeitgeist hackfest and should
have a 0.3 development release ready shortly after. If we are happy
about the 0.3 series we will rename it to 0.9 and go for a 1.0.

Regarding Gnome 3.0 I think we are in a situation much like Owen
Taylor recently outlined for Gnome Shell on the release-team mailing
list[1]. If we are desperate for Zeitgeist to be included in a Gnome
3.0 this March I believe it would be doable. It will require that we
really bust our backs and cut some corners, but it’s doable.
Personally (not speaking for the Zeitgeist team here) I am not sure it
would be a very good idea for the same reasons Owen mention.

Relation to Tracker and Other Semantic Technologies

The very short version of this is that Tracker and Zeitgeist does not
depend on each other in any way. The catch however is that either one
becomes a whole lot more powerful when working together. To take an
example consider tagging. Zeitgeist is just a log so we don’t manage
your tags, we are however fully equipped to understand events
concerning your tags. So you manage the tags via Tracker and track
their usage in Zeitgeist. The combined power enables one to reason
about what tags relate to resources in a temporal manner, even with
resources that are not tagged.

In the Zeitgeist world we call an application like Tracker a
Repository. Nepomuk or Desktop-CouchDB might work as other
Repositories. If there is some confusion in this area it is
understandable, since we do have some Repository-like features in our
0.2 series. This is however removed from the 0.3 series. It is still
undecided if we want to define a minimal Repostiory DBus API for
Zeitgeist and then ship a reference impl. of this API (which would run
in a separate process). Any full fledged Repository would be able own
the Repository service on the bus and Zeitgeist would not run its own.
But again let me stress that a Repository is not needed for the
Zeitgeist Log daemon to be useful.

Tags: , ,

10 Responses to “Zeitgeist Status Update”

  1. pvanhoof Says:

    Nepomuk is not an RDF store, it’s a set of ontologies. Nepomuk-KDE is not just a store, it’s KDE’s Nepomuk usage, concept or platform, more or less. Tracker 0.7 uses Nepomuk’s ontologies too. Please keep this clear in explanations about RDF stores.

  2. kamstrup Says:

    @pvanhoof: Right, it would have been more correct to say Soprano. I deliberately used Nepomuk instead because people tend to think that Nepomuk is what really is Soprano. I didn’t want to go into the finer details about this, so I swallowed my pride and just called it Nepomuk. One could argue that I am just adding to the confusion – I plead guilty on that charge :-)

  3. Tom Says:

    Sorry but this really sounds quite useless to me, certainly not something I would want another deamon for…

  4. frej soya Says:

    Problems to solve (or user stories)
    Recent files:
    Parsing a big XML is an implementation issue, nothing a user cares about.

    How is the ‘list of recent stuff’ with a complete history helpful? The point of recent files 7+-2 elements that any user can quickly manage instead of searching for files, exploring vs. lookup. If we have 50 recent elements the user ends op exploring instead of a simple lookup.

    I doubt showing 3 weeks of recent files is helpfull for more than a few. It might even confuse a larger group than it helps. If the problem is that recent manager shows too few, maybe the problem can be fixed somewhere else? My guess is searching, maybe tracker is better for this problem?

    Contextual relevance:
    If contextual relevance is the only reason for related events, and we don’t know yet how to express contextual relevance or visualize it, then the related events is quite a implementation cost for a journal viewer? I do think a journal viewer is cool – but it’s not for everyone and shouldn’t it be doable without related events? Maybe a journal can be supported by a library API without the need for separate process? It’s not like you need to add a million words a day.

    Relating data always gives a nice fuzzy feeling (for some of us, me included), but sometimes we get stuck in data(bases) and not actual use cases… You end up implementing cool databases, without user justification ;) .

    Please don’t let the above stop your progress in anyway.

  5. kamstrup Says:

    @frej: So the user does not care about massive disk churn and long loading times? Maybe it is just me? ;-) It is an implementation detail right, but it hits users everyday.

    Also, I’ve never said that we intend to show a huge list of recent items where we only show a smaller list now. This is a matter of UI design. I am talking about the capabilities of the daemon. The daemon will provide a highly effient GtkRecentManager and also enable a range of temporal analysis for use in smart GUIs.

    And – we are not only talking recent files. We are talking about a log of _all_ your recent activity. IM, email, browsing, tagging, etc. I might have come off with the wrong impression here because I really tried hard to keep it simple and not talk too much about hypothetical use cases.

    The daemon can power new GUI types such as the Gnome Activity Journal or Nemo-like interfaces. [links needed, sorry]

    While Zeitgeist is to some degree cursed with an aura that we are doing all sorts of hand-wavy black magic the fact is that it is really simple stuff powering all of this. This is also why we will prevail in the end! :-D

  6. frej soya Says:

    Ramblings follow…..

    The reason why people keep asking is because you put up ‘fake’ examples like replacing gtkrecentmanager, which indeed is only recent files in general, but per app it’s actually recent document,recent video etc.. That code also serves a very specific use case, it might be slow but i’m sure it could be fixed easily (with time).

    I know you talk about a gtkrecentmanager, not recent files, but as long as you keep ‘recent’ i’m will keep bitching about it.. :)
    Recently used items is not about anything temporal, it’s taking advantage of
    * associative memory humans are equipped with
    * the ability to quickly compare and choose among short lists
    * The idea (thoroughly tested) that we tend to use the documents we used the last time. Not yesterday or last week, but whatever we did last time we had an app (task) open.

    However there are still no actual problems to be solved? If you wan’t people to buy in. They need to see that actual problems for actual users are solved, and especially how this implementation is great for this.

    PS: I don’t see any black magic at all. Really. If other people are claiming hand-wavy, it’s lack of communication on your part ;)
    PPS: I have seen nemo and gnome activity journal.
    PPS: You don’t need to answer, i’m just trying to explain why you need actual users and cases to begin with.

  7. Seif Lotfy Says:

    @frej: Well I don’t see it as fake if we can actually replace the recently-used-manager with a more efficient service. If it is fake then why did it exist in the first place.
    How can you ask for your activities during Christmas. Recently used won’t help you there IMHO.
    I agree that Zeitgeist is not for everyone. Keep in mind it is a service and not a UI. It did start out as a UI that did have a demand on which makes it legible in its own right to exist as a daemon, since the recentlyused manager did not help out.

    The simplest use case would be:
    Having a most used for documents/applicatons of any mimetype. Nothing provides this service.
    This can be used to sort search results. Or sort the “recently used” by popularity.

    I like your points alot and I will be preparing a big example of apps that are using zeitgeist and how they use it. It would be cool if u can hang out with us at #zeitgeist on freenode

  8. Phil Says:

    Why do you intend to rewrite Zeitgeist in C? It seems to me, that vala is much better idea – OO, GTK native and no contraverses like with Mono.
    And are there any plans to incorporate Zeitgeist with Nautilus? Because the last time I checked Zeitgeist front-end looked like some history view of a file manager. Of course I understant that you track more objects than just files, but still files are major part of it. So after discovering the file I worked last friday I probably want to copy it, move it, delete it, rename it and not just open it. So definatly its somehow doubleing the functionality of a file manager

  9. kamstrup Says:

    @Phil: About C – I recently did some test coding with Vala to see how it would be to write a DBus service in it and my conclusion was that it was not ready. Don’t get me wrong; I am a huge fan of Vala and it would be awesome to use for most other stuff, but not a DBus service as it stands IMHO.

    About Nautilus integration, I think that it is really just a matter of time… To my knowledge no one has looked at this, but the idea seems so obvious that it is bound to happen…

  10. Tollef Fog Heen Says:

    Thanks for the update!

    What I especially like about your update is that you care to have the “Zeitgeist in one sentence” bit. Far too often I see somebody announcing libsplat or a new version of qmwehj or somesuch and I have no idea what it is or if I should care. (In this particular case, I know what Zeitgeist is, so it doesn’t apply to me here, but it’s a good thing in general.)

Leave a Reply