What We Talk About When We Talk About Zeitgeist
There is a tangible confusion around as to what Zeitgeist is and what it isn’t; what it can do and what it can’t do. This is partly our own fault because we could have communicated this whole thing better, for instance we have some very outdated wiki pages lying around that you should probably stay away from until we updated them. In this post I aim to give a semi technical run down of the core Zeitgeist functionality and how we expose it for you to work with. This should hopefully clear out some confusion.
Events
The Zeitgeist daemon (also known as the engine) is a process that exposes an event logging framework as a DBus API. The structure of these events is that they have a block of metadata that describe the event itself (this is known simply as the event metadata) and another block of metadata that describes the subject, or subjects, that this event happened to (this part is known as the subject metadata). The metadata for the event looks like:
- Timestamp – When did this happen. Milliseconds since the Unix Epoch. Note that we see events as single points in time, meaning that events don’t have a duration
- Interpretation – Abstract interpretation of this event; what happened. Fx. “opened”, “saved”, “closed”, “send”, etc.
- Manifestation – How the event happened. Fx. “user activity”, “notification”, or “scheduled activity”.
- Actor – Who triggered it. This will typically point to the .desktop file of the acting application. It will most likely be an application, but it is not required to be so.
- Payload – A free-form binary blob that you may attach to the event. This is specifically application specific and mainly intended to be a “back door” for people to do all sorts of funky hacks.
Each event has one or more subjects associated. For each subject we store:
- URI – You guessed it! The URI of the subject
- Interpretation – Abstract interpretation of the subject. This could be “Document”, “Image”, “Video”, “Email”, “Instant message”, “Contact”, anything.
- Manifestation – How the subject is stored. This could be something like “File”, “Mailbox”, “Web page”.
- Origin – A URI pointing to the origin or “patron” of the subject. For files this would be the parent folder. For YouTube videos it would be http://youtube.com
- Mimetype – The format of the datastream representing the subject. Fx. text/plain, application/xml.
- Text – Textual information added to the subject. This is not applicable for for types of subjects.
- Storage – Identifier for the storage medium this subject resides on. We use this to make it possible for queries that return only events for subjects that are “available now”. Fx. some clients don’t want to show events for files that are stored on you USB pen drive when it is not connected.
Ontology – Or Data Description
In reality the metadata fields we store don’t contain simple strings like “Document” for the subject interpretation. It’s a bit more complex than that – sorry! We store a URI pointing to a formal definition of something categorized as a Document. This formal categorization is called an ontology if you want a word to confuse your friends with. We are fortunate enough that someone already wrote such a spec, namely the Nepomuk Ontology. So instead of just “Document” we store the string”http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#Document“.
Since Tracker also uses the Nepomuk ontologies you may take these formal classification strings and plug them directly into Tracker to find everything that Tracker considers a document.
We will also have an ontology for the event metadata as this is not covered by Nepomuk. We are actively working on this.
Getting Data Out – Querying the Event Log
We employ a template based query API for searching our log data. You send us a list of event templates you want to look for and how you want it sorted and we give you the results. So if you want to find all “open” events on subjects of type “Document” simply create an Event object, set the interpretaion to “open”-event and add a subject to the event template with the interpretation set to “document”. All other fields should be left blank. Send this template to us and we will give you the matches.
The list of event templates is collected into a big OR-query to imbue the consumers with more power.
Getting Data Into Zeitgeist
There are really no limits to what kind of events we could store. If you have a spare mobile with a in-built accelerometer and glue it to your front door then you could send an event over bluetooth to your desktop each time your front door opens. Probably there are better use cases?
The point is that the usefulness of Zeitgeist stands and falls with the events that you push into us. We can store anything that you can model using the structures I outlined above. I am pretty certain that people will not agree on the kinds of data they want logged, but we are ready for anything
Normal users would of course not need to think about getting their data into Zeitgeist. What developers need to know is that we have a simple DBus API to insert events (surprisingly called InsertEvents). It is called InsertEvents and not AppendEvents or something like that for a reason. Namely that you are allowed to insert events that are in the past. This is useful if you want to import your Firefox history or what ever. If you try to log an event twice the engine will throw an error at you, so no need to worry about dupes.
Ok. I think that about wraps up what I intended to say for now. Hope it’s useful to at least one person out there!
Tags: events, Gnome, Hacking, logging, nepomuk, tracker, zeitgeist
November 16th, 2009 at 4:47 pm
This explains everything. Thank you!
November 16th, 2009 at 4:57 pm
Yep, me. Thanks.
November 16th, 2009 at 11:10 pm
What I want to know is how you can obtain OS share from zeitgeist. Google zeitgeist used to supply a graph of OS share of all Google requests, but quit doing that just when it was getting interesting. The last one I can find is from June 2004.
http://www.google.com/intl/en/press/zeitgeist/zeitgeist-jun04.html
November 17th, 2009 at 12:12 am
@Alan: Of course the exact same question, “OS share”, does not make sense for our Zeitgeist. One could post similar questions though; like “which app do I most frequently use to open audio files” and paint a graph with usage frequency by time. How one would do this would probably be by simply asking for all “open”-events on subjects of type “audio” and then on the client side simply do some bucket counting on the event field containing the ref to the generating application (called “actor” above).
November 17th, 2009 at 2:22 am
I’m surprised you see events as single points in time. All events have a duration. Even opening or closing a file takes time. Having durations for events seems very valuable. Now one is forced to have events like startActivity and endActivity.
November 17th, 2009 at 3:27 am
@Jos: A while back we actually represented events as things with a temporal dimension. This turned out to suck in a lot of ways so we trashed the idea. It is a lot easier to work with point-events for both clients and servers. Another thing we realized after having written a handful data-providers (and a few guis as well) was that *none* of these made use of the fact that events had a temporal dimension.
November 17th, 2009 at 6:42 am
@Jos: Think of it that way.
We note “when did u start opening” where an answer would be one number
not “When did u open” where one could understand the timeperiod of the opening event (which is of no use for us)
Events have two ways of understanding. Either a time period in whihc something happens or a single instance in a infinite period of time. we follow the second.
This instance also allows us to emulate the first approach if we wanted. By just adding a new close event to the table or end event.
November 17th, 2009 at 9:09 am
> Timestamp – When did this happen. Milliseconds since the Unix Epoch.
Isn’t it asking for trouble to time everything from the epoch?
You have to know about leap seconds ever since 1 Jan 1970. For intervals of more than 24 days or so, you overflow 32-bits-signed.
Would it not be far better to use year/Milliseconds since year start.
November 17th, 2009 at 9:36 am
@mikkel “This turned out to suck in a lot of ways” is very unspecific. No idea what the downside is to using periods from that answer. Perhaps the fact that you need a special type of index?
@seif What is “As single instance in an infinite period of time” ?
November 18th, 2009 at 11:10 pm
Still don’t know what Zeitgeist is. What does it do? What is it for?
March 24th, 2010 at 12:22 pm
@Jos In GNOME Activity Journal we are able to do durations for files by using both the open and close event related to that uri. In other words Zeitgeist logs when you open the door and when you close the door.