google
yahoo
bing

Zeitgeist Storage Awareness

Leading up to our last Zeitgeist release (0.3.1) I hacked up our new Blacklisting- and Monitoring APIs, both things quite fun work and very useful API if I might say so my self :-) But I regret not blogging about it as I wrote it – we gotta keep them olde hype-wheels a’turnin’. So here we go about the next feature on my plate…

Storage Awareness

So what does the buzz wordy term “Storage Awareness” cover? We had a few requests from application developers like:

  1. “I don’t want to show online resources when there is no network interface up”
  2. “I don’t want to show work related to files on a disconnected USB drive”
  3. Or another one I just came up with: “When I plug in my USB-drive show my recent activity on that device”
  4. Very much related is how deleted files should be handled in the results, but I will not discuss that right now

Since Zeitgeist is a log and not a snapshot of your environment we will keep the information around even if you delete files or detach your storage devices. So you might indeed get data about subjects that are not readily available when you query the log. However, the use cases above seems valid and applications stat()ing each file:// URL in the result set seems like a very bad idea, so it would be nice if we could help a bit with this.   Even though we are “just” a log doesn’t mean we can’t provide some nifty API for application developers.

So our query API has flag that filters events to only those events with subjects that are “available right now”, it has not been functional until now, but it will be so for 0.3.2. Since we also log information about what storage medium each event subject uses one can also ask for recent- or popular stuff on a given storage medium.

Storage Identifiers.. Help?

We associate each subject URL with a storage medium via a unique string identifier. For stuff like USB drives we have the UUID readily available from GIO. For online resources we simply use the id “net” and I use NetworkManager to check for network availability (ConnMan should be easy too).

So far so good, but I have not been able to handle CDs (both audio and data) and DVDs properly yet. I am not a storage format expert so I don’t know if it’s even possible to obtain a unique identifier for a given data CD (or what have we) – at least I can only get the disk label from GIO and that is not unique (but it might very well be “unique enough” for this to work well in practice). So any help on obtaining real unique ids for CDs and/or DVDs would be appreciated. Note that I would like to use the G* stack and not introduce funky dependencies – and I am also not going to read the first N bytes and checksum those.

Next bump on the road is that it seems that I can not get the disk label from within from gio.VolumeMonitor’s "volume-removed" signal handler (calling volume.get_identifier()). I just get a None whereas I get the right label in the "volume-added" handler. I can probably figure this one out, but any ideas are appreciated.

The Code

Hold on… While you can indeed dig out the code from Launchpad, it’s not a secret,  I would recommend that you wait a bit just yet. It’s not ready for testing (and not even wired up in the engine so you have to do that yourself). So no code pointer for you, sorry :-)

Tags: , , , , , ,

9 Responses to “Zeitgeist Storage Awareness”

  1. Seif Lotfy Says:

    ALL BOW TO THE MIGHTY KAMSTRUP

  2. RubenV Says:

    How does this work, is this something in Zeitgeist’s code, or tracker magic?

  3. kamstrup Says:

    @RubenV: This is purely in Zeitgeist. Each subject URI has an associated storage medium id. We then have a ’storage’ table listing these ids and their state (available,unavilable). We then have a GIO based VolumeMonitor thingie that tracks your storage volumes and updates the ’storage’ table accordingly (using UUIDs as ids), and also a NetworkManager based monitor updating the state of the ‘net’ id of ’storage’. All we have to do is then a simple INNER JOIN on the ’storage’ table when querying the log.

  4. Wout Says:

    I’m not currently a zeitgeist user but I do have an opinion, as always. Sadly I’m not a coder so I can’t do the:”Talk is cheap show me code” thingy. ;-)

    I would really like storage awareness in metadata solutions, but I think there might be a better way. I’d want these type of solutions to show me where data is. Not only available data just related data. Wouldn’t it be nice if this system would show me that some documents where edited on this computer but they are not available right now. Show a disconnected icon over a document or files.

    Clicking on the icon could instruct the user to insert the media or connect to a specified network.

    Just my two cents

  5. kamstrup Says:

    @Wout: Fear not. This is indeed possible with the current approach. We can figure out which device you file was on when you interactedd with it and we can figure out which files you last interacted with on a given device.

  6. Greg Says:

    I’m not an expert by any means, but afaik CDDB and co use a kind of hash of the TOC to identify disks. This may be a good starting point for more info: http://musicbrainz.org/doc/DiscIDCalculation

    Researching other disk catalog software may turn up solutions better suited for data disks.

    Greg

  7. Benjamin Otte Says:

    I wonder if “net” is good enough. Some resources on “the net” are only available sometimes. And I don’t just mean friends that are disconnected (Telepathy to the rescue?), but more importantly VPN and other private networks.
    Not that I think it’s terribly important, but surely it’s a way to make your APIs more complicated and give you more stuff to think about :p

  8. kamstrup Says:

    @Benjamin Otte: Indeed I’ve though a lot about this myself, VPN, SSH tunnels, proxies, authentications… My conclusion was that it was a quite complex problem to solve elegantly and going for the simpler “net or no-net” solution would get us to a “goo enough” solution quite fast with very little potential for bugs.

  9. Victor Bogado Says:

    Maybe the log should be stored in the device it self? If I recently edited a file on my office’s workstation and saved to the pen drive to continue working at home. I believe that the “correct” would be for Zeitgeist to figure out that I’ve recently worked on that file, even if I didn’t actually worked on it with the current computer.

    This would solve the volume id, but it would open another can of worms, since pendrives usually have poor FS (FAT) and also where should those logs be stored, and how to merge them.

Leave a Reply