When I update my laptop to the next version of Ubuntu (Lucid Lynx or 10.04 this time), I usually have a look at the general direction for some of the “ desktop core elements”, like desktop search. I decided to switch from Beagle to Tracker and hopefully have tackled the performance problems it seems to come with.
The Ubuntu community has been shipping Tracker desktop search for some time already. It seemed to often freeze up my computer completely while it was indexing files. Beagle, the best alternative, did not, and also seemed to have a better feature set. For instance, it indexed my chat logs in Pidgin nicely.
But Beagle doesn’t seem to be developed very actively, whereas Tracker, as part of the Gnome desktop, seems to be going actively towards supporting the Nepomuk semantic desktop. Which then paves the way to let other applications use Tracker to retrieve information. And store information!
Adding tags or a description to a photo in one application will make it available to other applications as well. Instead of letting each photo application build their own application-specific database.
All these applications also periodically want to go through my files and directories to see if there is new content that they can handle. It just adds to potential performance problems. But switching back to Tracker also meant switching back to its performance problems. As soon as some sort of disk-intensive activity started, my whole system froze.
But Ralf Nieuwenhuijsen gave an explanation about the background of the problem in the Ubuntu brainstorm.
“Currently, what happens is that linux saves a last-read-timestamp on every file. So when tracker indexes it, it also has to write it. Hence the trashing. This has become worse over time. Although most of you associate this with tracker, all file-io with lots of small files is horrible at the moment in linux. Nothing tracker-specific about it.”
That lead me to explore this “last read timestamp” a bit more: do I need it anyway? Apparently not: a pointer to discussions in the Linux community suggest that it might be switched off by default in the future, and let me to an article by Kushal Koolwal explaining the different options atime, noatime and relatime.
So I edited /etc/fstab, replaced relatime by notime, and remounted the disk. And started Tracker again. Had Rhythmbox running. Asked Eclipse to compare a project in CVS with its repository. All tasks that read (sometimes a lot of) files on disk. Without any hickups so far.
Lets hope the search results that Tracker delivers are useful too, in practice 🙂