Skip to content

Indexed folders rely on the index being complete

by Brandon on March 16th, 2008

Let’s say you want to optimize your system by only indexing certain data.  For example, a reader recently e-mailed me and said “I only want to index my media files.”  Seems like a valid choice.  At first glance, it might seem like you could achieve this by telling Windows to only index files with extensions like .mp3 and .avi.  Ultimately, this is a very bad idea.

First, let me tell you why this is a bad idea.  Second, I’ll tell you the right way to achieve what you want.

Let’s begin by looking at how the Windows Vista shell and the indexer work together.

The indexer maintains a list of “start paths” – which are locations in the shell namespace that it cares about.  By default, it is set up to index the x:\Users directory – and thus all of the default Documents / Music / Pictures folders of all user accounts on the system.  When you install Outlook, it sets up a start path for your mail accounts.  OneNote sets one up for your OneNote data.  And so on.  This means that the indexer will try to index all items under that path*, and ignore everything else.

When you browse to a folder in Explorer, the shell asks the indexer if the current path is covered by the index.  If it is, Explorer will use the index exclusively for search / filter / grouping operations against that location.  It does not ask the index if it covers all the file types in that location.  It assumes the index is the authoritative source for information about that part of the namespace.

On the other hand, if the path is not covered by the index, Explorer walks the entire namespace starting at that location (so, the current folder and all subfolders) and enumerates every single item, performing all operations like filtering / sorting / grouping in-memory.  By default, it does not crack open any files being enumerated – so all filtering operations happen only against the basic properties like file name.  You can then click the “Search in File Contents” button (what some of us call the “try harder” button), and it will repeat the operation – stopping at every file and cracking it open with the appropriate IFilter and property handlers, doing essentially the same thing that happens when a file is indexed.  It loads the file, cracks it open, extracts all the properties and content, checks to see if it matches the current filter, and then decides whether or not to add that item to the view or ignore it.  If you change the filter, the whole process starts over again.  Needless to say, this is rather slow if you have to do it for more than a few files.  That’s why the “Search in File Contents” button is there, since in most unindexed locations (like C:\windows) you are probably only searching for a filename.

Armed with this information, let’s take another look at the original question.  Let’s say you go into the Advanced options for the indexer and tell it not to index .doc files at all.  Then you go save a new document called Something.doc inside of your Documents folder, which is still indexed.  The indexer will be notified that a new file was created there, but since you disabled indexing of that extension, it will ignore it.  Then when you go to your user folder or the Documents folder and search for “something” – you don’t find the document.  Even though it’s right there in the name.  The folder said “Hey, I’m indexed” and the file is not in the index, so as far as Explorer is concerned, it doesn’t exist.

A much better approach, if you really don’t care about indexing your .doc files, is to tell Windows not to index the Documents folder (or wherever you keep your .doc files).  That way it will fall back to slow GREP search when you look there, which will at least find what you’re looking for, albeit more slowly.

You can do this from the Indexing Options control panel, and it’s pretty easy to do.  Only want your music and videos indexed?  Then tell the indexer to only crawl the places where you store those files.  That it’s, mission accomplished.

The end result is the same.  The indexer isn’t doing any additional work, unless you mix .doc files and media files in the same folder.  And even then, at least you’ll be able to find them.

Another option available to you is to set certain extensions to “Index File Properties only.”  That way you’ll at least be able to find the item by its name.  Why would you want to do that?  I have no idea.  It’s not like indexing files incurs a significant overhead on any reasonably modern PC.  The option is mainly there because there are some file types the indexer can’t search inside.  So instead it indexes all the basic stuff that applies to every file (like name, date modified, and size).

* = It’s actually more complicated than that, as there can be nested inclusion/exclusion rules, files or folders excluded based on attributes, etc.  But that’s not particularly relevant to this discussion

From → WDS FAQ

One Comment
  1. Thanks for blogging about this issue, Brandon. I am among the people who believe they can index their media files, and not the rest. I read and understood your warnings. I acknowledge that Windows Vista desktop search was first created to index “obvious” locations and start menu items.

    Still there is something you don’t seem to take fully into account, and that’s archiving. For example, I have isos split in many rar files, these files zipped in one file. All rar numbered files are ticked in the indexer (rar, rar1, rar2 and so on).

    I would like to disable indexing those in a easier way, and leave only the first part of the archive available to the indexer.

    Also, when nearly all file types are ticked, I find this view cluttered, as I don’t immediately see if I omitted to tick one particular file type (for example, .mpc, which is “lost” among all mp* file types).

    All in one, I think the search indexing features are a strong candidate for a management console add-on, or something similar (the same goes for shadow copy management). That is to tick and untick in one shot, but also as you stress out, to choose the adequate method of indexing.

    To answer your question, I would also need to index the content of the readme’s that bear no extension on my hard drive (they were created from a different system, but still, they are plain text files).

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS