The Indexer
At its core, the Windows Search indexer doesn’t really know anything about files, e-mails, or anything like that. In fact, all it really knows is how to do the following things:
The indexer relies on other Windows Search components to handle the specifics, such as converting a URL into data to be indexed. That’s where Protocol Handlers, IFilters, and Property Handlers come in.
Protocol Handlers
A Protocol Handler allows the indexer to crawl a specific kind of data store. For example, the File System Protocol Handler allows the indexer to crawl files stored on your hard drive. Windows Search includes a few Protocol Handlers including those for the File System, MAPI (ie. Outlook), and the Client-Side Cache for Offline Files (Vista only). Other examples include the Protocol Handlers for Lotus Notes, the IE History / Cache, or Mozilla Thunderbird.
At a basic level, a Protocol Handler is just a piece of code that takes as input a URL (like “file://C:/Foo/” or “mapi://{USER-SID}/Brandon’s Mailbox/Inbox”) and performs two important tasks:
IFilters
An IFilter is responsible for taking an item such as a file (usually in the form of a Stream) and emitting the contents and properties of that item for indexing.
For example, the MS Word IFilter knows how to take the stream from a .DOC or .DOCX file and return both the contents and useful properties (like the author’s name or the date it was last modified) into the index.
Property Handlers
Property Handlers are similar to IFilters, except that they’re designed to simply return properties for items and not complex textual content.
[powered by WordPress.]
Hi. I'm Brandon. I'm a geek, and I work on Search technology for Windows at Microsoft. This is my blog.
The views expressed within my blog are my own - and are not in any way indicative of those of the company I work for, Microsoft, or it's employees. No warranties or other guarantees will be offered as to the quality of the opinions or anything else offered here.
June 20th, 2007 at 10:53 pm
[...] under the SYSTEM account, and other times runs in the context of the current user. It hosts a Protocol Handler responsible for enumerating items in a specific store (such as the File System, Outlook, UNC [...]
June 21st, 2007 at 9:57 pm
Brandon,
Great article!
I have been working to release DWG IFilter 2007, to enable users to search within AutoCAD DWG files. I am collecting much general support information and information in IFilters on the support site htp://www.dwgifilter.com . I am trying to post general spect, registry settings, and all how and why-info around IFilters that may be helpful to the usrs.
Also, at that location you can sownload a free trial.
Marco
June 28th, 2007 at 4:26 pm
[...] FAQ: How does indexing work? What are IFilters and Protocol Handlers? Little insight into how the indexing on Vista works under the hood. (tags: technology) [...]
July 12th, 2007 at 2:41 pm
Sorry ot post this here but if you are on the WDS team you need to know that
#1) Outlook 2007 bugs me to install WDS
#2) Install WDS and it DOES NOT INDEX MY EMAIL
My .PST file is not in the default place. I moved it 12+ years ago and have kept it here ever since. I my case d:\work\email\mymail.pst. I also have an archieve.pst in the same folder which is accessable from Outlook. WDS didn’t index that either.
That’s very frustrating to be constantly badgered by Microsoft Outlook 2007 to install WDS for index my email and then have it not actually work.
July 15th, 2007 at 7:25 am
My Vista search results include many files that have been deleted or renamed. So, when I attempt to archive those files to data DVD’s I get errors. Any comments?
July 24th, 2007 at 11:30 am
Sometimes when you search a file in the start menu search you’ll get non-existent file and so when you click on it you’ll get a message similar this “searched file not found”. Please add a message like this: “file not found, do you want remove it from the index? yes/no”.
A message “do you want remove it? yes/not” is already presents for the start menu objects, but not for windows search results. Please add it.
January 10th, 2008 at 1:26 am
WDS does not provide a formatted docx or xlsx preview. Is there a work around for this or other solution? Thanks.
February 1st, 2008 at 12:29 am
Hi Brandon,
I am trying to build a search for a website hosted on Windows 2008 Server.
Could you provide some tips on how to build the same.
Currently I am using Indexing Service to perform the Search by creating a Catalog.
Any help is appreciated.
Regards
July 13th, 2008 at 8:56 am
You list the first important responsibility of a protocol handler as “Enumeration of child URLs” but it’s not very clear from the documentation how we are supposed to support that.
I’ve been trying to find out how the mapi: and oneindex: (One Note) protocols do this. One approach would be to implement IShellFolder, and they do appear to implement this. But if you call EnumObjects, they seem to return an E_NOTIMPLEMENTED. And the other way in which it occurred to me one could enumerate objects for crawling is to have and IFilter return a series of chunks of a PKEY_Search_UrlToIndexWithModificationTime property. But when I ask the one note search root for all its chunks, it just appears to return one - property called ‘ROBOTS’ with a value of ‘NOINDEX’.
Perhaps that’s because the one note protocol doesn’t in fact enumerate its contents at all? From the docs it looks like you don’t technically need to make a custom store crawlable, as long as you’re prepared to notify WDS of every single indexable item you add explicitly.
But in any case, it’s quite hard to work out what our options are for making a custom store crawlable. Given that this is one of only two jobs for a protocol handler, I think it’d be helpful to make the documentation a little more clear here. (And in particular, it would be really useful to have more insight into what WDS actually does. I want to write tests for my code, but if it’s not clear what WDS requires of my code, it’s hard to know what I’m supposed to test for.)