Brandon Live!

Desktop Search FAQ   |   Start++   |   Contact Me

FAQ: How does indexing work? What are IFilters and Protocol Handlers?

June 20, 2007 at 10:19 pm
Desktop Search, WDS Development, WDS FAQ

The Indexer 

At its core, the Windows Search indexer doesn’t really know anything about files, e-mails, or anything like that.  In fact, all it really knows is how to do the following things:

  1. Index contents and metadata associated with a URL and store it in a row.
  2. Retrieve rows that match a specific query.
  3. Shape the results in interesting ways (sorted, grouped, etc)
  4. Retrieve properties / metadata associated with a row.

The indexer relies on other Windows Search components to handle the specifics, such as converting a URL into data to be indexed.  That’s where Protocol Handlers, IFilters, and Property Handlers come in.

Protocol Handlers

Protocol Handler allows the indexer to crawl a specific kind of data store.  For example, the File System Protocol Handler allows the indexer to crawl files stored on your hard drive.  Windows Search includes a few Protocol Handlers including those for the File System, MAPI (ie. Outlook), and the Client-Side Cache for Offline Files (Vista only).  Other examples include the Protocol Handlers for Lotus Notes, the IE History / Cache, or Mozilla Thunderbird.

At a basic level, a Protocol Handler is just a piece of code that takes as input a URL (like “file://C:/Foo/” or “mapi://{USER-SID}/Brandon’s Mailbox/Inbox”) and performs two important tasks:

  1. Enumeration of child URLs (such as “file://C:/Foo/Bar/” or “file://C:/Foo/Bar/Taxes.docx”)
  2. Binding of URLs to either an IFilter, or a Stream (which can be bound to whatever IFilter is registered for its content type)

IFilters

An IFilter is responsible for taking an item such as a file (usually in the form of a Stream) and emitting the contents and properties of that item for indexing.

For example, the MS Word IFilter knows how to take the stream from a .DOC or .DOCX file and return both the contents and useful properties (like the author’s name or the date it was last modified) into the index.

Property Handlers

Property Handlers are similar to IFilters, except that they’re designed to simply return properties for items and not complex textual content.


FAQ: Why does WDS / Windows Vista use so many processes?

at 9:25 pm
Desktop Search, WDS FAQ

The three processes used by the Windows Search service are SearchIndexer.exe, SearchProtocolHost.exe, and SearchFilterHost.exe.  Sometimes you may even see multiple instances of the latter two running simultaneously (especially if multiple users are logged in).

So why are they divided up in this way?  To find out, let’s look at what each of the processes does.

SearchIndexer.exe

This process runs as a system service under the SYSTEM account.  It is responsible for maintaining the index, servicing queries, as well as deciding what to crawl and when.

SearchProtocolHost.exe

This process sometimes runs under the SYSTEM account, and other times runs in the context of the current user.  It hosts a Protocol Handler responsible for enumerating items in a specific store (such as the File System, Outlook, UNC shares, Lotus, etc).

Why is it seperate?

SearchFilterHost.exe

This process hosts the actual IFilters. These filters are responsible for processing individual items, such as files, in a data store.

Why is it seperate?


[powered by WordPress.]

Hi. I'm Brandon. I'm a geek, and I work on Search technology for Windows at Microsoft. This is my blog.

RSS Button

Picture

categories:

archives:

June 2007
M T W T F S S
« May   Jul »
 123
45678910
11121314151617
18192021222324
252627282930  

search this site:

The views expressed within my blog are my own - and are not in any way indicative of those of the company I work for, Microsoft, or it's employees. No warranties or other guarantees will be offered as to the quality of the opinions or anything else offered here.

Xbox Live GamerCard