Brandon Live!

Desktop Search FAQ   |   Start++   |   Contact Me

FAQ: How does indexing work? What are IFilters and Protocol Handlers?

June 20, 2007 at 10:19 pm
Desktop Search, WDS Development, WDS FAQ

The Indexer 

At its core, the Windows Search indexer doesn’t really know anything about files, e-mails, or anything like that.  In fact, all it really knows is how to do the following things:

  1. Index contents and metadata associated with a URL and store it in a row.
  2. Retrieve rows that match a specific query.
  3. Shape the results in interesting ways (sorted, grouped, etc)
  4. Retrieve properties / metadata associated with a row.

The indexer relies on other Windows Search components to handle the specifics, such as converting a URL into data to be indexed.  That’s where Protocol Handlers, IFilters, and Property Handlers come in.

Protocol Handlers

Protocol Handler allows the indexer to crawl a specific kind of data store.  For example, the File System Protocol Handler allows the indexer to crawl files stored on your hard drive.  Windows Search includes a few Protocol Handlers including those for the File System, MAPI (ie. Outlook), and the Client-Side Cache for Offline Files (Vista only).  Other examples include the Protocol Handlers for Lotus Notes, the IE History / Cache, or Mozilla Thunderbird.

At a basic level, a Protocol Handler is just a piece of code that takes as input a URL (like “file://C:/Foo/” or “mapi://{USER-SID}/Brandon’s Mailbox/Inbox”) and performs two important tasks:

  1. Enumeration of child URLs (such as “file://C:/Foo/Bar/” or “file://C:/Foo/Bar/Taxes.docx”)
  2. Binding of URLs to either an IFilter, or a Stream (which can be bound to whatever IFilter is registered for its content type)

IFilters

An IFilter is responsible for taking an item such as a file (usually in the form of a Stream) and emitting the contents and properties of that item for indexing.

For example, the MS Word IFilter knows how to take the stream from a .DOC or .DOCX file and return both the contents and useful properties (like the author’s name or the date it was last modified) into the index.

Property Handlers

Property Handlers are similar to IFilters, except that they’re designed to simply return properties for items and not complex textual content.






9 Responses to “FAQ: How does indexing work? What are IFilters and Protocol Handlers?”

  1. Brandon Paddock's Blog - Desktop Search and more » Blog Archive » FAQ: Why does WDS / Windows Vista use so many processes? Says:

    [...] under the SYSTEM account, and other times runs in the context of the current user.  It hosts a Protocol Handler responsible for enumerating items in a specific store (such as the File System, Outlook, UNC [...]

  2. marcovanschagen Says:

    Brandon,
    Great article!

    I have been working to release DWG IFilter 2007, to enable users to search within AutoCAD DWG files. I am collecting much general support information and information in IFilters on the support site htp://www.dwgifilter.com . I am trying to post general spect, registry settings, and all how and why-info around IFilters that may be helpful to the usrs.
    Also, at that location you can sownload a free trial.

    Marco

  3. Jengates Blog » Blog Archive » links for 2007-06-28 Says:

    [...] FAQ: How does indexing work? What are IFilters and Protocol Handlers? Little insight into how the indexing on Vista works under the hood. (tags: technology) [...]

  4. greggman Says:

    Sorry ot post this here but if you are on the WDS team you need to know that

    #1) Outlook 2007 bugs me to install WDS
    #2) Install WDS and it DOES NOT INDEX MY EMAIL

    My .PST file is not in the default place. I moved it 12+ years ago and have kept it here ever since. I my case d:\work\email\mymail.pst. I also have an archieve.pst in the same folder which is accessable from Outlook. WDS didn’t index that either.

    That’s very frustrating to be constantly badgered by Microsoft Outlook 2007 to install WDS for index my email and then have it not actually work.

  5. John Villere Says:

    My Vista search results include many files that have been deleted or renamed. So, when I attempt to archive those files to data DVD’s I get errors. Any comments?

  6. suc Says:

    Sometimes when you search a file in the start menu search you’ll get non-existent file and so when you click on it you’ll get a message similar this “searched file not found”. Please add a message like this: “file not found, do you want remove it from the index? yes/no”.
    A message “do you want remove it? yes/not” is already presents for the start menu objects, but not for windows search results. Please add it.

  7. Sheldon Says:

    WDS does not provide a formatted docx or xlsx preview. Is there a work around for this or other solution? Thanks.

  8. Shafaat Karim Says:

    Hi Brandon,

    I am trying to build a search for a website hosted on Windows 2008 Server.
    Could you provide some tips on how to build the same.

    Currently I am using Indexing Service to perform the Search by creating a Catalog.

    Any help is appreciated.

    Regards

  9. Ian Griffiths Says:

    You list the first important responsibility of a protocol handler as “Enumeration of child URLs” but it’s not very clear from the documentation how we are supposed to support that.

    I’ve been trying to find out how the mapi: and oneindex: (One Note) protocols do this. One approach would be to implement IShellFolder, and they do appear to implement this. But if you call EnumObjects, they seem to return an E_NOTIMPLEMENTED. And the other way in which it occurred to me one could enumerate objects for crawling is to have and IFilter return a series of chunks of a PKEY_Search_UrlToIndexWithModificationTime property. But when I ask the one note search root for all its chunks, it just appears to return one - property called ‘ROBOTS’ with a value of ‘NOINDEX’.

    Perhaps that’s because the one note protocol doesn’t in fact enumerate its contents at all? From the docs it looks like you don’t technically need to make a custom store crawlable, as long as you’re prepared to notify WDS of every single indexable item you add explicitly.

    But in any case, it’s quite hard to work out what our options are for making a custom store crawlable. Given that this is one of only two jobs for a protocol handler, I think it’d be helpful to make the documentation a little more clear here. (And in particular, it would be really useful to have more insight into what WDS actually does. I want to write tests for my code, but if it’s not clear what WDS requires of my code, it’s hard to know what I’m supposed to test for.)

Leave a Reply


[powered by WordPress.]

Hi. I'm Brandon. I'm a geek, and I work on Search technology for Windows at Microsoft. This is my blog.

RSS Button

Picture

categories:

archives:

June 2007
M T W T F S S
« May   Jul »
 123
45678910
11121314151617
18192021222324
252627282930  

search this site:

The views expressed within my blog are my own - and are not in any way indicative of those of the company I work for, Microsoft, or it's employees. No warranties or other guarantees will be offered as to the quality of the opinions or anything else offered here.

Xbox Live GamerCard