Let’s say you want to optimize your system by only indexing certain data. For example, a reader recently e-mailed me and said “I only want to index my media files.” Seems like a valid choice. At first glance, it might seem like you could achieve this by telling Windows to only index files with extensions like .mp3 and .avi. Ultimately, this is a very bad idea.
First, let me tell you why this is a bad idea. Second, I’ll tell you the right way to achieve what you want.
Let’s begin by looking at how the Windows Vista shell and the indexer work together.
The indexer maintains a list of “start paths” - which are locations in the shell namespace that it cares about. By default, it is set up to index the x:\Users directory - and thus all of the default Documents / Music / Pictures folders of all user accounts on the system. When you install Outlook, it sets up a start path for your mail accounts. OneNote sets one up for your OneNote data. And so on. This means that the indexer will try to index all items under that path*, and ignore everything else.
When you browse to a folder in Explorer, the shell asks the indexer if the current path is covered by the index. If it is, Explorer will use the index exclusively for search / filter / grouping operations against that location. It does not ask the index if it covers all the file types in that location. It assumes the index is the authoritative source for information about that part of the namespace.
On the other hand, if the path is not covered by the index, Explorer walks the entire namespace starting at that location (so, the current folder and all subfolders) and enumerates every single item, performing all operations like filtering / sorting / grouping in-memory. By default, it does not crack open any files being enumerated - so all filtering operations happen only against the basic properties like file name. You can then click the “Search in File Contents” button (what some of us call the “try harder” button), and it will repeat the operation - stopping at every file and cracking it open with the appropriate IFilter and property handlers, doing essentially the same thing that happens when a file is indexed. It loads the file, cracks it open, extracts all the properties and content, checks to see if it matches the current filter, and then decides whether or not to add that item to the view or ignore it. If you change the filter, the whole process starts over again. Needless to say, this is rather slow if you have to do it for more than a few files. That’s why the “Search in File Contents” button is there, since in most unindexed locations (like C:\windows) you are probably only searching for a filename.
Armed with this information, let’s take another look at the original question. Let’s say you go into the Advanced options for the indexer and tell it not to index .doc files at all. Then you go save a new document called Something.doc inside of your Documents folder, which is still indexed. The indexer will be notified that a new file was created there, but since you disabled indexing of that extension, it will ignore it. Then when you go to your user folder or the Documents folder and search for “something” - you don’t find the document. Even though it’s right there in the name. The folder said “Hey, I’m indexed” and the file is not in the index, so as far as Explorer is concerned, it doesn’t exist.
A much better approach, if you really don’t care about indexing your .doc files, is to tell Windows not to index the Documents folder (or wherever you keep your .doc files). That way it will fall back to slow GREP search when you look there, which will at least find what you’re looking for, albeit more slowly.
You can do this from the Indexing Options control panel, and it’s pretty easy to do. Only want your music and videos indexed? Then tell the indexer to only crawl the places where you store those files. That it’s, mission accomplished.
The end result is the same. The indexer isn’t doing any additional work, unless you mix .doc files and media files in the same folder. And even then, at least you’ll be able to find them.
Another option available to you is to set certain extensions to “Index File Properties only.” That way you’ll at least be able to find the item by its name. Why would you want to do that? I have no idea. It’s not like indexing files incurs a significant overhead on any reasonably modern PC. The option is mainly there because there are some file types the indexer can’t search inside. So instead it indexes all the basic stuff that applies to every file (like name, date modified, and size).
* = It’s actually more complicated than that, as there can be nested inclusion/exclusion rules, files or folders excluded based on attributes, etc. But that’s not particularly relevant to this discussion
PKEY_Identity (or “System.Identity”) is used to store the identity GUID associated with an Outlook Express or Windows Mail (Vista) account.
Tom Laird-McConnell, who used to be my boss until a few months ago, wrote one of his extremely rare blog posts last week on this subject. To celebrate the occassion, I will link you there for the detailed answer to this question.
Definition of the PKEY_Identity / System.Identity property at Tom’s Handy Dandy Space
If you’re running a 64-bit version of Windows Vista, or a 64-bit version of WDS 3.x on Windows XP/2003, you may notice that the new Office 2007 document formats (.docx, .xlsx, etc) don’t show up when you search using the “Documents” filter in the search UI, or the kind:document Advanced Query Syntax.
This is a known issue with the 64-bit property system, and happens because the 64-bit shell only looks in the 64-bit section of the registry for a set of keys that map file extensions to various “kinds” for filetypes that don’t emit their own “kind” information. Because Office 2007 is a 32-bit application, it registers its kinds in the 32-bit section of the registry, where the shell never sees it.
In a future release, the shell / search engine will be updated to better handle this situation. For now, I have uploaded a .reg file which will fix the KindMap for Office 2007 documents on 64-bit machines.
Disclaimer Serious problems might occur if you modify the registry incorrectly by using Registry Editor or by using another method. These problems might require that you reinstall your operating system. I cannot guarantee that these problems can be solved. Modify the registry at your own risk.
If you have installed WDS on Windows XP / 2003 and wish to uninstall it, read the instructions below. Also, please post a comment with your reason for uninstalling - user feedback is very important to all of us.
Standard answer:
If that doesn’t work:
Try looking for the uninstaller, it normally resides in:
C:\WINDOWS\$NtUninstallKB917013$\spuninst\spuninst.exe
If it’s not there
Then you probably deleted it, or some ill-conceived ”tweak” or “disk cleaner” did. Unfortunately, this puts you in a tough spot, since it’s really not a supported scenario or something we can design for (”I deleted the uninstaller - now I can’t uninstall! Help!”).
Your best bet in this case might be to look and see if you have a System Restore point that you can revert back to, from before WDS was installed. That should remove everything it added to the registry.
The Indexer
At its core, the Windows Search indexer doesn’t really know anything about files, e-mails, or anything like that. In fact, all it really knows is how to do the following things:
The indexer relies on other Windows Search components to handle the specifics, such as converting a URL into data to be indexed. That’s where Protocol Handlers, IFilters, and Property Handlers come in.
Protocol Handlers
A Protocol Handler allows the indexer to crawl a specific kind of data store. For example, the File System Protocol Handler allows the indexer to crawl files stored on your hard drive. Windows Search includes a few Protocol Handlers including those for the File System, MAPI (ie. Outlook), and the Client-Side Cache for Offline Files (Vista only). Other examples include the Protocol Handlers for Lotus Notes, the IE History / Cache, or Mozilla Thunderbird.
At a basic level, a Protocol Handler is just a piece of code that takes as input a URL (like “file://C:/Foo/” or “mapi://{USER-SID}/Brandon’s Mailbox/Inbox”) and performs two important tasks:
IFilters
An IFilter is responsible for taking an item such as a file (usually in the form of a Stream) and emitting the contents and properties of that item for indexing.
For example, the MS Word IFilter knows how to take the stream from a .DOC or .DOCX file and return both the contents and useful properties (like the author’s name or the date it was last modified) into the index.
Property Handlers
Property Handlers are similar to IFilters, except that they’re designed to simply return properties for items and not complex textual content.
The three processes used by the Windows Search service are SearchIndexer.exe, SearchProtocolHost.exe, and SearchFilterHost.exe. Sometimes you may even see multiple instances of the latter two running simultaneously (especially if multiple users are logged in).
So why are they divided up in this way? To find out, let’s look at what each of the processes does.
This process runs as a system service under the SYSTEM account. It is responsible for maintaining the index, servicing queries, as well as deciding what to crawl and when.
This process sometimes runs under the SYSTEM account, and other times runs in the context of the current user. It hosts a Protocol Handler responsible for enumerating items in a specific store (such as the File System, Outlook, UNC shares, Lotus, etc).
Why is it seperate?
Access - Sometimes it needs to run in the context of the SYSTEM account (ie. to index the filesystem, even when a user is not logged in). Other times it needs to run in the context of the user, so that it can access data that is ACL’d for that user (network shares, Offline files) or accessed via a program the user is running (Outlook, Thunderbird).
Reliability - If a protocol handler, which may be written by a third-party, crashes - it will not crash the indexer itself. This reduces the risk of index corruption, and ensures that you can still issue queries even if a protocol handler crashes or hangs.
Security - Isolating code that interacts with possibly untrusted data stores can mitigate vulnerabilities in said code.
This process hosts the actual IFilters. These filters are responsible for processing individual items, such as files, in a data store.
Why is it seperate?
Security - This process is tightly locked down. For example, it cannot even read the filesystem. It runs with reduced privileges (kind of like Protected Mode IE). Why is this important? Well think back to the WMF file vulnerability a year or so ago. Google Desktop Search would trigger the vulnerability whenever it indexed one of those such files. If you received it as an e-mail attachment, you would have a 0-click attack because they don’t sandbox the indexing process. This wasn’t a problem for WDS users because we have always isolated filtering to a seperate locked-down process.
Reliability - Same as with the Protocol Handlers. IFilters are very often third-party code, and may be subjected to corrupted files. Keeping them seperated improves robustness to crashes / hangs in third-party code or when dealing with corrupted data.
Sometimes index corruption or other problems with Windows Search cannot be fixed by the “Rebuild Index” option in the control panel. One troubleshooting option you can try is to tell the indexer to reset the indexer to its out-of-the-box default setup.
For WDS 3.0 on Windows XP or Windows Vista:
This will result in a rebuild of your index and will also reset your crawl scopes (folders to be indexed). It may also reset certain Indexing-related settings.
WDS 3.0 and Windows Vista can index uncached Exchange mailboxes, but this functionality is disabled by default. It needs to be enabled via group policy.
If you’re not in a group-policy managed environment (or your administrator does not set this setting), you can enable it by creating a DWORD registry key in:
HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows\Windows Search
called “PreventIndexingUncachedExchangeFolders” and setting it to a value of 0.
[powered by WordPress.]
Hi. I'm Brandon. I'm a geek, and I work on Search technology for Windows at Microsoft. This is my blog.
The views expressed within my blog are my own - and are not in any way indicative of those of the company I work for, Microsoft, or it's employees. No warranties or other guarantees will be offered as to the quality of the opinions or anything else offered here.
Most popular searches that brought people here today:
search (11)start++ (9)an expression
contains an inva (2)windows desktop
search rebuild (2)trackpad driver
macbook pro (2)brandon live (2)itunes 64 bit
download (2)brandontools.com (2)Paths regedit wds (2)windows desktop
search shared (2)