Efficient multiple searches on the same index

Oct 5, 2011 at 12:59 PM

I have an index defined on a binary (len 32) field in esent. I need to extract multiple rows based on values in that index.

Basically, what I have is a function that goes like this:

 

public IEnumerable<RavenJObject> GetMappedResults(params GetMappedResultsParams[] getMappedResultsParams)
{
	Api.JetSetCurrentIndex(session, MappedResults, "by_reduce_key_and_view_hashed");
	foreach (var item in getMappedResultsParams)
	{
		Api.MakeKey(session, MappedResults, item.ViewAndReduceKeyHashed, MakeKeyGrbit.NewKey);
		if (Api.TrySeek(session, MappedResults, SeekGrbit.SeekEQ) == false)
			continue;

		Api.MakeKey(session, MappedResults, item.ViewAndReduceKeyHashed, MakeKeyGrbit.NewKey);
		Api.JetSetIndexRange(session, MappedResults, SetIndexRangeGrbit.RangeUpperLimit | SetIndexRangeGrbit.RangeInclusive);
		do
		{
			// we need to check that we don't have hash collisions
			var currentReduceKey = Api.RetrieveColumnAsString(session, MappedResults, tableColumnsCache.MappedResultsColumns["reduce_key"]);
			if (currentReduceKey != item.ReduceKey)
				continue;
	
			var currentView = Api.RetrieveColumnAsString(session, MappedResults, tableColumnsCache.MappedResultsColumns["view"]);
			if (currentView != item.View)
				continue;
			
			yield return Api.RetrieveColumn(session, MappedResults, tableColumnsCache.MappedResultsColumns["data"]).ToJObject();
		} while (Api.TryMoveNext(session, MappedResults));
	}
}

 

The problem is that usually pass a few hundreds to a few thousands items to this method, and it takes a LOT of time. Is there any way to make this faster?

I already sorted the incoming values in the same way the index sort them, but it seems to make no difference. I assume that it always starts scanning from the top of the index.

Is there a good way to make this more efficient? 

What I am basically trying to do is an IN query.

Oct 5, 2011 at 4:24 PM

I expect that retrieving the column values is causing problems. A secondary index entry contains a reference to the primary key and retrieving the columns requires doing a second seek into the primary index. That seek will be random, as opposed to the sequential traversal of the secondary index.

If you are lucky the index is structured so the values you want are included in the index. In that case you can use the RetrieveFromIndex option with JetRetrieveColumn (I have no idea why that isn't the default). That will avoid the primary index seek altogether.

You can improve on this, especially on Windows 7 and above, but it is a bit of a pain because you have to break the traversal into two loops:

  1. As you traverse the secondary index do not retrieve the columns, but instead retrieve the bookmark using JetGetSecondaryIndexBookmark. That returns 2 values, we only want the primary key bookmark.
  2. Sort the primary key bookmarks, using memcmp ordering. This will give the most efficient ordering of the primary index traversal.
  3. On Windows 7 and above: use JetPrereadKeys to get the primary index keys read in parallel. This will make the I/O a lot more efficient.

(I'm hoping to encapsulate the code above into a helper class at some point.)