How to store guids?

Aug 31, 2010 at 11:20 PM

I am storing sequential guids as one of my columns, and I need to do range scans on it.

When I am using VistaColtyp.GUID, everything works perfectly. But I want to support XP as well, so I need to set the column as binary 16.

What are the differences in the way Esent compare GUID columns and binary columns? I want to be able to convert my guids to the binary representation that the binary(16) index expects.

Sep 1, 2010 at 12:19 AM

Binary columns are ordered in the way that memcmp does. On the other hand, when we normalize a Guid column we store the bytes in this order:

normalized[0..15] = input[10, 11, 12, 13, 14, 15, 8, 9, 6, 7, 4, 5, 0, 1, 2, 3] 

And then use memcmp order. If you put the bytes in your binary column in that order you will get the same sort order as Vista.

 

 


Sep 1, 2010 at 2:08 AM

For future reference (and I suggest making this part of the FAQ), here is the code that I am using:

public static class GuidExtensions
{
	public static Guid TransfromToGuidWithProperSorting(this byte[] bytes)
	{
		var unnormalized = new byte[16];
		unnormalized[0] = bytes[10];
		unnormalized[1] = bytes[11];
		unnormalized[2] = bytes[12];
		unnormalized[3] = bytes[13];
		unnormalized[4] = bytes[14];
		unnormalized[5] = bytes[15];
		unnormalized[6] = bytes[8];
		unnormalized[7] = bytes[9];
		unnormalized[8] = bytes[6];
		unnormalized[9] = bytes[7];
		unnormalized[10] = bytes[4];
		unnormalized[11] = bytes[5];
		unnormalized[12] = bytes[0];
		unnormalized[13] = bytes[1];
		unnormalized[14] = bytes[2];
		unnormalized[15] = bytes[3];
		return new Guid(unnormalized);
	}

	public static byte[] TransformToValueForEsentSorting(this Guid guid)
	{
		var bytes = guid.ToByteArray();
		var normalized = new byte[16];
		normalized[10] = bytes[0];
		normalized[11] = bytes[1];
		normalized[12] = bytes[2];
		normalized[13] = bytes[3];
		normalized[14] = bytes[4];
		normalized[15] = bytes[5];
		normalized[8] = bytes[6];
		normalized[9] = bytes[7];
		normalized[6] = bytes[8];
		normalized[7] = bytes[9];
		normalized[4] = bytes[10];
		normalized[5] = bytes[11];
		normalized[0] = bytes[12];
		normalized[1] = bytes[13];
		normalized[2] = bytes[14];
		normalized[3] = bytes[15];
		return normalized;
	}
}

Sep 1, 2010 at 8:26 PM
Edited Sep 1, 2010 at 8:35 PM

Edited comment that show me being stupid  

 

Sep 2, 2010 at 6:00 AM

Hm,

I think that I must be doing something wrong, take a look at the following guids:

  • 08cd188e-24d2-a1ce-0000-0000000000c9 - Guid
    • [142,24,205,8,210,36,206,161,0,0,0,0,0,0,0,201] - ToBytes
    • [0,0,0,0,0,201,0,0,206,161,210,36,142,24,205,8] - Normalized
  • 08cd188f-0f0c-40a9-0000-000000000001
    • [143,24,205,8,12,15,169,64,0,0,0,0,0,0,0,1] - ToBytes
    • [0,0,0,0,0,1,0,0,169,64,12,15,143,24,205,8] - Normalized

After normalization, they don't compare the same to the guid. Are you sure about the sort order?  

Sep 2, 2010 at 8:59 PM

This code gets the same sort order on a Binary column as on a Guid column (see the input/transformed variables in the middle). Note that this won't match the order given by Guid.CompareTo -- it is more closely related to the way that SQL Server orders Guids. It uses the transformation described above:

JET_TABLEID tableid;
JET_COLUMNDEF columndef = new JET_COLUMNDEF();
JET_COLUMNID guidColumn, binaryColumn;
Api.JetCreateTable(sesid, dbid, "table", 0, 100, out tableid);
columndef.coltyp = VistaColtyp.GUID;
Api.JetAddColumn(sesid, tableid, "guid", columndef, null, 0, out guidColumn);
columndef.coltyp = JET_coltyp.Binary;
Api.JetAddColumn(sesid, tableid, "guid_as_binary", columndef, null, 0, out binaryColumn);
const string GuidIndexKey = "+guid\0\0";
Api.JetCreateIndex(sesid, tableid, "guid", CreateIndexGrbit.IndexUnique, GuidIndexKey, GuidIndexKey.Length, 100);
const string BinaryIndexKey = "+guid_as_binary\0\0";
Api.JetCreateIndex(sesid, tableid, "binary", CreateIndexGrbit.IndexUnique, BinaryIndexKey, BinaryIndexKey.Length, 100);
Api.JetBeginTransaction(sesid);
foreach (Guid g in from x in Enumerable.Range(0, 200) select Guid.NewGuid())
{
    Api.JetPrepareUpdate(sesid, tableid, JET_prep.Insert);
    Api.SetColumn(sesid, tableid, guidColumn, g);
    byte[] input = g.ToByteArray();
    int[] transform = new[] { 10, 11, 12, 13, 14, 15, 8, 9, 6, 7, 4, 5, 0, 1, 2, 3 };
    byte[] transformed = (from i in transform select input[i]).ToArray();
    Api.SetColumn(sesid, tableid, binaryColumn, transformed);
    Api.JetUpdate(sesid, tableid);
}
Api.JetCommitTransaction(sesid, CommitTransactionGrbit.LazyFlush);
Api.JetSetCurrentIndex(sesid, tableid, "guid");
Guid[] guidOrder = GetGuids(sesid, tableid, guidColumn).ToArray();
Api.JetSetCurrentIndex(sesid, tableid, "binary");
Guid[] binaryOrder = GetGuids(sesid, tableid, guidColumn).ToArray();
CollectionAssert.AreEqual(guidOrder, binaryOrder);
Sep 9, 2010 at 11:36 AM

Hi Ayende, I think sorting or taking ranges on Guids hints at some kind of design flaw. Guids are effectively random, but you seem to want to assign some kind of meaning to them. In the SQL world I only use them as primary keys for things that must be uniquely identified forever, such as User IDs, the IDs of plugin controls, etc.

Greg

Sep 10, 2010 at 9:26 AM

Let me rephrase that, then. I am not talking about Guids in the sense of Guid.NewGuid(), I am talking about 16 bytes sequential number.