This project has moved. For the latest updates, please go here.

Primary index size

Feb 3, 2013 at 2:13 AM
I have a table that has 41,915 rows in it.
It has the following indexes defined:
new JET_INDEXCREATE
{
  szIndexName = "by_id",
  szKey = "+id\0\0",
  grbit = CreateIndexGrbit.IndexPrimary
},
new JET_INDEXCREATE
{
  szIndexName = "by_etag",
  szKey = "+etag\0\0",
  grbit = CreateIndexGrbit.IndexDisallowNull
},
new JET_INDEXCREATE
{
  szIndexName = "by_key",
  szKey = "+key\0\0",
  grbit = CreateIndexGrbit.IndexDisallowNull | CreateIndexGrbit.IndexUnique,
});
I run the following code to get the stats:
Api.JetComputeStats(session, table);
foreach (var index in Api.GetTableIndexes(session, table))
{
    Console.WriteLine("{0}: {1:#,#;;0} kb", index.Name, index.Pages * (SystemParameters.DatabasePageSize) / 1024);
}
And I get:

by_etag: 656 kb
by_id: 34,080 kb
by_key: 1,556 kb

But I don't understand why by_id is so big.
Api.JetAddColumn(session, tableid, "id", new JET_COLUMNDEF
{
    coltyp = JET_coltyp.Long,
    grbit = ColumndefGrbit.ColumnFixed | ColumndefGrbit.ColumnAutoincrement | ColumndefGrbit.ColumnNotNULL
}, null, 0, out columnid);
The id column is just an auto int, nothing more.
The index density is set to 90%.
Feb 13, 2013 at 9:53 PM
Since you haven't gotten a reply, I figured I'd toss out my mostly uninformed but hopefully accurate enough though:
From previous posts, I'm conjecturing that this table of 42000 rows in it has a column that of type of long binary and is storing "a lot" of data in it. Since the "by_id" column is your primary index, it's overriding ESE's default clustered index. And, quoting Wikipedia "Primary indexes are different from secondary indexes in that the index entry is the record itself, and not a logical pointer to the record. Secondary indexes have primary keys at their leaves to logically link to the record in the primary index." Where it possibly gets a bit off is the "separate long value" storage, but I'm guessing that since the "separate long value" is transparent on the read side, ESE is still marking that index as the owner of the data even if it's not stored in the same B+tree.

These MSDN pages also say the same thing as that Wikipedia quote, just in more words:
http://msdn.microsoft.com/en-us/library/windows/desktop/gg294106(v=exchg.10).aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/gg269290(v=exchg.10).aspx
Feb 13, 2013 at 10:24 PM
Edited Feb 13, 2013 at 10:25 PM
MichaelWint wrote:
Where it possibly gets a bit off is the "separate long value" storage, but I'm guessing that since the "separate long value" is transparent on the read side, ESE is still marking that index as the owner of the data even if it's not stored in the same B+tree.
AFAIK that separate long value storage does not counts.

However, I'd like to quote the following from MSDN article "Long Value Columns":

By default, long value data is stored in a separate B+ tree if it is larger than 1024 bytes, or if the record does not fit on a single database page when the long value data is stored in the record.

So if there's not much data in each record, it's totally possible that all your data is kept in the primary B-Tree, even though you have a long value column in your table.
Feb 14, 2013 at 5:22 AM
This makes perfect sense, now that you say this.
And I guess it is also responsible for the increasing insert times.
Especially since we use long values a lot there.
Feb 15, 2013 at 1:04 AM
Edited Feb 15, 2013 at 1:04 AM
Const_me wrote:
AFAIK that separate long value storage does not counts.
Thanks for the correction!