Sorting arbitrary non-indexed columns

Nov 13, 2010 at 3:52 PM

Hi!

Let's assume the following simple example:

Users table:

  • User_id (indexed, primary unique)
  • Firstname
  • Lastname
  • Email
  • Phone

I am curious about the different approaches to sorting records on non-indexed columns:

- Using JetOpenTempTable to create an in-memory non-clustered index "by hand", then iterating over the temp table and manually seeking to each row in the "Users"-table

- Using JetOpenTempTable and add all relevant columns, then iterating over the temp table and outputting each sorted row

(JetOpenTempTable can forward-only sort on a text column that is max 127 chars with codepage=unicode)

 

Today we sort non-indexed columns by creating a sort string, and then adding the sort string, row primary key and table name to an array. This array is then quicksorted based on the sort string (in-memory).
Afterwards we sequentially process each item from the sorted array, and perform a primary key seek to each item in the array (similar to a non-clustered index).

Nov 13, 2010 at 6:10 PM

I think there are two issues to consider:

  • Sort all the data, or just sort some columns and join against the main table.
  • Use an in-memory sort or a temporary table.

If the sort key is large relative to the total record size then I suggest sorting all the data to avoid the JetMakeKey/JetSeek/JetRetrieveColumns cost. For the simple schema above that would mean sorting all the data.

If you know that the amount of data you are sorting will always fit in RAM then an in-memory sort like you are using is great. If you start swapping then the performance can become very bad and you will need to use a temporary table. A temp table will give reasonable performance with very large amounts of data.

Nov 13, 2010 at 11:46 PM

Thanks for the answer!

 

After some experimenting it looks like porting/upgrading our old code to .net and using lists/icomparable will suffice for sorting on any imagineable sort string (atleast up to 5000 records in-memory). RAM is cheap :-)
Sorting larger sets will probably go into a temp table to build a non-clustered index there, and then seeking and retrieving each row afterwards.