PersistentDictionary performance

Mar 3, 2013 at 8:56 PM
I'm a bit puzzled by the performance numbers from the documentation of the PersistentDictionary. It says it can do 137,000 lookups per second when cached. The raw Esent can do 1,1 million reads. I don't understand why the dictionary lookup should be so slow. Can anyone explain?

Also, from what I've understood, Esent has a b-tree index. But a dictionary is normally a variant of a hash table. Does this mean that the PersistentDictionary is using the Esent index?

The reason I'm asking, is that I'm looking for a key/value store that
a) is very fast
b) can scale to very large datasets.

Using a b-tree with a O(log N) lookup isn't a good fit for my requirements.
Mar 3, 2013 at 9:46 PM
PersistentDictionary is using the Esent index?
Yes.
Using a b-tree with a O(log N) lookup isn't a good fit for my requirements.
2 cases here.
  1. All dataset fits in RAM. In this case, theoretically hashmaps do better. Practically however, in most cases the difference will be small, because many tree nodes will stay in the CPU cache for fast access for subsequent lookups. If you’re thinking about a memcached-like use cases, the IO will consume much more time.
  2. The dataset doesn’t fit in RAM. In this case, b-trees perform better, because the HDD is not a random access device, but block device.
P.S. And even when your dataset fits in RAM, sometimes ESENT is a good choice merely for persistence.
Mar 4, 2013 at 7:00 AM
@Const_me Thanks for the reply.

I'm considering using it as the storage engine for a graph database, but would prefer to use a disk based hash table. But I haven't found any yet.

But as you say, Esent might be good enough. Other graph databases also like to keep as much data in memory as possible. OrientDB has a hash table, which ensures a single IO operation per read.

Does anyone have a good explanation why the PersistentDictionary is so much slower than regular Esent?
Mar 4, 2013 at 11:54 AM
Hi vlangber,

I'm just not sure where you got your performance figures.
The performance varies based on many factors: RAM amount, dataset size, hard drive technology (SSD/magnetic) and speed (for magnetic this mostly depends on the rotation speed and density, SSDs are different as well, e.g. my Intel 520 SSD is significantly faster compared to e.g. Intel 320), and many ESENT system parameters.
IMO, the comparison only makes sense when you perform both benchmarks, using your real-life data.

BTW, you could also check out my implementation of persistent dictionary, here:
http://esentserialize.codeplex.com/SourceControl/changeset/view/a03b66824099#DictionaryDemo/Dictionary/DemoDictionary.cs
It will work slower for primitive value types, for complex values however it'll be faster due to more compact serialization format (persistent dictionary use BInaryFormatter, my implementation use DataContract serializer + binary XML format + pre-shared XML dictionary).
Mar 4, 2013 at 5:31 PM
The performance numbers are from the main page of the documentation for managed Esent and the PersistentDictionary.

The main page of the Managed Esent documentation (http://managedesent.codeplex.com/wikipage?title=ManagedEsentDocumentation&referringTitle=Documentation) contains there numbers in the performance section:

Insert records 132,000 records/second
Update one record 157,000 updates/second
Read one record 1,149,000 retrieves/second
Scan all records (sequential) 794,000 records/second
Seek to all records (random order) 266,000 seeks/second

The PersistentDictionary documentation page contains these numbers:

Sequential inserts 32,000 entries/second
Random inserts 17,000 entries/second
Random Updates 36,000 entries/second
Random lookups (database cached in memory) 137,000 entries/second
Linq queries (range of records) 14,000 queries/second
Mar 4, 2013 at 5:43 PM
Managed Esent documentation
8 byte autoincrement key and 32 bytes of data.
These test results are from a fast development machine with an SSD disk. The database is fully cached.
PersistentDictionary documentation
string data was 64 bytes in length (i.e. .NET's strings need to be constructed which loads the memory manager).
These measurements were taken on my desktop system (no mention about whether it has an SSD, nor does it fit in RAM).

You see? It's completely pointless to directly compare those numbers. If you're interested how much overhead does the PersistentDictionary introduce, the only way to get an answer - measure it yourself.

Based on the PersistentDictionary source code I saw, it's pretty effective, i.e. it shouldn't be much differences between the 2.
Mar 4, 2013 at 9:13 PM
I understand that data size and hardware influences the results, but wouldn't think anyone used a mechanical harddisk for database benchmarks in 2013.. But I'm happy that performance should be roughly similar. That means it may be suitable for my needs. At least it warrants a closer inspection and trying it out a bit.

I'll take a look at your version of the persistant dictionary. We'll mostly use complex types, so that's a good feature for us.
Mar 6, 2013 at 5:10 PM
Edited Mar 6, 2013 at 9:19 PM
In addition to what Const_me said about the differences between the two datasets, "read one record" most likely really was "read the same record one million times" compared to the "random lookups" that is in PersistentDictionary (which is probably closer to a combination of "Seek to all records (random order) + read" -- which isn't covered by any of the Esent performance numbers. The program used for the Managed Esent test is probably just a C#ification of this:
http://blogs.msdn.com/b/laurionb/archive/2008/11/07/some-basic-esent-performance-measurements.aspx

edit: SimplePerfTest.cs, RepeatedlyRetrieveOneRecord. http://managedesent.codeplex.com/SourceControl/changeset/view/81201#749529