This project has moved. For the latest updates, please go here.

Performace of Persitent Dictionary

Aug 27, 2013 at 8:30 PM
Hello everyone,

I wonder if anyone encoutered the kind of really slow performance I'm experiencing with
the persistent dictionary. I have a large file that I need to load to a persistent dictionary. I'm also doing lookups and retrievals.

These are the steps.
  1. Check if a key is dictionary ( key is numberic string - 12345).
  2. If key exists, retreive its value and do comparison
  3. If criteria is met, remove old record and add a new modified one.
  4. If key is not in dictionary, simply add it.
Initially, speed is decent or tolerable in my case. But as dictionary size increases you can see a dramatic drop in performance. it loaded 10 million records in 4 hours, and getting slower and slower!. I realize lookups, removes and retrievals can take thier toll, but I still expected better performance than this.

Is there anything I can do to improve performance?

here is the simple code:

if (RecordsDictionary.ContainsKey(fields[0]))
                                   {
                                       string demValue;
                                       RecordsDictionary.TryGetValue(fields[0], out demValue);

                                       string[] demFields = demValue.Split(_delimiter);
                                       string[] inDictionaryFileNameParts = demValue.Spli(_delimiter).LastOrDefault().Split('_');
                                       string inDictionaryFileDate = inDictionaryFileNameParts[0];
                                       if (string.Compare(fileDate, inDictionaryFileDate, true) > 0) 
                                       {
                                           RecordsDictionary.Remove(fields[0]); 
                                           RecordsDictionary.Add(fields[0], sRecord);
                                       }

                                   }
                                   else
                                   {

                                       RecordsDictionary.Add(fields[0], sRecord);

                                   }
Aug 28, 2013 at 11:42 PM
Edited Aug 28, 2013 at 11:43 PM
Hi,
You could try yourDictionary[12345] = yourValue;.
I most often use it this way; what may be hurting your performance as well is string.Split, which wouldn't be required any more.

But I have to admit that I haven't run any metrics if this is actually faster - but I guess it's worth a try. Until now I've never run in any performance troubles. Although performance decreases with dictionary size, I've never come close to the performance you mentioned.

BR
Aug 29, 2013 at 3:58 PM

Thanks Linky,

I make that change ( MyDictionary[234] = “Value” ) , and noticed a slight improvement. Still very slow though.

I also took out the RecordsDictionary.ContainsKey(key) if statement. Of course, if there is a duplicate key it will error but I’m handling the error, checking if it is a duplicate error then proceeding with processing.

I done this because there are very few dups in the file, and I thought that rather than performing 24 million checks, handling few exceptions may be more cost effective.

The file is slightly over 2GB with about 24 million records. It loaded 19.8 million records in 12 hours. That is not good, as I have other files even much larger.

So my new code looks like:

try

{

DemographicRecordsDictionary.Add(fields[0], sRecord);

}

catch (Exception ex)

{

if (ex.Message.Contains("An item with this key already exists"))

{

string demValue;

DemographicRecordsDictionary.TryGetValue(fields[0], out demValue);

string[] demFields = demValue.Split(_delimiter);

string[] inDictionaryFileNameParts = demValue.Split(_delimiter).LastOrDefault().Split('_');

string inDictionaryFileDate = inDictionaryFileNameParts[0];

if (string.Compare(fileDate, inDictionaryFileDate, true) > 0)

{

DemographicRecordsDictionary[fields[0]] = sRecord;

}

}

}

Aug 29, 2013 at 4:28 PM
You're welcome.
How often do you think that string.Compare(fileDate, inDictionaryFileDate, true) > 0 is true? If it's most likely maybe run a test in which you replace your whole add/check/update code with just the single line DemographicRecordsDictionary[fields[0]] = sRecord; (which works for insert as well as update) as you'd get rid of 3 split and the comparision.

How large are the strings you are storing?
Aug 29, 2013 at 5:34 PM

The code within the Catch block doesn’t get executed that often (as there are very few key duplicates) so the string.Compare(fileDate, inDictionaryFileDate, true) > 0 shouldn’t execute frequently.

The string size varies - I would say averages 150 characters. Clearly, the performance issue has a lot more to do with the size of the dictionary since it loads a lot faster initially but deteriorates to a crawling pace overtime.

I wonder if the key datatype matters – say text vs integer

Developer
Aug 30, 2013 at 12:49 AM
Hi Kuus,

Is your data access sequential or random?
Assuming it's random, what's likely happening is that the database cache isn't able to cache the entire 2 GB in memory. This means that as you access those keys, we'll have to go to disk more often, and disk access is just so slow compared with memory.
If you are able to make your data access sequential, then it ought to help make things faster. This isn't always possible of course, but it's worth mentioning.

Integer keys should be faster than text keys. String data needs to be normalized (at the minimum for case-insensitivity, but it also matters for locales). This does make things slower, and the keys will usually be longer than the 4 bytes or 8 bytes for int/longs. But choosing text or integer will make a difference for every access, and really shouldn't cause things to become slower and slower with data growth. But come to think of it, shrinking the keys may help you fit more data in that 2 GB, which means that the database cache will be more efficient as well. If you can use integer keys, then you should.

-martin