11. August 2020

Python provides two different ways to initialize dictionaries - probably the most used type in Python codes. If you are interested in performance, you will find a lot of posts where it clearly states that dict() is slower than simply using {}. There is a great explanation by Doug Hellmann how the different implementations work during initialization.

 

Okay, so there are many posts about the performance of the two different implementations, why do I write a blog post about it too?

Because, I don’t want to focus on the initialization of the dictionaries, but what happens afterwards. Do the different initializations have an impact later on in the code? I want to check the performance of the most common use cases: adding <key, value>-pairs and updating/merging dictionaries.

 

Initialization

Let us start with the initialization. Yes, you can read it up somewhere else, but I think it is an important information to start with - and it is an easy way to check that I have the right settings, if my results would differ, I would do something wrong. I simply use it as a sort of validation of my setup - and for those of you who don’t know the different performances.


As we can see, dict() is obviously slower than {}. Especially, if the dictionary is initialized with many elements, it has a huge impact if your code needs 0.04ms or almost 0.08ms to create your dictionary. Even when you initialize an empty dictionary, it is slower. It takes roughly 100 nanoseconds longer to use dict() - which is a factor of 4. I think, in this case, you can ponder if there are other reasons to use dict() instead of {} - as for me, I still prefer dict() which is better understandable for non-Python developers in the beginning. But there are other factors to think about, e.g. how often you use it.

But back to the dictionary with 1’000 entries: who manually writes a dictionary with that many entries? Chances are high you will lose the overview and have duplicates or overwrite something. I wouldn’t write more than probably ten entries before I would start to think about an automated way to create the dictionary.

 

Adding Items

For such an automated script as mentioned above, I would first initialize an empty dictionary, and then add the <key, value>-pairs during a for loop that created whatever I need. Simplified, it could like this:

a = dict() # or a = {}
for i in range(n):
    key = 'key_' + str(i)
    a[key] = i

 

What I now want to check is if the different initializations of the dictionary have any effect on the later performance when adding items to it. Based on the information provided in the post of Doug Hellmann, I assume it will not have an impact. The only difference is during creation, afterwards, they are "the same". Still, I want to make sure, and I did not find any information about that in the world wide web (doesn’t mean nobody else tried it, only that I was not able to find it).

Again, the range of the elements that will be newly added starts at 1 and goes up to 1’000 entries. If we look at the performance, there is no clear difference. This implies, after the initialization, the two different types behave and perform similar, if not the same.

 

Updating Dictionary

How do you feel about double checking? We looked at the performance for adding a single <key, value>-pair at the time. But what about .update()? The setup is simple: the two different dictionaries - with dict() and {} - are set up with the same number of elements (x-axis). For the test, each possible combination for an update is run.

In this scenario the performance also behaves similar. Even if we have a closer look when only updating with a few elements, all four combinations have similar performance and there is no clear "fastest" version. Again, this result implies that after the initialization, there is no difference in the performance.

 

Merging Dictionaries

So far, everything implies that you should use {} for initialization if you care about performance, and afterwards it doesn’t matter anymore. Something else I often use, is merging two dictionaries into a new one. I know of three different ways to do it:

girls = {'Anna': 22, 'Mia': 21}
boys = {'Hans': 23, 'Peter': 24}
 
people = girls.copy().update(boys)
people = {**girls, **boys}
people = dict(girls, **boys) 

 

I was curious if there is a performance difference between these three difference approaches. Now we have some interesting results! If we merge two dictionaries with around 100 entries for a total of more than 200 elements, the .copy.update() approach is fastest - when the total of elements is 1’000, the other two approaches need around 1/3 more time. To be honest, I expected that .copy().update() would be the slowest, because the dictionary needs to be copied. A more detailed look for smaller dictionaries shows that for dictionaries smaller than 15 elements, using dict(a, **b) is the fastest approach. For a total of five elements it takes half the time, for a total of 10 elements, the others take still 1/3 more time. Interesting, especially after what we already learned about dictionaries until now! In general, it is "a little bit" faster than using {**a, **b}

 

Summary

Let me summarize what we found out with our performance tests:

  • If you focus on performance, use {}
  • When merging dictionaries (without changing either of them), use
    • .copy().update() for dictionaries with more than 15 elements
    • dict(a, **b) for smaller dictionaries
Kommentieren

Formatierungstipps

  • fetter Text: [b]fetter Text[/b]
  • kursiver Text: [i]kursiver Text[/i]
  • unterstrichener Text: [u]unterstrichener Text[/u]
  • Bild: [img]http://...[/img]
  • Link: [url]http://...[/url]
  • Link mit Text: [url=http://...]Link mit Text[/url]
  • Code: [code=<language>]Dein Code[/code]