In this first article in a two-part series, we will delve into the fascinating world of dictionaries in Python and uncover some insights that are sure to captivate your interest.
In Python, dictionaries and sets are used to save unique, unordered, mutable (also immutable — frozen sets) and quick retrieval elements. In any Python program there are a number of dicts used even when the program is not using these explicitly; e.g., CPython uses dictionary lookup for any attribute, class or global variable and more. As this plays an important role, it has been highly optimised by Python’s core developers, especially for search, add and delete operations. Hash tables are the key element behind the implementation of dicts and sets, making these data structures efficient. These tables are powerful and are also used in solving problems like indexing of database tables, caching, name lookups and so on.
A dictionary is composed of a series of key-value mapping elements. The keys and values can be of mixed types like string and integer.
Create
Here are a few methods to create a dict:
- a = dict(one=1, two=2, three=3)
- b = {‘one’: 1, ‘two’: 2, ‘three’: 3}
- c = dict(zip([‘one’, ‘two’, ‘three’], [1, 2, 3]))
- d = dict([(‘one’, 1), (‘two’, 2), (‘three’, 3)])
- e = dict({‘one’: 1, ‘two’: 2, ‘three’: 3})
Possible key types are all the immutable types such as str, bytes, and numeric.
Update
The update() method inserts the specified items to the dictionary. These items can be a dictionary or an iterable object with key value pairs. The pop() method removes the specified item from the dictionary. The value of the removed item is the return value of the pop() method.
car = { “brand”: “Ford”, “model”: “Mustang”, “year”: 1964 } print(“Pre Removed dict:”, car) x = car.pop(“model”) print(“Removed value:”, x) print(“Post Removed dict:”, car)
Output: Pre Removed dict: {‘brand’: ‘Ford’, ‘model’: ‘Mustang’, ‘year’: 1964} Removed value: Mustang Post Removed dict: {‘brand’: ‘Ford’, ‘year’: 1964}
Get
This returns the default value or None if the key does not exist, rather than handling KeyError.
The __missing__ method is called by the __getitem__() dictionary method internally if the keys don’t exist. The return value of __missing__() is the value returned when accessing the non-existence key.
Here’s an example:
sample_dict = {‘name’: ‘python’, ‘age’: 20} print(sample_dict[‘location’]) Traceback (most recent call last): File “dict_set_sample.py”, line 40, in <module> print(sample_dict[‘location’]) KeyError: ‘location’ sample_dict = {‘name’: ‘python’, ‘age’: 20} print(sample_dict.get(‘location’)) print(sample_dict.get(‘location’, ‘India’)) Output: None India
Items
The items() method returns a view object. The view object contains the key-value pairs of the dictionary as tuples in a list, and will reflect any changes done to the dictionary.
Note: ‘Returned view object’ was part of version 3+, and the __contains__ methods defined in Python dict class has been introduced to find out if the key exists in the dictionary or sets. |
For example: car = { “brand”: “Ford”, “model”: “Mustang”, “year”: 1964 } x = car.items() print(x) car[“model”] = “Figo” print(x) Output: dict_items([(‘brand’, ‘Ford’), (‘model’, ‘Mustang’), (‘year’, 1964)]) dict_items([(‘brand’, ‘Ford’), (‘model’, ‘Figo’), (‘year’, 1964)])
Keys
The keys() method returns a view object of all the available first level keys in the dictionary. The view object contains the keys of the dictionary as a list and will reflect any changes done to the dictionary. For example:
car = { “brand”: “Ford”, “model”: “Mustang”, “year”: 1964, “details”: { “wheels”: 4, “doors”: 2, } } x = car.keys() print(x) Output: dict_keys([‘brand’, ‘model’, ‘year’, ‘details’])
Values
values() is an inbuilt method in Python that returns a view object. The view object contains the values of the dictionary as a list and will reflect any changes done to the dictionary. Here’s an example:
car = { “brand”: “Ford”, “model”: “Mustang”, “year”: 1964, “details”: { “wheels”: 4, “doors”: 2, } } x = car.values() print(x) Output: dict_values([‘Ford’, ‘Mustang’, 1964, {‘wheels’: 4, ‘doors’: 2}])
Filters
We will just explore some sample examples here, but a very good blog at https://learnpython.com/blog/filter-dictionary-in-python/ details many more possibilities.
To filter dict with only partial matching values from dictionary, use the following code:
cars = { “ford_figo”: 1990, “ford_focus”: 1991, “ford_feista”: 1992 } print(“Before drop of the column”, cars) filter_str = “ford_feista” cars = dict(filter(lambda item: filter_str not in item[0], cars.items())) print(“After drop of the column”, cars) Output: Before drop of the column {‘ford_figo’: 1990, ‘ford_focus’: 1991, ‘ford_feista’: 1992} After drop of the column {‘ford_figo’: 1990, ‘ford_focus’: 1991}
Filter and save empty values in dict as follows:
cars = { “ford_figo”: 1990, “ford_focus”: 1991, “ford_feista”: 1992, “ford_dummy”: 0 } print(“Before drop of the column”, cars) cars = dict(filter(lambda item: not item[1], cars.items())) print(“After drop of the column”, cars) Output: Before drop of the column {‘ford_figo’: 1990, ‘ford_focus’: 1991, ‘ford_feista’: 1992, ‘ford_dummy’: 0} After drop of the column {‘ford_dummy’: 0}
Delete
The following example gives the code to delete the keys and the dictionary.
dict1 = {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5} print(dict1) # Delete a single element del dict1[‘a’] print(“post delete key dict1:”, dict1) # Delete all elements in the dictionary dict1.clear() print(“post clear dict1: “, dict1) dict2 = {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5} # Delete the dictionary del dict2 print(“complete dict delete dict2: “, dict2) Output: {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5} post delete key dict1: {‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5} post clear dict1: {} Traceback (most recent call last): File “dict_sample.py”, line 134, in <module> print(“complete dict delete dict2: “, dict2) NameError: name ‘dict2’ is not defined
Fromkeys
The fromkeys() method returns a dictionary with the specified keys and the specified value. It creates a new dictionary from the given sequence with the specific value, else creates with the None value assigned. For example:
types = (‘flower’, ‘diamond’, ‘heart’) cards = dict.fromkeys(types) print(“Create a dict with None values:”, cards) qty = [10] cards = dict.fromkeys(types, qty) print(“Create a dict with default values:”, cards) qty.append(3) print(“View the created dict with updated list values:”, cards) qty.pop(0) print(“View the created dict with updated list values:”, cards) Output: Create a dict with None values: {‘flower’: None, ‘diamond’: None, ‘heart’: None} Create a dict with default values: {‘flower’: [10], ‘diamond’: [10], ‘heart’: [10]} View the created dict with updated list values: {‘flower’: [10, 3], ‘diamond’: [10, 3], ‘heart’: [10, 3]} View the created dict with updated list values: {‘flower’: [3], ‘diamond’: [3], ‘heart’: [3]}
Setdefault
setdefault() returns the value of a key (if the key is in dictionary). Else, it inserts a key with the default value to the dictionary. It increases speed significantly by avoiding redundant key lookups. For example:
d = {‘a’: 97, ‘b’: 98, ‘c’: 99, ‘d’: 100} ret_value = d.setdefault(‘e’, 120) print(“New key which does not exists in the dict, value: {}, dict: {}”, ret_value, d) ret_value = d.setdefault(‘e’, 140) print(“New key which exists in the dict, value: {}, dict: {}”, ret_value, d) Output: New key which does not exists in the dict, value: {}, dict: {} 120 {‘a’: 97, ‘b’: 98, ‘c’: 99, ‘d’: 100, ‘e’: 120} New key which exists in the dict, value: {}, dict: {} 120 {‘a’: 97, ‘b’: 98, ‘c’: 99, ‘d’: 100, ‘e’: 120}
Defaultdict
A defaultdict is configured to create items on demand whenever a missing key is searched. Internally it works by calling the callable provided to produce a default value.
The callable that produces the default values is held in an instance attribute called default_factory. Here’s an example:
- Create a defaultdict with a list constructor as default_factory:
If the key is not found, an empty list is assigned to the new key.
If no default_factory is provided, the usual KeyError is raised for the missing keys.
Dict comprehensions
You can transform a dictionary from one form to another using dict comprehensions. Dict comprehensions make the code easier to read, avoid loops and enhance performance (https://stackoverflow.com/questions/52542742/why-is-this-loop-faster-than-a-dictionary-comprehension-for-creating-a-dictionary). Here’s an example:
dict1 = {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5} # Double each value in the dictionary double_dict1 = {k:v*2 for (k,v) in dict1.items()} print(double_dict1) Output: {‘a’: 2, ‘b’: 4, ‘c’: 6, ‘d’: 8, ‘e’: 10}
You can log on to https://www.datacamp.com/tutorial/python-dictionary-comprehension for multiple such examples.
Call by reference
Call by reference means that the argument passed to the function is a reference to a variable that already exists in memory rather than an as an independent copy of that variable. Hence any changes made to the variable will be impacted even after the function is executed.
Dictionary passed as argument is call by reference; for example:
dict1 = {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5} def change(dict_val): print(“inside change function before change:”, dict_val) dict_val[‘new’] = 100 print(“inside change function after change:”, dict_val) print(“inside main function before change:”, dict1) change(dict1) print(“inside main function after change:”, dict1) Output: inside main function before change: {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5} inside change function before change: {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5} inside change function after change: {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5, ‘new’: 100} inside main function after change: {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5, ‘new’: 100}
Let’s now look at some variations.
UserDict
Users can implement their own dictionary by extending UserDict, which works like standard dict. It is preferred to use subclass from UserDict rather than from dict, as the built-in has some implementation shortcuts that end up forcing an override, which can be easily inherited from UserDict. For example:
# Python program to demonstrate Userdict from collections import UserDict # Creating a Dictionary where # deletion is not allowed class MyDict(UserDict): # Function to stop deletion # from dictionary def __del__(self): raise RuntimeError(“Deletion not allowed”) # Function to stop pop from # dictionary def pop(self, s = None): raise RuntimeError(“Deletion not allowed”) # Function to stop popitem # from Dictionary def popitem(self, s = None): raise RuntimeError(“Deletion not allowed”) # Driver’s code d = MyDict({‘a’:1, ‘b’: 2, ‘c’: 3}) print(“Original Dictionary”) print(d) d.pop(1) Output: Original Dictionary {‘a’: 1, ‘b’: 2, ‘c’: 3} Traceback (most recent call last): File “dict_sample.py”, line 154, in <module> d.pop(1) File “dict_sample.py.py”, line 139, in pop raise RuntimeError(“Deletion not allowed”) RuntimeError: Deletion not allowed Exception ignored in: <function MyDict.__del__ at 0x10955b8b0> Traceback (most recent call last): File “dict_sample.py.py”, line 134, in __del__ RuntimeError: Deletion not allowed
OrderedDict
This maintains the order of the insertion. The popitem method of an OrderedDict pops the first item by default, but if called by popitem(last=True) will return the last item added.
Ordered dictionary is designed to remember the order of items, which is defined by the insertion order of keys. When the value of a certain key is changed, the position of the key remains unchanged.
The syntax is OrderedDict(). Here’s an example:
from collections import OrderedDict numbers = OrderedDict() numbers[“one”] = 1 numbers[“two”] = 2 numbers[“three”] = 3 print(numbers) # OrderedDict([(‘one’, 1), (‘two’, 2), (‘three’, 3)])
Memory allocation
Just like in a dictionary, the memory is allocated and grows linearly based on the number of keys added. However, in an ordered dictionary, nearly 50% more memory is used to preserve the order of the items. Here’s the code:
from collections import OrderedDict import sys ordd = OrderedDict() d = {} for x in range(15): ordd[x] = x d[x] = x print(“ordered dict:”, sys.getsizeof(ordd), “\t”, “dict:”, sys.getsizeof(d)) ‘’’ Output: ------- ordered dict: 392 dict: 232 ordered dict: 424 dict: 232 ordered dict: 456 dict: 232 ordered dict: 488 dict: 232 ordered dict: 520 dict: 232 ordered dict: 744 dict: 360 ordered dict: 776 dict: 360 ordered dict: 808 dict: 360 ordered dict: 840 dict: 360 ordered dict: 872 dict: 360 ordered dict: 1312 dict: 640 ordered dict: 1344 dict: 640 ordered dict: 1376 dict: 640 ordered dict: 1408 dict: 640 ordered dict: 1440 dict: 640
You can go to https://lerner.co.il/2019/05/12/python-dicts-and-memory-usage/ for further details.
Methods
The move_to_end method is used to move an existing key of the dictionary either to the end or to the beginning. For example, from collections import OrderedDict:
ord_dict = OrderedDict().fromkeys(‘OSFY’) print(“Original Dictionary”) print(ord_dict) # Move the key to end ord_dict.move_to_end(‘O’) print(“\nAfter moving key ‘O’ to end of dictionary :”) print(ord_dict) # Move the key to beginning ord_dict.move_to_end(‘F’, last = False) print(“\nAfter moving Key in the Beginning :”) print(ord_dict) Output: Original Dictionary OrderedDict([(‘O’, None), (‘S’, None), (‘F’, None), (‘Y’, None)]) After moving key ‘G’ to end of dictionary : OrderedDict([(‘S’, None), (‘F’, None), (‘Y’, None), (‘O’, None)]) After moving Key in the Beginning : OrderedDict([(‘F’, None), (‘S’, None), (‘Y’, None), (‘O’, None)])
In the reversed method, reverse the order of the dict keys stored. For example, from collections import OrderedDict:
x = {‘d’:’four’, ‘e’:’five’, ‘a’:’one’} sample_reversed_dict = OrderedDict(reversed(list(x.items()))) print(“The reversed order dictionary : “ + str(sample_reversed_dict)) Output: The reversed order dictionary : OrderedDict([(‘a’, ‘one’), (‘e’, ‘five’), (‘d’, ‘four’)])
Counter
Hold the count for each key. Adding a new key will create a single count and updating the existing key will increase the count. Here’s an example:
from collections import Counter counter_dict_sample = Counter(“helloworld”) print(counter_dict_sample) Output: Counter({‘l’: 3, ‘o’: 2, ‘h’: 1, ‘e’: 1, ‘w’: 1, ‘r’: 1, ‘d’: 1})
ChainMap
ChainMap holds the list of dictionaries, where key search is performed based on the order of the dictionaries stored. For example, interpreters use nested scopes, where each mapping represents a scope context.
pylookup = ChainMap(locals(), globals(), vars(builtins)
MappingProxyType builds a read-only dictionary instance.
Performance
Time complexity for querying is O(1), while time complexity for storing is O(n). Calculating the count is going to slow down a dictionary; it will take O(n) time complexity to calculate the counts for all keys (because each key will have to be hit at least once to append its count to the running count stored in the value).
Space complexity for storing is O(n). (During hash collision, it is more than O(n).)
Let’s look at a sample to see the performance list versus dict. Here’s the code:
import timeit def find_number_in_list(lst, number): if number in lst: return True else: return False short_list = list(range(100)) start = timeit.default_timer() find_number_in_list(short_list, 1) stop = timeit.default_timer() print(“--- short_list %s seconds ---” % (stop - start)) long_list = list(range(10000000)) start = timeit.default_timer() find_number_in_list(long_list, 1) stop = timeit.default_timer() print(“--- long_list %s seconds ---” % (stop - start)) def find_number_in_dict(dct, number): if number in dct.keys(): return True else: return False short_dict = {x:x*5 for x in range(1,100)} start = timeit.default_timer() find_number_in_dict(short_dict, 1) stop = timeit.default_timer() print(“--- short_dict %s seconds ---” % (stop - start)) long_dict = {x:x*5 for x in range(1,10000000)} find_number_in_dict(long_dict, 1) stop = timeit.default_timer() print(“--- long_dict %s seconds ---” % (stop - start))
The output is:
>> python list_vs_dict_time.py --- short_list 1.0299999999990872e-06 seconds --- --- long_list 4.7639999999904425e-06 seconds --- --- short_dict 2.97399999998893e-06 seconds --- --- long_dict 1.768262919 seconds ---
The fastest way to repeatedly lookup data with millions of entries in Python is by using dictionaries. Because dictionaries have built-in mapping in Python, they are highly optimised. However, there is a typical space-time trade-off in dictionaries and lists. Dict still uses more memory than lists, since you need to use space for the keys and the lookup as well, while lists use space only for the values.
In general, O(1) is used to find the element in the dictionary. Though when this is a hash collision, it’s not always O(1).
Dictionaries are a preferable option for mapping key-value elements, as they offer far better performance compared to lists and tuples, especially when it comes to search, add, and delete operations. Sets are similar to dictionaries, but they lack key-value pairing and consist of unique, disordered elements. We will learn more about sets and the underlying concept of hash in the next article in the series.