Engaging With Dictionaries And Sets In Python

0
64
Python

In this first article in a two-part series, we will delve into the fascinating world of dictionaries in Python and uncover some insights that are sure to captivate your interest.

In Python, dictionaries and sets are used to save unique, unordered, mutable (also immutable — frozen sets) and quick retrieval elements. In any Python program there are a number of dicts used even when the program is not using these explicitly; e.g., CPython uses dictionary lookup for any attribute, class or global variable and more. As this plays an important role, it has been highly optimised by Python’s core developers, especially for search, add and delete operations. Hash tables are the key element behind the implementation of dicts and sets, making these data structures efficient. These tables are powerful and are also used in solving problems like indexing of database tables, caching, name lookups and so on.

A dictionary is composed of a series of key-value mapping elements. The keys and values can be of mixed types like string and integer.

Create

Here are a few methods to create a dict:

  • a = dict(one=1, two=2, three=3)
  • b = {‘one’: 1, ‘two’: 2, ‘three’: 3}
  • c = dict(zip([‘one’, ‘two’, ‘three’], [1, 2, 3]))
  • d = dict([(‘one’, 1), (‘two’, 2), (‘three’, 3)])
  • e = dict({‘one’: 1, ‘two’: 2, ‘three’: 3})

Possible key types are all the immutable types such as str, bytes, and numeric.

Update

The update() method inserts the specified items to the dictionary. These items can be a dictionary or an iterable object with key value pairs. The pop() method removes the specified item from the dictionary. The value of the removed item is the return value of the pop() method.

car = {
“brand”: “Ford”,
“model”: “Mustang”,
“year”: 1964
}
print(“Pre Removed dict:”, car)
x = car.pop(“model”)
print(“Removed value:”, x)
print(“Post Removed dict:”, car)
Output:
Pre Removed dict: {‘brand’: ‘Ford’, ‘model’: ‘Mustang’, ‘year’: 1964}
Removed value: Mustang
Post Removed dict: {‘brand’: ‘Ford’, ‘year’: 1964}

Get

This returns the default value or None if the key does not exist, rather than handling KeyError.

The __missing__ method is called by the __getitem__() dictionary method internally if the keys don’t exist. The return value of __missing__() is the value returned when accessing the non-existence key.

Here’s an example:

sample_dict = {‘name’: ‘python’, ‘age’: 20}
print(sample_dict[‘location’])
Traceback (most recent call last):
File “dict_set_sample.py”, line 40, in <module>
print(sample_dict[‘location’])
KeyError: ‘location’
sample_dict = {‘name’: ‘python’, ‘age’: 20}
print(sample_dict.get(‘location’))
print(sample_dict.get(‘location’, ‘India’))

Output:
None
India

Items

The items() method returns a view object. The view object contains the key-value pairs of the dictionary as tuples in a list, and will reflect any changes done to the dictionary.

Note: ‘Returned view object’ was part of version 3+, and the __contains__ methods defined in Python dict class has been introduced to find out if the key exists in the dictionary or sets.
For example:

car = {
“brand”: “Ford”,
“model”: “Mustang”,
“year”: 1964
}
x = car.items()
print(x)
car[“model”] = “Figo”
print(x)
Output:
dict_items([(‘brand’, ‘Ford’), (‘model’, ‘Mustang’), (‘year’, 1964)])
dict_items([(‘brand’, ‘Ford’), (‘model’, ‘Figo’), (‘year’, 1964)])

Keys

The keys() method returns a view object of all the available first level keys in the dictionary. The view object contains the keys of the dictionary as a list and will reflect any changes done to the dictionary. For example:

car = {
“brand”: “Ford”,
“model”: “Mustang”,
“year”: 1964,
“details”: {
“wheels”: 4,
“doors”: 2,
}
}
x = car.keys()
print(x)

Output:
dict_keys([‘brand’, ‘model’, ‘year’, ‘details’])

Values

values() is an inbuilt method in Python that returns a view object. The view object contains the values of the dictionary as a list and will reflect any changes done to the dictionary. Here’s an example:

car = {
“brand”: “Ford”,
“model”: “Mustang”,
“year”: 1964,
“details”: {
“wheels”: 4,
“doors”: 2,
}
}
x = car.values()
print(x)

Output:
dict_values([‘Ford’, ‘Mustang’, 1964, {‘wheels’: 4, ‘doors’: 2}])

Filters

We will just explore some sample examples here, but a very good blog at https://learnpython.com/blog/filter-dictionary-in-python/ details many more possibilities.

To filter dict with only partial matching values from dictionary, use the following code:

cars = {
“ford_figo”: 1990,
“ford_focus”: 1991,
“ford_feista”: 1992
}
print(“Before drop of the column”, cars)
filter_str = “ford_feista”
cars = dict(filter(lambda item: filter_str not in item[0], cars.items()))
print(“After drop of the column”, cars)

Output:
Before drop of the column {‘ford_figo’: 1990, ‘ford_focus’: 1991, ‘ford_feista’: 1992}
After drop of the column {‘ford_figo’: 1990, ‘ford_focus’: 1991}

Filter and save empty values in dict as follows:

cars = {
“ford_figo”: 1990,
“ford_focus”: 1991,
“ford_feista”: 1992,
“ford_dummy”: 0
}
print(“Before drop of the column”, cars)
cars = dict(filter(lambda item: not item[1], cars.items()))
print(“After drop of the column”, cars)

Output:
Before drop of the column {‘ford_figo’: 1990, ‘ford_focus’: 1991, ‘ford_feista’: 1992, ‘ford_dummy’: 0}
After drop of the column {‘ford_dummy’: 0}

Delete

The following example gives the code to delete the keys and the dictionary.

dict1 = {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5}
print(dict1)
# Delete a single element
del dict1[‘a’]
print(“post delete key dict1:”, dict1)
# Delete all elements in the dictionary
dict1.clear()
print(“post clear dict1: “, dict1)
dict2 = {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5}
# Delete the dictionary
del dict2
print(“complete dict delete dict2: “, dict2)

Output:
{‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5}
post delete key dict1: {‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5}
post clear dict1: {}
Traceback (most recent call last):
File “dict_sample.py”, line 134, in <module>
print(“complete dict delete dict2: “, dict2)
NameError: name ‘dict2’ is not defined

Fromkeys

The fromkeys() method returns a dictionary with the specified keys and the specified value. It creates a new dictionary from the given sequence with the specific value, else creates with the None value assigned. For example:

types = (‘flower’, ‘diamond’, ‘heart’)
cards = dict.fromkeys(types)
print(“Create a dict with None values:”, cards)
qty = [10]
cards = dict.fromkeys(types, qty)
print(“Create a dict with default values:”, cards)
qty.append(3)
print(“View the created dict with updated list values:”, cards)
qty.pop(0)
print(“View the created dict with updated list values:”, cards)

Output:
Create a dict with None values: {‘flower’: None, ‘diamond’: None, ‘heart’: None}
Create a dict with default values: {‘flower’: [10], ‘diamond’: [10], ‘heart’: [10]}
View the created dict with updated list values: {‘flower’: [10, 3], ‘diamond’: [10, 3], ‘heart’: [10, 3]}
View the created dict with updated list values: {‘flower’: [3], ‘diamond’: [3], ‘heart’: [3]}

Setdefault

setdefault() returns the value of a key (if the key is in dictionary). Else, it inserts a key with the default value to the dictionary. It increases speed significantly by avoiding redundant key lookups. For example:

d = {‘a’: 97, ‘b’: 98, ‘c’: 99, ‘d’: 100}
ret_value = d.setdefault(‘e’, 120)
print(“New key which does not exists in the dict, value: {}, dict: {}”, ret_value, d)
ret_value = d.setdefault(‘e’, 140)
print(“New key which exists in the dict, value: {}, dict: {}”, ret_value, d)

Output:
New key which does not exists in the dict, value: {}, dict: {} 120 {‘a’: 97, ‘b’: 98, ‘c’: 99, ‘d’: 100, ‘e’: 120}
New key which exists in the dict, value: {}, dict: {} 120 {‘a’: 97, ‘b’: 98, ‘c’: 99, ‘d’: 100, ‘e’: 120}

Defaultdict

A defaultdict is configured to create items on demand whenever a missing key is searched. Internally it works by calling the callable provided to produce a default value.

The callable that produces the default values is held in an instance attribute called default_factory. Here’s an example:

  • Create a defaultdict with a list constructor as default_factory:

If the key is not found, an empty list is assigned to the new key.

If no default_factory is provided, the usual KeyError is raised for the missing keys.

Dict comprehensions

You can transform a dictionary from one form to another using dict comprehensions. Dict comprehensions make the code easier to read, avoid loops and enhance performance (https://stackoverflow.com/questions/52542742/why-is-this-loop-faster-than-a-dictionary-comprehension-for-creating-a-dictionary). Here’s an example:

dict1 = {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5}
# Double each value in the dictionary
double_dict1 = {k:v*2 for (k,v) in dict1.items()}
print(double_dict1)

Output:
{‘a’: 2, ‘b’: 4, ‘c’: 6, ‘d’: 8, ‘e’: 10}

You can log on to https://www.datacamp.com/tutorial/python-dictionary-comprehension for multiple such examples.

Call by reference

Call by reference means that the argument passed to the function is a reference to a variable that already exists in memory rather than an as an independent copy of that variable. Hence any changes made to the variable will be impacted even after the function is executed.

Dictionary passed as argument is call by reference; for example:

dict1 = {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5}

def change(dict_val):
print(“inside change function before change:”, dict_val)
dict_val[‘new’] = 100
print(“inside change function after change:”, dict_val)

print(“inside main function before change:”, dict1)
change(dict1)
print(“inside main function after change:”, dict1)

Output:
inside main function before change: {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5}
inside change function before change: {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5}
inside change function after change: {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5, ‘new’: 100}
inside main function after change: {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5, ‘new’: 100}

Let’s now look at some variations.

UserDict

Users can implement their own dictionary by extending UserDict, which works like standard dict. It is preferred to use subclass from UserDict rather than from dict, as the built-in has some implementation shortcuts that end up forcing an override, which can be easily inherited from UserDict. For example:

# Python program to demonstrate Userdict
from collections import UserDict

# Creating a Dictionary where
# deletion is not allowed
class MyDict(UserDict):

# Function to stop deletion
# from dictionary
def __del__(self):
raise RuntimeError(“Deletion not allowed”)

# Function to stop pop from
# dictionary
def pop(self, s = None):
raise RuntimeError(“Deletion not allowed”)

# Function to stop popitem
# from Dictionary
def popitem(self, s = None):
raise RuntimeError(“Deletion not allowed”)

# Driver’s code
d = MyDict({‘a’:1,
‘b’: 2,
‘c’: 3})

print(“Original Dictionary”)
print(d)

d.pop(1)

Output:
Original Dictionary
{‘a’: 1, ‘b’: 2, ‘c’: 3}
Traceback (most recent call last):
File “dict_sample.py”, line 154, in <module>
d.pop(1)
File “dict_sample.py.py”, line 139, in pop
raise RuntimeError(“Deletion not allowed”)
RuntimeError: Deletion not allowed
Exception ignored in: <function MyDict.__del__ at 0x10955b8b0>
Traceback (most recent call last):
File “dict_sample.py.py”, line 134, in __del__
RuntimeError: Deletion not allowed

OrderedDict

This maintains the order of the insertion. The popitem method of an OrderedDict pops the first item by default, but if called by popitem(last=True) will return the last item added.

Ordered dictionary is designed to remember the order of items, which is defined by the insertion order of keys. When the value of a certain key is changed, the position of the key remains unchanged.

The syntax is OrderedDict(). Here’s an example:

from collections import OrderedDict
numbers = OrderedDict()
numbers[“one”] = 1
numbers[“two”] = 2
numbers[“three”] = 3

print(numbers) # OrderedDict([(‘one’, 1), (‘two’, 2), (‘three’, 3)])

Memory allocation

Just like in a dictionary, the memory is allocated and grows linearly based on the number of keys added. However, in an ordered dictionary, nearly 50% more memory is used to preserve the order of the items. Here’s the code:

from collections import OrderedDict
import sys

ordd = OrderedDict()
d = {}

for x in range(15):
ordd[x] = x
d[x] = x
print(“ordered dict:”, sys.getsizeof(ordd), “\t”, “dict:”, sys.getsizeof(d))

‘’’
Output:
-------
ordered dict: 392 dict: 232
ordered dict: 424 dict: 232
ordered dict: 456 dict: 232
ordered dict: 488 dict: 232
ordered dict: 520 dict: 232
ordered dict: 744 dict: 360
ordered dict: 776 dict: 360
ordered dict: 808 dict: 360
ordered dict: 840 dict: 360
ordered dict: 872 dict: 360
ordered dict: 1312 dict: 640
ordered dict: 1344 dict: 640
ordered dict: 1376 dict: 640
ordered dict: 1408 dict: 640
ordered dict: 1440 dict: 640

You can go to https://lerner.co.il/2019/05/12/python-dicts-and-memory-usage/ for further details.

Methods

The move_to_end method is used to move an existing key of the dictionary either to the end or to the beginning. For example, from collections import OrderedDict:

ord_dict = OrderedDict().fromkeys(‘OSFY’)
print(“Original Dictionary”)
print(ord_dict)

# Move the key to end
ord_dict.move_to_end(‘O’)
print(“\nAfter moving key ‘O’ to end of dictionary :”)
print(ord_dict)

# Move the key to beginning
ord_dict.move_to_end(‘F’, last = False)
print(“\nAfter moving Key in the Beginning :”)
print(ord_dict)

Output:
Original Dictionary
OrderedDict([(‘O’, None), (‘S’, None), (‘F’, None), (‘Y’, None)])

After moving key ‘G’ to end of dictionary :
OrderedDict([(‘S’, None), (‘F’, None), (‘Y’, None), (‘O’, None)])

After moving Key in the Beginning :
OrderedDict([(‘F’, None), (‘S’, None), (‘Y’, None), (‘O’, None)])

In the reversed method, reverse the order of the dict keys stored. For example, from collections import OrderedDict:

x = {‘d’:’four’, ‘e’:’five’, ‘a’:’one’}

sample_reversed_dict = OrderedDict(reversed(list(x.items())))
print(“The reversed order dictionary : “ + str(sample_reversed_dict))

Output:
The reversed order dictionary : OrderedDict([(‘a’, ‘one’), (‘e’, ‘five’), (‘d’, ‘four’)])

Counter

Hold the count for each key. Adding a new key will create a single count and updating the existing key will increase the count. Here’s an example:

from collections import Counter

counter_dict_sample = Counter(“helloworld”)
print(counter_dict_sample)

Output:
Counter({‘l’: 3, ‘o’: 2, ‘h’: 1, ‘e’: 1, ‘w’: 1, ‘r’: 1, ‘d’: 1})

ChainMap

ChainMap holds the list of dictionaries, where key search is performed based on the order of the dictionaries stored. For example, interpreters use nested scopes, where each mapping represents a scope context.

pylookup = ChainMap(locals(), globals(), vars(builtins)

MappingProxyType builds a read-only dictionary instance.

Performance

Time complexity for querying is O(1), while time complexity for storing is O(n). Calculating the count is going to slow down a dictionary; it will take O(n) time complexity to calculate the counts for all keys (because each key will have to be hit at least once to append its count to the running count stored in the value).

Space complexity for storing is O(n). (During hash collision, it is more than O(n).)

Let’s look at a sample to see the performance list versus dict. Here’s the code:

import timeit

def find_number_in_list(lst, number):

if number in lst:

return True

else:

return False

short_list = list(range(100))

start = timeit.default_timer()

find_number_in_list(short_list, 1)

stop = timeit.default_timer()

print(“--- short_list %s seconds ---” % (stop - start))

long_list = list(range(10000000))

start = timeit.default_timer()

find_number_in_list(long_list, 1)

stop = timeit.default_timer()

print(“--- long_list %s seconds ---” % (stop - start))

def find_number_in_dict(dct, number):

if number in dct.keys():

return True

else:

return False

short_dict = {x:x*5 for x in range(1,100)}

start = timeit.default_timer()

find_number_in_dict(short_dict, 1)

stop = timeit.default_timer()

print(“--- short_dict %s seconds ---” % (stop - start))

long_dict = {x:x*5 for x in range(1,10000000)}

find_number_in_dict(long_dict, 1)

stop = timeit.default_timer()

print(“--- long_dict %s seconds ---” % (stop - start))

The output is:

>> python list_vs_dict_time.py

--- short_list 1.0299999999990872e-06 seconds ---

--- long_list 4.7639999999904425e-06 seconds ---

--- short_dict 2.97399999998893e-06 seconds ---

--- long_dict 1.768262919 seconds ---

The fastest way to repeatedly lookup data with millions of entries in Python is by using dictionaries. Because dictionaries have built-in mapping in Python, they are highly optimised. However, there is a typical space-time trade-off in dictionaries and lists. Dict still uses more memory than lists, since you need to use space for the keys and the lookup as well, while lists use space only for the values.

In general, O(1) is used to find the element in the dictionary. Though when this is a hash collision, it’s not always O(1).

Dictionaries are a preferable option for mapping key-value elements, as they offer far better performance compared to lists and tuples, especially when it comes to search, add, and delete operations. Sets are similar to dictionaries, but they lack key-value pairing and consist of unique, disordered elements. We will learn more about sets and the underlying concept of hash in the next article in the series.

LEAVE A REPLY

Please enter your comment!
Please enter your name here