If you have come across sets for the first time and if it has left you confused and wondering what sets are all about then you’ve come to the right place!
In this article, let us learn about Sets in Python and learn how and where to use them in our Python programs. We will also look at the various useful methods that come with it.
Sets in a nutshell
Sets are a built-in data structure in Python that is useful when we need to group a collection of unique items.
Have a look at the following code:
mySet = {1,2,3,4,5,6}
Notice how each element can be separated by using a comma and the whole set itself is enclosed with curly braces. This simply is all you need to know to create sets!
We can also create a set using the set() constructor as shown below.
mySet = set([1,2,3,4,5,6])
Here we gave a list to the set constructor to produce a set!
The following are some of the properties of sets:
- Unordered – the items of a set don’t have any defined order
- Unindexed – we can’t access the items with [i] as with lists
- Mutable – a set can be modified
- Iterable – we can loop over the items of a set
The main applications of Python sets include the following:
- Removing duplicates
- Membership test: Checking if an item is present in a collection
- Performing mathematical set operations like union, intersection, difference, and symmetric difference
If the above explanation raises more questions than answers, don’t worry as the above answer was only meant to be a quick reference for people who are already familiar with the topic.
If this is your first encounter with “Python Sets” I suggest setting aside 15 minutes of your time and reading through the rest of this article with undivided attention. By the time you reach the end of the article, I assure you that you will have mastered this topic, and you will be equipped with enough knowledge to tackle almost any challenge related to Python sets!
A Detailed Look Into Sets
What are Sets?
A set is a data structure in Python. It is usually used to store a collection of unique items in a single object.
Let’s have a look at some of the properties of Sets:
- Unique: One defining property of a set is that it contains no duplicates. All the elements inside it are always unique.
- Mutable: They are also mutable, unlike tuples, meaning you can add or remove elements from a set after its creation.
- Unordered: Sets are always unordered and unindexed, unlike lists, In other words, the items are inserted in random order, so you can’t access elements using indices.
- Different data types: The elements inside of a set can be made up of various different data types. For example, you can have a set containing strings, numbers, etc. whatever best suits your project in hand.
- Mathematical sets: A set in python is similar to mathematical sets, and operations like intersection, union, symmetric difference, and more can be applied.
- Iterable: When we need to iterate over the elements of a set, we can loop over it using for loops and while loops.
Set Construction
Python provides us with two ways to create a set:
- By using the built-in set() constructor with an iterable object passed in (such as a list, tuple, or string)
- By placing all the items separated by a comma inside a pair of curly braces {} same as in mathematics.
Let’s implement both of these and see how they each work:
# making a set using the first method: utilizing set() constructor
# list
galaxies = ['Andromeda', 'Milky Way', 'Black Eye', 'Sombrero']
# converting it into a set
galaxies_set1 = set(galaxies)
# making a set using mathematical notation: manually placing items inside curly brackets
galaxies_set2 = {'Milky Way', 'Andromeda', 'Sombrero', 'Black Eye'}
print(galaxies_set1)
print(galaxies_set2)
The code above will give the following output.
{'Milky Way', 'Andromeda', 'Sombrero', 'Black Eye'}
{'Milky Way', 'Andromeda', 'Sombrero', 'Black Eye'}
As you can see
- in line-5 we are creating the set galaxies_set1 using the set constructor and
- in line-7 we are creating the set galaxies_set2 using the mathematical notation
When we printed them, in both cases we got the exact same result. So in essence these are just 2 ways to do the same thing. Which one to pick? Pick whichever feels more comfortable to you!
If you are from a mathematical background, then use the mathematical notation. If you are from a programming background then use the set() constructor!
We can also use the set() constructor without any input parameter. In this case, the constructor returns an empty set. to which we can add items later on to.
empty_set = set()
Visualizing Sets
Now let us look into examples along with a visual aspect to understand better. For this, we will be using the Venn diagrams!
Venn diagrams are used to visualize sets and the relationship between them
For example, the following Venn diagram visualizes the sets we created just now:
A Challenge for you!
I’ve included mini-challenges here and there that will help you practice and implement what you are learning.
Challenge
Let me create and print a set with duplicate items and print the output:
#including the letter ‘r’ two times
letters = {'q','w','e','r','r','t','y'}
print(letters)
Since I have already mentioned the properties of a set(), what do you think the output will look like?
I want to stop reading here and think, this will help you in conditioning your mind to think like a programmer!
Ready with the answer? Alright, let’s see if you have got it right!
{'w', 'r', 'q', 't', 'y', 'e'}
If you got it right congratulations!
If you did not, do not feel bad, you just did some gym for your mind! You must be proud of yourself!
We got this output as two properties of a set are in play here:
- The duplicate ‘r’ is removed and only one is kept as the items of a set are always unique
- We have all the letters mixed up and randomized. This is because the items are unordered and do not follow a strict order.
Checking set size
We can use the inbuilt len() function to check the number of elements in a set as shown below.
>>> galaxies_set = {'Milky Way', 'Andromeda', 'Sombrero', 'Black Eye'}
>>> len(galaxies_set)
4
Membership Test
We also check if a particular element exists or doesn’t exist in a set using the in or not in operators, respectively.
Let’s now check if the element ‘Milky Way’ exists in our collection:
>>> galaxies_set = {'Milky Way', 'Andromeda', 'Sombrero', 'Black Eye'}
>>> print('Milky Way' in galaxies_set)
True
>>> print('Cartwheel' in galaxies_set1)
False
As you can see
- ‘Milky way‘ is part of our collection and hence we get True
- ‘Cartwheel’ is not part of our collection and hence we get False
Let’s try the not in operator now:
>>> galaxies_set = {'Milky Way', 'Andromeda', 'Sombrero', 'Black Eye'}
>> print('Milky Way' not in galaxies_set1)
False
>>> print('Cartwheel' not in galaxies_set1)
True
As you can see the not in operators work the exact opposite way as the in operator.
The in and not in operators are pretty straightforward to understand, if you are curious to learn more we have an entire article dedicated to these operators which you can find here.
Adding elements to a Set
Let’s now learn how to add elements to an existing set.
To add an element to an already existing set, we utilize the appropriately named add() method as shown in the following example
>>> continents= {'North America',' South America', 'Antarctica', 'Asia', 'Africa', 'Australia'}
>>> continents
{'Asia', 'Africa', ' South America', 'North America', 'Australia', 'Antarctica'}
>>> continents.add('Europe')
{'Asia', 'Africa', ' South America', 'North America', 'Australia', 'Antarctica', 'Europe'}
As you can see
- 1st I had a set of all the continents except Europe.
- Then I added Europe to my set using the add() method
As expected Europe got added to our Set!
The syntax used here is
setName.add(element_to_add)
Notice how the method does not return anything, this is because it alters the set in place and returns None
Removing an element from a set
Often in programs, we have the need to remove something from our collection. Thankfully since sets are mutable or changeable we can do that using the following three methods.
- remove() method,
- pop() method, and
- discard() method
Let’s look at them each with different examples:
The remove() method
This method removes the element in place and hence returns None.
planets = {'Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune', 'Pluto'}
# Let’s remove poor Pluto from the planets set
planets.remove('Pluto')
print(planets)
As you can see in line-3 we have removed Pluto from our planets set using the remove() method. As expected we get the following output without Pluto.
{'Mercury', 'Earth', 'Jupiter', 'Venus', 'Mars', 'Saturn', 'Uranus', 'Neptune'}
The syntax used here is
setName.remove(element_to_remove)
If we try to remove an element that does not exist, we will encounter a KeyError as shown below
>>> planets = {'Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune', 'Pluto'}
>>> planets.remove('Sun')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'Sun'
We tried to remove Sun from our planets list, but this time since Sun is not part of our planets set, our code crashes with an error!
But what if we are in a situation where we might need to remove an item, but we are not sure if the item we intend to remove is part of the set or not?
As we just saw using the remove() method in such situations ends up crashing our cool code.
Wouldn’t it be better to simply ignore it in case a given item is not in our set? Instead of crashing the code with a scary error message?
For situations such as these, Python gives us the discard() method which we will see next!
The discard() method
The discard() method is another way of removing an element from a set and does the exact same thing as the remove() method. The only difference is that instead of throwing a KeyError like the remove() method, the discard() method simply ignores items that are not part of our set.
Let us again try removing Sun again from planets set, this time using the discard() method.
>>> planets = {'Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune', 'Pluto'}
>>> planets.discard('Sun')
>>>
As you can see, no error was thrown in this case!
When it sees that Sun is not in planets set, it just ignores it instead of raising an error and moves on to the next statement!
The syntax used here is
setName.discard(element_to_remove)
The pop() method
The pop() method in sets acts a little bit differently from the previous methods in the sense that it does not take any arguments, in fact, you cannot specify which element you want to remove at all!
This method removes an element from the specified list randomly! It itself chooses an element and removes it from the set.
Furthermore, it also returns this element, unlike the previous methods we saw.
Some python versions remove the items in proper order, but do not be fooled this is not enforced and uniform in all Python versions, so use this method with caution!
Let’s have a look at an example to understand better:
>>> cities = {'Madrid', 'Delhi', 'New York', 'Beijing'}
>>> cities.pop()
'Beijing'
>>> cities
{'Delhi', 'New York', 'Madrid'}
As you can see ‘Beijing’ got popped here and we have the remaining 3 cities in our set at the end.
If you try it yourself, you might be forced to think that there is some order in this chaos, but that is just the way your Python version is implemented. Running the same code in another version might yield a different result!
Alright, we now know how to
- create a set
- get the number of elements from a set,
- check if an object is present in a set or not,
- add elements to it, and
- remove elements from it
Now it’s time to do some Math with our sets!
Mathematical Operations on Sets
As mentioned earlier, we can perform mathematical set operations using Python sets.
Let’s look at the following 4 operations with examples to help us understand better.
- Union
- Intersection
- Difference
- Symmetric Difference
Set Union
Simply put, if we have two sets, union means all the elements in both sets
So let’s say we had a set named A and another named B. Union means the combined elements of both A and B, and it is denoted by A ∪ B.
When coding, we use the symbol ‘|’ in Python.
For example, let’s say I have two sets, Countries and Continents:
This is the same Python equivalent of it:
# Defining Countries set
Country = {'Canada', 'India', 'Brazil', 'Australia'}
# Defining Continents set
Continents = {'Antarctica', 'Asia', 'Europe', 'Australia'}
# Defining Union- elements in both the sets
Union = Country | Continents
print(Country)
print(Continents)
print(Union)
OUTPUT:
{'Australia', 'Brazil', 'India', 'Canada'}
{'Australia', 'Europe', 'Asia', 'Antarctica'}
{'Asia', 'Australia', 'Antarctica', 'Brazil', 'India', 'Europe', 'Canada'}
The following Venn diagram shows the same. The shaded area of the set shows the union of both sets:
Set Intersection
It is nothing but the common elements of 2 or more sets. That is if we had a set named A and another named B. Intersection refers to the elements that belong to both set A and set B.
This operation can be performed using the & operator or the intersection() method.
Let’s implement it in the following code:
# Defining Countries set
Country = {'Canada', 'India', 'Brazil', 'Australia'}
# Defining Continents set
Continents = {'Antarctica', 'Asia', 'Europe', 'Australia'}
# Finding intersection- the common elements
intersection = Country & Continents
print(Country)
print(Continents)
print(intersection)
OUTPUT:
{'Canada', 'Australia', 'Brazil', 'India'}
{'Asia', 'Europe', 'Australia', 'Antarctica'}
{'Australia'}
Set Difference
The set difference operation works just like the usual mathematical difference.
Consider two sets again, A and B. The difference between set A and set B is the set of all elements that are only in set A and not in set B.
Let’s understand better by looking at the Countries and Continents set.
If I wanted to get the items that are countries only and aren’t continents, this is how I would do so:
# Defining Countries set
Country = {'Canada', 'India', 'Brazil', 'Australia'}
# Defining Continents set
Continents = {'Antarctica', 'Asia', 'Europe', 'Australia'}
# Finding the difference ( item is present in A but not in B)
difference = Country - Continents
print(difference)
OUTPUT:
{'Canada', 'India', 'Brazil'}
Notice how we didn’t get Australia. This is because Australia is subtracted, or in simple words, removed, just as we wanted!
Set Symmetric Difference
The symmetric difference, simply put, is an operation that allows you to get the elements that are either in the first set or in the second set but not in both. It does exactly the opposite of the set intersection.
Using our “Countries and Continents” example again, let’s try Symmetric Differences and see:
# Defining Countries set
Country = {'Canada', 'India', 'Brazil', 'Australia'}
# Defining Continents set
Continents = {'Antarctica', 'Asia', 'Europe', 'Australia'}
sym_difference = Country ^ Continents
print(sym_difference)
OUTPUT:
{'India', 'Asia', 'Europe', 'Canada', 'Brazil', 'Antarctica'}
Mathematical operations using methods
If you are not a big fan of symbols like “| & ^ ” you can do the exact same operations using methods instead as shown in the examples below.
Set Union: union()
Example:
colors1 = {"Black", "Red", "Yellow", "Orange"}
colors2 = {"Black", "Blue", "Purple", "Indigo"}
print(colors1.union(colors2))
OUTPUT:
{'Purple', 'Indigo', 'Orange', 'Black', 'Blue', 'Yellow', 'Red'}
Set Intersection: intersection()
Example:
colors1 = {"Black","Red", "Yellow", "Orange"}
colors2 = {"Black", "Blue", "Purple", "Indigo"}
print(colors1.intersection(colors2))
OUTPUT:
{'Black'}
Set Difference: difference()
Example:
colors1 = {"Black","Red", "Yellow", "Orange"}
colors2 = {"Black", "Blue", "Purple", "Indigo"}
print(colors1.difference(colors2))
OUTPUT:
{'Red', 'Orange', 'Yellow'}
Set Symmetric Difference: symmetric_difference()
Example
colors1 = {"Black","Red", "Yellow", "Orange"}
colors2 = {"Black", "Blue", "Purple", "Indigo"}
print(colors1.symmetric_difference(colors2))
OUTPUT
{'Purple', 'Red', 'Blue', 'Indigo', 'Yellow', 'Orange'}
Advanced: frozenset
The in-built frozenset class in Python has all the properties of a set but is immutable.
If you’ve ever attempted to use set elements as dictionary keys, you know that this doesn’t work because sets are mutable and thus unhashable.
Luckily, Python has another built-in type named frozenset that has all the properties of a set but is immutable.
Hence, Frozensets are hashable and are accepted as keys to a dictionary!
A one-line summary of sets would be
Set is a data structure in Python that is mutable, unordered, and stores only unique sequences of elements.
Sets are an incredibly useful data structure and hats off to you for mastering it!
And with that, I will end this article!
I hope you enjoyed reading it and got some value out of it!
Thanks to Namazi Jamal for his contributions to writing this article!
Related Articles
Here are some related articles you might find interesting!