📓1.2: Data Collections

Table of Contents


Data Collections

Python provides 4 built-in data structures that allow you to store collections of data in a single named variable:

Lists
Ordered, mutable sequences, declared with square brackets [ ]. Suitable for storing collections of items that can be changed.
Tuples
Ordered, immutable sequences, declared with parentheses ( ). Useful for storing fixed collections of items that should not be modified.
Sets
Unordered collections of unique, immutable elements, declared with curly braces { }. Ideal for storing distinct items and performing set operations.
Dictionaries
Unordered collections of key-value pairs, declared with curly braces { }. Excellent for storing data with associated labels for quick retrieval.

Lists

Lists are one of the most powerful data types in Python. Generally, they’re container objects used to store related items together.

list cheat sheet

type list
use Used for storing similar items, and in cases where items need to be added or removed.
creation [] or list() for empty list, or [1, 2, 3] for a list with items.
search methods my_list.index(item) or item in my_list
search speed Searching in an item in a large list is slow. Each item must be checked.
common methods len(my_list), append(item) to add, insert(index, item) to insert in the middle, pop() to remove.
order preserved? Yes. Items can be accessed by index.
mutable? Yes
in-place sortable? Yes. my_list.sort() will sort the list in-place. my_list.sort(reverse=True) will sort the list in-place in descending order. my_list.reverse() will reverse the items in my_list in-place.

An empty list can be created in two ways. The first, by calling the list() method. More commonly, it’s created with two empty brackets [].

empty_list = list()
another_empty_list = []

Confirm the data type of the list with the type() built-in function

Let’s create a list with a few items in it. Let’s say we want to keep track of a list of names. We add items to our list as strings, and separate them with commas ,:

names = ["Nina", "Max", "Jane"]

We can check its length with the built-in len() method, like so:

print(len(names))

Returns 3

Indexes/Indices

Lists retain the order of the items in them. In the next section, you’ll learn about some data structures that don’t.

In order to access items in a list, we’ll need to use an index. (Multiple indexes are sometimes also called indices). The index for the item you want to access is an integer put in square brackets after the list.

Indexes start at 0 in Python and most other programming languages.

Try accessing the individual items in our list:

print(names[0])
print(names[1])
print(names[2])
Updating an item in a list

To update a particular item in a list use square-bracket notion and assign a new value. my_list[pos] = new_value

names[2] = "Floyd"
print(names)

If you try to access an index that is greater than or equal to (>=) the length of the list, you’ll get an IndexError.

>>> names = ["Nina", "Max", "Jane"]
>>> len(names)
3
>>> names[3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

Common Mistakes

If you forget to include commas between your items, you’ll get a SyntaxError:

>>> numbers = [1, 2 3]
  File "<stdin>", line 1
    numbers = [1, 2 3]
                    ^
SyntaxError: invalid syntax

If you forget the closing bracket, you’ll see a SyntaxError with a different name. It’ll say: SyntaxError: unexpected EOF while parsing or SyntaxError: invalid syntax. In these cases, you also need to check the line of code before the line with the SyntaxError.

Sorting Lists

Sorting sounds complicated, but in practice, it’s just one method call away!

Sorting a copy of your list

If you’d like sort to return a brand new copy of your list, instead of modifying your original copy, you can use the built-in sorted(my_list) function on your list to return a new list, sorted in increasing (ascending) order. Or use sorted(my_list, reverse=True) to create a new list sorted backwards, in decreasing (or descending) order. This operation will not modify the underlying list.

>>> lottery_numbers = [1, 4, 32423, 2, 45, 11]
>>> sorted(lottery_numbers)
[1, 2, 4, 11, 45, 32423]
>>> lottery_numbers
[1, 4, 32423, 2, 45, 11]
>>> sorted(lottery_numbers, reverse=True)
[32423, 45, 11, 4, 2, 1]
>>> lottery_numbers
[1, 4, 32423, 2, 45, 11]
Sorting in-place

You can call my_list.sort() on your list to sort it in increasing (ascending) order, or my_list.sort(reverse=True) on the list to sort it backwards, in decreasing (or descending) order. This operation will modify the underlying list, and doesn’t return a value.

>>> lottery_numbers = [1, 4, 32423, 2, 45, 11]
>>> lottery_numbers.sort()
>>> lottery_numbers
[1, 2, 4, 11, 45, 32423]

>>> lottery_numbers.sort(reverse=True)
>>> lottery_numbers
[32423, 45, 11, 4, 2, 1]

>>> words = ["Umbrella", "Fox", "Apple"]
>>> words.sort()
>>> words
['Apple', 'Fox', 'Umbrella']
Reverse a list in-place

To reverse the items of a list in-place, call my_list.reverse() on it.

>>> lottery_numbers = [1, 4, 32423, 2, 45, 11]
>>> lottery_numbers.reverse()
>>> lottery_numbers
[11, 45, 2, 32423, 4, 1]

list Operations

action method returns possible errors
check length len(my_list) int  
add: to the end my_list.append(item) -  
insert: at position my_list.insert(pos, item) -  
update: at position my_list[pos] = item - - IndexError if pos is >= len(my_list)
extend: add items from another list my_list.extend(other_list) -  
is item in list? item in my_list True or False  
index of item my_list.index(item) int ValueError if item is not in my_list
count of item my_list.count(item) int  
remove an item my_list.remove(item) - ValueError if item not in my_list
remove the last item, or an item at an index my_list.pop() or my_list.pop(pos) item IndexError if pos >= len(my_list)
Checking Length

Before we add or remove items, it’s usually a good idea to check a list’s length. We do that with the len built in function. We can even use the len built in function to check the lengths of other types, like strings.

Let’s see it in action on a names list with two items, and a name string with four characters.

>>> len(names)
2
>>> name = "Nina"
>>> len(name)
4

Adding Items

Let’s start with a list of two names.

>>> names = ["Nina", "Max"]
my_list.append(item) adds to the end of my_list

We can use my_list.append(item) to add an additional item to the end of the list.

>>> names.append("John")
>>> names
['Nina', 'Max', 'John']
my_list.insert(pos, item) inserts an item into my_list at the given position

Use my_list.insert(pos, item) to insert items in an arbitrary position in the list. If the position is 0, we’ll insert at the beginning of the list.

>>> names.insert(0, "Rose")
>>> names
['Rose', 'Nina', 'Max', 'John']
my_list.extend(other_list) adds all the contents of other_list to my_list
>>> names = ["Nina", "Max"]
>>> colors = ["Red", "Blue"]
>>> names
['Nina', 'Max']
>>> names.extend(colors)
>>> names
['Nina', 'Max', 'Red', 'Blue']

Searching for Items

Looking for items in a list is slow. Each item needs to be checked in order to find a match.

This doesn’t matter much when you’re just getting started, unless your data set is large, or if you’re building high-performance systems. If you want to quickly search for an item, you’ll need to use a set or a dictionary instead.

There are a few ways to determine if an item is in the list, and at which position. Let’s try this on our list of names.

names = ["Nina", "Max", "Phillip", "Nina"]
Use the in keyword to determine if an item is present or not.
>>> "Nina" in names
True
>>> "Rose" in names
False
Use the my_list.index(item) method to find the first index of a potential match.

Notice that only the first index of the string "Nina" is returned. We’ll learn more about what an index is in the next chapter.

If the item we’re looking for is not in the list, Python will throw a ValueError.

You’ll learn how to deal with exceptions later. For now, you can use the in operator to check if an item is present in the list before finding its index.

>>> names.index("Nina")
0
>>> names.index("Rose")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'Rose' is not in list
Use the my_list.count(item) method to find out how many times an item appears in a list.
>>> names.count("Nina")
2
>>> names.count("Rose")
0

Updating Items

To update items in a list, use the position of the item you’d like to change using square bracket [] syntax. Like: my_list[pos] = new_item

For example:

>>> names = ["Nina", "Max"]
>>> names[0] = "Rose"
>>> names
['Rose', 'Max']

Or, when used with my_list.index(item):

>>> names = ["Nina", "Max"]
>>> pos = names.index("Max")
>>> names[pos] = "Rose"
>>> names
['Nina', 'Rose']

You’ll see a IndexError: list assignment index out of range if you try to update an item in a position that doesn’t exist, that is if the position is greater than or equal to >= the length of the list.

>>> names = ["Nina", "Max"]
>>> len(names)
2
>>> names[2] = "Rose"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range

Removing Items

There are a few ways to remove items from a list.

Use my_list.remove(item) to remove the first instance of the item

Be careful. remove() only removes the first instance of the item from the list, which isn’t always what we want to do.

>>> names = ["Nina", "Max", "Rose"]
>>> names.remove("Nina")
>>> names
['Max', 'Rose']
>>>
>>>
>>> names = ["Nina", "Max", "Nina"]
>>> names.remove("Nina")
>>> names
['Max', 'Nina']

If we try to remove an item that’s not in the list, we’ll get a ValueError: list.remove(x): x not in list.

>>> names = ["Nina"]
>>> names.remove("Max")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: list.remove(x): x not in list
Use my_list.pop() to remove the last item, or my_list.pop(index) to remove the item at that index

Using pop() will also return the item that was in that position. That’s useful if we want to save the item.

>>> names = ["Nina", "Max", "Rose"]
>>> names.pop()
'Rose'
>>> names
['Nina', 'Max']
>>> names.pop(1)
'Max'
>>> names
['Nina']

If we try to pop an item from an index that is longer than or equal to the length of the list, we’ll get an IndexError: pop index out of range.

>>> names = ["Nina"]
>>> len(names)
1
>>> names.pop(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: pop index out of range

Tuples

Tuples are light-weight collections used to keep track of related, but different items. Tuples are immutable, meaning that once a tuple has been created, the items in it can’t change.

You might ask, why tuples when Python already has lists? Tuples are different in a few ways. While lists are generally used to store collections of similar items together, tuples, by contrast, can be used to contain a snapshot of data. They can’t be continually changed, added or removed from like you could with a list.

tuple cheat sheet

type tuple
use Used for storing a snapshot of related items when we don’t plan on modifying, adding, or removing data.
creation () or tuple() for empty tuple. (1, ) for one item, or (1, 2, 3) for a tuple with items.
search methods my_tuple.index(item) or item in my_tuple
search speed Searching for an item in a large tuple is slow. Each item must be checked.
common methods Can’t add or remove from tuples.
order preserved? Yes. Items can be accessed by index.
mutable? No
in-place sortable? No

A good use of a tuple might be for storing the information for a row in a spreadsheet. That data is information only. We don’t necessarily care about updating or manipulating that data. We just want a read-only snapshot.

Tuples are an interesting and powerful datatype, and one of the more unique aspects of Python. Most other programming languages have ways of representing lists and dictionaries, but only a small subset contain tuples. Use them to your advantage.

Tuple Creation

Let’s say we have a spreadsheet of students, and we’d like to represent each row as a tuple.

student = ("Marcy", 8, "History", 3.5)

Tuple Access by index

We can access items in the tuple by index, but we can’t change them.

>>> student = ("Marcy", 8, "History", 3.5)
>>> student[0]
'Marcy'
>>> student[0] = "Bob"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

We’ll see TypeError: 'tuple' object does not support item assignment if we try to change the items in a tuple.

tuples also don’t have an append or extend method available on them like lists do, because they can’t be changed.

Tuple unpacking

Sounds like a lot of work for not a lot of benefit, right? Not so. tuples are great when you depend on your data staying unchanged. Because of this guarantee, we can use tuples in other types of containers like sets and dictionaries.

It’s also a great way to quickly consolidate information.

You can also use tuples for something called unpacking. Let’s see it in action:

>>> student = ("Marcy", 8, "History", 3.5)
>>>
>>> name, age, subject, grade = student
>>> name
'Marcy'
>>> age
8
>>> subject
'History'
>>> grade
3.5

Sets

Sets are a datatype that allows you to store other immutable types in an unsorted way. An item can only be contained in a set once. There are no duplicates allowed. The benefits of a set are: very fast membership testing along with being able to use powerful set operations, like union, difference, and intersection.

set cheat sheet

type set
use Used for storing immutable data types uniquely. Easy to compare the items in sets.
creation set() for an empty set ({} makes an empty dict) and {1, 2, 3} for a set with items in it
search methods item in my_set
search speed Searching for an item in a large set is very fast.
common methods my_set.add(item), my_set.discard(item) to remove the item if it’s present, my_set.update(other_set)
order preserved? No. Items can’t be accessed by index.
mutable? Yes. Can add to or remove from sets.
in-place sortable? No, because items aren’t ordered.

Creating sets with items

Let’s make a new set with some items in it, and test out important set concepts.

sets can’t contain duplicate values
>>> names = {"Nina", "Max", "Nina"}
>>> names
{'Max', 'Nina'}
>>> len(names)
2
sets can’t contain mutable types

The way that sets allow you to quickly check if an item is contained in them or not is with an algorithm called a hash. I won’t cover the details, but an algorithm is a way of representing an immutable data type with a unique numerical representation. In Python, there’s a built-in hash() function.

The hash() function only works on immutable data types. That means, data types where the contents can’t be changed after creation.

>>> hash("Nina")
3509074130763756174
>>> hash([])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

You’ll see a TypeError: unhashable type: 'list' if you try to add a mutable data type (like a list) to a set.

If you try to add a mutable data type (like a list) to a set, you’ll see the same TypeError, complaining about an unhashable type.

>>> {"Nina"}
{'Nina'}
>>> {[]}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
sets can be used to de-duplicate the items in a list

Tip: If you don’t care about order, you can quickly de-duplicate the items in a list by passing the list into the set constructor.

>>> colors = ["Red", "Yellow", "Red", "Green", "Green", "Green"]
>>> set(colors)
{'Red', 'Green', 'Yellow'}
sets don’t have an order

Sets don’t have an order. That means that when you print them, the items won’t be displayed in the order they were entered in the list.

>>> my_set = {1, "a", 2, "b", "cat"}
>>> my_set
{1, 2, 'cat', 'a', 'b'}

It also means that you can’t access items in the set by position in subscript [] notation.

>>> my_set = {"Red", "Green", "Blue"}
>>> my_set[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'set' object does not support indexing

You’ll see TypeError: 'set' object does not support indexing if you try to access the items in a set by index with my_set[pos]

Tip: If your set contains items of the same type, and you want to sort the items, you’ll need to convert the set to a list first. Or, you can use the built-in sorted(sequence) method, which will do the conversion for you.

>>> my_set = {"a", "b", "cat", "dog", "red"}
>>> my_set
{'b', 'red', 'a', 'cat', 'dog'}
>>> sorted(my_set)
['a', 'b', 'cat', 'dog', 'red']

Adding to and removing from sets

Since a set has no order, we can’t add or remove items to it by index. We need to call the operations with the item itself.

Add items to a set with my_set.add(item).
>>> colors = {"Red", "Green", "Blue"}
>>> colors.add("Orange")
>>> colors
{'Orange', 'Green', 'Blue', 'Red'}
Remove items with my_set.discard(item)

You can remove an item from a set if it’s present with my_set.discard(item). If the set doesn’t contain the item, no error occurs.

>>> colors = {"Red", "Green", "Blue"}
>>> colors.discard("Green")
>>> colors
{'Blue', 'Red'}
>>> colors.discard("Green")
>>> colors
{'Blue', 'Red'}

You can also remove items from a set with my_set.remove(item), which will raise a KeyError if the item doesn’t exist.

Update a set with another sequence using my_set.update(sequence)

You can update a set by passing in another sequence, meaning another set, list, or tuple.

>>> colors = {"Red", "Green"}
>>> numbers = {1, 3, 5}
>>> colors.update(numbers)
>>> colors
{1, 3, 'Red', 5, 'Green'}

Be careful passing in a string to my_set.update(sequence). That’s because a string is also a sequence. It’s a sequence of characters.

>>> numbers = {1, 3, 5}
>>> numbers.update("hello")
>>> numbers
{1, 3, 'h', 5, 'o', 'e', 'l'}

Your set will update with each character of the string, which was probably not your intended result.

set operations

sets allow quick and easy operations to compare items between two sets:

Method Operation Symbol Operation Result
s.union(t) s | t creates a new set with all the items from both s and t
s.intersection(t) s & t creates a new set containing only items that are both in s and in t
s.difference(t) s ^ t creates a new set containing items that are not in both s and in t

We have two sets, rainbow_colors, which contain the colors of the rainbow, and favorite_colors, which contain my favorite colors.

>>> rainbow_colors = {"Red", "Orange", "Yellow", "Green", "Blue", "Violet"}
>>> favorite_colors = {"Blue", "Pink", "Black"}

First, let’s combine the sets and create a new set that contains all of the items from rainbow_colors and favorite_colors using the union operation. You can use the my_set.union(other_set) method, or you can just use the symbol for union |= from the table above.

>>> rainbow_colors | favorite_colors
{'Orange', 'Red', 'Yellow', 'Green', 'Violet', 'Blue', 'Black', 'Pink'}

Next, let’s find the intersection. We’ll create a new set with only the items in both sets.

>>> rainbow_colors & favorite_colors
{'Blue'}

Lastly, We can also find the difference. Create a new set with the items that are in in one, but not the other. We’ll see that "Blue" is missing from the list.

>>> rainbow_colors ^ favorite_colors
{'Orange', 'Red', 'Yellow', 'Green', 'Violet', 'Black', 'Pink'}

There are other useful operations available on sets, such as checking if one set is a subset, a superset, and more, but I don’t have time to cover them all. Python also has a frozenset type, if you need the functionality of a set in an immutable package (meaning that the contents can’t be changed after creation).

Find out more by reading the documentation, or calling help() on set.

Dictionaries

Dictionaries are a useful type that allow us to store our data in key, value pairs. Dictionaries themselves are mutable, but, dictionary keys can only be immutable types.

We use dictionaries when we want to be able to quickly access additional data associated with a particular key.

Looking for a key in a large dictionary is extremely fast. Unlike lists, we don’t have to check every item for a match.

Dictionary cheat sheet

type dict
use Use for storing data in key, value pairs. Keys used must be immutable data types.
creation {} or dict() for an empty dict. {1: "one", 2: "two"} for a dict with items.
search methods key in my_dict
search speed Searching for a key in a large dictionary is fast.
common methods my_dict[key] to get the value by key, and throw a KeyError if key is not in the dictionary. Use my_dict.get(key) to fail silently if key is not in my_dict. my_dict.items() for all key, value pairs, my_dict.keys() for all keys, and my_dict.values() for all values.
order preserved? Sort of. As of Python 3.6 a dict is sorted by insertion order. Items can’t be accessed by index, only by key.
mutable? Yes. Can add or remove keys from dicts.
in-place sortable? No. dicts don’t have an index, only keys.

Creating dicts with items

If we want to create dicts with items in them, we need to pass in key, value pairs. A dict is declared with curly braces {}, followed by a key and a value, separated with a colon :. Multiple key and value pairs are separated with commas ,.

We can call familiar methods on our dictionary, like finding out how many key / value pairs it contains with the built-in len(my_dict) method.

>>> nums = {1: "one", 2: "two", 3: "three"}

>>> len(nums)
3
Side note: What can be used as keys?

Any type of object, mutable or immutable, can be used as a value but just like sets, dictionaries can only use immutable types as keys. That means you can use int, str, or even tuple as a key, but not a set, list, or other dictionary.

The follow is OK:

>>> my_dict = {1: 1}
>>> my_dict = {1: []}

You’ll see a TypeError: unhashable type: 'list' if you try to use a mutable type, like a list as a dictionary key.

>>> my_dict = {[]: 1}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Accessing Dictionary Items

Our dict contains key, value pairs. Because a dictionary isn’t ordered, we can’t access the items in it by position. Instead, to access the items in it, we use square-bracket my_dict[key] notation, similar to how we access items in a list with square bracket notation containing the position.

>>> nums = {1: "one", 2: "two", 3: "three"}
>>> nums[1]
'one'
>>> nums[2]
'two'

We’ll get a KeyError: key if we try to access my_dict[key] with square bracket notation, but key isn’t in the dictionary.

>>> nums = {1: "one", 2: "two", 3: "three"}
>>> nums[4]

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 4

One way to get around this is to use the my_dict.get(key) method. Using this method, if the key isn’t present, no error is thrown, and no value (aka the None type) is returned.

>>> nums = {1: "one", 2: "two", 3: "three"}
>>> nums.get(4)

>>> result = nums.get(4)
>>> type(result)
<class 'NoneType'>

If we want to provide a default value if the key is missing, we also pass an optional argument to the my_dict.get(key) method like so: my_dict.get(key, default_val)

>>> nums = {1: "one", 2: "two", 3: "three"}
>>> nums.get(4, "default")
'default'

Adding & Removing Items

To add a new key value pair to the dictionary, you’ll use square-bracket notation.

If you try to put a key into a dictionary that’s already there, you’ll just end up replacing it. To avoid subtle bugs, you can check if a particular key is in a dictionary with the in keyword. We’ll cover that technique in the Control Statements and Looping topic.

>>> nums = {1: "one", 2: "two", 3: "three"}
>>> nums[8] = "eight"

>>> nums
{1: 'one', 2: 'two', 3: 'three', 8: 'eight'}

>>> nums[8] = "oops, overwritten"
>>> nums
{1: 'one', 2: 'two', 3: 'three', 8: 'oops, overwritten'}
>>> 8 in nums
True

Updating Items

Just like with lists an sets, you can update the items in a dictionary with the items from another dictionary.

>>> colors = {"r": "Red", "g": "Green"}
>>> numbers = {1: "one", 2: "two"}
>>> colors.update(numbers)
>>> colors
{'r': 'Red', 'g': 'Green', 1: 'one', 2: 'two'}

Complex Dictionaries

One incredibly useful scenario for dictionaries is storing the values in a list or other sequence.

>>> colors = {"Green": ["Spinach"]}
>>> colors
{'Green': ['Spinach']}
>>> colors["Green"].append("Apples")
>>> colors
{'Green': ['Spinach', 'Apples']}

Working with items, keys, & values

There are three useful methods you need to remember about dictionary access:

  1. my_dict.keys()
  2. my_dict.values()
  3. my_dict.items()
1. my_dict.keys() Getting all the keys in a dictionary
>>> nums = {1: 'one', 2: 'two', 3: 'three', 8: 'eight'}
>>> nums.keys()
dict_keys([1, 2, 3, 8])
2. my_dict.values() Getting all the values in a dictionary.
>>> nums = {1: 'one', 2: 'two', 3: 'three', 8: 'eight'}
>>> nums.values()
dict_values(['one', 'two', 'three', 'eight'])
3. my_dict.items() Getting all the items (key, value pairs) in a dictionary

Notice that my_dict.items() returns a type that looks like a list. It contains two-item tuples containing the key, value pairs.

>>> nums = {1: 'one', 2: 'two', 3: 'three', 8: 'eight'}
>>> nums.items()
dict_items([(1, 'one'), (2, 'two'), (3, 'three'), (8, 'eight')])

Mutability

Mutability, simply put: the contents of a mutable object can be changed, while the contents of an immutable object cannot be changed.

Simple Data Types

All of the simple data types we covered first are immutable:

type use mutable?
int, float, decimal store numbers no
str store strings no
bool store True or False no

Collection/Container Types

For the mutability of the container types we covered in this chapter, check this helpful list:

container type use mutable?
list ordered group of items, accessible by position yes
set mutable unordered group consisting only of immutable items. useful for set operations (membership, intersection, difference, etc) yes
tuple contain ordered groups of items in an immutable collection no
dict contains key value pairs yes

Acknowledgement

Content on this page is adapted from LearnPython - Nina Zakharenko.