Chapter 4 — Sets, Dictionaries, and JSON

4.0 Sets

This week we’re introducing two new data structures: sets and dictionaries. Let’s start with sets, as they are the simpler of the two.

A set in Python is just like what you have learned about in math class way back in the day: a collection of unique items. So a set is really a lot like a list, except that it cannot have duplicates, and it isn’t ordered (meaning you can’t access items by index, like my_animals[2]). Sets can be of any data type: int, float, string, boolean, even lists and other sets.

Creating sets

To declare an empty set and then add items to it in Python:

my_animals = set()

my_animals.add("dog")
my_animals.add("cat")
my_animals.add("lion")
my_animals.add("giraffe")
my_animals.add("giraffe")  # giraffe was added twice, but because its a set this will be ignored.

If you print a set, it prints out inside curly brackets:

my_animals = {"dog", "cat", "lion", "giraffe"}
print(my_animals)
'''
the above command will generate:
{'dog', 'lion', 'giraffe', 'cat'}
'''

You can also create with items in it like you can with a list; just use curly brackets instead of square brackets:

my_animals = {"dog", "cat", "lion", "giraffe"}
print(my_animals)
'''
the above command will generate:
{'dog', 'lion', 'giraffe', 'cat'}
'''

Notice that when we print a set, its members are not in the order they were inserted, nor are they in alphabetical order. This is because sets are unordered, and the order you see is an implementation detail that depends on “hashing” (which helps Python check for membership quickly).

Alas, you can’t sort a set in-place, but there is a workaround: you can make a sorted list of the set’s elements using the sorted() function:

my_animals = {"dog", "cat", "lion", "giraffe"}
sorted_animals = sorted(my_animals)
print(sorted_animals)  # ['cat', 'dog', 'giraffe', 'lion']

This trick will come up again in 4.2. Sequence Tricks.

Just as we can convert sets to lists, the opposite is also true: you can convert other data structures, like tuples and lists, into sets, thereby removing the duplicates. You do this by passing the list into the set() definition.

animal_list = ["lion", "lion", "tiger", "dog", "cat"]
animal_set = set(animal_list)

Notice that the set definition looks like a function: it has parentheses and can be passed an argument. It is a function! If you don’t provide it with an argument, it creates an empty set by default (so set() is equivalent to set([])). Otherwise, it creates a set out of whatever you give it as input.

Accessing set members

Because sets are not officially ordered, you CANNOT access items in a set like a list, with a number:

my_animals = {"dog", "cat", "lion", "giraffe"}

# Let's try to access the third item in the set
my_animals[2]  # Oops! This would generate an error!
# Sets are UNORDERED, so there is no zeroth item, much less a third item!

Instead, the main way to access elements of a set is to set up a for loop with the set as the iterator. You can also use the if statement to see if an item is in a set.

my_animals = {"dog", "cat", "lion", "giraffe"}
for item in my_animals:
    print(item)

the_animal = "lion"
if the_animal in my_animals:
    print(the_animal)

Set methods

Sets have a bunch of methods (in addition to .add()), just like lists.

my_animals = set(['dog', 'cat', 'lion', 'giraffe'])
your_animals = {'dog', 'mouse', 'giraffe', 'elephant'}

my_animals.add({'bear', 'dolphin'})  # Adds an element to the set. Note you can add other sets
my_animals.discard('lion')  # Removes the specified item; does nothing if not in set
my_animals.remove('lion')   #  Removes the specified element; or gives error if not in set
my_animals.clear()  # Removes all elements from set

# Resetting my_animals
my_animals = set(['dog', 'cat', 'lion', 'giraffe'])
the_copy = my_animals.copy()  # Gives you a copy of the set

the_diff = my_animals.difference(your_animals)  # Returns a set containing the difference between two or more sets
# result would be the set {'cat', 'lion'}, the stuff in the set the method was used for that aren't in the one passed in
the_diff = your_animals.difference(my_animals)
# result would be {'mouse', 'elephant'}

our_animals = my_animals.union(your_animals)    # Returns a set containing the union of the two sets
the_intersection = my_animals.intersection(your_animals)  # Returns a set that is the intersection of the two sets
# the result would be the set {'dog', 'lion'}, assuming lion hadn't been removed or my_animals cleared
my_animals.symmetric_difference(your_animals)   # Returns a set with all items not in both sets
# the result would be the set {'cat', 'lion', 'mouse', 'elephant'

is_disjoint = my_animals.isdisjoint(your_animals)   # Returns T/F, whether two sets have an intersection or not
my_animals.issubset(your_animals)   # Returns T/F, whether another set contains this set or not
my_animals.issuperset(your_animals) #  Returns whether this set contains another set or not

Many of these methods can also be run across multiple sets at once. Here are a few examples:

# let's make three sets. First, we'll use the set keyword to make a set from a list
my_animals = set(["dog", "cat", "lion", "giraffe"])
your_animals = {"dog", "mouse", "giraffe", "elephant"} # using curly brackets to make a set
their_animals = {"dog", "lion", "tiger", "elephant"}

my_sets = [my_animals, your_animals, their_animals]

# we can use set.union to get all the possible animals from all three sets
# we can do this by passing all the sets as arguments
all_animals = set.union(my_animals, your_animals, their_animals)
# returns the set {'dog', 'cat', 'lion', 'giraffe', 'mouse', 'elephant', 'tiger'}
# though the order you see may differ, since sets are unordered collections

# we can get all the animals held in common with set.intersection
# here we are unpacking the sets with the * operator
common_animals = set.intersection(*my_sets)
# returns {'dog'} since only dog is in all three sets

4.1 Dictionaries

Now let’s learn one of the most important and useful data structures in python: dictionaries.

So far all of our data structures have been simple sequences:

range: a set of numbers
list: a mutable (i.e., changeable) ordered sequence
tuple: an immutable (i.e., unchangeable) ordered sequence
set: a mutable unordered sequence without duplicates (you can actually create an immutable set using frozenset() instead of set(), and you can use all the same methods except those that change the set)

Dictionaries are different, in that each element of a dictionary is a pair of things, a key, and a value. Dictionaries are like a set of unique keys, where each key points to a value. Think of an actual dictionary. It has a set of words, and each one points to a definition. We can create an actual dictionary with a python dictionary:

word_meaning_dict = {
    "run": "to move with haste; act quickly",
    "jump": "to spring clear of the ground or other support by a sudden muscular effort",
    "talk": "to consult or confer"
}

To create a dictionary in python, we use curly brackets just like a set, except that instead of adding single items, we add key-value pairs. Dictionaries can be defined as we did above, with data already in them, or they can be defined empty (with just the curly brackets with nothing inside them). In either case, we can add items to a dictionary after defining it like this:

word_meaning_dict = {}
word_meaning_dict["eat"] = "to take into the mouth and swallow for nourishment; chew and swallow (food)."

We can then access dictionary items by using the key in the same way we just added one:

print(word_meaning_dict["jump"])
# output would be: "to spring clear of the ground or other support by a sudden muscular effort"

Sometimes it is useful to think of a dictionary like a list, with an important difference. In a list, the organization is the order of a list, so we access each member using a number. In a dictionary, the organization is the key. We access data the same way as with a list (using square brackets), but we provide the key instead of a number.

Dictionaries are mutable, meaning you can change them. The keys of a dictionary must be unique, but the values do not need to be. You could, for example, have a dictionary of people’s names and their birthdays. Every name in the dictionary would need to be unique, but multiple people could have the same birthday.

Data structures you can use in dictionaries

The values of dictionaries can be any kind of data type: strings, numbers, booleans, lists, tuples, sets, even other dictionaries. As we will talk about in later sections, data structures can get pretty complicated: dictionaries of dictionaries, lists of dictionaries, dictionaries of lists of sets. Oh my!

The keys of dictionaries can be any immutable data type: string, int, float, tuple, boolean, frozenset. One way to think about and make sense of this is that you cannot change a key in a dictionary once you’ve created it. You can replace it, but not change it. In the example below, I can change the value of ‘Jon’, but I cannot change the key. If I wanted to do that I would need to delete the entry and create a new entry.

birthday_dict = {'Jon': '11-04', 'Andrew': '10-21', 'Jonathan': '12-03', 'Lin Khern': '10-04'}
birthday_dict['Jon'] = '11-11'

# imagine the name was misspelled
del birthday_dict['Jon'] # deletes the key and value
birthday_dict['John'] = '11-04'

This is the reason you cannot use a mutable data type (like a list or a set) as the key of a dictionary. If you made the key a list of words, and then added a word to the list, then the key would change, and that’s not allowed. But you can use a tuple of words as a key, and if you want to be able to change the keys, then you just have to use the delete and replace method described above.

Accessing dictionary data

In addition to accessing data by using they key inside square brackets, there are other ways you can get at dictionary data.

Get a list of a dictionaries keys, or values, or both:

word_meaning_dict = {
    "run": "to move with haste",
    "jump": "to spring clear of the ground or other support by a sudden muscular effort",
    "talk": "to consult or confer",
}
word_list = word_meaning_dict.keys()
# ["run", "jump", "talk"]

meaning_list = word_meaning_dict.values()
# ["to move with haste", "to spring clear of the ground", "to consult or confer"]

word_meaning_list = word_meaning_dict.items()
# .items() gives you a list of tuples
# [("run", "to move with haste"), ("jump", "to spring clear of the ground"), ("talk", "to consult or confer")]

Use the len() function to see how many entries are in a dictionary:

word_meaning_dict = {
    "run": "to move with haste",
    "jump": "to spring clear of the ground or other support by a sudden muscular effort",
    "talk": "to consult or confer",
}
print(len(word_meaning_dict))  # output is 3

Looping over dictionaries

Dictionaries are a sequence (of keys), so you can also loop over dictionaries just like a list.

word_meaning_dict = {
    "run": "to move with haste; act quickly",
    "jump": "to spring clear of the ground or other support by a sudden muscular effort",
    "talk": "to consult or confer",
}

for word in word_meaning_dict:
    print(f"{word}: {word_meaning_dict[word]}")

Checking to see if a key is in a dictionary

You can combine the previous methods to iterate over keys and values at the same time:

word_meaning_dict = {
    "run": "to move with haste; act quickly",
    "jump": "to spring clear of the ground or other support by a sudden muscular effort",
    "talk": "to consult or confer",
}

for word, definition in word_meaning_dict.items():
    print(f"{word}: {definition}")

Remember that .items() gives you back a list of tuples, so what’s really happening here is that you’re creating that list of tuples, and then unpacking the values of that tuple one at a time into the two variables provided in the for loop.

Finally, as with lists and sets, you can check to see if a key is in a dictionary:

word_meaning_dict = {
    "run": "to move with haste; act quickly",
    "jump": "to spring clear of the ground or other support by a sudden muscular effort",
    "talk": "to consult or confer",
}

word = "run"
if word in word_meaning_dict:
    print(f"Found the word {word}. It's meaning is: {word_meaning_dict[word]}")

When you aren’t sure if a key is in a dictionary, you can use .get() to safely access the value stored at that key (if it exists) and return a default value if it doesn’t exist.

word_meaning_dict = {
    "run": "to move with haste; act quickly",
    "jump": "to spring clear of the ground or other support by a sudden muscular effort",
    "talk": "to consult or confer",
}

print(word_meaning_dict.get("run", "Unknown word")) # "to move with haste; act quickly"
print(word_meaning_dict.get("walk", "Unknown word")) # "Unknown word"

Conversely, if you want your dictionary to have default values baked into it, you can use defaultdict from the collections module.

from collections import defaultdict

word_meaning_dict = defaultdict(lambda: "Unknown word")
word_meaning_dict["run"] = "to move with haste; act quickly"

print(word_meaning_dict["run"]) # "to move with haste; act quickly"
print(word_meaning_dict["walk"]) # "Unknown word"

This is especially useful when counting things:

word_counts = defaultdict(int) # This creates a dictionary with default value 0
# this is because `int()` returns 0: the default value for an int

# Let's suppose we have a list of words and we want to count the occurrences of each word
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
for word in words:
    word_counts[word] += 1

print(word_counts) # {"apple": 3, "banana": 2, "cherry": 1}

Tip

You can also count the occurrences of each word in a list using the Counter class from the collections module.

from collections import Counter

words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
word_counts = Counter(words)
print(word_counts) # Counter({'apple': 3, 'banana': 2, 'cherry': 1})

Nested dictionaries

Of course, dictionaries are not limited to what is allowed as a value. This means you can even have dictionaries inside of other dictionaries ad infinitum!

# a nested dictionary of cities and their information
cities = {
    "Chicago": {
        "state": "IL",
        "population": 2_709_364,
        "coordinates": {"lat": 41.881832, "lon": -87.623177},
        "sports": {"MLB": ["Cubs", "White Sox"], "NBA": ["Bulls"]},
    },
    "St. Louis": {
        "state": "MO",
        "population": 279_695,
        "coordinates": {"lat": 38.6270, "lon": -90.1994},
        "sports": {"MLB": ["Cardinals"]},
    },
}

We can then access information from each level of the dictionary using square brackets.

# see the whole dictionary
print(cities)

# get all the information about Chicago
print(cities["Chicago"])

# get the population of Chicago
print(cities["Chicago"]["population"])

# get the latitude and longitude of Chicago
print(cities["Chicago"]["coordinates"]["lat"])
print(cities["Chicago"]["coordinates"]["lon"])

# get the MLB teams in Chicago
print(cities["Chicago"]["sports"]["MLB"])

# get the MLB teams in St. Louis
print(cities["St. Louis"]["sports"]["MLB"])

# increment the population of Chicago by 50_000
cities["Chicago"]["population"] += 50_000
print(cities["Chicago"]["population"])

Extending dictionaries

We saw earlier that you can extend a dictionary by adding a new key-value pair to it.

# add a new city to the `cities` dictionary
cities["New York"] = {
    "state": "NY",
    "population": 8_405_837,
    "coordinates": {"lat": 40.7128, "lon": -74.0060},
    "sports": {"MLB": ["Mets"], "NBA": ["Knicks"]},
}

Sometimes, you may find yourself wanting to extend a dictionary with the contents of yet another dictionary. You can accomplish this using the .update() method.

other_cities = {
    "Los Angeles": {
        "state": "CA",
        "population": 3_971_883,
        "coordinates": {"lat": 34.0522, "lon": -118.2437},
        "sports": {"MLB": ["Dodgers"], "NBA": ["Lakers"]},
    },
    "Houston": {
        "state": "TX",
        "population": 2_325_502,
        "coordinates": {"lat": 29.7604, "lon": -95.3698},
        "sports": {"MLB": ["Astros"], "NBA": ["Rockets"]},
    },
}

# extend the `cities` dictionary with the contents of `other_cities`
cities.update(other_cities)

However, there is another way to do this, using the ** operator.

# extend the `cities` dictionary with the contents of `other_cities`
cities.update(**other_cities)

This is a little more concise, and comes up fairly often in function definitions (as with the **kwargs argument) and when creating instances of classes from dictionaries (which we will learn about in Chapter 6.0.).

4.2 Sequence tricks

Now that we have covered all of our basic data types in Python, let’s talk about a few more tricks.

Sorting data structures

We’ve already sorted lists. You can use the same trick to sort sets and dictionaries.

Now remember that sets are not ordered, so sorting it converts it into a list.

my_animals = {'dog', 'lion', 'mouse', 'cat'}
sorted_animals = sorted(my_animals)
print(sorted_animals)
# results in ['cat', 'dog', 'lion', 'mouse']

It sorts alphabetically, as you can see above. You can sort in reverse by adding the reverse=True argument.

my_animals = {'dog', 'lion', 'mouse', 'cat'}
sorted_animals = sorted(my_animals, reverse=True)
print(sorted_animals)
# results in ['mouse', 'lion', 'dog', 'cat']

You can sort dictionaries too, but it is a little more complicated since dictionaries contain key-value pairs. But remember that we can use the .keys(), .values(), and .items() methods to get lists of the keys, values, or key-value pairs. We can then sort them the same way we do lists.

animal_populations = {'dog': 471000000, 'cat': 600000000, 'bee': 2000000000000, 'human': 8000000000}
animals = sorted(animal_populations.keys())
populations = sorted(animal_populations.values())
populations = sorted(animal_populations.items())

Remember that .items() creates tuples with the key and then the value, so this will sort the tuples by the key. But if we wanted them sorted by the value we can do that too:

animal_populations = {'dog': 471000000, 'cat': 600000000, 'bee': 2000000000000, 'human': 8000000000}
sorted(animal_populations.items(), key=lambda item: item[1])

It’s quite common to sort using lambda functions rather than fully fledged functions. (For a review of lambda functions, see the eponymous section in Chapter 3.0.) Basically, we can use this lambda expression to say what we want to sort the tuples by. Using item[1] sorts by the second slot, the values from the dictionary. Using item[0] would have sorted by the keys. And of course, it doesn’t matter what we call the parameter in the lambda function, so long as we’re consistent. We could have used x instead of item and it would have worked equally well.

Comprehensions

Although we gave a couple of examples of list comprehensions in Chapter 1.0., we didn’t cover them in detail because they require that you understand how loops work first. Now that you’ve learnt how to use loops, we can revisit them and see how they can be used to simplify the creation of collections.

List comprehensions

Let’s begin by way of example. Suppose that you want to create a list of the squares of only the even numbers from 1 to 5. We can accomplish this easily using a for loop:

squares = []
for i in range(1, 6):
    if i % 2 == 0:
        squares.append(i**2)
print(squares) # [4, 16]

This is a perfectly cromulent way to get the squares of the even numbers, and for many other languages, this may even be the best way to do it. However, in Python we can do a little better. Consider the following:

squares = [i**2 for i in range(1, 6) if i % 2 == 0]
print(squares) # [4, 16]

What did we just do? By reorganizing the components of the for loop, we were able to get the same result using a single line of code where it previously took 4. Whenever you see a pattern like the one above, you should immediately think about substituting it with a comprehension.

Let’s make the pattern even clearer. When you see something like this:

empty_collection = [] # make an empty collection
for i in range(start, end): # loop over something in a sequence
    if some_condition_is_met: # (optional) condition to filter the items
    empty_collection.append(i) # add to the collection

You can substitute it with this:

# create and populate the collection in one line
empty_collection = [i for i in range(start, end) if some_condition_is_met]

You can also add conditions to the comprehension to filter the items (as we did in the even squares example above), or even have multiple loops when you need to iterate over multiple sequences at once; the pattern remains the same. Consider the case where you want to set up coordinates in a two-dimensional (2D) grid, with columns denoted by letters (A-D) and rows denoted by numbers (1-4):

coords = []
for column in 'ABCD':
    for row in range(1, 5):
        coords.append((column, row))
print(coords)

# gives us our 2D grid of coordinates, though of course printed on one long line;
# I'm breaking it up here for readability.
# [('A', 1), ('A', 2), ('A', 3), ('A', 4), 
# ('B', 1), ('B', 2), ('B', 3), ('B', 4), 
# ('C', 1), ('C', 2), ('C', 3), ('C', 4),
# ('D', 1), ('D', 2), ('D', 3), ('D', 4)]

You can instead write this as a comprehension:

coords = [(column, row) for column in 'ABCD' for row in range(1, 5)]
print(coords)

# same result as above!
# notice that here we have two loops and no conditional statement

There is no practical limit to how many clauses, loops, and conditionals you can squeeze into a comprehension, but it’s not recommended to go overboard. Think of a single line of code as a single sentence you can express in natural language. You can, of course, say a sentence that is extremely long with several nested clauses, but then it becomes difficult to understand. So too it is with code. Stick to two or three clauses max when using comprehensions.

Comprehensions are a general pattern in the language that can be applied to any type of collection. Below we’ll go over how we can use comprehensions to create dictionaries, sets, and lastly, generator expressions, which operate a little differently from the others.

Dictionary comprehensions

Dictionary comprehensions look very similar to list comprehensions, substituting dictionaries for lists.

squares = {i: i**2 for i in range(1, 6)}
print(squares)  # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

even_squares = {i: i**2 for i in range(1, 6) if i % 2 == 0}
print(even_squares)  # {2: 4, 4: 16}

Set comprehensions

As you might expect by now, set comprehensions have the same sort of syntax:

even_squares = {i**2 for i in range(1, 6) if i % 2 == 0}
print(even_squares)  # {4, 16}

Generator expressions

Above we saw examples of creating whole collections from scratch using comprehensions. However, sometimes the collections you might want to create are so big that they won’t fit in your computer’s limited memory. When that happens, you wouldn’t want to create the whole collection at once, because it would be too heavy. Rather, you can tell the interpreter, “This is the pattern that the sequence should follow, and I want to access these one at a time, without storing them all at once”. Generator expressions (accomplished with round parentheses) allow us to do that by means of lazy evaluation; the values won’t be computed until they are needed.

squares = (i**2 for i in range(1, 6))
print(squares)  # <generator object <genexpr> at 0x00000247C4A39A40>

even_squares = (i**2 for i in range(1, 6) if i % 2 == 0)
print(even_squares)  # <generator object <genexpr> at 0x00000247C49535E0>

The tricky part about working with these expressions is that you can’t see inside of them directly. If you try to print them, you’ll see something like the above, where you can see a “generator object” at a particular memory address. You also can’t index into them directly (e.g., squares[4] will raise a TypeError). But we can make use of them in the context of a loop:

squares = (i**2 for i in range(1, 6))
for square in squares:
    print(square) # 1, 4, 9, etc., printed one at a time on separate lines without commas

Or by using the next() function:

squares = (i**2 for i in range(1, 6))
print(next(squares)) # 1
print(next(squares)) # 4
print(next(squares)) # 9
print(next(squares)) # 16
print(next(squares)) # 25

Or by using functions that expect an iterator object like list() or sum():

squares = (i**2 for i in range(1, 6))
print(list(squares)) # [1, 4, 9, 16, 25]

# need to create the generator again here, otherwise it will be empty!
squares = (i**2 for i in range(1, 6))
print(sum(squares)) # 55

Once you exhaust the generator expression, it will raise a StopIteration exception, so you can’t iterate over it again. This is why I had to create squares twice above! If I had just done it once, then by the second time I tried to use it, the output would be empty (meaning that the sum would amount to 0 in the above case).

To sum up, comprehensions are perhaps the most unique and powerful feature of Python, and can be found all over the place. Be aware of the type of brackets you use when working with them: square brackets create lists, curly braces create dictionaries (or sets, depending on the syntax), and round brackets create generator expressions.

4.3. JSON files

Dictionaries are very useful, and used all the time, especially once you start embedding other data structures in them. They can effectively be a database.

animal_dictionary = {
    "leo": {
        "common name": "lion",
        "genus": "panthera",
        "features": ["has a mane", "lives in groups", "has bad stamina"],
        "subspecies": ["leo", "melanochaita"],
    },
    "tigris": {
        "common name": "tiger",
        "genus": "panthera",
        "features": ["has stripes", "lives in Asia", "likes to swim"],
        "subspecies": ["tigris", "sondaica"],
    },
    "pardus": {
        "common name": "leopard",
        "genus": "panthera",
        "features": ["has spots", "likes to climb trees", "is solitary"],
        "subspecies": [
            "pardus",
            "fusca",
            "melas",
            "nimr",
            "tulliana",
            "orientalis",
            "delacouri",
            "kotiya",
        ],
    },
    "onca": {
        "common name": "jaguar",
        "genus": "panthera",
        "features": ["lives in Americas", "has spots", "nocturnal"],
        "subspecies": [],
    },
    "uncia": {
        "common name": "snow leopard",
        "genus": "panthera",
        "features": ["is white", "has spots", "likes the cold"],
        "subspecies": [],
    },
}

This dictionary has species names as its keys, and its values are themselves dictionaries that all share the same keys:

common name (whose value is a string)
genus (whose value is a string)
features (whose value is a list)
subspecies (whose value is a list).

You can imagine that we might often want to write data that we store in dictionaries to a file, but preserve the same format and organization, and then be able to re-import it back into the same dictionary format. Imagine trying to write a program that would do this. It would be pretty complicated!

Luckily someone has already done that hard work for us. A special kind of file was developed that is called a JSON file. JSON is short for “JavaScript Object Notation”, because the file format was first developed to be able to solve this exact same problem in the JavaScript langauge. But the JSON file format became popular and used more generally, and so most popular programming languages have developed a way to read files to that format.

Creating JSON files

To make this work, what we really want to do is convert a dictionary (and all its parts) into a string, and then write that string to a file. The json module does that first part for us, and then we can just use normal file writing to create the file.

import json

animal_json_string = json.dumps(animal_dictionary)

with open("animal_data.json", "w") as animal_file:
    animal_file.write(animal_json_string)

The json.dumps() function creates a string that includes all the quotes, colons, commas, and brackets (square and squiggly) that identify the keys and values of the dictionary, and the kinds of data structures that were stored in each one. So when we write the file, we get this:

{
  "leo": {
    "common name": "lion",
    "genus": "panthera",
    "features": ["has a mane", "lives in groups", "has bad stamina"],
    "subspecies": ["leo", "melanochaita"]
  },
  "tigris": {
    "common name": "tiger",
    "genus": "panthera",
    "features": ["has stripes", "lives in Asia", "likes to swim"],
    "subspecies": ["tigris", "sondaica"]
  },
  "pardus": {
    "common name": "leopard",
    "genus": "panthera",
    "features": ["has spots", "likes to climb trees", "is solitary"],
    "subspecies": [
      "pardus",
      "fusca",
      "melas",
      "nimr",
      "tulliana",
      "orientalis",
      "delacouri",
      "kotiya"
    ]
  },
  "onca": {
    "common name": "jaguar",
    "genus": "panthera",
    "features": ["lives in Americas", "has spots", "nocturnal"],
    "subspecies": []
  },
  "uncia": {
    "common name": "snow leopard",
    "genus": "panthera",
    "features": ["is white", "has spots", "likes the cold"],
    "subspecies": []
  }
}

It’s just one big string, with no formatting, no line breaks, other than what you get from automatic text-wrapping. If you open a .json file in most code editing software (like sublime text, pycharm, etc.) it will color code it so you can still identify the different variables. If you want the file to look nicer and be readable, you can supply an additional parameter:

animal_dict_string = json.dumps(animal_dictionary, indent=4)

This will result in a file that looks nice and readable (only showing the first portion below):

{
    "leo": {
        "common name": "lion",
        "genus": "panthera",
        "features": [
            "has a mane",
            "lives in groups",
            "has bad stamina"
        ],
        "subspecies": [
            "leo",
            "melanochaita"
        ]
    },
    "tigris": ...

Reading JSON Files

We can easily read in data in this format as well.

import json

with open("animal_data.json") as file_handle:
    animal_json_string = file_handle.read()
animal_dictionary = json.loads(animal_json_string)

We read in the file like we normally would, and store it as a string. Don’t do any processing on it (strips or splits)! Then we just use the json.loads() function to convert that string back to a dictionary.

Alternatively, you can use the json.load() function on the opened file, which will read the file for you and convert it to a dictionary:

import json

animal_dictionary = {} # initialize to an empty dict just in case the file is not found
with open("animal_data.json", "r") as fh:
    animal_dictionary = json.load(fh)

# can also use `animal_dictionary = json.load(open("animal_data.json"))`
# but this may result in the file not being closed in a timely manner; this can cause issues

4.4. Lab 4

"""
This lab will be the first time that we directly organize the questions into separate functions,
then call them from the main function. This is to practice the general structure of a
well-organized program. When appropriate, this structure will be followed for the
labs from here on out. Follow the instructions within each function to complete the lab.
"""
def q1():
    print("\n######## Question 1 ########\n")
    """
        Use a print statement inside this function to describe:
        - one way a set is different than a dictionary
        - one way a set is different than a list
        - one way a list is different than a dictionary
    """


def q2():
    print("\n######## Question 2 ########\n")
    """
        For the following two sets, create a print statement that uses set methods to tell us:
        - how many items are in each set
        - what items are unique to each set
        - what items are in both sets
        Return the african animals and asian animals at the end of this function
    """
    african_animals = {"lion", "monkey", "elephant", "zebra", "hippo", "hyena"}
    asian_animals = {"tiger", "panda", "reindeer", "elephant", "monkey", "lion"}

def q3(african_animals, asian_animals):
    print("\n######## Question 3 ########\n")
     """
    - Make sure that your `q2()` function returns the african and asian animal sets
    - Pass those two sets into this function as inputs
    - then create your own list of tuples, containing at least 4 animals from either africa or asia:
    e.g., ("my_cool_animal", "africa") or ("my_other_cool_animal", "asia")
    - then write a loop that iterates over that list of tuples you just defined, adding each animal to the correct set
    - print both sets with a label:
        african animals: lion, monkey, elephant, ...
        asian animals: tiger, panda, reindeer, ...
    """

def q4():
    print("\n######## Question 4 ########\n")
    """
    Use a print statement to explain why the set created by the code below has the elements that it has, and the
    order they are in.
    """
    my_set = set("abcdefghijklmnopqrstuvwxyz")

def q5():
    print("\n######## Question 5 ########\n")
    """
    Write code that checks to see if 'Champaign' is in the dictionary below, and if not, adds
    its population (89189)
    Then print the dictionary
    """
    north_america_city_populations_dict = {
        "Mexico City": 8918653,
        "New York City": 8550405,
        "Los Angeles": 3971883,
    }

def q6_7():
    print("\n######## Question 6-7 ########\n")
    """
    Go to the URL: http://classics.mit.edu/Plato/republic.mb.txt, and copy the text of Plato's Republic
    and save it to a file called "platos_republic.txt". Then.
    - Q6:
        - create an empty "word_frequency_dict"
        - use Python to open the file, read each line, and split it into a list of tokens (space separated strings).
        - loop through that list of tokens, lower case the token, and remove punctuation from the token
        - You are allowed to use `from string import punctuation` at the top of the script as whole or at the
          top of this function to help with this, rather than manually typing out all possible punctuation characters.
        - check to see if each token is in the word frequency dictionary. If it is not, add it to the dictionary
          with a frequency of 1. If it is in the dictionary, increment its frequency by 1
    - Q7:
        - print the 50 most frequent words in the book, and the frequency of each word, like this (and these are the
            right numbers for the first 10, but you print 50):
            the 7696
            and 5977
            of 5078
            to 3450
            is 2680
            in 2326
            he 2155
            a 2114
            that 1991
            be 1874
    """

def q8_9_10():
    print("\n######## Question 8-9-10 ########\n")
    """
    - Q8: First, copy and paste the data from the animal data json example from the chapter
        into a file called "animal_data.json", then write code that reads the data into a
        dictionary called "animal_data_dict".
        (Note that you can't just hard code the dictionary directly into the code;
        you need to read the data as JSON from the file.)
    - Q9: loop through "animal_weights" dictionary below
        - check to see if the animal is in the animal_data_dict
        - if it is, create a new key in that animal's dictionary called "weight",
          which points to the corresponding value
    - Q10: loop through the animal_data_dict, and print out each animal's formal name, common name, and weight on each
        line
        panthera leo, lion, 120 kg
        panthera tigris, tiger, 155 kg
        panthera pardus, leopard, 46 kg
        panthera onca, jaguar, 76 kg
        panthera uncia, snow leopard, 38 kg
    """
    animal_weights = {"leo": 120, "tigris": 155, "pardus": 46, "onca": 76, "uncia": 38}

def main():
    q1()
    q2()
    q3()
    q4()
    q5()
    q6_7()
    q8_9_10()


if __name__ == "__main__":
    main()