Comprehensions come in many flavors. Using [ ]
we get lists. Using { }
we can make dictionaries or sets. You might think that ( )
will make a tuple. Instead, you'll get a generator.
A generator is an object that we can iterate through. But unlike a list, tuple or dictionary the values are evaluated as it's iterated. A generator has rules on how to create the sequence of values.
Function Generator
The longer method of creating a Generator is using a Function. Here is an example of a function that creates a generator. A generator function uses yield
instead of return
. As the function is run the yield
releases the sequence until it's exhausted.
def numbers():
num_list = [1, 2, 3, 4, 5, 6, 7]
for num in num_list:
yield num
We start the generator by calling the function and capturing the returning generator object in a variable.
my_numbers = numbers()
Printing the generator object doesn't do anything special, just display what the object is.
> print(my_numbers)
<generator object numbers at 0x7f7fbf9834a0>
What we can now do is iterate over the generator object and each one will be returned one at a time.
for num in my_numbers:
print(num)
Generator Comprehension Version
A comprehension gives us a short cut on creating the generator.
num_list = [1, 2, 3, 4, 5, 6, 7]
my_numbers = (num for num in num_list)
my_numbers
is the same generator object created by the generator function.
Differences: List vs Generator Comprehension
Since the code inside is the same between a List and Generator comprehensions. Why would you use one instead of the other?
A Generator Comprehension doesn't evaluate the values until the next one is requested. This means it can be very small.
import sys
nums_list = [num for num in range(10_000)]
nums_gen = (num for num in range(10_000))
print(sys.getsizeof(nums_list))
print(sys.getsizeof(nums_gen))
Even though these will loop through exactly the same, the list has a size of 87616
and the generator object is 112
.
On the other hand, a List Comprehension is complete so it has a length and can be sorted. A List can also be sliced.
A Generator can be converted to List as such:
new_list = List(nums_gen)
Infinite generator
Generators can use an unending iterator to generate a value one after another. Since Lists need to finish creating the list before moving on trying this with a List Comprehension would create an infinite loop.
Let's create an unending list of perfect squares. The itertools package has a count method that will provide an unending sequence of integers. We can create a comprehension that only returns the value if it's a perfect square, otherwise we keep testing until one appears.
import itertools
squares = (num
for num in itertools.count(start=1)
if ((int(num ** (1/2)) == num ** (1/2))) and (num != 0)
)
Now that we have an object, we can iterate over that will give us an ending sequence, we do need to be careful that we don't create an infinite loop.
for square in squares:
print(square)
This will continue to output perfect squares until we break out. Or something crashes.
One option is to use the __next__()
method. This method will pass the next value in the sequence.
for i in range(10):
print(squares.__next__())
This loop will output the next 10 values in the sequence.
Another option is to use some trigger to stop the loop. We can check if the square is divisible by 2, 3, 4 and 5.
for square in squares:
print(square)
if (square%2 == 0) and (square%3 == 0) and (square%4 == 0) and (square%5 == 0):
print(square, " is divisible by 2, 3, 4 and 5")
break
The generator will continue going through the same sequence even if it's different loops. For example, if we ran the code above a second time the sequence would start on the next item. If the sequence needs to be restarted, the comprehension will need to be redefined.
any()
and all()
The any()
and all()
built in functions can be very useful. Each take an iterable of Booleans and tells if any are true or all are true, depending on which you use.
You've most likely seen these used with List Comprehensions.
all_even = all([num for num in range(10)])
any_even = any([num for num in range(10)])
The List Comprehension will be completed first and then all
or any
will be run.
If we remove the [ ]
, these functions will treat them as a generator comprehension. The great thing is that once it confirms an answer it stops. For example, for all()
once it finds a single False
it stops and returns False
. For any() the first
Truewill have a response of
True`, no need to keep going.
Earlier we looked at how a non-ending sequence can't work with a List Comprehension. Let's see what would happen here.
any_even = any(num%2
for num in itertools.count(start=1))
all_even = all(num%2
for num in itertools.count(start=1))
In both of these cases the checking will continue until any()
and all()
can determine which way. Since the sequence is unending, this may take a long time.
While we can put an unending iterator in these lines, we want to be very careful. all()
could never return True
if we used num>0
in the line above. We know that all that is returned from intertools.count(start=1))
is greater then 0, all()
will have to check every number first.
Conclusion
Use a List Comprehension if you need to
- sort
- determine length
Use a Generator Comprehension if you want to
- save memory
- stop before the entire sequence finishes
Please comment with your favorite comprehension example.