Implementation of Python Generator Functions: A Complete Guide
In this article, we will discuss iterator objects in Python, using the Python generator functions and expressions, and why they are used and preferred.
Join the DZone community and get the full member experience.
Join For FreeHave you ever encountered memory issues while working with a very large data set, or working with an infinite sequence of data? Converting the objects to iterable ones helps here, which can be easily done by the Python generator functions. Since PEP 255, when generators were first introduced, Python has incorporated them heavily. You can declare a function that acts like an iterator using generator functions in an easy and efficient way.
In this article, we will discuss what iterator objects in Python are, how they can be declared using the Python generator functions and expressions, and why and where they are used and preferred.
Iterators in Python
An object that can be iterated (used often in many applications in Python looped) upon is known as an iterator. A container of data is abstracted to behave as an iterable object using this technique. Iterable objects like strings, lists, and dictionaries are very common examples of iterable objects in Python.
For example, a sequence of prime numbers that can be iterated upon, like an array or list, is iterated upon using a for loop: ([3, 5, 7, 11, …)].
An iterator is defined by a class that follows the iterator protocol. Within the class, the two methods __iter__
and __next__
are sought for by this protocol.
Let’s go through an example to implement an iterable object:
class list_even:
def __init__(self, max):
self.n=2
self.max=max
def __iter__(self):
return self
def __next__(self):
if self.n <= self.max:
result = self.n
self.n += 2
return result
else:
raise StopIteration
even_nos = list_even(15)
print(next(even_nos))
print(next(even_nos))
print(next(even_nos))
Which results in this output:
2
4
6
As discussed, for an object to be an iterator, it should implement the __iter__
and __next__
functions. The __init__
function initializes the object with the maximum value possible. The __iter__
function returns the iterator object when the object is called, and the __next__
method returns the next item in the sequence and may raise the StopIteration
exception if there are no values to return.
The above-shown process is clearly a lengthy one, and generators come to the rescue to do the same thing in a fairly simple manner.
Note: Iterators can be iterated only once. No value will be provided if you attempt to iterate again. It will act as a list that is empty. This is clear from, the fact that, the next()
function increases the number every time and there’s no getting back in the list. This can be customized, however.
We can also iterate over this object, using a for loop:
even_nos = list_even(9)
for number in list_even:
print(number)
Which results in this output:
2
4
6
8
Generator Functions
Generator functions allow us to declare iterator objects in a simple and more efficient manner. We will see how this is possible.
Generator functions behave and appear just like normal functions with one exception. Instead of “returning” data using the “return” statement, Python generator functions introduce a new statement or keyword “yield” to Python. Similar to return statements, its main role is to regulate the execution of a generator function. But the Python “yield” statement has a few significant differences in execution.
When you use a generator expression or a generator function, you return a unique iterator called the “generator.” This generator can be used by assigning it to a variable. When you call specific methods on the generator, such as next()
, the code included within the function is performed up to the “yield” statement.
Python’s “yield” statement causes the program to halt the execution of the calling function and return the value that was yielded. (In contrast, “return” terminates the execution of the function). The state of a function is saved when it is suspended.
The instruction pointer, internal stack, and any exception handling are all included in this, along with any variable bindings particular to the generator.
Every time you call one of the generator’s methods, you can continue the function execution as a result. This allows the entire function evaluation to resume immediately after “yield.”
Another distinction is that generator functions construct and return a generator object; they do not even execute a function. The code in generator functions only runs when the generator object’s next()
method is invoked.
Implementation of Generator Functions
Let’s see how the generator functions can be implemented in Python. We’ll implement the iterator of generating even numbers, as previously discussed, but with the help of the generator function this time:
def list_even_generator():
n=0
n+=2
yield n
n+=2
yield n
n+=2
yield n
even_nos=list_even_generator()
print(next(even_nos))
print(next(even_nos))
print(next(even_nos))
Which results in this output:
2
4
6
Here, we develop a generator function with three yield statements. When this function is called, it returns a generator, an iterator object. The next()
function is then used to extract elements from this object.
We can calculate the value of the first yield from the print statement, which is “2,” the value of the second yield statement from the print statement, which is “4,” and the value of the third yield statement from the print statement, which is “6.”
As it can be observed, the generator function is considerably more straightforward than our iterator, which is based on classes.
Note: The above code doesn’t involve a “max” value, which means this iterable object iterates over an infinite sequence of even numbers.
Now, let us change the above code, employing the “Do Not Repeat Your Principle” and wrapping the statements in a while loop, along with taking a max value this time:
def list_even_generator(max):
n=0
while n<=max:
yield n
n+=2
even_nos=list_even_generator(5)
print(next(even_nos))
print(next(even_nos))
print(next(even_nos))
Which results in this output:
2
4
And then, the StopException
is raised.
The stopException
exception is raised when the value “n” gets to “4” and exceeds the maximum value of “5.”
You can see from the code that we never explicitly specified the __iter__
method, the __next__
method, or raised a StopIteration
exception in our generator.
Instead, generators handle things automatically, making programming much simpler to grasp.
Why Do We Need Generators?
When they are first created, iterators do not compute the value of each item. When you request it, they only compute it. Lazy evaluation is what we call this.
When you need to calculate from a huge data collection, lazy evaluation is helpful. While the entire data set is being calculated, it enables you to use the data right now.
For example, to read a csv_file
in Python, generally, a method called csv_reader
is used. Let’s consider the use case where we need to calculate the number of rows in the dataset. The code will appear as below:
dataset = csv_reader("dataset.csv")
row_cnt = 0
for row in dataset:
row_cnt += 1
print("Row count is ", row_cnt)
What would be the implementation of the csv_reader()
method? It would require opening the dataset, and loading its content into the assigned array. Each row of the CSV file can be added as an element to the array, and by calculating the length of the array, we can find the number of rows:
def csv_reader(file_name):
file = open(file_name)
res = file.read().split("\n")
return res
This implementation works perfectly if the dataset is small. In the case of large files, we are gonna encounter the memory exception error. The reason for this is that the statement file.read().split()
loads the whole of the dataset into the memory at once, leading to a memory error, and the computer/system may even crash.
The main defining characteristic of a generator function is to save the state of the variables involved in the function. Hence, using a generator function to read a large dataset will not lead to a memory error because it will not load everything into the memory at once. The process is iteratively performed over each row.
The code is shown below:
def csv_reader(file_name):
for row in open(file_name, "r"):
yield row
Here, the file is opened and iterated one by one, appending each row to the given assigned array (in the main code) in one iteration. The following output from this code ought to be generated without any memory errors:
Row count is 8769576394
The code first accesses the file, then loops through each line, yielding each row rather than returning it.
Similar is the use case of generating an infinite sequence, which is not uncommon in many applications.
In Python, the method range()
is used to obtain a finite sequence. As shown in the example below:
a = range(4)
list(a)
Which results in this output:
[0, 1, 2, 3]
In order to get an infinite sequence, however, range()
cannot be used, and generator functions are needed so the computer doesn’t run out of its finite memory. The code will be:
def inf_seq():
n = 0
while True:
yield n
n += 1
First, the variable “n” is initialized before beginning an endless loop. After that, you immediately yield “n” to save the state. This resembles how range works.
Using the next()
method, the generator object will generate the next element. If we try to iterate over this infinite list using a for loop, we will end up printing an endless list of values and the computer will either crash or keep on yielding values until you make a keyboard interrupt.
# Infinite Loop is yielded:
for i in inf_seq():
print(i, end=" ")
# Using next()
:
>>> list = inf_seq()
>>> next(list)
0
>>> next(list)
1
>>> next(list)
2
>>> next(list)
3
What can be done using such an infinite sequence?
Getting a list of the Armstrong numbers (sum of the cube of the digits is the same as the number), or possible palindromes (strings or numbers that are read the same in both forward and backward manner), which may further be required in some applications.
Given below is an example:
for i in infinite_sequence():
arms = is_armstrong(i)
if arms:
print(i)
Here, is_amstrong()
is a function to check if the given input is an Armstrong number or not.
Let’s go through another example of using a generator function. We’ll generate an infinite list of Fibonacci numbers. The Fibonacci sequence of numbers contains numbers that are the sum of the previous two numbers:
def fibonacci():
a=0
b=1
while True:
yield a
a, b = b,a+b
seq= fibonacci()
print(next(seq))
print(next(seq))
print(next(seq))
print(next(seq))
print(next(seq))
0
1
1
2
3
With generators, we may access these numbers for as long as we like because we are only dealing with one thing at a time, but if we had used a for loop and a list to hold this infinite series, we would have run out of memory.
Generator Expressions
Generator expressions, like list comprehensions, let you easily generate a generator object in a small amount of code. With the extra advantage that you can generate them without creating and storing the complete object in memory prior to iteration, they are helpful in the same situations as list comprehensions are employed.
In other words, using generator expressions won’t cost you any RAM. The expression is surrounded by “()” as opposed to “[]” in the list comprehension. Here’s an illustration of how to square some numbers:
squared_lc = [num**2 for num in range(4)]
squared_gc = (num**2 for num in range(4))
squared_lc
[0, 1, 4, 9]
squared_gc
<generator object <genexpr> at 0x016fbjc90>
Here, the first item constructed a list using brackets, whereas the second utilized parentheses to construct a generator expression. The results show that you have made a generator object, and they show that it is different from a list.
Conclusion
In this article, we discussed in detail what generator functions in Python are, how, why, and where they can be used. To summarize:
- Generator functions are used to declare and create iterable objects in a very simple and efficient manner.
- Generator functions are similar to regular functions except, for the fact that, they use a “yield” statement instead of “return.” The state of the function is saved up to the encountered yield and resumed from there, the next time the function is called.
- The generator function returns the iterable object known as the generator, which is stored in a variable, and methods like
next()
are used to operate upon them. - Generator functions are very useful in case of reading very large files or working with infinite sequences because they don’t load everything into the memory at once and don’t cause a memory error.
- Generator expressions are analogous to list comprehensions but create generators instead of lists. The only difference in the syntax is using “()” instead of “[].”
Opinions expressed by DZone contributors are their own.
Comments