Simplifying Access to Db2 Databases in Jupyter Notebook
See how to simplify access to Db2 databases in Jupyter Notebook.
Join the DZone community and get the full member experience.
Join For FreeAre you trying to figure out the best way to access Db2 data from within your Jupyter Notebook? Or perhaps you are already using a technique and are looking for ways to simplify things? If so, did you know that there are three ways of connecting to your existing Db2 data?
- Use native Python Db2 API calls to connect and manipulate the data
- Take advantage of Pandas built-in support of databases
- Install extensions to Jupyter notebooks (Magic commands)
Getting access to Db2 data from within a notebook requires that you use the following command to install the appropriate Db2 drivers either from your notebook or from a shell prompt.
!easy_install ibm_db
Once you have the Db2 drivers installed, you need to import the ibm_db libraries into your notebook.
import ibm_db
import ibm_db_dbi
At this point, you will need to know something about the low-level Db2 Python APIs in order to connect to your database. At a minimum, you will need the database name, the host system that the database resides on (or the IP address), the port to communicate with, the protocol to use, the userid to connect as, and the password for the userid.
Once you've gotten all of the connection information, you need to create a connection string that includes all of the information using the following format:
dsn = "DRIVER={{IBM DB2 ODBC DRIVER}};" + \
"DATABASE=SAMPLE;" + \
"HOSTNAME=localhost;" + \
"PORT=50000;" + \
"PROTOCOL=TCPIP;" + \
"UID=DB2INST1;" + \
"PWD=db2inst1;"
The first connection command that is required for any access to the database is the ibm_db.connect command. This connection handle (hdbc) is required for any communication with the database.
hdbc = ibm_db.connect(dsn, "", "")
An additional connection command is required to enable database access through Pandas.
hdbi = ibm_db_dbi.Connection(hdbc)
After you have successfully connected to the database, you can access Db2 through either Pandas or low-level Db2 Python calls. Details on the Db2 Python API can be found at https://github.com/IBM/db2-python.
Accessing Db2 data through Pandas is relatively straightforward. The first step is to create a string with the SQL that you want to execute:
sql = 'SELECT * FROM EMPLOYEE'
Using the Pandas read_sql function will issue the appropriate call through the ibm_db_dbi driver:
df = pandas.read_sql(sql,hdbi)
The data returned from Db2 is placed into a Python variable (df), which contains the entire answer set. Issuing df.head(5) returns the first five records of the answer set.
In some cases, it may not be practical to return all of the data in a Pandas dataframe. For instance, if you are retrieving a large answer set (millions of rows), it may exceed the capacity of your system. If you need to do any insert or update statements against the data, you will need to use lower-level Db2 Python calls.
The same example above can be written using Db2 Python APIs, but you require substantially more code to make it work.
hdbc = ibm_db.connect(dsn, "", "")
sql = 'SELECT * FROM EMPLOYEE FETCH FIRST ROW ONLY'
resultSet = ibm_db.exec_immediate(hdbc, sql)
row = ibm_db.fetch_tuple(resultSet)
comma = ""
while (row != False):
for column in row:
print(comma,end="")
print(column,end="")
comma = ","
print()
row = ibm_db.fetch_tuple(resultSet)
The results from this code are shown below.
000060,IRVING,F,STERN,D11,6423,2003-09-14,MANAGER ,16,M,1975-07-07,72250.00,500.00,2580.00
This example completely ignores all of the error checking that should be done (you do check for errors, right?) so you would usually require more code than this! Usually you want to use the low-level API calls so that you can control the formatting of the data, or you want to issue updates, inserts, or other SQL calls that aren't always available through Pandas.
The final technique for accessing Db2 involves Jupyter magic command extensions. These extensions give you a combination of Pandas dataframes and low-level access in a single package. The Db2 magic commands are documented in the following GitHub repository (https://github.com/DB2-Samples/db2jupyter) and include detailed instructions on how to install the extensions as well as a Docker file that you can use to build a test system.
There is one file (db2.ipynb) that must be run within a notebook in order to enable the Db2 magic commands (%sql). There is no requirement to import the ibm_db libraries or to issue the low-level connect commands. This is all handled from within the magic command.
%run db2.ipynb
Magic commands are special commands within the Jupyter notebook environment that start with a percent sign (%) and are handled differently in command boxes. Rather than having the text interpreted as Python code, the text is sent to a special process (the Db2 magic code) to be interpreted.
The first example of connecting to a Db2 database is accomplished through the %sql connect statement.
%sql connect to sample host localhost port 50000 \
user db2inst1 using db2inst1
Once the connection is established, retrieving data from the database can be done by issuing the SQL statement directly with the magic command.
%sql select * from employee fetch first 3 rows only
The results are immediately displayed in the notebook.
The %sql magic command already retrieves the data in a Pandas Dataframe, so if you need to manipulate the data further, you can assign the results directly to a Python variable.
df = %sql select * from employee
All of the commands associated with dataframes can now be applied to this variable.
df.tail(5)
The %sql command can also retrieve the data as a standard Python array through the use of the -r flag:
df = %sql -r select firstnme, lastname from employee fetch first 5 rows only
The Python variable contains an array of values representing the rows and columns of the answer set.
df
[['FIRSTNME', 'LASTNAME'],
['IRVING', 'STERN'],
['EVA', 'PULASKI'],
['EILEEN', 'HENDERSON'],
['THEODORE', 'SPENSER'],
['VINCENZO', 'LUCCHESSI']]
The Db2 %sql magic command has extended capabilities which allow the user to:
CREATE and DROP objects
Create complex procedures, triggers, user-defined functions
Issue multiple statements in a single code block
Create cursors for retrieval (rather than extract all of the data at once)
INSERT, UPDATE, and DELETE data
Create COMMIT scopes and change the autocommit behavior
Run stored procedures
The Db2 %sql command simplifies the development of SQL scripts without the user needing to deal with many of the underlying API calls.
More details of how to use the Db2 magic command are found on GitHub (https://github.com/DB2-Samples/db2jupyter) and a hands-on lab is available at https://www.ibm.com/demos/collection/Db2-Advanced-Enterprise-Server-Edition/?lc=en if you don't want to install all of the software yourself. You can try the lab at your own pace with a dedicated virtual machine and find out how easy it is to access Db2 from within your Jupyter notebook projects.
In summary, you can access Db2 from within a Jupyter notebook using Db2 Python API calls, Pandas, or through the use of magic commands. Which one you choose depends a lot on your requirements, but you may want to check out the Db2 magic commands to make your SQL development a little easier.
Opinions expressed by DZone contributors are their own.
Comments