Using GPU in TensorFlow Model

This tutorial explains how to increase our computational workspace by making room for TensorFlow GPU.

Rinu Gour

Mar. 11, 19 · Tutorial

Likes (1)

Comment

Save

10.2K Views

In our last TensorFlow tutorial, we studied Embeddings in TensorFlow. Today, we will study how to increase our computational workspace by making room for Tensorflow GPU. Moreover, we will see device placement logging and manual device placement in TensorFlow GPU and will discuss optimizing GPU memory. We will also cover single GPU in multiple GPU systems and use multiple GPU in TensorFlow.

Let's begin!

Image title

GPU in TensorFlow

Your usual system may comprise of multiple devices for computation, and as you already know, TensorFlow supports both CPU and GPU, which we represent as strings. For example:

If you have a CPU, it might be addressed as “/cpu:0”.
TensorFlow GPU strings have an index starting from zero. Therefore, to specify the first GPU, you should write “/device:GPU:0”.
Similarly, the second GPU is “/device:GPU:1”.

By default, if your system has both a CPU and a GPU, you give priority to the GPU in TensorFlow.

Device Placement Logging

You can find out which devices handle particular operations by creating a session where the log_device_placementconfiguration option is preset.

# Graph creation.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Running the operation.
print(sess.run(c))

The output of TensorFlow GPU device placement logging is shown below:

/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/device:GPU:0
a: /job:localhost/replica:0/task:0/device:GPU:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22. 28.]
[ 49. 64.]]

Manual Device Placement

At times, you may want to decide on which device your operation should be running, and you can do this by creating a context with tf.device, wherein you assign the specific device, i.e., CPU or a GPU that should do the computation, as shown below.

# Graph Creation.
with tf.device('/cpu:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Running the operation.
print(sess.run(c))

The above code of TensorFlow GPU assigns the constants a and b to cpu:0. In the second part of the code, since there is no explicit declaration of which device is to perform the task, a GPU by default is chosen if available, and it copies the multi-dimensional arrays between devices if required.

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22.  28.]
 [ 49.  64.]]

Optimizing TensorFlow GPU Memory

Memory fragmentation is done to optimize memory resources by mapping almost all of the TensorFlow GPU's memory that is visible to the processor, thus saving a lot of potential resources.
TensorFlow GPU offers two configuration options to control the allocation of a subset of memory if and when required by the processor to save memory, and these TensorFlow GPU optimizations are described below:

allow_growth, which allocates a limited amount of GPU memory in TensorFlow according to the runtime, is dynamic in the sense that it initially allocates little memory and keeps widening it according to the running sessions, thus extending the GPU memory required by the process. The memory isn’t released, as it will lead to fragmentation, which is not desired. ConfigProto is used for this purpose:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

per_process_gpu_memory_fraction is the second choice, and it decides that the segment of the total memory should be allocated for each GPU in use. The example below will tell TensorFlow to allocate 40 percent of the memory:

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

It will be used only in cases where you already know the specifics of the computation and are sure that they will not change during the course of processing.

Single GPU in Multi-GPU System

In multi-TensorFlow GPU systems, the device with the lowest identity is selected by default. It is, again, up to the user to decide the specific GPU if the default user does not need one:

# Creates a graph.
with tf.device('/device:GPU:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

The InvalidArgumentError is obtained when the TensorFlow GPU specified by the user does not exist, as shown below:

InvalidArgumentError: Invalid argument: Cannot assign a device to node 'b':
Could not satisfy explicit device specification '/device:GPU:2'
   [[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2]
   values: 1 2 3...>, _device="/device:GPU:2"]()]]

If you want to specify the default device in such cases when there is no existing or supported device found by TensoFflow, you could use allow_soft_placement and set it in the configuration option when the session is created, as illustrated by the code below.

with tf.device('/device:GPU:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(
      allow_soft_placement=True, log_device_placement=True))
# Running the operation.
print(sess.run(c))

Using Multiple GPU in TensorFlow

You are already aware of the towers in TensorFlow and each tower we can assign to a GPU, making a multi-tower structural model for working with TensorFlow multiple GPUs. Let’s see an example:

c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Running the operations.
print(sess.run(sum))

The output of TensorFlow GPU is as follows:

/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/device:GPU:3
Const_2: /job:localhost/replica:0/task:0/device:GPU:3
MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3
Const_1: /job:localhost/replica:0/task:0/device:GPU:2
Const: /job:localhost/replica:0/task:0/device:GPU:2
MatMul: /job:localhost/replica:0/task:0/device:GPU:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[ 44. 56.]
[ 98. 128.]]

You can test this multiple GPU model with a simple dataset such as CIFAR10 to experiment and understand working with GPUs.

Conclusion

In this tutorial, we saw TensorFlow GPUs for graphical computations and that define as an array of parallel processors working together to perform high-level computations that are in contrast to CPUs. This tutorial also briefed you on how to initialize GPUs and change the default configurations to suit your needs and optimize your computation. Moreover, we saw how to import GPU and TensorFlow GPU install. If you have any questions or thoughts, feel free to comment below.

TensorFlow Memory (storage engine) code style Making Room Session (web analytics) Strings Subset Testing

Published at DZone with permission of Rinu Gour. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending