A Beginner's Guide to High-Performance Computing in Python
The Taichi Programming Language is an attempt to extend the Python programming language with constructs that enable general-purpose, high-performance computing.
Join the DZone community and get the full member experience.
Join For Freeimport this
in a Python console, it will recite a little poem:
Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. The complex is better than complicated. The flat is better than nested. Sparse is better than dense. Readability counts...
Taichi: Best of Both Worlds
Algorithmic Overview
N
by N
grid of point-masses, where adjacent points are linked by springs. The following image, provided by Matthew Fisher, illustrates this structure:
- Gravity
- Internal forces of the springs
- Damping
- Collision with the red ball in the middle
t = 0
. Then, at each step of the simulation, it advances time by a small constant dt
. The program estimates what happens to the system in this small period of time by evaluating the effect of each of the 4 factors above, and updates the position and velocity of each mass point at the end of the timestep. The updated positions of mass points are then used to update the image rendered on the screen.
Getting Started
pip install taichi
.
ti
:
import taichi as ti
ti.init(arch=ti.cuda)
If you don't have a CUDA GPU, Taichi can still interact with your GPU via other graphics APIs, such as ti.metal, ti.vulkan, and ti.opengl. However, Taichi's support for these APIs is not as complete as its CUDA support, so, for now, use the CPU backend: ti.init(arch=ti.cpu)And don't worry, Taichi is blazing fast even if it only runs on the CPU. Having initialized Taichi, we can start declaring the data structures used to describe the mass-spring cloth. We add the following lines of code:
N = 128
x = ti.Vector.field(3, float, (N, N))
v = ti.Vector.field(3, float, (N, N))
These three lines declare x
and v
to be 2D arrays of size N
by N
, where each element of the array is a 3-dimensional vector of floating-point numbers. In Taichi, arrays are referred to as "field"s, and these two fields respectively record the position and velocity of the point masses. Notice that, if you initialized Taichi to run on a CUDA GPU, these fields/arrays will be automatically stored in GPU memory. Apart from the cloth, we also need to define the ball in the middle:
ball_radius = 0.2
ball_center = ti.Vector.field(3, float, (1,))
Here, the ball center is a 1D field of size 1, with its single component being a 3-dimensional floating-point vector. Having declared the fields needed, let's initialize these fields with the corresponding data at t = 0
. We wish to ensure that, for any pair of adjacent points on the same row or column, the distance between them is equal to cell_size = 1.0 / N
. This is ensured by the following initialization routine:
def init_scene():
for i, j in ti.ndrange(N, N):
x[i, j] = ti.Vector([i * cell_size,
j * cell_size / ti.sqrt(2),
(N - j) * cell_size / ti.sqrt(2)])
ball_center[0] = ti.Vector([0.5, -0.5, 0.0])
No need to worry about the meaning behind the value of each x[i,j]
-- it is only chosen so that the cloth falls down at the 45 degrees angle as shown in the gif.
Simulation
@ti.kernel
def step():
for i in ti.grouped(v):
v[i].y -= gravity * dt
There are two things to be noted here. Firstly, for i in ti.grouped(x)
means that the loop will iterate over all elements of x
, regardless of how many dimensions there are in x
. Secondly, and most importantly, the annotation ti.kernel
means that taichi will automatically parallelize any top-level for-loops inside the function. In this case, taichi will update the y
component of each of the N*N
vectors in v
in parallel.
Next up, we will handle the internal forces of the strings. First, notice from the previous illustration that each mass point is connected to at most eight neighbors. These links are represented in our program as follows:
links = [[-1, 0], [1, 0], [0, -1], [0, 1], [-1, -1], [1, -1], [-1, 1], [1, 1]
links = [ti.Vector(v) for v in links]
From a physical perspective, each spring s
in the system is initialized with a rest length, l(s,0)
. At any time t
, if the current length l(s,t)
of s
exceeds l(s,0)
, then the spring will exert a force on its endpoints that pulls them together. Conversely, if l(s,t)
is smaller than l(s,0)
, then the spring will push the endpoints away from each other. The magnitude of these forces is always proportional to the absolute value of l(s,0)-l(s,0)
. This interaction is captured by the following code snippet:
for i in ti.grouped(x):
force = ti.Vector([0.0,0.0,0.0])
for d in ti.static(links):
j = min(max(i + d, 0), [N-1,N-1])
relative_pos = x[j] - x[i]
current_length = relative_pos.norm()
original_length = cell_size * float(i-j).norm()
if original_length != 0:
force += stiffness * relative_pos.normalized() *
(current_length - original_length) /
original_length
v[i] += force * dt
Notice that this for
loop should still be placed as a top-level for
loop in the substep
function, which was annotated with @ti.kernel
. This ensures that the spring forces applied to each mass point are computed in parallel. The variable stiffness
is a constant which controls the extent to which the springs resist change in their length. In our program, we will use stiffness = 1600
.In the real world, when springs oscillate, the energy stored in the springs dissipates into the surrounding environment, and its oscillations eventually stop. To capture this effect, at each timestep, we slightly reduce the magnitude of the velocity of each point:
for i in ti.grouped(x):
v[i] *= ti.exp(-damping * dt)
damping
takes the fixed value of 2
.
if (x[i]-ball_center[0]).norm() <= ball_radius:
v[i] = ti.Vector([0.0, 0.0, 0.0])
And finally, we update the position of each mass point using its velocity:
x[i] += dt * v[i]
Rendering
vertices
, and a field of indices
. The vertices
fields is a 1-dimensional field where each element extract is a 3D vector that represents the position of a vertex, possibly shared by multiple triangles. In our application, every point mass is a triangle vertex, so we can simply copy data from x
into vertices
:
vertices = ti.Vector.field(3, float, N * N)
@ti.kernel
def set_vertices():
for i, j in ti.ndrange(N, N):
vertices[i * N + j] = x[i, j]
Notice that set_vertices
needs to be called every frame, because the vertex positions are constantly being updated by the simulation.
N
by N
grid of mass points, which can also be seen as an N-1
by N-1
grid of small squares. Each of these squares will be rendered as two triangles. Thus, there are a total of (N - 1) * (N - 1) * 2
triangles. Each of these triangles will be represented as 3 integers in the vertices
field, which records the indices of the vertices of the triangle in the vertices
field. The following code snippet captures this structure:
num_triangles = (N - 1) * (N - 1) * 2
indices = ti.field(int, num_triangles * 3)
@ti.kernel
def set_indices():
for i, j in ti.ndrange(N, N):
if i < N - 1 and j < N - 1:
square_id = (i * (N - 1)) + j
# 1st triangle of the square
indices[square_id * 6 + 0] = i * N + j
indices[square_id * 6 + 1] = (i + 1) * N + j
indices[square_id * 6 + 2] = i * N + (j + 1)
# 2nd triangle of the square
indices[square_id * 6 + 3] = (i + 1) * N + j + 1
indices[square_id * 6 + 4] = i * N + (j + 1)
indices[square_id * 6 + 5] = (i + 1) * N + j
Notice that, unlike set_vertices
, the function set_indices
only needs to be called once. This is because the indices of the triangle vertices don't actually change -- it's only the positions that are changing.
ball_center
and ball_radius
variable that we previously defined are all that's needed by GGUI.
Putting Everything Together
init()
set_indices()
window = ti.ui.Window("Cloth", (800, 800), vsync=True)
canvas = window.get_canvas()
scene = ti.ui.Scene()
camera = ti.ui.make_camera()
while window.running:
for i in range(30):
step()
set_vertices()
camera.position(0.5, -0.5, 2)
camera.lookat(0.5, -0.5, 0)
scene.set_camera(camera)
scene.point_light(pos=(0.5, 1, 2), color=(1, 1, 1))
scene.mesh(vertices, indices=indices, color=(0.5, 0.5, 0.5), two_sided = True)
scene.particles(ball_center, radius=ball_radius, color=(0.5, 0, 0))
canvas.scene(scene)
window.show()
One small thing to note is that instead of calling step()
once, we will call it 30 times for each frame in the main program loops. The purpose of this is just so the animation doesn't run too slowly. Putting everything together, the entire program should look something like this:
import taichi as ti
ti.init(arch=ti.cuda) # Alternatively, ti.init(arch=ti.cpu)
N = 128
cell_size = 1.0 / N
gravity = 0.5
stiffness = 1600
damping = 2
dt = 5e-4
ball_radius = 0.2
ball_center = ti.Vector.field(3, float, (1,))
x = ti.Vector.field(3, float, (N, N))
v = ti.Vector.field(3, float, (N, N))
num_triangles = (N - 1) * (N - 1) * 2
indices = ti.field(int, num_triangles * 3)
vertices = ti.Vector.field(3, float, N * N)
def init_scene():
for i, j in ti.ndrange(N, N):
x[i, j] = ti.Vector([i * cell_size ,
j * cell_size / ti.sqrt(2),
(N - j) * cell_size / ti.sqrt(2)])
ball_center[0] = ti.Vector([0.5, -0.5, -0.0])
@ti.kernel
def set_indices():
for i, j in ti.ndrange(N, N):
if i < N - 1 and j < N - 1:
square_id = (i * (N - 1)) + j
# 1st triangle of the square
indices[square_id * 6 + 0] = i * N + j
indices[square_id * 6 + 1] = (i + 1) * N + j
indices[square_id * 6 + 2] = i * N + (j + 1)
# 2nd triangle of the square
indices[square_id * 6 + 3] = (i + 1) * N + j + 1
indices[square_id * 6 + 4] = i * N + (j + 1)
indices[square_id * 6 + 5] = (i + 1) * N + j
links = [[-1, 0], [1, 0], [0, -1], [0, 1], [-1, -1], [1, -1], [-1, 1], [1, 1]]
links = [ti.Vector(v) for v in links]
@ti.kernel
def step():
for i in ti.grouped(x):
v[i].y -= gravity * dt
for i in ti.grouped(x):
force = ti.Vector([0.0,0.0,0.0])
for d in ti.static(links):
j = min(max(i + d, 0), [N-1,N-1])
relative_pos = x[j] - x[i]
current_length = relative_pos.norm()
original_length = cell_size * float(i-j).norm()
if original_length != 0:
force += stiffness * relative_pos.normalized() * (current_length - original_length) / original_length
v[i] += force * dt
for i in ti.grouped(x):
v[i] *= ti.exp(-damping * dt)
if (x[i]-ball_center[0]).norm() <= ball_radius:
v[i] = ti.Vector([0.0, 0.0, 0.0])
x[i] += dt * v[i]
@ti.kernel
def set_vertices():
for i, j in ti.ndrange(N, N):
vertices[i * N + j] = x[i, j]
init_scene()
set_indices()
window = ti.ui.Window("Cloth", (800, 800), vsync=True)
canvas = window.get_canvas()
scene = ti.ui.Scene()
camera = ti.ui.make_camera()
while window.running:
for i in range(30):
step()
set_vertices()
camera.position(0.5, -0.5, 2)
camera.lookat(0.5, -0.5, 0)
scene.set_camera(camera)
scene.point_light(pos=(0.5, 1, 2), color=(1, 1, 1))
scene.mesh(vertices, indices=indices, color=(0.5, 0.5, 0.5), two_sided = True)
scene.particles(ball_center, radius=ball_radius, color=(0.5, 0, 0))
canvas.scene(scene)
window.show()
The total number of lines: 91.
Fun Things to Do
- [Easy] Mess around with the parameters: see how varying the
stiffness
,damping
, anddt
changes the behavior of this program. - [Easy] Search the program text for
vsync=True
, and change it tovsync=False
. This will remove the 60 FPS limit on the program. Observe how fast the program can run on your machine. - [Medium] Implement a slightly more complicated interaction between the cloth and the ball: make it slide down the ball without penetrating it.
- [Medium] Add more balls: make the cloth interact with more than one ball.
- [Hard] Having completed the second challenge, try to implement the same program in another programming language, or in Python but without Taichi. See what's the maximum FPS that you can obtain, and what's the amount of code that you need to write in order to obtain similar performance.
Parting Words
- Simulate a mass-spring system with over ten thousand mass points and around a hundred thousand springs.
- Using the
@ti.kernel
annotation, automatically parallelize the simulation via a CUDA GPU or multi-threading on CPU - Render the result in real-time via a GPU renderer.
To learn more about Taichi, please do visit its Github Page, where you can find detailed documentation, as well as quite a few examples of Taichi programs, all of which are quite amusing. And, finally, if you believe in the mission of making a friendly yet powerful language for parallel computation, you are more than welcome to join Taichi as an open-source contributor.
Published at DZone with permission of Dunfan Lu. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments