Articles/Computers/Other/3D Matrix Transformations

You may have found yourself wondering how computers are able to draw 3-dimensional objects. Sure, they have 3-dimensional data to work with, but how do they represent it on a flat screen with no depth? This article will attempt to explain the concept of 3D viewing transformations and the mathematics behind it.

To represent a 3D object mathematically, it needs to be defined in terms of 3-dimensional vertices. 3-dimensional vertices consist of 3 values representing the point's position with respect to three axes (x, y, and z). On a computer, this data can be stored in a 4x1 matrix. Why a 4x1? Because 3d transformations require transformation matrices that are 4x4. Since we only ever have 3 values, the fourth value is always 1.

A 3D object requires many of these vertices in order to be defined completely. For now, we will only focus on one point as an example. Now that we have our vertex, we need some information. For one thing, we need to know where the observer is located with respect to the axes. This is referred to as the vantage point and can be located anywhere around the 3d object.

The next point we need is called Pr and is the point that the viewer is looking at.

The last point we need is the orientation of the viewer. We have already defined the location of the viewer, but we need to know which way is up for the viewer.

Now that we have all of our data, we need to do some math. This is pretty simple stuff, involving cross multiplication. The first step is to calculate vector N, which is found by matrix subtraction. N represents the distance between the vantage point and the point being looked at.

From N, we can find another vector, n, by dividing N by its magnitude. The magnitude of the vector is the square root of the sum of its terms squared. This operation of dividing a vector by its magnitude is also known as "normalizing" a vector.

Next, we need a vector L, which is equivalent to V and N cross-multiplied.

Now we need to normalize L to get l.

Finally, we need one last vector, m. m is the cross-product of n and l.

That was the hard part! The rest is fairly easy. Now, we need to make our transformation matrices!

The first transformation matrix that I will introduce is the model-to-view transformation matrix. Calm down, all we need to do is put some numbers in there and do three dot-products.

The dot product is not hard. Lets say you have two vectors. To find the dot product, you multiply x from the first and second vectors, multiply y from both vectors, and z from both vectors. Note that these are three separate operations and you should have three separate terms. Once you have these 3 values, you just add them together and you have your dot product!

In this case, the dot product is between the vantage point and the normalized vectors, multiplied by -1 afterwards. The only other values are the x, y, and z values from those same vectors. This matrix is the most complicated of the viewing transformation matrices.

The next matrix we need is the view to device transformation matrix. This transforms the viewing coordinates to device coordinates. All you need to know for this matrix is the width and height of the display area in pixels. In the matrix, these values are denoted dw and dh, respectively.

There is nothing complicated here, just plug in dw divided by two and dh divided by two.

Those are the only required matrices to transform a 3D coordinate to a 2D coordinate for a screen. However, no modern graphics card uses only those transformations. There are two more we need to talk about. The first is the scaling matrix and requires a scale factor. The scaling factor determines how much larger or smaller the object should be than its original size. For example, if you wanted the object to look twice as large, the scaling factor would be 2. For half as large, the factor would be .5.

This matrix is not complicated at all, you only need to put the scaling factor in place of the S values in the matrix.

The final matrix is the perspective matrix. This matrix allows you to add perspective effects, which add realism to 3D representations. Perspective causes objects further away to look smaller than ones close up. The following 3D graphic shows a grid of lines with the perspective effect applied:

This might seem like something complicated to execute, but it is really easy. Here is the perspective transformation matrix:

All of the values are fixed except for one, which is the negative inverse of zf. zf is the focal point and determines how far the viewer is from the screen and should always be positive. If you want, you can just pick a number, like 25, and use that as the focal point (most applications do this).

OK, now that we have all these matrices, what do we do with them all? Well, we multiply them to get the mother of all transformation matrices, the model-to-device transformation matrix. It is best to let a calculator or program do the work, although you can do it by hand as a test.

Now that we have our model-to-device transformation matrix, we can render our 3D object! To transform 3D model coordinates, you simply multiply the model-to-device transformation matrix by the 3D 4x1 matrix to get another 4x1 matrix. This new 4x1 matrix contains the data we need.

As you can see, the x and y values are not pure and are instead multiplied by a remnant of the perspective transformation. So what do we do? Simply divide the xh and yh values by h (the bottom value in the matrix).

Wow, that seemed like quite a bit of work to convert 3D to 2D, didn't it? That is why powerful computers are needed to render 3D graphics quickly. You can easily implement this method in a computer program using a matrix class. Luckily, you probably won't ever have to write a program to do this since most of this grunt work is now done by the graphics adapters themselves. Libraries like OpenGL and DirectX simply pass the 3D coordinates to the graphics adapters, which do the math and plot the points on your screen, saving you a lot of programming effort. Neat eh?