My first semester in UMD has generally been a rewarding experience. After not taking any classes for the last year in my masters studies, I am back to being a “student” again, taking courses… Fortunately, all is going well and the projects I have done this semester has been exciting to me, and helped me experience new APIs (OpenCL, and some MPI and OpenMP) and new methods, such as the work I will briefly present in this work, which I generalized as terrain tessellation on GPU for this post…
Previously, I believed that block-based approaches was the way to go for fast processing (rendering) of data on GPU, and disliked all the methods that try to do tricks, which in turn slow down performance and in turn do not help rendering quality much (if you are not very strict on some error bounds and approximation quality). Well, the video shows that you can parallelize such “rich” approaches as well, and they can work nearly as efficient as block-based methods which generate tessellation with variable/multi resolution models on GPU. Still there are a few more places that can be improved to make it even faster, and we have additional ideas to extend the current feature set. I hope I can share more in the future, let me keep them a secret for now :P
And this also marks the first post I share on my new domain :) All the data of the website and blog is transferred to new domain without any loss, and any pain… Thanks for reading!
Another course project is over, and here are some screen captures and my comments on the results.
The first idea of the project was to make a character walk on terrain, thus it was named as “Animating Interactive Adaptive Locomotion Motion on Heightfield” in the project proposal. I wanted the physical simulation to drive the animation, and so progressed my literature search and the project. You will not see an implemented walking motion in the videos or report, but that is mainly because other stuff took much of the implementation time (Character rigging, a more advanced physics wrapper for ODE along with integration into OpenREng, physics stability, fixing heightfield collision resolution, UI and PD controller concept). The implementation has been based on Mandel ‘s code, and the ideas generally follow his approach. I suppose I will continue working on the physics wrapper a little more and upload the code to OpenREng project, for others to easily integrate physics into their application that uses OpenREng (I hope it will be the case for some :) )
I believe the results of this project can be a beginning for further studies. Much work has been done on motion capture data, but physical character controllers have not been that much studied, while there exists some key studies on related issues (you may find some of those in the references section of the report below). As a future work, there can be many controllers which tries to add balance in certain motions or available controllers may be modified for different motion styles and character properties. Note that, physical quantities such as mass can directly affect the simulation of the character. The PD controllers resembles key-frames integrated into a physical simulation, but the main difference is how you integrate the object. The way you approach the problems in articulated body animation will surely need to change in dynamic simulations. To me, working in terms of torques/forces seems to be much more natural and produce richer motions than working with simple interpolations and motion graphs.
Given the fast computation time of the torques to be applied, it can give some “life” to characters even when they fall down. Its effectiveness/cost ratio is thus very high, I believe :) If I were to add a rag-doll support to a game, I would surely add simple PD controllers on falling characters, at least :)
Driving the motion completely through the use of physical controllers (which output torques on joints) is the ultimate goal as I can observe.
For more details, you can find the report here.
Note: The musical score in the video is The Blue Danube , a waltz by Strauss. The recording is in public domain, and fits the funny moves of the starring character, little boxy green man :)
Note2: How hard could it be to make a high resolution video? Well, I can say a lot. Screen capture was the first phase and finally done using ffdshow / xvid codecs in 1280×800 resolution with 60 fps and captures ran smoothly. For editing purposes, I tried to use Wax 2.0, which allowed me to select the decoder to be used in compositing stage, but the video preview was very slow and the UI was not helpful. Editing the recorded clips (even cut/paste) took a lot of time, not the time to identify the cut frames, but rendering those clips to smaller ones! I had to wait over 10 minutes for a 30 sec clip, to get a high resolution and very jagged clip, whatever codec I tried. It is and issue with Wax, I believe. Microsoft Movie Maker came to the rescue then. I lost HD output option there, but I could easily cut the recorded videos, insert titles between clips, apply transitions, watch the output “in real-time”. That’s why quality management makes a difference, the application should “just work”.
Introduction of internal textures types that could store unclamped floating point data was a huge improvement in OpenGL specifications. Imagine you cannot use any floating point storage in your ‘regular’ programs, and you would guess the heavy limitation that applied on the basic OpenGL data storage, textures. The floating point data can in many sizes actually, such as 32bit, 16bit and even 8 bit. I have used them in my terrain application with some precaution and recently migrated to unsigned integer textures for the reasons soon to be explained. I do not want to mean that they are not useful, only that I have been using a type that did not fit my problem, just because using them was relatively simple. The migration steps involved updating multiple parts of the code, and I will describe those as a last part in this post. Now, let me move on to the situation I have recently been through…
As has been stated elsewhere on this site, I am mostly working on terrain structures during my M.S. studies (I have heard that terrains are so basic and so well studied that research on them would not be possible, but it did not happen to be so:) ) I had decided to use unsigned shorts (16 bit) to store height values. 8 bit resolution (256 possible height values only) is so low that you cannot represent even the simplest terrains, 16 bit seemed just right (and has been used extensively as far as I know).
Next, I should note that I am using GPU textures to render / update these height values as well. I generate / load height data in 16 bit resolution on the main memory and I send this basic data to graphics processor. For this purpose, I had used 16 bit floats, until now. The most basic difference between the past and current is that I have moved into OpenGL 3.0 contexts, which made thing much easier and more expressive in almost every way. Firstly, I converted old “alpha” channel textures to “red” channel textures, a small update. Yet, the problem was more fundamental as you may have guessed, storing a fully utilized 16 bit unsigned short data as a 16bit floating data was not the correct way to go. As a side-effect, I experienced lots of cracks on the heightfields where different textures were sampled at the same location, meaning they stored different height values, although not substantially different.
Using floating point texture formats was relatively simple:
- I converted the 16bit height data to floating points and uploaded it to 2D textures.
- Sampling this texture on the shaders just gave me a floating value in the range 0-1. I just re-scaled the sample value to world-space height when required.
- Attaching floating buffers as a render target was relatively simple. While writing the height as a fragment data, no scaling was required, so I directly passed in the new height value when required.
- When I needed to read-back texture data using glGetTexImage, I specified color channel and requested component type as float, and that was it.
How would you convert this 16bit floating point pipeline to 16 bit unsigned short pipeline? If you think changing only the internal internal texture type is enough, then you will later see it is a bit more involved than that :) To keep it short and concise, here are the required “concept” changes when you want to use non-normalized integer instead of floating-point textures:
- You first need to change the internal type of the texture to a format that is stored as a signed/unsigned integer format. Some of your options are GL_R32I, GL_R32UI, GL_R16I, GL_R16UI, etc. Noticed the U / UI trend in the def’s?
- If you need to upload data to these images (such as height data), then comes the next required change: specify and INTEGER pixel format along with your data pointer. Some of the options are: GL_RED_INTEGER and GL_RGBA_INTEGER, you can guess the rest.
- And now, you need to sample this texture to make it useful. And you cannot do it with your mother’s sampler: texture2D, which can only sample “float” data. A brief look at latest specs show you a new direction: texture() samplers!, which can return floating or integer data.
- The change above broke another part: you need pass a usampler2D value to texture() for it to return uint values. Notice the preceding u, which is for “unsigned” texture sampler? Note that with this update, you can only use that sampler for texture types with unsigned integer values.
The steps above described how you can upload / sample unsigned integer data. Next comes the more interesting part, how you update these textures using fragment shaders? First, you may naively attach these textures as render targets, then try to write data using old standard gl_FragData[n] fragment shader output, and you may wonder why gl_FragData is noted as deprecated if you watch out shader compilation warnings. This is because gl_FragData is a floating value target, but you can also attach textures with unsigned integer internal type, which implies that you need to output other types of data from a fragment shader. Here are the steps involved to successfully update integer textures in fragment shaders:
- Specify your fragment shader outputs using “out” qualifiers, and do not use gl_FragData and variants no more. An example statement would be “out uint newHeight”. You can use variables declared as out as you please in your shader statements.
- Since the output is now user-defined, you need to manage which user defined variable writes to which render target. You need glBindFragDataLocation or glGetFragDataLocation for this purpose. The first one allows you to control the assigned render target indexes to outputs, while the latter lets you know the target bound by OpenGL to the variable name you give.
- As a last step, you need to manage the DrawBuffers, in other words, manage which output render target index matches to which color component of the frame buffer.
- If you need to read back data from an integer texture using glGetTexImage, you also need modifications, which have NOT been carefully stated in OpenGL 3.1 specifications :) You need to modify the “type” parameter to an integer-based format (GL_UNSIGNED_SHORT in my case), and also the “format” parameter to an color format having _INTEGER as a suffix, such as GL_RED_INTEGER. The requirement to use INTEGER formats was not stated in specifications. I don’t know if it is some driver bug that causes such a behavior, but when you think about it, it seems natural to change the format parameter as well.
Well, this has been a long kind-of tutorial for some of the details involved using non-normalized integer formats used in OpenGL texture objects. As you can clearly see, without performing all the little adjustments and using the new OpenGL features, it can be quite struggling to achieve what you need :) I hope the information presented above will be helpful to those interested. Comments are highly welcome.
I implemented a skeleton loader / renderer to my 3D playground application for a course project, which was about interpolating animated values. For Euler-angle joint rotations, linear, Catmull-Rom and Cardinal (with tension parameter) are supported. For quaternion based joint rotations, normalized Linear, Spherical and Catmull-Rom are supported. The translational data is interpolated as usual, using the degrees of the selected interpolation scheme.
Below, you can find a video that can give an overview of the features / interpolations supported (I know, the video may seem a bit boring at first, but you can hopefully read the text fields and track the mouse clicks :) ). The animations are correctly timed and the ability to playback and sub-sample animations using the 2D GUI was a helpful feature indeed :)
And I want to share an interesting reference I found online about slerp’ing operation on rotational data. Slerp is known as the de-facto way of interpolating such data, but as usual, it should be approached with a bit of precaution…
I gave a small talk in the seminar class in my university related to my thesis topic basics. It just gives an overview of the problems and the methods, no detailed discussions are included because of the time constraints.
You can grab the presentation from this link .
And this marks my first post about my thesis related studies :)
The problem is simple: You want to render a 3D sphere in your application, but you can only render simple triangles, so only an approximation is possible. If you use a low-level graphic library such as OpenGL, no such high-level support exists (for many reasons that is out of discussion for now:) ) GLU (GL utility) library had supporting tessellation functions, but that library has become old and new OpenGL versions may not even have linkage to this library (as with OpenGL ES). So, how can you create individual triangles that resemble a sphere? Moreover, you want control over how detailed your mesh is (a simple approximation which uses few vertex data may be enough for distance objects, an introduction to LoD 101 :) )
One approach uses latitudes and longitudes (slices and stacks in another way) and it was what GLU supported. The problem with this approach is that the areas of the triangles vary significantly. Consider the area of a slice around equator and the area of a slice around a pole and you will see what I mean :) I won’t go into details of this method, and jump to a better approximation technique, another one based on sub-division surfaces and an interesting geometric object, icosahedron.
Wikipedia describes the geometric properties of an icosahedron, which is basically “a regular polyhedron with 20 identical equilateral triangular faces, 30 edges and 12 vertices.” When viewed from a distance, it might look like a sphere! But what if you need higher resolution? Then comes the sub-division algorithm. The algorithm is simply described in OpenGL Redbook . But we have a minor problem there: You cannot send vertex data vertex-by-vertex. You need to generate and pack them into a single array.
You may prefer using indexing rather than in-order triangulation, since a vertex will be used 5 or 6 times in a icosahedron based sphere sub-division. The basic sub-division algorithm is recursive, so managing vertex AND index buffers is a tough job. Generating a vertex only once, book-keeping its index and re-using that index is one hell of a job if you use a recursive algorithm, which I have noticed you cannot possibly do by passing index parameters /references to the recursive function. So, I removed the indexing buffer and inserted a vertex to the buffer when it is to be rendered as stated in the tutorial (so there are some duplicate data in buffer). A minor detail in this step is that given a subdivision level (0 is the icosahedron itself), the number of “unique” vertices in the final subdivided shape is (12+10*(4^subdivision-1)), a useful formula which can help you generate a vertex buffer of enough size. Also, remember that the normal and the vertex coordinate of a point on sphere is the same and you can pass the same buffer to define vertex coordinate and normal attribute of each rendered vertex, which can make it a little faster on render time :)
Yet, I had a problem with the source code in OpenGL Redbook. The cause was that the triangle order is specified clockwise! You should convert index order when appending vertices if you want a follow the regular approach. It cost me some time to understand why the shading had problems given original implementation. Other icosahedron tutorials you may find online generally use converted index data, so don’t forget to watch out the triangle face directions on your implementation :)
And I want to note one last thing related to the LoD levels. A subdvision of 4 generates about 2500 unique vertices, which would generally be sufficient for detailed meshes. Increasing LoD level both mitigates the distracting edge contours of the approximation AND the shading. Yet, the shading can also be improved if you follow per-pixel lighting instead of per-vertex lighting, where a rough interpolation of normals is available per pixel. So, your can consider using a simple screen-space contour error when you need to save some vertex processing speed in sacrifice of fragment processing, or need to increase shading quality given a constant geometry resolution.
Lastly, this implementation is a part of OpenREng, which will soon be available for public release :)