Archive for November, 2013

An Experiment: OpenGL Class Targets

Friday, November 15th, 2013

According to the documentation, SDL_GL_CONTEXT_MAJOR_VERSION was primarily used to tell SDL which method to use to construct the Window on desktop Open. OpenGL 3.x introduces a new way of context creation, so this would have been used to suggest which way to create (old way or new way). Of course, Mobile needed a way to handle ES 1.1 vs ES 2.0 (and sort-of ES 3.0 too, but so far the ES 3.0 devices are 2.0 compatible).

Rather than just repeat what everyone seems to say and quote, I’m going to try doing just this.

I call this a “Class 2” OpenGL target, referring to an OpenGL 2.x or OpenGL ES 2.0 target (i.e. GL with Shaders).

No #ifdefs. Same code across Desktop SDL and Mobile SDL. Hypothetically speaking it should just work, and if not, I’m not sure why SDL_GL_CONTEXT_PROFILE_MASK is being anal about its mask bit when there really is only one choice (ES on Mobile, OpenGL on desktop).

The other options would be a “Class 3” OpenGL target, i.e. OpenGL 3.x and OpenGL ES 3.0, as well as “Class 1” (OpenGL 1.x and OpenGL ES 1.1).

First I need to see if “Class 2” works safely across PC and mobile devices (FYI I’m only testing Windows, Linux and Android). Then I’m curious if I can set my class to “Class 3”, and whether that safely Inits on “Class 2” devices.

The point: I’d like to just do this, and have it work everywhere.

Two lines. It’s my responsibility to determine if devices even support 3.x features (like it always has been). As far as I know, no hardware optionally goes in to 2.0 mode if 3.0 is available. On desktop GL, you always get the highest version of GL. It’s on mobile GL that we see a distinct ES 1.1 versus ES 2.0 switch.

If I come across any issues, I’ll update the post.

Results: Class 3 will not work on non ES 3.0 mobile devices

Well this was quick. Tried both 2 and 3 on a 2.0 tablet (Galaxy Tab 3), and the 3 setting failed.

Alright then. So the experiment continues with “2” always set.

Results: Class 2

Platforms tested so far:

Untested: Tegra 2 Development Board, Google G1 (no ES 2.0), Netbooks with Intel GMA (no GL 2.x), PocketTV (MALI 400 MP), Workstation PC with NVidia GPUs (broken), GameStick (MALI 400 MP), AMD Netbook, Zotac w/ NVidia ION, AMD PC, IvyBridge Ultrabook.

The correct way to SDL_Init

Friday, November 15th, 2013

By correct, I mean the way that will work across multiple devices, and give you OpenGL (ES) shaders.

If you don’t need shaders (i.e. OpenGL 1.x, OpenGL ES 1.1), a lone SDL_Init will suffice (default behavior).

If you are using SDL_GL_LoadLibrary, you *MUST* put the calls in the following order.

I had to do it this way (between these 2 calls), otherwise Android devices with PowerVR GPUs would not work (Nexus S, Intel powered devices, Kindle Fire?, etc).

That said, I’m actually not sure of the benefits of SDL_GL_LoadLibrary. Removing it from my code works just fine. The documentation suggests having to manually use SDL_GL_GetProcAddress as a result of SDL_GL_LoadLibrary, which doesn’t sound cool.

I’ve noticed I’ve never had to do this, but I am using an updated version of GLEE (haven’t switched to GLEW yet). Huh.

TL;DR: Don’t use SDL_GL_LoadLibrary.

GPU Debugging on Android Devices

Thursday, November 14th, 2013

Here are some notes on getting the GPU/OpenGL ES debuggers working with devices.

NVidia’s Tegra tool seems the best (even has integrated mesh viewer). Qualcomm’s Adreno tool is also quite good. ARM’s MALI tool is difficult to set up (rooting), but covers the essentials. Intel’s GPA tool is a complete suite, but is only a system usage profiler for Android. Imagination’s PowerVR tools appear good, though I haven’t tested them yet (one tool requires rooting). Vivante provides no tools. Desktop tools not listed.

Before we start

Things you must do (all tools require it):

  1. Add adb.exe to the path. I.e. the Android SDK folder/platform-tools/ will do the trick.
  2. Confirm that your device is connected by doing an “adb devices” from any command prompt or shell.

If it’s not available, and you need a driver, you can hack Google’s USB driver to support your device.

  1. Open up AndroidSDKDir/extras/google/usb_driver/android_winusb.inf in a text editor.
  2. Note the “[Google.NTx86]” and “[Google.NTamd64]” sections. These are where you put stuff for 32bit Windows and 64bit Windows.
  3. Copy and make a new ADB definition for the device you are adding. Something like this:
  4. ;Samsung Galaxy Tab 3
    %SingleAdbInterface% = USB_Install, USB\VID_04E8&PID_6860
    %CompositeAdbInterface% = USB_Install, USB\VID_04E8&PID_6860&MI_03

    ;Lenovo K900
    %SingleAdbInterface% = USB_Install, USB\VID_17EF&PID_75B0
    %CompositeAdbInterface% = USB_Install, USB\VID_17EF&PID_75B0&MI_01

    ;ONDA VX610W
    %SingleAdbInterface% = USB_Install, USB\VID_18D1&PID_0003
    %CompositeAdbInterface% = USB_Install, USB\VID_18D1&PID_0003&MI_01

  5. VID’s and PID’s you get from the Windows Device Manager.
    1. From the Start Menu, Right click Computer and select Properties. Click Device Manager.
    2. Right Click on the ‘broken’ device that doesn’t have a driver, and click Properties.
    3. Click the Details Tab, then select Hardware IDs from the dropdown box.
    4. This is where you find your VID, PID, and MI. Copy those values in to a layout like shown above. If there’s extra data there, you can try it if you want, but what’s important (AFAIK) is the VID, PID, and MI.

    5. Save the file, and now update the driver to use this driver. You should now have a ADB Interface. To test, do an “adb devices” from a command prompt/shell.
  6. If a device is still not working, you may need to add the VID to your adb_usb.ini file. Either open the file (C:\Users\MyName\.android\adb_usb.ini) or do something like the following:
  7. Use the same VID found in the device manager. For example, 2836 is the Ouya’s VID.


Qualcomm Adreno Profiler

Both USB and WiFi debugging.

Download Adreno Profiler (not the SDK):…/gaming-graphics-optimization-adreno

Add the following line to your AndroidManifest.xml file:

If you don’t know where, put it somewhere near the bottom of the file.

This is optional (tool does it automatically), but for reference the following config flag must be set. From a shell do:

Which enables profiling.

FYI Adreno devices only supports GPU debugging if is available.

Do the following to confirm whether the file is on the device.

For me, it’s available on my 2nd gen Nexus 7.

NVidia PerfHud ES Tegra

Supports ADB USB and WiFi debugging.

Download the Tegra Android Development Pack here:

It’s a download installer, so you can deselect the parts you do not want.

Like Adreno, be sure to add the following line to your AndroidManifest.xml file:

The next command has to be run every time you boot/reboot your Android device:

NVidia’s tool, unfortunately, doesn’t do this automatically for you.

Add the following code to your app, somewhere before you create your OpenGL ES context.

The Code above requires the EGL headers in addition to the OpenGL ES headers.

Tested on an original Nexus 7. Haven’t tried the Ouya yet.

ARM MALI Graphics Debugger

NOTE: Requires rooted device.

Supports ADB USB and WiFi Debugging. Tricky setup.

Download is here:

Setup instructions for Android are hidden inside the install folder.

First, browse to the folder above.

Next mount the system folder like so:

Now do the following.

NOTE: My OS lacks a “cp” command. You can alternatively use “cat”. i.e. “cat mgddaemon > /system/bin/mgddaemon”. Don’t forget to redirect!

This installs the daemon and alternative that captures GLES messages (forwarding them to the daemon).

The daemon has one more config option, a file “/system/lib/egl/processlist.cfg”. If you put the name of your app in that file (i.e. if using stock SDL2), then that will the only process traced. Otherwise omitting the file will cause the daemon and library to trace all running GLES apps (which can confuse the debugger).

Shut down all instances of your app, then decide if you want a USB or WiFi connection.

For USB, do the following:

This will make the IP you enter (i.e. localhost) on port 5002.

For WiFi, you’re already ready. Just lookup the devices IP address in either Settings->Networking or About.

Finally, start the daemon.

And you should now be able to punch in the IP address (either the device or localhost) in to the Mali Graphics Debugger App.

Tested on an obscure Chinese tablet called the ONDA VX6010W (Allwinner A10 SOC) running Android 4.1. I also have a GameStick, but I haven’t tried that yet (didn’t want to root it).

To disable the capture library, copy your backup (egl.cfg.bak) to “egl.cfg”.

You might want to keep a copy of the mgd version around, for ease.

Another tip: I made a typo in one of my filenames (egl.cgf). Running the daemon will allow the Mali Graphics Debugger to connect to the device, but if “egl.cfg” isn’t the correct file (the mgd version) then you will get no trace data.

Imagination (PowerVR) PVRTrace and PVRTune

NOTE: PVRTrace requires a rooted device.

Supports ADB USB and WiFi debugging (depending on tool).

Download here, as part of the PowerVR SDK.

Have only done some initial tests with this. As it turns out, none of my PowerVR GPU devices are working with my SDL2 code (both ARM and x86 CPUs), so I can’t really verify if this is working yet (well I can, but I don’t want to).

Intel Graphics Performance Analyzer (GPA) for Android

Supports ADB USB debugging.

Download is here:

Like others, be sure to add the following line to your AndroidManifest.xml file:

After that, Intel GPA System Analyzer will just work. This is a profiler tool with realtime usage graphs.

Unfortunately, this is the only tool in the Intel GPA suite supported by Android.

Tested on a Lenovo K900, and a Samsung Galaxy Tab 3.

OpenGL Extension GL_KHR_debug

This is an interesting extension available on certain GPUs (MALI 600 series GPUs, all OpenGL 4.3+ drivers). It provides far better debug logging of errors and things in OpenGL and OpenGL ES.

Some useful links:…/easier-opengl-es-debugging-on-arm-mali-gpus-with-gl_khr_debug/

The original extension, available on AMD GPUs.

Squirrel Stack Tracking

Friday, November 8th, 2013

Squirrel native dev lives and dies by the stack, so here are some notes on the effect each function has on the stack.

NOTE: SQUserPointer’s are noted as void*’s (since that’s what they really are).


Virtual Machine

Stack 0

Stack ?? (depends on args: 0, -1, +1, or both)

Stack +1

Stack -1

Pops a value and sets it to the value found on the stack.


If (successful) Stack +1 else Stack 0

Use sq_gettop(vm) to check if it was a success.

Stack Operations

Stack 0

Stack +1

Stack -1

Stack +N or -N

Stack -N

Object Creation and Handling

Stack +1

Notably sq_typeof() also returns values like OT_FLOAT, but specific info can be retrieved from the stack. DO NOT FORGET ABOUT THE STACK! POP IF UNUSED!

Stack ?? (+1, or optionally 0)

Stack -N and +1

Stack 0

SQR – shorthand for SQRESULT (for use with SQ_SUCCESS() and SQ_FAILURE()).

Stack -1 then +1 (effectively 0)

Stack -1

Member names are actual members of classes. For example, “x” in a vec2 class. sq_setbyhandle and sq_getbyhandle are used to read/write data to class members referenced by the Member Handle. Complicated yes.


Stack ?? (-N arguments, 0 or +1 returns)

Stack 0

Stack +1

Stack 0 or +1

Stack -1

Object Manipulation

Stack -1

Stack -2

TODO: Push value first, then key?

Stack -3

TODO: Push value first, then key?

Stack ?? (-1 or 0 if return)

Stack ??

Stack -1 then +1 (effectively 0)

Stack 0

Stack +1

Raw Object Handling

Stack 0

Stack +1

Revised thoughts on Squirrel Math

Friday, November 8th, 2013

Been thinking a bunch about the vector math classes mentioned in the previous post (vec2, vec3, etc). I was ready to try proposing “something” to let you add .anything to a class to access members hidden in (say) userdata. As it turns out, that already works.

The _get and _set metamethods talk about indexes in the documentation, so I mistakenly assumed they were for creating array-like syntax (maybe they are too), but in actuality they do exactly what I want: let you handle .anything

This is, of course, a Squirrel implementation.

A better use would be to write these metamethods in native code. Also instead of a class, use a UserData type to hold the true data (vector parts, a matrix, a quaternion), and attach all functionality using a delegate.

Using native has the added advantage when it comes to the operators (metamethods). Types can be checked far quicker this way (as constant values like OT_FLOAT), so my concerns about wasted time doing checks for each type shouldn’t be much of a problem anymore. I have to give somewhere, and the flexibility Squirrel provides is worthwhile.

Creating instances though is the question.

We could have a delegate called “vec2_delegate”, and a native function called “vec2” (like how JavaScript classes work). Have the function push the UserData structure on to the stack equal to the values passed, then attach the vec2_delegate to it. The delegate has a _typeof method that returns “vec2”. Finally, so long as it’s the only new thing on the stack, the vec2 function say it returned a value, and thus will be assigned by reference to

Copy Constructor implementation will be native, detected the same way as before, but no longer a cloning issue as it’ll be native code.

Just a few things to figure out:

1. How to “throw null” natively (required by _get and _set).

2. How to differentiate between UserData types (vec2, vec3, mat4, etc).

3. How to create and popluate Userdata.

Unrelated, but sq_getmemberhandle(v,idx,&handle) looks useful for optimizing data read/writes between the VM and Native code. sq_getbyhandle and sq_setbyhandle. The SQMEMBERHANDLE type however only contains a bool (_static) and an index (_index), so I’m not entirely sure how we get quick-access to data yet. It looks like you may push the value you want to set (sq_pushint, sq_pushstring, etc) and then follow it with a call to sq_setbyhandle. Reading the information though, it sounds like Member Handles may be a class-only feature, so this may not be what I’m looking for.

Classes can have UserData associated with them! sq_setclassudsize sets the size of the UserData attached to a class. sq_setinstanceup and sq_getinstanceup are a pair of functions for manipulating a UserPointer associated with a class instance (not UserData). That said, calling sq_setclassudsize will automatically set the internal classes UserPointer to the location of allocated data. Following up with a call to sq_getinstanceup will tell you where to put your data.

What’s missing?

The only thing missing is a nice way of handling Floats with new Vector and Matrix types (as userdata).

One option is to have a “scalar” or “real” type that exists for doing math with Vectors and Matrices. MyVec.x *= 10 is going to work fine already, but MyVec *= 10 will not. I could put in similar code as before, a check “if ( vs.type == OT_FLOAT )” then treat it as a scalar, but that doesn’t handle the front case (MyVec = 10 * MyVec). That’s why I’m suggesting a .toscalar() or .toreal() function. toscalar() I believe makes the most sense, as the operations being performed are scalar math ones. In addition, the float can have a .tovec3() or similar to create boring (1,0,0) type conversions.

Vectors will already support using any float as arguments “vec3(0,MyFloat,12)”. So classes like 2×2 or 4×4 matrix should support constructing with equivalent vectors “mat2(vec2(1,0),vec2(0,1))”. If feeling very adventurous, take any combination of float and vector types.

Yeah, the only hold-out is the “NewVec = 10 * OldVec” case.

I don’t really want to disturb the standard Float type by introducing extra check “is the previous variable a Matrix?”. Requiring conversion via .toscalar() may not be unreasonable after all, even though all operations like magnitude and normalize are available inside the Float (well, my modified float anyway).

Squirrel Class Notes

Friday, November 8th, 2013

Squirrel features both Delegates and Classes for creating types and providing default values and actions. They are mutually exclusive, meaning you either use a class or a delegate (attached to a table/array).

The following is a collection of notes on Classes.

Sample Class

The class below features a constructor, metamethods (like operator overloading), and a few additional functions.

Create instances with function syntax

Creating an instance is a lot like JavaScript.

The constructor is called with mentioned arguments.

Not to be confused with the _call metamethod.

Problems to consider: Functions are not overloaded, but replaced

In C++ and GLSL/HLSL, it’s common to have operators overloaded that let you Add, Subtract, and Multiply vectors by other vectors, scalars and matrices. This can (mostly) be done in squirrel by looking at the type of the argument received by your metamethods. Here’s an example that supports scalar multiplication as well as vector multiplication (component wise).

The problem though is this is one way. I can do “MyVec * 10”, but “10 * MyVec” will not work. To make it work, you would have to add a custom _mul metamethod to the number type… except I’m not sure that’s allowed by Squirrel (custom delegates are, but metamethods?).

One solution would be to create a “real” type (similar to GEL), but Squirrel doesn’t exactly provide a nice way to create custom automatic type conversions, so the syntax wouldn’t be ideal (i.e. real(10) * vec2(10,20) everywhere). I do live with this in GEL, but it would be a shame to not have a cleaner syntax here. This also assumes custom math metamethods are not supported by Squirrel (and technically I have the source, so it’s not like I couldn’t add any feature I wanted).

It’s also a shame the operator code ends up being so complex. A function that started as one line (the last line) has become 5 lines to add float and integer support. Add in Matrix multiplication too, and then I start getting scared.

An alternative solution would be to add .tovec3() and .toscalar3() functions to the default float delegate (where tovec3() is (10,0,0) and toscalar3 is (10,10,10)). This would be a cleaner option, but is somewhat wasteful, especially once we get to higher order maths like Matrix multiplication.

One more option would be to add language level vector and matrix support. After all, a table with x,y,z slots can’t exactly be the most efficient thing. This would certainly be the most difficult though.

It’s a shame Squirrel doesn’t have a way of adding shadow members. Like a MyObject.x() without the brackets that could be assigned (MyObject.x = 10).

I shouldn’t be expecting a scripting language to be ideal for high performance math anyway. Like I’ve said, specialization is the key. That is what scripting is for. Leave the low-level to the low-level.

Copy Constructors: meh

This is inconclusive currently. I may have found a bug with what should be the ideal way of doing it. I’ve since posted a question to the forum on it.

What I would like to do:

What I have to do instead.

This works, but isn’t as elegant as cloning.

Weak References

All built-in types have a .weakref() function in their delegate (classes, instances, generators, tables, etc). Bools, floats and integers return the actual type, but everything else returns a weakref type.

The weakref type is exactly that, a type containing a reference. To access the data referenced, you can call the .ref() function.

If the data referenced runs out of references, it is recycled. Any weakrefs pointing to it will thereafter return null.

In addition, Squirrel does some cleverness with weakref types. If you omit the .ref() call, it will still return the data referenced. For the most part, a weakref will act like the real type, except typeof will be weakref instead of the original type. To get the type referenced, you do typeof on the value returned by .ref().

Strong References

Okay, the reason for the brief exploration of weak references was because of an experiment I was doing with strong references.

Contrary to C++, instead of an assignment operator being called, a strong reference is created. Thus both Pos and Old point to the same data. In this way, Squirrel works like Java/JavaScript/C#, in that everything (other than int/float/bool) are references.

Instead, one should use clone to make a copy.

Clone performs a shallow copy, meaning all top-level members are copied. Integers, Floats, and Bools work as expected, but most other types are references, which may not be the behavior you desire.

Cloning Classes

In the case of a class, you can write a simple _cloned metamethod to handle the above case..

If a table, you can use a delegate like the following to automatically clone all members.

No General Purpose Copier for Classes

The general purpose copier doesn’t seem to work very well with classes. At least when I was trying to do it, I couldn’t get it working properly. Here’s a snapshot.

_get metamethod: are you even useful?

I have this code in my 3D vector class.

It’ll let you syntactically use the class like an array, but it’s not an array. So this is fine for inside Squirrel code, but if you have native code expecting an array, these will be useless.