Spifftastic,

 

On The Rusalka Virtual Machine

To get this out of the way real quick, Rusalka is a register-based virtual machine. In short, it loads bytecode that contains instructions that conform to its instruction set and runs that. Currently, there are no high-level-ish languages that compile to Rusalka bytecode, but Rusalka assembly is available to make playing with it relatively easy. As assembly languages go, it’s probably more usable than most by virtue of being high-level compared to x86 assembly.

Rusalka doesn’t attempt to emulate existing hardware or any existing instruction set or CPU. It has its own instruction set. I feel it’s important to point this out because a lot of programmers I’ve spoken with tend to confuse virtual machines with only those they use to run one OS under another (e.g., VirtualBox). Instead, Rusalka leans closer in nature to Lua’s VM and possibly the Parrot VM, among other similar virtual machines. This is particularly true of Lua, since Rusalka draws a lot of inspiration from it and the various publications on Lua’s internals.12 These fall more under the category of process virtual machines, where each VM is not an entire system but a process in the host operating system.

At the moment, Rusalka is open source on GitHub and licensed under the Boost Software License, version 1.0. I previously made it available under the GNU GPL 3, but decided this didn’t really fit with how I viewed Rusalka, nor was it really conducive to letting people play with it if they want to. This also makes it easier to reuse bits of Rusalka as need be, since, although the VM will likely never see use, some of its code may prove useful later.

Motivation

As with a lot of projects I work on for fun, people tend to ask why I decided to start working on it. Since it’s a virtual machine, it’s maybe a little more curious than most personal projects because it’s more like a CS student’s homework and less like a random serialization library or one of the many thousands of Javascript animation/special-button libraries out there. Unfortunately, nobody accepts my usual explanation of “I felt like it,” so let’s try to break down why I felt like it.

First off, I want to eventually build my own programming language(s), compilers, and other tools. It’s probably on the list of weird romantic dreams programmers have. I just need you to either accept this is normal or smile and nod. I’ve toyed with various projects of this nature over the years, but the issue it’s always come down to is what to target. I already have C++, so I don’t really need something compiling to machine code and that seems pretty unpleasant anyway. JVM bytecode is mostly covered by Kotlin or Scala, and I don’t really like using the JVM if I can help it anyway.

What I’d really like is a scripting language I can embed in my own projects. Lua fits that role, but it seems less permanent and more like a haven’t-found-anything-better situation. I doubt I’ll do better than Lua — I haven’t got the years of experience and I don’t consider myself smarter than the folks working on Lua — but it seems doable to build a scripting language. I don’t want to pull a web dev and do something like CoffeeScript, though, so I wanted a VM. That’s one path to Rusalka.

Second, ever since I used the Cipher engine’s virtual machine3 and saw Quake 3 had a virtual machine,4 I’ve wanted to try my hand at implementing one of my own. These are VMs built to be embedded in the software they’re made for, and, although likely fairly impractical to do these days when we’ve got Lua and later perhaps mruby, it seems like a fun thing to try doing this as well, even if the end result isn’t perfect or hugely successful or whatever “web scale” means, it’s something to learn from and I enjoy the work involved.

Third, register-based virtual machines are seemingly uncommon — you’ve got Parrot, Lua, Dalvik, and likely a handful of others, but it seems like the majority go stack-based. Building a register-based VM instead then seemed interesting as well, so I wanted to try my hand at building one. This is especially true after past attempts to build stack-based virtual machines.5

Fourth, I felt like it.

You Get One Type

The Rusalka VM only supports a single type: doubles (i.e., 64-bit floating point numbers). This is a design choice mostly borrowed from Lua, albeit that Lua technically supports a few types (numbers, sometimes integers, strings, and tables). This wasn’t always the case for Rusalka, and initially the VM supported three types: floats and signed and unsigned 32-bit integers.

I liked the original three types, but they proved fairly annoying because it resulted in significant instruction bloat without introducing type tags into values and branching into most of the instructions. The former was more of a concern than the latter, realistically, since branching isn’t a huge issue. Really, though, it just seemed like a poor use of additional bytes since if I was going to end up increasing the size of a value in the VM, I wanted it to use 64 bits. There’re no 56-bit types in C++, much less standardized 56-bit floating point types supported by C++. Considering that, after some thinking, it seemed easier to use doubles as the universal type in Rusalka.

First, doubles can perfectly represent all signed and unsigned integers up to ±2^53 (or 52 bits explicitly stored via the mantissa plus one), which for the most part is sufficient to me. I’m sure there are some potential uses for the remaining 12 explicit bits you lose to the double’s sign and exponent, but I see this as a fair trade-off. As a way of not encouraging people to try not to treat it as a 52-bit integer, however, Rusalka’s bitwise instructions cast the value to an unsigned 32-bit integer.

Second, by using doubles, this automatically gains additional precision over the float that I previously used. This could end up being slower in some respects, but it’s a benefit in the long run. At least until someone hits floating point precision issues, but I can’t really help that.

The two questions I tend to get from friends after I talk about this are more or less the same: what about strings / other objects? The simple answer for Rusalka is, “They don’t exist.”

This is a partial lie: strictly, as far as Rusalka is concerned, there are no types other than doubles. This doesn’t mean you can’t return a double and use it as a handle, though. Rusalka does allow allocation of memory and access to it via handles that identify these blocks (these can also be embedded in the bytecode to be loaded as read-only memory blocks). There are a few instructions for dealing with memory, mainly:

  • REALLOC and FREE, both of which reallocate and free a memory block. A zero block handle is reserved for no block, so reallocating a zero block allocates a new block.

  • PEEKand POKE allow reading from and writing to memory. They’re two of the weirder instructions, seeing as they both take five operands in order to specify their inputs/outputs, the type they’re reading from the memory.6 Because memory access is intentionally limited by the VM, these instructions are both bounds-checked to ensure you’re not doing something bad.

  • MEMLEN, MEMDUP, and MEMMOVE query the length of blocks, duplicate blocks, and copy data from one block to another. MEMDUP can be effectively recreated using MEMLEN, REALLOC, PEEK, and POKE, but that’s unpleasant. MEMMOVE is the same, though it’s much easier to provide as an instruction since it uses std::memmove and is far safer.7

So although you only get doubles, you can allocate a string as a block of memory and pass around the block’s handle to functions that take strings (and arrays and dictionaries and likely any other conceivable type of object).

There’s one possible trick around this that can yield type-tagged object handles, provided one is okay with 32 bits for a handle and around 16 to 20 bits for a type tag. Unfortunately, it’s also an enormous, horrifying hack: it’s possible these bits into a quiet NaN’s (QNaN) mantissa. Because a QNaN only requires 12 bits to be set to generate one, where the remaining bits are all theoretically inconsequential, we’re left with 52 bits to pack data into if you count the sign bit.

There is the question of why one would bother doing this with a QNaN when you can represent 52 bits with a double anyway, and the answer is slightly simple: it’s a recognizable bit pattern. The VM can potentially identify QNaN values with certain bits set and give them special treatment. In addition, this should make this class of values somewhat immune to the existing arithmetic instructions, making error checking possible if operating on these. Still, though, this is one possible hack and it would be easier, in the end, to use a tagged union. It’s fun to think about, anyway.

In the end, the use of doubles over a wider range of types is more of a pragmatic choice that has its pros and cons. On the one hand, it greatly simplifies the instruction set and the code necessary to execute it, as well as guaranteeing that any language built for Rusalka won’t need to worry about multiple integral and floating point types. On the other, it inherently limits what the VM can directly provide access to through its value type. Pointers are off limits.8 You get no special array, string, or dictionary types as of yet. And lastly, without additional support from the host process you’re limited to what you can do with memory blocks and doubles. That last limit’s pretty loose, but it’s still a limit that means any code produced for the VM is going to be moderately lower-level than some other VMs.

I think this has the benefit of producing an instruction set and virtual machine that’s restricted to the bare essentials: arithmetic, logic, and moderately safe memory access. Further specialization can be done as the VM evolves as well, meaning it’s in a good position to grow out from where it is. Granted, that’s a nice way of saying it’s not very useful at the moment, but it beats throwing my computer out a window.

Register-Machine

One of the most important points to keep in mind is this: Rusalka is a register-based virtual machine. I mean that in the sense that it has a (currently) finite number of registers that act as operands, inputs, outputs, and so on for instructions. The stack is also available, albeit only as a storage mechanism and as part of the VM’s argument passing mechanism for function calls (which exists mainly for calling host functions, as there’s no requirement to use the stack for in-VM functions). Only three or four instructions manipulate the stack directly, and otherwise all instructions operate on and with registers.

This is in contrast to convention in some sense. Most VMs end up being implemented as stack machines, which I’ve been told is partly because it’s easier to reason about. That is, it’s easy to say you want to push X, push Y, execute an add instruction, then finally pop the result and store it somewhere. I have no single concrete reason for not going this route except that I tend to find it harder to reason about. So whereas someone might see this as clear:

// a = a * b
// We'll assume A and B are local variables or
// something and can be referenced by push and
// some sort of store local instruction.
push a
push b
mul             // push(pop() * pop())
store a         // a = pop()

It instead comes across as difficult for me to think about. I suspect this is because I’ve spent so long in high-level languages like C and C++ where I think in terms of functions and their parameters/inputs. As a result, it feels natural to me to treat each instruction as a function performing a small, specific task with its operands and returning a result, or in this case writing the result to a register. There’s no need to keep a mental image of how things on the stack are ordered, I just have to think about the input and output of a function instead of having indirect input and output on a stack.

So, for me, Rusalka assembly feels a bit simpler:

// let a = $0; b = $1
mul $0 $0 $1    // a = a * b

Which, in fairness to stack-machines, is a somewhat unfair comparison. A register-machine requires fewer instructions by virtue of not requiring a stack, so it should feel simple by comparison. Despite that, I prefer my little functions comparison for register-machine instructions.

This of course means instructions aren’t as light as those in stack-machines, but the benefit is that there’s overall fewer instructions per operation needed. Where a combined multiply and add might take about five instructions (three pushes, a multiply, and an add), the same takes only two in Rusalka. This also means I spend less time manipulating the stack, keeping things fairly simple. Depending on how many registers there are, this can be difficult to work with, but this isn’t too much of a concern for Rusalka with 256 registers on-hand (four reserved, but they’re still registers).

So let’s have a quick overview of the registers. Each register is just an index into an array of registers held by Rusalka, since they don’t map directly to hardware registers. These indices are broken up into three ranges:

  • Registers 0 — 3

    Reserved registers, of which there are four: IP, EBP, ESP, and RP. IP is the instruction pointer (or program counter) of the VM and for the most part isn’t directly touched by the bytecode. EBP and ESP are simply the stack base pointer and stack top pointer (in this case, the stack grows up). RP, finally, is just a dedicated volatile register used to store the return value of functions. These are also the only named registers.

    In the case of IP, EBP, ESP, the pointers are just offsets into the instruction data or stack. Why RP ends in a P is a mystery to me at this point, seeing as it’s not a pointer, but the name stuck. I suppose I could call it the return product to justify it now. At any rate, it’s an artifact of an earlier version of the VM and I kind of like having all the named registers ending in ‘P’ now.

  • Registers 4 — 11

    General purpose non-volatile registers, of which there are currently eight. These are automatically preserved and restored between function calls, meaning the VM itself is responsible for restoring their values between calls. As such, a function is free to stomp9 all over these as much as it pleases.

  • Registers 12 — 255

    General purpose volatile registers, which make up the remainder of the registers. There are currently 256 registers in total, so there are 244 remaining volatile registers. These are not preserved at all by any part of the VM. At the moment, there is no calling convention to determine who — the caller or callee — preserves these registers, though I suspect any language I build on top of Rusalka will put the responsibility on the caller to preserve those registers it needs preserved.

Again, in implementation, these are just members of an array,10 since all registers are in-memory. I have some plans to change this to allow functions to only allocate as many registers as they need, but this is ongoing and I’m undecided still if it’s entirely necessary. It should prove to be fun as an experimental branch of Rusalka, at least, particularly if I can allocate all registers in host stack memory (i.e., basically a VLA — given that Rusalka’s written using C++11, though could prove frustrating to pull off).

One particularly handy upside to having registers, especially considering the only type we care about is doubles, means that basic variables map very cleanly onto registers. By not worrying about emulating a specific hardware (as some register VMs might choose to do), there’s no realistic concern about whether a variable fits in a register — it does, always.

The Instruction Set

As an example of the Rusalka instruction set and its current assembly language, this is a ROT13 cipher I quickly implemented in my assembly language. This doesn’t display the entire instruction set, but at least shows how a smallish function might be implemented using it. The following also contains instruction numbers, though these aren’t part of the assembly itself.

 0   function rot13(mem_in, mem_out;
..                  index, char, base, length, length_out) {
 2    memlen length     mem_in
 3    memlen length_out mem_out
 4    if length_out < length
 5        load length length_out
 6    load index 0
 7    for index < length {
 9         peek char mem_in index MEMOP_UINT8
..         
10         le 'a' char 0 ; jump @__rot13_test_upper
12         le char 'z' 0 ; jump @__rot13_test_upper
14         load base 'a'
15         jump @__rot13__apply_rotate
.. 
..         @__rot13_test_upper:
16         le 'A' char 0 ; jump @__rot13__continue
18         le char 'Z' 0 ; jump @__rot13__continue
20         load base 'A'
.. 
..         @__rot13__apply_rotate:
21         sub  char char base
22         add  char char 13
23         imod char char 26
24         add  char char base
.. 
..         @__rot13__continue:
25         poke mem_out char index MEMOP_UINT8
26         add  index index 1
27     }
.. 
28     load rp length
29     return
.. }

It’s fairly ugly, but it should be easy enough to understand. As a quick primer: both a function declaration, or a let statement, tells the assembler to alias unused registers with the given names. Any identifier prefixed with a @, ., or ^ followed by a colon declares a label. if and for conditionals are syntactic sugar for compare-and-jump instruction pairs (because it’s easy enough to generate instructions for those patterns). If an instruction has an output/destination parameter, it is always the first parameter, as with Intel’s x86 assembly, though in practice it’s a mix of AT&T and Intel syntax.

Function declarations are broken up into three parts: the function name, the argument list (these are optional, but the arguments are automatically aliased to registers and their values are popped off the stack per the current calling convention), and a local register list. The locals function the same as arguments, except their values are not automatically popped off the stack. So, in the case of the above function, it has two mem_in and mem_out arguments and requests five additional named registers. Named registers are also syntactic sugar in the assembly and their info isn’t retained in the bytecode (it could be as part of a metadata chunk, but the VM would ignore it). In any case, this ultimately yields the following assembly:

.rot13:
    pop $1 // pop mem_out to general register 1
    pop $0 // pop mem_in to general register 0
    /* ... */

Similarly, an if statement, like if x < y { /* ... body ... */ }, would yield something like the following:

    // let x and y be two arbitrary registers
    // if ((x < y) != 0) ++ip
    lt x y 0
    jump @end_conditional
    /* ... body ... */
@end_conditional:
    /* ... */

A for statement behaves mostly the same, except it’s a looping conditional statement, so it’ll have an additional jump at the end of the loop back to the initial test. Aside from that, it’s fundamentally the same as an if statement.

Now that we’ve both got an idea of what I’m working with here, let’s talk about the actual instruction set. Rusalka currently has the following 36 instructions, as of this writing (these are re-ordered for organization here):

R = Register; C = Constant; M = Mixed; F = Flag
          U V W X Y Z   (Operand names)
====( Arithmetic )========================================
ADD       R M M F       U := V + W
SUB       R M M F       U := V - W
DIV       R M M F       U := V / W
IDIV      R M M F       U := i64(V) / i64(W)
MUL       R M M F       U := V * W
POW       R M M F       U := pow(V, W)
MOD       R M M F       U := fmod(V, W)
IMOD      R M M F       U := i64(V) % i64(W)
NEG       R R           U := -V
====( Bitwise ops )=======================================
NOT       R R           U := ~u32(V)
OR        R M M F       U := u32(V) | u32(W)
AND       R M M F       U := u32(V) & u32(W)
XOR       R M M F       U := u32(V) ^ u32(W)
ASHIFT    R M M F       U := i32(V) <<+->> i32(W)
BSHIFT    R M M F       U := u32(V) <<+->> i32(W)
====( Rounding )========================================
FLOOR     R R           U := floor(V)
CEIL      R R           U := ceil(V)
ROUND     R R           U := nearbyint(FE_TONEAREST, V)
RINT      R R           U := nearbyint(FE_TOWARDZERO, V)
====( Branching )=========================================
EQ        M M C F       if ((U = V) != (W != 0)) IP++
LE        M M C F          ... <= ...
LT        M M C F          ... <  ...
JUMP      M F           IP := U
====( Stack ops )=========================================
PUSH      R             push(U)
POP       R             U := pop()
====( Register assignment )===============================
LOAD      R M F         U := V
(Function calls)
CALL      M M F         exec_call(U, V)
RETURN                  vm.sequence--  (End frame)
====( Memory manipulation )===============================
REALLOC   R M M F       U := realloc(&V, W)
FREE      R             free(&U)
MEMMOVE   R M M M M F   memmove(&U + V, &W + X, Y)
MEMDUP    R M F         U := memdup(&V)
MEMLEN    R M F         U := memlen(&V)
PEEK      R M M M F     U := *type_ptr<X>(&V + W)
POKE      R M M M F     *type_ptr<X>(&U + W) = V
====( Internal )==========================================
TRAP                    vm.trap++  (Set kill-switch)

In the above list, mixed values are those that are either a constant or a register reference. This is determined by the flag operand, which I’ll get to in a moment. The instruction set above is intended to be sparse but functional enough that it can provide most of what’s necessary without going too far overboard. The actual opcode as it’s encoded in the bytecode is a 16-bit unsigned integer, so there’s plenty of room for more instructions, but it seems unlikely that I’ll need more than 8 bits worth of data for an opcode.11

A few, namely EQ, LE, and LT, are more or less carbon copies of how the same instructions from Lua’s VM — this is mainly because it makes sense and, as test instructions go, they’re simultaneously lightweight and should meet most needs. Unlike some instruction sets, however, the unconditional JUMP is absolute rather than relative, and is likely a completely useless instruction except for it making intent obvious. For relative jumps, there’s always the option to use any of the normal arithmetic instructions with the IP register. Besides that, the use of an absolute jump is more of an oversight in the initial design of the VM than anything else and may be corrected later. It would certainly be more useful, given that it would make jump relocation completely unnecessary (since otherwise the only things that require relocation are data references and function calls).

Most instructions have a final flag operand to determine whether a mixed input is either a register or a constant. Currently, this is a false operand: it is not part of an instruction’s operand list in the bytecode. Instead, it immediately follows the opcode as a 16-bit unsigned bitmask. While it’s possible to specify your own flags as operands to any instruction that takes it, it’s often better to leave it out of the assembly because asm2bc, my hacked-together bytecode assembler, will generate the flags for any instruction that doesn’t already have them. There are only two types of operands, constants and registers indices, so asm2bc does this well enough already. This saves a bit of time when hand-writing Rusalka assembly, since I have nothing better yet.

This instruction set is still rather large, but it’s only about a third the size of early Rusalka that required instructions for each type used. As a result, the past typed version of the instruction set had cast instructions, arithmetic for all three types (though only unsigned bitwise ops aside from shifts). In addition, because the flag operands to instructions weren’t yet present, each instruction that takes an optional constant required both a register- and constant-operand instruction. The cost of the flag operand is that it introduces branches into many of the instructions, but the upside is that the instruction set remains simple. Future iterations of the bytecode loader could convert instructions to more granular branch-less forms later as well, but for now I’m opting for an instruction set I can remember.

Unfortunately, I still need to add some instructions once I’ve decided how to handle global values in the VM. Storing them in read-write static data blocks seemed like one option, but also probably unnecessary. The other problem is how lookup is handled, but I imagine the end result will be based on hashed global names (to ensure that access to a global can be done by-name or by-hash). That’s one problem I’m still working through, since I’d like to come up with at least a few different ways to handle it and then spend some time just to think about them. Almost all further instruction additions and changes will probably be for the purpose of setting up features for whatever language I build on top of Rusalka.

Future Plans

What happens with Rusalka in the future is up in the air since it’s an experimental project, but there are a few things I’d like to work on implementing. Nothing I have planned is explicitly necessary for any one end result here, so they’re just ideas I think’d be neat.

  • Some sort of concurrent execution support. This can be done right now, albeit unsafely, by simply splitting up the existing vm_state type into a vm_process and vm_thread, wherein the process takes ownership of its threads and the threads have their own registers and can execute concurrently.

    I don’t mean to suggest that each thread consumes an actual host thread, as it’s entirely possible that the VM could schedule each thread for X instructions and yield as needed. However, it should be possible to spawn a host thread and have a VM thread running on that as needed. This would require some locks, of course, to ensure safe access to memory. Ideally, the locks should be granular enough that they don’t block VM threads too much (in other words, no one big global interpreter lock as in, say, some implementations of Python or Ruby).

  • Explicit support for classes or objects. This would require coming up with some way of identifying these objects at runtime, potentially increasing the size of the VM’s value type to include a type tag and any other necessary data. Beyond that, it also necessitates deciding how classes work, how method dispatch should work in the VM, and so on.

    This could be done at the bytecode level as well by storing everything in memory blocks (metaclasses and such could be stored in static memory blocks), but doing that could be potentially very ugly and slow. It could also be done through host functions, but this also necessitates some initialization done at the bytecode level (like an exported module init function).

    Considering performance, it’s probably a better idea to make this a part of the VM itself.

  • Global variables. These aren’t actually supported in Rusalka, which is an oversight on my part, but also comes down to me being undecided on how to reference a given global variable. How should a global variable be referenced by name? Using a string to identify a global variable at runtime has some reflection benefits, but it’d mean doing string comparisons for each lookup.

    My current idea is to use arbitrary numbers, though ideally these would be hashed and unique integers to refer to names. In that case, it should be easy enough to store globals in a hash table for quick access.12 After that, the only question is whether to make global variables viable operands for all instructions or if GETGLOBAL and SETGLOBAL instructions should be added.

  • Some sort of FFI support. Right now, access to host functions goes through Ruby-C-extension-like callbacks.13 This works well since it provides a generic interface to the function, but it could be better, since it’d be much nicer to write something like vm.bind_function<float, float, float>("fmod", &std::fmod) and have it just work. I already have some ideas as to how this could work, but it’s low on the priority list since the callback API already works and there are other issues to deal with — like the mysterious lack of global variables.

Ultimately, Rusalka’s been fun to work on and I plan to continue working on it, though have taken a break recently for various personal reasons and to work on other projects (including learning Swift and slowly picking up development of Snow again). With any luck, I’ll also get around to building the language that should sit on top of Rusalka as well, but that’s still in the not-quite-sure-what-I-want-it-to-be phase.

In the meantime, back to coding.

  1. The Implementation of Lua 5.0 by Roberto Ierusalimschy, Luiz Henrique de Figueiredo, and Waldemar Celes. <http://www.lua.org/doc/jucs05.pdf>.

  2. The Virtual Machine of Lua 5.0 by Roberto Ierusalimschy. <http://www.inf.puc-rio.br/~roberto/talks/lua-ll3.pdf>.

  3. An old Quake 3-like engine I licensed. I never used it for much, but I spent a lot of time studying its source code. It was pretty well-written and featured a stack-based VM, like Quake 3’s. It’s a shame the engine is entirely defunct and hasn’t been open-sourced.

  4. The QVM, which Phaethon’s put a specification together for over yonder: icculus.org/~phaethon/…/q3vm_specs.html. Cipher’s was likely heavily inspired by Quake 3’s, though I’ve never asked Rik Heywood about this so it’s speculation on my part.

  5. It’s slightly funny to me that when I tried building my first virtual machine years ago, I didn’t think to call it one and didn’t consider that there was anything other than stack-based VMs. In retrospect, I may have actually given myself a harder time by building a stack-machine, since I’ll point out a little later that I find them hard to reason about.

  6. Curiously, I allow reading 64-bit integers here, but they get casted to doubles, which makes their range unfortunately limited. I’ll probably reconsider allowing 64-bit integer reads or allow untyped 64-bit support (i.e., bitwise-only and anything else gets casted).

  7. MEMMOVE takes six operands, which is mostly unavoidable (source and destination blocks, source and destination offsets, size, and flags indicating which operands are constants that don’t need to be loaded from registers). They’re all mostly utility instructions, but MEMMOVE might stand out as the weirdest of all instructions.

  8. In theory — in practice, if you need more than about ~48 bits of a pointer, I’m not prepared to handle your awe-inspiring amount of memory in Rusalka anyway.

  9. For some reason I’ve convinced myself that “stomp” is used to describe register usage, but I have no evidence for this. It probably came from memory stomping. As such, I’ll just define it as I use it: stomping a register is modifying it without preserving its value or being concerned for how the register is used elsewhere. In other words, your code went in and wrecked the place. For the non-volatile registers, the VM will clean up for you. For the volatile registers, some part of the code needs to clean up after itself.

  10. value_t _registers[REGISTER_COUNT];

    Negative register numbers are also valid, but don’t reference registers. Negative register number accesses refer to values on the stack, specifically stack[ESP + RegNum] (where RegNum is a negative number). This is provided to ensure it’s possible to access data on the stack as an operand to an instruction, in the event that you want to quickly get or set a stack value without popping it. So, it’s a convenient way to use a value on the stack as an operand.

  11. High probability I’ll eat those words later.

  12. Or, if the name integers can be guaranteed to be sequential, an array also works and could allow the VM to relocate global storage depending on access frequency, but that’s an optimization that could be done after loading the bytecode.

  13. That is, using vm_callback_t = value_t(vm_state_t &vm, int32_t argc, value_t const *argv);

My Sublime Text Configuration

Following James Brooks sharing his Sublime Text configuration, since he suggested that others similarly share their Sublime Text configurations, I figured I’ll go ahead and do the same. First off, though, anyone curious about what’s in my Packages directory can check out my packages repository for it over on GitHub. I’ll bet it’s fascinating.

The following configuration is for Sublime Text 3, though should cover a few bits of Sublime Text 2 as well. It may or may not have changed in minor ways since writing this. I regularly toggle options such as align_indent on and off as need be, so it’s worth keeping in mind.

{
  "align_indent": false,
  "always_show_minimap_viewport": true,
  "auto_complete_commit_on_tab": true,
  "auto_complete_triggers": [],
  "auto_complete_with_fields": false,
  "caret_style": "phase",
  "close_windows_when_empty": false,
  "color_scheme":
    "Packages/Theme - Freesia/Shindo.tmTheme",
  "default_line_ending": "unix",
  "detect_slow_plugins": true,
  "draw_white_space": "selection",
  "enable_telemetry": "disable",
  "find_selected_text": false,
  "fold_buttons": false,
  "font_face": "PragmataPro",
  "font_size": 12,
  "freesia_borderless": true,
  "freesia_small_vscroll": true,
  "highlight_line": true,
  "highlight_modified_tabs": true,
  "ignored_packages":
  [
    "BufferScroll",
    "CoffeeScript",
    "Emmet",
    "Vintage",
    "WordCount"
  ],
  "indent_guide_options": ["draw_active", "draw_normal"],
  "line_padding_bottom": 1,
  "line_padding_top": 2,
  "manpage_sections": ["2", "3", "3G"],
  "manpage_use_apropos": true,
  "margin": 0,
  "overlay_scroll_bars": false,
  "rulers": [72, 80, 100, 120],
  "scroll_past_end": true,
  "scroll_speed": 0,
  "show_full_path": true,
  "show_panel_on_build": true,
  "show_tab_close_buttons": true,
  "tab_size": 2,
  "theme": "Shindo.sublime-theme",
  "translate_tabs_to_spaces": true,
  "trim_trailing_white_space_on_save": true,
  "use_simple_full_screen": false,
  "wide_caret": true,
  "word_wrap": true
}

And this is what my configuration currently looks like:

A screenshot of my Sublime Text 3 configuration

And a complete list of packages I currently have installed along with some notes for any that might need explanation. I won’t include packages that’re personal modifications of packages shipping with ST.

  • BBCode1
  • BlitzMax
  • Cowlips2
  • Emmet3
  • ExportHtml
  • GitCommitMsg
  • GitGutter
  • GoSublime
  • NilKit
  • Origami
  • Theme - Freesia4
  • Theme - Nil5
  • WordCount
  • hasher
  • sublime-jsdocs

Contrary to almost everyone else I know using Sublime Text, I don’t use Package Control. I don’t particularly like the way it works, I’m pretty sure I don’t like the idea of a plugin downloading things automatically, and so I don’t use it. Instead, I keep my ST packages directory in a git repository and each additional package that isn’t mine is made a submodule. Otherwise, it’s fairly useful, and I don’t fault anyone who chooses use it.

The only other thing of note is my keyboard bindings file. I have a lot of two-to-four stroke key bindings, mostly bound to ⌘K and ⌘, because they’re easy enough to reach. Preferences are re-bound from the Mac OS default of ⌘, to ⌘F5 (and ⌘⇧F5 for keybindings) just because it’s not that useful to have it bound to an accessible hotkey. I change my settings a lot, but I do so through key bindings to toggle things on and off. For a nice example of that, see my possibly-absurd list of indentation level keybindings.

My typical workflow at this point doesn’t make much use of autocompletion outside of the built-in word completion in ST. So, if instruction_argv shows up in my code, that’s the sort of thing I get completions for. I don’t find project-wide autocompletion all that useful in my C++ projects, and in Ruby I just operate under the assumption that any use of autocompletion would incur a severe performance hit.

I consider performance pretty important in Sublime Text. More than anything else, I dislike when an editor stutters while I’m typing something because it decided to attempt something intelligent, like autocompletion. In those cases, I have to wait to see if what I typed came out right, and even if the wait time is 500 milliseconds, that’s enough to disrupt my flow of thought. There’s a good reason I don’t use Eclipse for my Android work, and IntelliJ IDEA is barely scraping by at times. IntelliJ usually slows down in places where I’m least affected, though, since the mental overhead it takes to keep track of what’s going on in Java or Scala is far higher than in C++.6

My color schemes at this point are all heavily biased towards picking out significant chunks of source code rather than tiny things like variable names. Some people believe we should highlight variable names with individual colors, but this seems counter-productive since the problems I have are with navigating source code, not with identifying where a variable is used on screen. If I need that, I’ll just move my caret over the variable and hit ⌘D and ⌘⌃G (expand to word and find all instances of selection).

Rather than having a mess of colors all over the place, I prefer significant colors for structural elements. This can also make the minimap in Sublime much more useful if you take advantage of background colors for various selectors (such as having a specific background color for entity.name.function).

So, I’ve tried to optimize my setup for working on large-ish bodies of code spread out across various functions/types. It might not even be optimal for that work yet, but this is all a work in progress anyway. I’m sure I’ll find better ways to do all this over time.

Hopefully this isn’t too boring, though I expect it is. Like going through someone’s pants and finding only jeans. My [packages repository] might be more useful than simply looking over my list of packages and settings. In any case, have fun using your text editor of choice.

  1. A package I wrote just so I could have BBCode highlighting in ST. I don’t write a lot of forum posts, but I’ve never found a forum with a post text field that was pleasant, so it seemed better to just do this.

  2. Highlighting for my Scheme-like language. Currently in flux since using ST for Lisp kind of sucks. Haven’t found a good way to fix that.

  3. Disabled normally because it screws with my normal key bindings. Occasionally useful, but usually not. I miss Zen coding – it was better.

  4. Anyone using Freesia might look at the screenshot above and think that’s a Freesia theme, and they would be right. They might also think “Shindo.sublime-theme” doesn’t appear anywhere in either the base package or my personal branch, and they would also be right. That’s because I made it today and it’s not official and probably not going in my personal branch right away.

  5. Kept around mostly for maintenance purposes. I don’t actually use it anymore since Freesia replaced it.

  6. Your mileage may vary. Some people, for example, find it very easy to read Python code and think a large project in Python’s easy to navigate. I’m not as fortunate, since Python’s project structure and lack of explicit scoping through markers (i.e., not whitespace) always has me doing a double-take to see if something’s where I thought it was or the indentation’s what I expected. In other words, this is subjective, so please don’t complain to me that you find C++ obviously harder to reason about than language X.

Ascension 2 Development

I released the Ascension 2 Live Wallpaper on Google Play last month, so it’s probably high time I wrote about it. I’ll just run through a few topics and see where that goes.

For those not sure what Ascension is, you can first click the link to the store page, above, and that should give you a good idea of what it looks like. If you don’t know what a live wallpaper does: it displays a wallpaper on the Android home screen (the launcher), and typically the wallpaper animates or provides information or some other feature. It might render a model, maybe a tree, maybe an aquarium. Ascension displays bars that change color over time and react to touch. I mean for Ascension to stay subtle, more so than most live wallpapers, but it still allows users to customize it enough to make it their own.

I originally built Ascension for myself, but I’ll try to explain why I built Ascension 2. I still made it for myself, but its purpose changed a little this time around.

Motivation

Ascension’s actually pretty old now. I released it back in 2010 — when live wallpapers first appeared and various spam-developers released tons of them. Some featured sphere-mapped Android robot models floating in front of a background, every national flag, and other garbage.1 I still used an Android phone at the time and wanted a live wallpaper, so I made the first version of Ascension to appease my desire for a good live wallpaper.

After a while and some updates, I decided I no longer wanted to develop for Android. Android grew fragmented between various OS versions and hardware specs, and to an extent it remains that way today. Not so much OEM skins anymore, but they’re there. I ditched my Android phone for an iPhone and left Android behind while it went through its growing pains with tablets (recall the Motorola Xoom and how it failed to deliver on its features). Google later recognized that performance on Android sucked, so they tried to fix most of that with Android 4.0. Anyway, Android sucked at the time.

Come 2013, the Android platform had settled, tablets became usable (though they still lack good apps), and the fragmentation issue slowly began to sort itself out when Google stopped making OS upgrades necessary to benefit from app improvements when they shoved a lot of Android into Play and moved their apps to the store.2

Background aside, Android sucked less, I had a Nexus 7, and I figured I should try to re-learn Android development and forget most of what I learned. Ascension sat in its little corner of the Play Store, horribly out of date and ugly and generally unpleasant to use.3 That in mind, I figured I should bring Ascension back up to speed with the current Android. So, Sometime early in December 2012, I started to write Ascension 2. I initially wrote this reboot in Java. Most people will pause and think, “Well, obviously, you use Java to write Android apps.” I think “But wait, there’s more!” sounds appropriate here.

I wrote the first version of Ascension 2 in Java. I never released this version, neither to testers nor the Play store. It worked very well, but I stopped writing it when finals started up. I still had my degree to finish at the time, so I put it on hold. When I returned to the code afterward, I decided I didn’t want to use Java, so I killed that version of Ascension 2, went back to my iPad, and ignored my Nexus 7 except as a curiosity until September of this year. Its battery ran down to nothing several times during that period.

In early September, I still hadn’t found a job, and needed something to keep me busy. I couldn’t let myself just tinker with code and various projects, so I looked at the broken build of Ascension 2 I had on my Nexus 7 and decided I’d finish it. Except not in Java. I’d made my decision there, and I would find another language that didn’t piss me off. I only considered two options for alternative languages: Mirah and Scala.

Mirah looked like Ruby, which as far as I know is intentional. It’s not like JRuby, where Ruby runs on the JVM. Instead, Mirah borrows some of the syntax and compiles to JVM bytecode. I needed to find a language that compiled to JVM bytecode, otherwise it’d be difficult to compile to dex, so Mirah met that requirement. The problem with Mirah is that it’s still a bit iffy right now, and although I want it to become a popular JVM language, it doesn’t feel stable. So, I dropped it as a possible choice soon after I looked at it.

Enter Scala, which for a long time bore the title of “the language that confused me.” In retrospect, I don’t know why I found it confusing, but I’ll assume I just didn’t give it a good look. Scala is also like Ruby in that it tends to result in expressive code without the verbosity of Java, though the similarity ends there. It’s also both imperative and functional, and though nobody should listen to me when it comes to functional programming, I think having it leads to generally better code, assuming you avoid side-effects and otherwise write deterministic functions.

After I toyed with Scala for a while, I decided it had everything I wanted and, at the very least, I could use it as a less-verbose Java.4 In the best case, it would help ensure I wasn’t doing anything horrible. I still do horrible things, of course, but Scala let me get away with writing a lot of code that would cause me pain in Java.5

Scala has its ups and downs, particularly when it comes to Android development. Scala’s major upsides, those that let you write better code, are that it avoids verbosity, it provides pattern matching, it allows you to write both functional and imperative code, it supports anonymous functions, traits that allow mixin-like composition, and plenty of other features. This all leads to code that tends to be more expressive and less error-prone. As a short example, it doesn’t take too much work to get to a position where you can write code like this:

import android.app.Fragment
import android.view.{View, ViewGroup, LayoutInflater}
import android.os.Bundle
import android.widget.Toast
import scala.language.implicitConversions
import net.spifftastic.view.util.implicits._

class MainMenuFragment extends Fragment {
  override def onCreateView(inflater: LayoutInflater, root: ViewGroup, state: Bundle): View =
    inflater.inflate(R.layout.main, root, false)

  override def onViewCreated(root: View, state: Bundle): Unit = {
    super.onViewCreated(root, state)

    (root withView R.id.new_game) {
      _ onClick Toast.makeText(getActivity, "Poppy", Toast.LENGTH_LONG).show()
    }
  }
}

One small downside: I encountered trouble when I tried to use Scala’s standard library with Android. In particular, if you include the entire standard library, you will easily exceed the Dalvik VM’s 16-bit method limit per dexfile.6 ProGuard solves this but, as usual, it takes some work to configure ProGuard. Rather than wrestle with ProGuard yourself, though, I’d recommend anyone interested use pfn’s Android SDK Plugin for SBT. It’s hard to adjust to SBT, but it’s great when things start to work, and pfn’s plugin will handle most of the heavy lifting for projects. If you decide to go with Scala, though, you’ll more or less need to use Scala anyway, so just get comfortable with it.

After that, I set some goals for Ascension 2. All of the goals were set with the idea to take something that exists and learn to use new APIs to implement it. The goals:

  • Aim for almost complete feature parity with Ascension 1.

    This sounds obvious, but the “almost” there is important. There are features in Ascension 2 that still haven’t made their way into the release build. I’ve delayed the first update for one particular feature, as well. I really want what I’ve planned to make it in early on, but also wanted to get it into users’ hands sooner. As a result, Ascension 2 does not have one setting that Ascension 1 had: you cannot specify a custom bar color. That feature’s coming back in a different form, but more on that when I finish the update. It’s not a small addition.

  • Design the app for tablets first.

    Ascension 2 has no phone-specific UIs, so you see the the tablet layout on all devices. There are minor differences that depend on the screen’s width in DIPs, but otherwise Ascension uses the same UI everywhere. All devices and tablets get both a settings pane and a preview pane to view their changes. Only a few settings require you to open a dialog to select something, and otherwise the settings app shows every setting as you change it.

  • Allow users to save their configurations.

    Users of Ascension 1 requested this often enough, but I designed Ascension 1 in such a way that it would’ve been difficult to implement (meaning Ascension 1 had bad code). Ascension 2 has this, and although I doubt it sees much use, I’m sure someone is grateful that they can save a configuration and load it again later.

  • Implement the renderer with OpenGL ES 2 as a minimum.

    GL ES 2 just lets me move more work off to the GPU and in doing so I’m made less dependent on hacks to implement certain features. For example, I implemented brightness in the shader, whereas previously I had to account for it when the renderer generated bar colors. The code gets smaller and easier to maintain as a result.

  • Target Android 4.x and up.

    I decided to only target Android 4.x because I just don’t want to support older devices. It’s not fun, it leaves me hamstringed if I want to adopt new APIs since I then have to try to maintain compatibility with older APIs, and it overall just limits what I can do. If I had continued to target Android 2.x, I would no longer have access to PreferenceFragment, for example, a class crucial to Ascension 2’s design. Rather than limit myself and target older devices, I decided to target what let me make the app I wanted.

None of this should surprise anyone, or at least not developers. Put short, I wanted a way to get back into Android development, I wanted to use a language that didn’t suck, and I wanted to build an app that felt like it fit in on Android 4.x. As such, it seemed right to take an app that I made for myself and build it again, improve it, and do it right this time. The first problem I had came down to how to design something for a tablet.

Tablet UI Layouts

As I mentioned above, when I designed Ascension 2’s user interface, I made it for tablets first. Also, I have no Android phones.7 When I test on a phone, I either borrow family members’ phones or ask my testers to run something on their phones. So it makes sense that I design the UI for what I’ve got on hand.

First off, because Ascension is a live wallpaper, it has only one main user interface, aside from the wallpaper itself, to worry about: the settings activity. Ascension 1.x’s settings had no preview and required a lot of taps to change things, and you couldn’t preview your changes. Users needed to poke around in the dark and see how things looked for each change. I decided early on, then, that I needed to fix this.

Ascension 2 displayed on a Nexus 7 in landscape orientation.
Ascension 2 displayed on a Nexus 7 in landscape orientation.

Above is a picture of Ascension on a Nexus 7, in landscape. This is the UI you get on all devices. The only difference is portrait orientation, which sees the preview pane placed at the top of the screen.8 The preview pane, in landscape, is always on the left. This is to put it out of the way of your right hand, which — since most people are right handed — means you can scroll, swipe, and otherwise interact with the settings without using your left hand, or even having to do anything other than move your thumb.9

The largest change to the settings UI simplified how users interacted with the settings pane. Early on in development, the activity started off with a list of the settings pages and the choice to save or load a config. In short, you had a list:

  • Bars10
  • Colors
  • Save Config
  • Load Config

You could tap any of these to get to the actual settings or work with configs. This sucked. It meant that, in order to get to another settings page, you had to tap the back button then tap the entry for the other page. This persisted for a while, since I had to get the settings hooked up before I could find a better way to display them. Once I’d had a few things hooked up to see how I wanted them to work (I’ll go into that in a moment), I pulled out the basic list and replaced it with a ViewPager and ActionBar tabs.

This was much simpler than I’d expected and made the settings app easier to navigate. The downside is that there are two ways to get between settings pages, but they’re both easy to use: swipe the page or tap on a tab. No need to press the back button. The only problem I had is that the normal FragmentPagerAdapter generates names for the fragments it provides. This makes it difficult to communicate with specific fragments, so I reimplemented it in Scala. In reality, this isn’t a huge deal and implementing a PagerAdapter shouldn’t take too long, as with most basic adapters.

The preview pane itself doesn’t need too much explanation: it allocates its own renderer and displays everything the same as the live wallpaper normally does. The only difference is that you cannot change its offset by swiping, since it doesn’t have multiple pages of content.

The important part of the new settings panes, however, involved showing changes to settings as they happened. Rather than go the easy route and embed preferences’ layouts in a dialog (using DialogPreference — also known as the lazy coder’s preference), you just place them in the preferences’ layouts. It’s not hard to implement this and it makes it a lot more fun to customize Ascension’s settings. There’s no tapping back and forth between views to see what’s changed, you just change it, and you can see from the preview pane what’s changed.11 For me, that proved that the preview pane worked.

Configurations

As I mentioned above, one of the main goals for Ascension 2 was to let users save their configurations. Each saved configuration is written to a JSON file. They look like this:

{
  "use_touch_color": false,
  "bar_ping_lifespan": 15,
  "shimmer_speed": 0.20000000298023224,
  "use_uniform_height": true,
  "use_bar_pings": true,
  … and so on …
  "bar_count": 100,
  "flip_bar_mode": "even"
}

I had no plans to necessarily make them human-readable, and I wouldn’t say the JSON makes them accessible in that sense, but it does allow you to tweak the files in a text editor if you really wanted to. This also allows you to share configurations, though I don’t expect any users to do this. At any rate, it’s easy to save a config. The challenges all involve how Ascension 2 loads configurations, displays their previews, and deletes configurations.

It’s simple to load a config, apply its properties, and persist the values, but it’s complicated to refresh all the preferences’ views once done. You have to set every preference’s current value to the newly persisted value. Preferences do not automatically refresh their views, which is sensible, there’s no reason for most to ever check their values except when initialized or changes are persisted. That said, sensible or not, you’re required to do it for them, similar to how one must stimulate an abandoned kitten’s bowel movements.

As such, I have to notify all PreferenceFragments in the settings activity to refresh their values. In Scala, this just meant I had to write a new trait and include it with each PreferenceFragment subclass:12

trait RefreshablePreferenceFragment extends PreferenceFragment {
  import RefreshablePreferenceFragment.TAG

  def refreshPreferences(): Unit = {
    implicit val sharedPreferences = getPreferenceManager.getSharedPreferences

    Setting.values foreach { key =>
      val keyString = key.toString

      findPreference(keyString) match {
        case tsp: TwoStatePreference =>
          tsp.setChecked(sharedPreferences.getBoolean(keyString, tsp.isChecked))
        case sbp: SeekPreference =>
          sbp.setProgress(sharedPreferences.getInt(keyString, sbp.getProgress))
        case lp: ListPreference =>
          lp.setValue(sharedPreferences.getString(keyString, lp.getValue))
        case csp: ColorSelectorPreference =>
          csp.setColor(ColorPreference.getColor(keyString, csp.getColor))
        case _ =>
          Log.d(TAG, s"Unhandled preference change: $key")
      }
    }
  }
}

With this, it’s possible to then call refreshPreferences on each PreferenceFragment that supports it. In the case of Ascension, this means all of them. This excludes preference types not used by Ascension, but it takes little effort to add more as needed. Only the ColorSelectorPreference type requires special handling due to how I’ve encoded colors preferences’ values, but it’s still not as difficult as it would be otherwise.

Displaying a configuration preview required building a little offscreen renderer that simply took a GLSurfaceView#Renderer, let it draw a frame, and then returned a Bitmap from the renderer. That’s all easy to do, but doing it asynchronously isn’t as simple, seeing as the settings app generates configuration previews and caches them asynchronously.13 Ordinarily, nothing happens to cause an issue — things only fall apart when the activity dies while there are tasks running to generate the previews, or when the configuration changes. In either case, though, the activity is destroyed, so the fragment is as well.

When the config picker fragment dies, it tries to pull down the offscreen renderer and destroys its context. Normally, this should be pretty easy — synchronize on the renderer, wait for it to be free, and then destroy it. The problem here is that you end up with a deadlock, mainly due to how I’d originally designed the renderer and config preview generation code. One bit would grab a resource and wait on the renderer, while the fragment would grab the renderer and wait on the resource. This meant that if previews were still being generated when the fragment went down, the entire app froze.

In the end, I solved this just by never locking the renderer and instead atomically setting a flag. If the renderer isn’t in use, it’s locked and brought down immediately, which is what usually happens. If it’s in use — still rendering — then the flag is set and when the next render is complete, the renderer is torn down and all further use of it raises an exception (which is caught). By doing that, the renderer can continue to do its thing just long enough to get torn down without the need to throw a wrench in the cogs.

Aside from that, rendering the config previews is fairly straightforward. The only thing that might sound odd is they’re 16-bit BGR bitmaps, but that’s to conserve memory. The previews are small enough that it makes sense to limit their depth.

The last issue with configs gave me a headache, because it involved Android’s media scanner. A quick overview for anyone who hasn’t suffered it: when you plug in an Android device and grab files off of it, the files you see are those on the external storage that Android already scanned. If an application writes a file to external storage, the user may not immediately see the file in external storage — possibly not for a long time, depending on when the scanner runs next. You can usually force it by rebooting, but you don’t really want to ask a user to reboot to pull a file off the storage. So, to make it visible right away, the app tells the media scanner to go ahead and scan the file.

When Ascension creates a config file, it does this. When Ascension deletes a config file, it also does this, and then it tells Android to remove the file once it’s scanned. I do it this way because, if the app deletes a file, it’s still visible in the storage afterward because Android hasn’t scanned and noticed that it’s gone. To tell Android to remove a file, you need the file’s URI in the media content provider. To get the URI, however, you have to scan the file. So when Ascension deletes a file, it performs these steps:

  1. Scan the file and listen for when the scan completes.
  2. Get the URI once notified that the scan has completed.
  3. Delete the file.
  4. Tell the content provider to remove the URI from its database.

This sounds simple, because it is, but it took me a surprisingly long time to do things this way. I figured there has to be a better way, and I assume there still is, but I haven’t found one. As such, the current method to delete one or more config files uses this odd route through the media scanner. It works, I just wish I didn’t need to go through the media scanner for it to work. Still, it keeps things clean on the storage side.

Rendition

At least one person wanted me to discuss rendition in Ascension, but most of it bores me to death because there’s nothing particularly unique about it. Everything is drawn using OpenGL ES 2, but for the most part, the data used by OpenGL rarely changes. The only exception is the colors, but otherwise nothing unique.

Ascension’s renderer uses two fixed-size buffer objects: a vertex buffer and an index buffer. Both allocate the maximum size possible for their buffers. The index buffer is initialized once and never touched again,14 since there’s no need to ever modify it. The indices for bars never change, only the number of bars and their vertices. The vertex components are stored in different segments of the vertex buffer (i.e., not interleaved), so that way only components that need updating will be updated — in normal usage, this means the renderer only updates the color component.

Continuous rendition to the wallpaper engine’s surface is handled by a GLSurfaceView instance because it works well and allows you to queue up events on the thread that handles the EGL context, renderer, and so on. I tried to avoid this, originally, in favor of used ZeroMQ via JeroMQ for sending messages to the renderer, but the problem there is that I’m not directly in control of the GL thread the view manages, so I end up with issues where one socket lives on, which makes me unable to kill the ZeroMQ context. This causes the service to lock up, and it’s just kind of a mess.

Aside from that, Google forbids the use of sockets on the main thread in Android 3.x and up, so you now have to deal with two problems: how you synchronize socket access and how to kill both the context and sockets with AsyncTasks, but in order. You can use a serial executor, but the problem is that one socket is being closed from the GL thread. To ensure order, you have to push the work off to the executor. Meanwhile the service might have already tried to kill the context, causing it to block the executor’s thread. So, the socket is never closed while the context waits for you to close it. Ultimately, I ditched ZeroMQ and used GLSurfaceView’s queueEvent method:

override def onTouchEvent(event: MotionEvent): Unit =
  /* Copy and dispatch before the event gets recycled. */
  if (event != null) {
    /* For some reason, I've received null events, so check for null events. */
    val eventRunnable: TouchEvent = TouchEvent.Cache.allocate()
    eventRunnable.renderer = _renderer
    eventRunnable.event = MotionEvent.obtainNoHistory(event)
    _surfaceView queueEvent eventRunnable
    super.onTouchEvent(event)
  }

Another point to note here: events are cached (see: TouchEvent.Cache.allocate(). This avoids triggering the GC for most situations, since the cache eventually saturates (usually at around 12 to 16 objects — the cache in use above has no upper limit, though one can be set) and no more TouchEvent objects are allocated. This might be overkill when it comes to avoiding the garbage collector, especially given that the GC on Android has improved quite a bit (it’s difficult now to get it to cause frame skips). Still, if possible, I prefer to avoid the GC at all costs.

The above example applies as well to all other events: Ascension configuration changes15, offset changes, touch events, pause and resume events, and so on. Most events are handled as they’re received by the renderer, the only exception is the config event. When a config event is received, it includes the preference key that changes. Ascension takes the key and keeps a set of all changed keys to avoid pulling data out of preferences before it needs to. As such, all config changes are coalesced into a single config delta that is applied at the start of the next frame.16

There’s not that much else to say about rendition. Inside the renderer is a single fixed-step logic loop to keep the bar state updated and a check to delay the frame if not enough time has elapsed since the last frame.17 When the frame is actually drawn, the renderer loads any resources it needs to. Then, the bar state is passed to a color generator which the renderer uses to write the colors for visible bars (plus a few to account for bar overlap) to the vertex buffer. Then, only the visible bars are drawn. Repeat until the engine or service dies.

An Abrupt End

Since I continue to work on Ascension 2, I’ll have more problems to write about. I’ve probably forgotten a few problems that gave me trouble just because I had to block them out of my memory. The difficulty I had when I started to develop Ascension 2 mostly stemmed from using new tools, a new language, and unfamiliar APIs (for example, prior to this, I had never touched fragments). Still, I’ve had a lot of fun so far, and am happy with how far Android has come. It lacks good apps, but I feel well-equipped enough that I can make the apps I want.

Anyhow, Ascension 2 is on the Play Store now. That’s it for now — I’ve droned on for nearly 5,000 words about it and avoided even one mention of centipedes. I’ve got an update to finish and another app to write, so back to work.18

  1. This situation hasn’t improved much. Ascension 2’s competition today includes “twerk” wallpapers, boob-jiggle wallpapers, the many tons of seasonal- and holiday-themed wallpapers that get crapped out on a regular basis, and so on. I believe there is also at least one almost-furry-porn live wallpaper out there as well. Make of that what you will.

  2. The latter happened fairly early on, though. I remember Gmail getting updates via what was then the Android Market (which I still think is the better name — seriously, who the hell came up with “Google Play”?).

  3. To change the bar count and see what it looked like, you had to tap a preference to open a dialog to tweak a number or SeekBar, then close the settings, see if it was what you wanted, and go back to tweaking settings. Repeat this for most settings and you had a ton of things you could customize — Ascension’s main draw — and a lot of taps to go back and forth between settings and a preview.

  4. At least one Scala user will cringe at that.

  5. For example: tail recursion works in Scala. In fact, you can slap an @tailrec annotation on a function and the compiler will emit an error if you fail to write a tail-recursive function. If that doens’t make you immediately happy, you’re a horrible person.

  6. Android Issue 7147

  7. The apps I use and enjoy all live on the iPhone and iPad, so it makes little sense for me to use an Android phone. Much as they’ve improved the OS and devices, I still enjoy iOS more. There’s also the problem that all Android phones try to top the human head in size. Where apps are concerned, I figure I’ll get what I want on Android as soon as I make them myself, so until then, I’ve stuck with my iPhone and iPad for day to day use, and my Nexus 7 — the only Android device I own — sticks around as that thing I grab when I want to see if Android has any “killer apps” yet. It doesn’t.

  8. Since I usually have my thumbs near the bottom of the screen on a tablet, I decided the part that sees more interaction would go on the bottom. In earlier layouts, I’d placed the preview pane at the bottom of the screen to keep the tabs and the settings connected, but this annoyed me after long enough.

  9. I may add a setting specifically to enable a left-handed layout later, though it would only swap the preview and settings panes.

  10. Now the “Appearance” tab.

  11. The exception to this is the multiply blend mode, which unfortunately is both difficult to explain and might confuse people who don’t already know how multiplying two colors works. So, someone might enable multiply blend mode, see everything go black, and just assume it’s broken. In reality, multiply only works with light colors precisely because it multiplies the background and bar colors together. So, it also assumes you know basic arithmetic.

  12. Had I used Java, this would likely be a subclass of PreferenceFragment that sat between PreferenceFragment and each of its subclasses in Ascension. The code itself probably wouldn’t be too complicated, but it would be less pleasant than writing with RefreshablePreferenceFragment.

  13. Using an LruCache, the config picker fragment only stores so many configs in memory before it dumps them (8MB of bitmaps, specifically). They’re never written to storage, so the previews are just re-drawn as needed. I’d change this behavior, but it’s not likely to be an issue, and I’d rather not pollute even temporary storage with a ton of bitmaps.

  14. Except when the app loses the EGL context, either because the configuration or underlying surface changed.

  15. An Ascension configuration change is different from a regular configuration change in that it only affects the current Ascension settings. It’s not the sort that causes an activity to restart.

  16. The config delta is just a bitset, since there’s little need to store actual strings or symbols or objects. This keeps the delta relatively inexpensive and avoids unnecessary allocations on the GL thread. Beforehand, it was just a regular set of Scala Enumeration values, which was fast but resulted in allocations to put objects into the set. A bitset stood out as a good alternative to avoid allocations, since each Enumeration value gets an integer ID, which can be poked into the bitset.

  17. It’s important to not simply return early when doing this, however, since GLSurfaceView will swap buffers even if your renderer does nothing. Instead, sleep for a short amount of time. This is a bad way to do things because it blocks the view’s GL thread, but you will usually only have one active GL view at a time, meaning only one renderer will block the thread. If you have multiple views, you should consider a different way to handle frame limiting.

  18. If anyone wants me to go into more detail about something, shoot me an email and I’ll either reply directly or shove another post up on here. My address is over on the Contact page (see: menu bar).

Spifftastic Design Change

A staple of Spifftastic is that I can change the design periodically. A staple of blogs is that you’re obligated to write a blog post about it, because otherwise how will anyone know? Aside from the visual changes.

Anyway, I rebuild Spifftastic, so the new theme is called Tracey. I figure that continues my tradition of giving projects terrible names. The main point of Tracey was mobile support — specifically to see if I could use one CSS file for phones, tablets, and desktops. So far, I’d say it worked out. You can see the mobile design by shrinking your browser window’s width to 600 pixels or less.

It’s about as simple as Oak, just colorful. And whereas there was always a bar at the top listing other pages, that’s now a sidebar on desktops that’ll scroll with you. The sidebar scrolling is disabled on iOS and Android tablets because I haven’t found a browser that handles fixed position elements well.

I borrowed the color palette from Ascension 2, which I’ll write something about later.1 The title’s additional text remains and all the other little things I liked about Oak. Also, everything is tinted purple. I find it readable that way.

Anyhow, that’s it for now. I’ll have a post up about Ascension 2 sometime later. Until then, not much new.

  1. Would like to finish the next update first, as I’ve aimed to put in a feature that I wanted for a long time. Said feature should also explain the removal of one particular setting between the two major versions of Ascension.

Ruby OpenGL Experiments

One of my complaints in working with Ruby is its lack of anything for doing graphical work, be it UI toolkits – most of which are now unmaintained – or simply graphics in general, like OpenGL. The title of this post should tell you which direction this is going. I could complain that Ruby’s image is taken over by the Rails folks or that web developers ruin the fun for everyone else, but neither of these complaints have any objective basis and are just for fueling my personal biases against web developers. So, let’s talk about Ruby, OpenGL, and what I did to make these two unsafe things work together.

OpenGL

The current state of OpenGL in Ruby is decent-ish. There are a handful of gems for Ruby support, some for Ruby 1.9 and at least two for Ruby 2.0. It’s a good situation. Your options are, for the most part, as follows:

  • ruby-opengl2, ruby-opengl, and opengl — Fundamentally the same gems, though for different Ruby versions. They all provide the GL API versions 1.0 through 2.1. Not the best choice, but there are folks out there trying to make sure it still works, and if nothing else, it’s maintained in that respect. If you’re still using 1.8 or 1.9, it might be up your alley.
  • opengl3 — A GL 2.1 - 4.2 gem implemented using FFI, which makes it pretty useful across interpreters (it should work with JRuby, for example). The downside is that it is large and overly complicated, which is strange. Looks like it might’ve been written by someone whose only other language is Java. It’s still being worked on though, so I’m sure it’s not going anywhere for a while.
  • opengl-core — This is my gem. It only supports the OpenGL core profile and extensions, so it’s purely 3.2 and up. Like ffi-opengl (a defunct 3 year-old gem that’s dead) and opengl3, this uses FFI to load GL. Unlike opengl3, it doesn’t have that much code in it except under the opengl-core/aux features, which aim to wrap some GL functionality in slightly more Ruby-friendly code. By default, it is straight OpenGL with none of the difficult-to-work-with-in-Ruby aspects hidden. The downside to opengl-core is that I’ve only built it for Ruby 2.x, so for those of you stuck on 1.9 for whatever reason, it may not work.

So bindings for OpenGL are available and usable now, which is a far sight better than things were a year ago. For the purpose of this blog post onward, I’ll only talk about opengl-core because I wrote it and therefore I use it. In other words, if you want to use one of the other gems, you’ll have to consult someone else. If you’re not using Ruby 2.x, you might not find this very informative (so go install Ruby 2.x and join the future).

opengl-core is, put simply, a very small amount of handwritten code and a very large amount of generated code. Almost all of opengl-core is defined through its gl_commands.rb and gl_enums.rb scripts, since both are used to sort of statically define all GL commands at runtime. That way there’s not too much need for eval or define_method or what have you, as the method stubs are already there and forward calls to the appropriate function (and, if not loaded, they make sure that happens too). So when you call a function for the first time, it’s loaded then and there, unless you told the gem to load everything ahead of time (Gl.load_all_gl_commands! is a thing).

So why write my own OpenGL gem when opengl3 and ruby-opengl2 et al. exist? Well, I didn’t like the former’s code. Again, looks like it was written by someone who knows only Java — it’s honestly a really strange piece of code to look at when you consider how simple it is to load a GL function. Very over-engineered. But anyway, its developer likely had much different needs than I, so it’s hard to say why it is the way it is without asking ‘em. Either way, opengl3 doesn’t suit my needs. ruby-opengl2 et al., on the other hand, only works up to GL 2.1, which I don’t code for. Period. Easy decision.

So, if you want opengl-core, you just install it via

$ gem install opengl-core

GLFW 3

Another problem I encountered was that there were no GLFW 3 bindings for Ruby. This won’t surprise anyone — Camilla Berglund (aka elmindreda) recently released GLFW 3. It seems obvious that nobody would get around to it in such little time.1 So, there were two options: use another library or write the gem myself.

There were a few other options, like SDL, GLUT, and GLFW 2.7. Using SDL is an alright idea, but there were no SDL 2 bindings for Ruby. The other bindings for 1.2 are plentiful but not really what I was looking for. GLUT is the spawn of Satan and the less we have to deal with it, the better. That it somehow perpetuates itself is mysterious, but I suppose there will always be the old OpenGL tutorials telling people they should use it.

GLFW 2.7 also has bindings, except that its gem, ruby-glfw, doesn’t work on Ruby 2.x, so that made it much less appealing. I’d been asked, previously, by Tom Black about whether I would be willing to help write GLFW bindings for Ruby 2.x, but at the time I was mostly concerned about the time needed to do the GLFW and OpenGL gems and so on, mostly because the work needed to do anything in Ruby would be a bit extensive. I think he might have contacted me because I hold a fork of it on GitHub and once tried to update it. Either way, I didn’t start on either until quite some time after, and I believe he’d found his own solutions.

The «do it yourself» option was then the only one that didn’t involve writing my own window-handling code. That’s something I could’ve done, at least for OS X (it’s remarkably simple), but it’s not a great idea. So, I sat down and wrote the GLFW 3 bindings for Ruby, developing it as a C extension rather than using FFI because that simplified a lot of the Ruby/C compatibility issues. Turns out Ruby’s C API is very nice, but very undocumented and very heavy on the abbreviations. You get used to seeing things like rb_str_new2 and SIZET2NUM (size_t → Numerical2).

For the most part, I tried to make sure the Ruby API for GLFW 3 was similar to its C API without becoming unbearable to use in Ruby. So, Windows and Monitors are their own objects, and GLFW is one big module encompassing them and the constants needed. It’s nice, since you can now create a window and main loop by just writing something like this:

#!/usr/bin/env ruby
require 'glfw3'

Glfw::init

main_window = Glfw::Window.new(800, 600, "Window Title")
raise "Unable to create window" if main_window.nil?

main_window.close_callback = lambda { |window| window.should_close = true }
main_window.make_context_current

until main_window.should_close?
  Glfw::wait_events

  # Do drawing stuff here. Also clear the backbuffer because otherwise
  # you're going to see a lot of junk when swapping buffers.

  main_window.swap_buffers
end

main_window.destroy
Glfw::terminate

So that’s about twelve to fourteen (depending on how you space out your lambda) lines. I think that’s pretty good. I actually fixed a small bug in Glfw::terminate that I hadn’t noticed before as a result of writing out this post, but otherwise there’s not too much to say there. The bindings are easy to use, they’re Ruby-friendly, and they’re out there in the glfw3 gem. Contrary to the repo names I use, which don’t correspond to gem names, the gem is just called glfw3 and you can install it via gem using the following:

$ gem install glfw3

The 3D Math Problem

A problem I greatly enjoy complaining about and solving several times over is that, because OpenGL’s meager math functions were deprecated, we’re always in need of code to handle the 3D maths side of things for us.

At first I wondered how it’d work if 3D maths were implemented in Ruby, so I tried doing a simple 4x4 matrix multiplication — work you’ll need to do reasonably often — in both Ruby and via a C extension and ran it through a profiler. For folks who aren’t doing very much, a Ruby implementation of 3D maths is probably sufficient, but it’s also very, very slow even when you try to reduce allocations, method calls, etc., for that operation.

On average, with my system, 4x4 matrix multiplication is 7x slower than a C extension. Both take a small amount of time, though it’s not as negligible if you’re doing a lot of it and have only 16 milliseconds to do it in. Keep in mind that, for the C extension, the Ruby values have to be converted to C values, Ruby has to forward all the arguments to the C function, and so on, and it’s still 7 times faster. So, if you need fast math code, always write it in C. If you need accurate but slow maths code, you can probably safely write it in Ruby (in which case, you’ll probably use a class like BigDecimal).

So, that led me to take my old Snow-Palm 3D maths code, add some bits to it, and write bindings around it for Ruby and call the result snow-math. Now it’s possible, for the most part, to get somewhat fast 3D maths. Plus it all works with GL since the 3D math types are all stored as contiguous blocks of memory, meaning you don’t have to pack them as strings before sending them off to GL (e.g., [1, 2, 3].pack('fff') to convert an array to three floats).

So, to continue the trend of telling you how to install something, you can install snow-math via gem using the following:

$ gem install snow-math

I’d also recommend checking the README on GitHub because there are some options to customize how the extension gets compiled, like whether to use floats instead of doubles.3 Most common types are covered by the library, namely two-, three-, and four-component vectors, quaternions, and 3x3 and 4x4 matrices. Planes and rays are not covered, unfortunately, though they should be easy to implement using the types provided.


That about sums up all but one odd experiment I did, snow-data — a library for defining a layout for blocks of memory and so on, similar to various C-struct libraries for Ruby (snow-data being one). I won’t write about that here though, as it’s not entirely relevant to OpenGL in Ruby (though you could use it to describe vertices and so on).

At any rate, OpenGL in Ruby is better than it’s been in a long time, and it’s now pretty reasonable to write applications, games, demos, tools, and so on using OpenGL in Ruby. Considering the flexibility of the language and what it offers for anyone who wants to worry more about their programs’ functionality than the boilerplate details, it’s a decent language for the job.

The Gems

If you’d like a list of the gems I made again, here you go:

  1. Especially not when most Ruby people are hideous web developers with no care for doing awesome things with Ruby.

  2. This conversion is missing from JRuby’s C API, but then JRuby deprecated it and is still stuck on 1.9, so screw them. In general, JRuby’s C API is a barren wasteland of misery, so I don’t recommend bothering with it. If you need bindings that are JRuby-compatible, always go with FFI. It’ll probably be faster by going through the JVM as well.

  3. snow-math uses doubles by default because Ruby also uses doubles internally. So, it results in the least amount of casting at the cost of allocating twice as much memory per type.

So I Learned To Code

I taught myself to code. Sort of. I taught myself to code after reading a book on Perl about 11 years ago, part of a book on Perl, and not understanding it. I wanted to make games and Perl was only useful, in my eyes, for text adventures. I’m still somewhat of the opinion that Perl is only useful for text-based stuff, but not for lack of the ability to do graphical things. At any rate, after I tried to learn Perl using a book, I taught myself to code.

Well, no, I read other people’s code and learned how to code from that. I read books that dissected good code, but mostly I read books on things that weren’t specific to code (e.g., 3D maths used in games). I taught myself to read code and write code like the code I read. I read the Quake 1 source code,1 I read the Quake 3 SDK’s source code, I read Ruby scripts, I read Lua scripts while I worked on a game that used Lua,2 and so on.

I asked myself how others did it and I read how others did it to answer the question. I was curious what code written by people who really knew how to code looked like. So I learned to read code, but I still wrote bad code probably for a long time, and maybe still do, but I leave that up to others to decide. After that, I taught myself to code.

Except really, it’s more like I taught myself to use a debugger before I learned to really code. I couldn’t write code if I didn’t know what it did. So, I wrote code or took other folks’ code and I stepped through it and saw how it worked. I read code while it executed, and this let me know how things worked even if I didn’t and sometimes still don’t know how everything works at the lowest level. I know how conditionals work, for example, in assembly, but I couldn’t tell you why the assembly worked with the hardware.3

I was curious about what code did and had to know how it worked before I could really code. So, I learned how it worked and what it did, and after that, I taught myself to code.

When I taught myself to code, I never really started small, I just started esoteric or weird, whatever prodded me just enough to raise questions that seemed out there. My idea of fun wasn’t writing blackjack for a terminal, though I wrote blackjack programs, it was always more along the lines of modifying virtual method tables at runtime or building code to automate other tasks in programming, like writing code to generate Lua bindings.4 I was curious about how I could handle any given situation and what different ways I could approach it.

The whole key here — to how5 I learned to code — is curiosity. Curiosity killed the cat, sure, but the cat had nine lives, and before it touched the capacitor in the CRT, it got a pretty good look at how things worked. Because of my curiosity, I want and need to solve problems. I see puzzles in the crevices between ideas. Programming is not a job, but a strange search for new problems, one part learning to learn, one part problem solving, and one part borderline-obsessively-curious tinkering. A self-taught programmer must necessarily have all three of those parts to survive as a programmer.

Without those parts, that programmer will never seek knowledge, never fix anything, and never try anything new. He or she will be the programmer you don’t want to know, self-taught or not.

But with them? With them, that programmer will never feel limited, never stay bored, and never grow tired. With these things, that programmer will always enjoy programming. With these things, you’ll be a great programmer.

(Addition 2014: In retrospect, I left out the intervening years where I initially practiced making game art, so there’s a gap between Perl and other stuff where I did game art, but the point of this post is less everything and more that curiosity’s pretty useful.)

  1. For those who have wanted to read the Quake series’ source code but haven’t had the time to sit down and do it, I do recommend reading Fabien Sanglard’s fantastic code reviews. As of this writing, he’s done Another World, Quake 1, Quake 2, Quake 3, Doom, Doom 3, Duke Nukem 3D, and others.

  2. Lua’s a fantastic language for programming fun little experiments in. When I first ended up learning it, I was working as an artist on a small game called Bioscythe. As far as I know, it was never released, and I went my own way after the project seemed to more or less die as everyone on the team went off to do their own things. I learned a lot from seeing how we used Lua, however, and I’ve since remained convinced that it’s the best scripting language for game development.

  3. That is, I know the assembly but not how the mainboard works, and given the complexity of modern computer hardware, I’m prepared to forgive myself for this for a while even if I’ll have to figure it out sometime. On the upside, knowing what a compiler might produce is pretty handy at times. I say “might” because the compiler might optimize things away and so on, so there’s no guarantee that I’d always see the input in the output.

  4. Which actually ended up landing me a small contract to do exactly. I wrote some experimental code to do at-runtime bindings of Lua code, someone liked it, and gave me a contract to do it again but without the speed hit of the original experiment.

  5. Why I learned to code is another sort of thing altogether, though from the code I read, you can probably guess what set me on the road to code, as it were. Basically, though, I started out as a game artist, and still make game art, but wanted to make a game. Mods were one avenue but it’s difficult to find a good team making one (and the mod scene is mostly dead now, at least for modern titles, due to the amount of work involved). Finding a programmer willing to follow your lead is even more difficult. So, the solution was to write code myself.

    This altered my career path significantly and made me an anomaly in some respects since I’m both a relatively competent artist and a programmer (debatable on the competency part there as I’m a lousy judge of my programming skills), but it made life a whole lot more fun.