The code/data duality

6 01 2016

One often wants special code to run the first time a function is invoked. In C++ the naïve way to do this is using function-static variables.

Consider a function that returns a random number. For nostalgia’s sake we want to use rand  (which needs to be seeded via srand before the first use). We may do something like this:

#include <cstdlib>

int random(int max) {

       static bool initalized = false;

       if (!initialized) {

              srand(time(nullptr)); // seed the random generator once

              initialzed = true;


       return rand() % (max+1);


When I was in university, towards the end of the previous millennium, we had to learn VAX assembly (not the most useful skill I’ve ever picked up). That was when I was introduced to the concept of self modifying code. The crux of the matter is pretty obvious in hindsight, code and data aren’t different things, in the end everything is software is bits and bytes. This means that we can modify the code of the program as simply as modifying variables.

I don’t remember my assembly and I’m assuming that anyone reading this either:

  1. Doesn’t know/remember assembly
  2. Doesn’t need me to explain about code re-writing

So I’ll just invent a simple stack based assembly language since I can’t be arsed to re-learn assembly.

If we take the random function above and translate it to pseudo-assembly we would get something like:

# randomInitialized
0x000DAF00:  0x0  # initialized to zero at compile time
# random
0x000DAF04loadAddress randomInitialized
0x000DAF08branchNotZero randomInitizliedmainFlow
0x000DAF0Cload 0 # first run
0x000DAF10call time
0x000DAF14load 1
0x000DAF18storeAddress randomInitialized
0x000DAF1Ccall srand
# mainFlow
0x000DAF20: increment # max is currently on the stack
0x000DAF24: call rand
0x000DAF28: modulo # rand() % max + 1
0x000DAF2C: return

This is pretty straight forward mapping of the C[++] code to assembly and it has the same drawbacks. We perform a comparison every time the function is run (!initialized) although it almost always has the same outcome. Another deficiency of the code is more pronounced in the assembly version, a lot of code is skipped over for most calls which works against the instruction cache.

What we would really want is that every time the function is called, except for the first time, it will just do what it needs to be done. This can be achieved by modifying the code.

We compile the function to start with a jump (aka goto) to some address (outside the function) which calls srand and then replaces the jump with the first instruction we want for the subsequent function calls. The first instruction we want is  increment, for the sake of argument we’ll say that the opcode for increment is  0xADD1.

# initializeRandom
0x000DAF00load 0
0x000DAF04call time
0x000DAF08call srand
0x000DAF0Cload 0xADD1 # opcode(increment)
0x000DAF10storeAddress random
0x000DAF14: jump initializeRandom
0x000DAF18: call rand
0x000DAF1C: modulo # rand() % max + 1
0x000DAF20: return

During the first run the first thing we do is jump out to a memory location that precedes the function proper, then we seed the random number generator and modify the beginning of the function from being a jump to being an increment1. The initial jump is computed in advance so that we store the increment  just after the instruction pointer and then just fall through into the (now) modified function. Subsequent calls to the
function have a lean four instruction function to execute with no conditions and no branches.

0x00DAF14: increment # this is now the first opcode
0x00DAF18: call rand
0x00DAF1C: modulo # rand() % max + 1
0x00DAF20: return

There’s no need to waste space on a static variable, the code is more cache friendly and (at least in my made up assembly) smaller.

So if everything is so good why isn’t this used in practice? OK so high level languages don’t give you direct access to the code parts of your program but the compiler could generate such code, right?

Well I understand next to nothing about compilers but I’m pretty sure that multiple cache levels at least will make things unpractical (not to mention branch prediction).

The most obvious problem with this example is that it’s horrendously thread unsafe.  I suppose that some of the readers have been tearing out their hair from the get go, the original C++ function with the static variable was just as unsafe. An unprotected shared variable could be modified different threads simultaneously which is a data race and undefined behavior (starting with C++11).

As an aside I would like to mention that C++11 introduced “thread-safe function local static initialization”  so a better implementation of random would be:

#include <cstdlib>
int random(int max) {        
    static bool unused = ([]{// define and invoke a lambda that
        srand(time(nullptr)); // seeds the random generator
        return false;

    return rand() % (max+1);

Here I’m depending on the fact that a it’s the compiler’s responsibility to initialize a static variable is only once in a thread safe way. The static variable here is a bool but it’s never really used, all we need is the side effect when creating it (does anyone what to submit a proposal to allow static void variables?).

In most mainstream compiled languages, the code doesn’t have access to the generated machine code. Thus most programmers nowadays have a mental divide between code and data (Lisp programmers, feel free to gloat now). However JavaScript, as a scripting language  with a functional orientation, brings the code/data duality back together again.
Since functions (code) are objects, self mutating code is back in business. Luck would have it that JavaScript is single threaded (mostly) which allows code to mutate in a thread-safe way.

Consider, for example, a wrapper around a WebSocket.
After creating a web-socket it’s not usable until the connection is established. Due to the single threaded nature of JavaScript this means that you have to relinquish control of the thread before using the object. Say we want to store all outgoing messages until the socket is opened and then send them, one way to achieve this would be like this:

function Socket(address) {
    this.socket = new WebSocket(address);
    var queue = []; // captured by 'send' and 'onopen'
    this.send = function (message) {
        if (this.socket.readyState !== WebSocket.OPEN)
    this.socket.onopen = function () {
        // send queued messages
        queue.forEach(msg => this.send(msg));

Now let’s see how the same thing could be achieved with self modifying

function Socket(address) {
    this.socket = new WebSocket(address);
    var queue = []; // captured by 'send' and 'onopen'
    this.send = message => queue.push(message);
    var self = this;
    this.socket.onopen = function () {
        // send queued messages
        queue.forEach(msg => this.send(msg));
        // replace the 'send' message
        self.send = function (message) {

The send function now has two instances, before the socket is fully open
and after it is open. But wait, what about after the socket is closed? For some reason sending on a closed socket outputs an error to the console but does not throw an exception. We can modify this behaviour like this:

this.socket.onclose = function () {
    // replace the 'send' function yet again
    self.send = message => {
       throw Error('Sending on closed socket: ' + message);

Now we see that code is can be more complex than having two different states, it can be a fully fledged state machine. Admittedly, for most code it is a state machine with only one state, no input to the code modifies the code itself. However it may be useful to keep in mind that the code itself can model the problem space in addition to the data and data structures.

1. In real life the instructions aren’t necessarily the same length but you get the idea



2 responses

21 02 2016

Hi, I saw a question you asked on stackoverflow about delete coockie in chrome extensions.
can you help me with that? I try to write a program thhat delete a specific coockie every time it shows. Simple as that.. And I can’t get it..
I did a normal manifest with permissions to coockie and background js with
chrome.cookies.remove command but it doesn’t work.. how can I make it repeat automatically or everytime I click on the browser action icon?

Thanks you,

21 02 2016
Motti Lanzkron

Hi Dean,
I think you can use Chrome’s Cookie.onChanged event in order to register for a notification when a cookie is added.

Also may I suggest that stackoverflow is a better place to ask such questions, it will reach many more people.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: