Skip to content

Bytecode Format

Bytecode Format

CrossBasic compiles your AST into a simple, two-part CodeChunk:

struct CodeChunk {
  std::vector<int>    code;      // sequence of opcodes and operands
  std::vector<Value>  constants; // constant pool of literal values, functions, classes, etc.
};

When you invoke your program, the VM runs runVM(vm, mainChunk), treating code as a flat array of int slots:

┌─────────────────────────────────────────────────────────────┐
│ CodeChunk                                                  │
│ ├ constants[0] = <Value nil>                               │
│ ├ constants[1] = <Value 123>                               │
│ ├ constants[2] = <Value "hello">                           │
│ ├ constants[3] = <Value <ObjFunction "foo">>               │
│ └ …                                                         │
│                                                             │
│ code[] = [                                                  │
│   OP_CONSTANT, 2,   // push "hello"                         │
│   OP_PRINT,         // print it                             │
│   OP_CONSTANT, 1,   // push 123                              │
│   OP_RETURN         // return                               │
│ ]                                                           │
└─────────────────────────────────────────────────────────────┘

1. Constant Pool

  • Indexed from 0 up to constants.size()-1.
  • Stores any Value variant:

  • Scalars: nil, int, double, bool, string, Color

  • Heap objects: ObjFunction, ObjClass, ObjInstance, ObjArray, ObjModule, ObjEnum
  • Built-ins: BuiltinFn lambdas, property maps, overload vectors, raw pointers (void*)
  • Accessed by OP_CONSTANT <idx>.

2. Instruction Stream

Each entry in code[] is either:

  1. A single-word opcode (the integer value of an OpCode enum), possibly followed by
  2. One or more operand words (each an int).

Offsets (for jumps) and indices (into constants) are all absolute positions or constant-pool slots.

2.1 Common Opcodes

Opcode Operands Semantics
OP_CONSTANT <constIndex> Push constants[constIndex]
OP_NIL Push nil
OP_POP Pop & discard top of stack
OP_DUP Push a copy of the top of stack
Arithmetic & Logic Pop 1–2 values, compute result, push it
OP_ADD + (int/double/string)
OP_SUB -
OP_MUL *
OP_DIV /
OP_NEGATE unary -
OP_POW exponentiation ^
OP_MOD modulus %
OP_LT / OP_LE <, <=
OP_GT / OP_GE >, >=
OP_EQ / OP_NE =, <>
OP_AND / OP_OR logical And / Or
Variables & Globals
OP_DEFINE_GLOBAL <nameConst> Pop a value and define it globally under the string constants[nameConst]
OP_GET_GLOBAL <nameConst> Push the current value of that global
OP_SET_GLOBAL <nameConst> Pop and assign into an existing global
Control Flow
OP_JUMP_IF_FALSE <targetIp> Pop condition; if false ⇒ ip = targetIp
OP_JUMP <targetIp> ip = targetIp (unconditional)
OP_RETURN Pop and return a Value to the caller (runVM returns)
Functions & Calls
OP_CALL <argCount> Pop argCount values + callee; invoke (built-in, scripted, overload, bound method, array)
OP_OPTIONAL_CALL <argCount> Like OP_CALL, but a nil callee becomes a no-op ⇒ pushes nil
Classes & Objects
OP_CLASS <nameConst> Push a fresh, empty ObjClass(name)
OP_METHOD <methodNameConst> Pop a function and class; register class method
OP_PROPERTIES <propsConst> Pop a PropertiesType; assign to a class on the stack
OP_NEW Pop a class, allocate ObjInstance; push it; handles plugin vs. built-in constructors
OP_CONSTRUCTOR_END Pop [instance, ctorResult]; if ctorResult≠nil push it, else re-push instance
Arrays & Props
OP_ARRAY <elementCount> Pop that many values, build an ObjArray, push it
OP_GET_PROPERTY <propNameConst> Pop an object (instance/module/enum/string/array), push field or bound method or event key
OP_SET_PROPERTY <propNameConst> Pop [newValue, object]; set property (instance or plugin), then push object

3. Jump Offsets

  • Absolute: OP_JUMP and OP_JUMP_IF_FALSE use the index into code[] where execution continues.
  • Compiler fix-ups: Labels record code.size() when declared; gotos emit a placeholder and are patched after all statements compile.

4. On-Disk Embedding (Executable Bytecode)

While your VM runs the in-memory CodeChunk, CrossBasic also supports embedding compiled scripts inside the executable:

…[binary exe data]…│"BYTECODE"│<uint32 textLength>│<textData bytes>│
  1. Marker: 8-byte ASCII "BYTECODE".
  2. Length: 4-byte little-endian uint32 giving the size of the embedded data.
  3. Payload: the raw script or serialized chunk of size textLength.

At startup, retrieveData() scans backwards for "BYTECODE", reads the length, then pulls out the preceding textLength bytes for automatic use (e.g. reloading pre-compiled code or source).


With this reference you have a complete “map” of how the CrossBasic source ends up as a stream of small integer codes plus a shared constant pool—and exactly how the VM decodes and executes each instruction.