-
Python VM
-
Process Virtual Machines
-
Stack vs Register Machines
-
Python’s Stack Machines
-
-
Byte Code Hacks
-
Code Objects
-
Modifying Code Objects
-
Handcrafted Code Objects
-
-
C — compiler converts C source to machine instructions
-
GW-BASIC, the programs were interpreted line by line
-
Python and Java
-
source code is compiled to a intermediate language byte codes
-
byte codes are then executed by an interpreter
-
-
Java, source code is compiled to byte code
-
to reduce machine dependency
-
to increasing portability
-
-
Python, the source code is compiled to byte code
-
to reduce parsing overhead
-
to ease interpretation.
-
Software that executes the byte code is called the abstract virtual machine. These virtual machines are modelled after microprocessors.
-
A microprocessor is a device that reads data from memory, processes it and writes back data to memory, based on instructions
-
Instructions specify, memory location to read operands from, operation to perform, memory location to write the result to
-
Improvised model of microprocessor
-
Instructions are themselves stored in memory
-
A microprocessor is a device that reads data from memory, processes it and writes back data to memory, based on instructions, which are themselves stored / fetched from memory."
-
Registers: memory locations within microprocessors
-
Operands and results are temporarily stored in registers
-
Modelled after microprocessors. Data is represented as objects stored in the heap
-
Works by manipulating these objects in memory, based on instructions
-
Instructions specify the type of objects to create, which objects to delete, the attribute of an object to be read / modified.
-
Byte code instructions are themselves wrapped in "code objects" and are also stored in the heap
-
Python’s compiler, parses functions / modules, converts them to code objects
-
Python interpreter executes the code objects
-
Classification depending on how the operands are accessed and results are stored, by instructions
-
Types of Abstract Virtual Machines
-
Register Machines
-
Stack Machines
-
-
Register machines are more like traditional microprocessors
-
Expression:
1 + 2 + 4LOADK R1, 1 LOADK R2, 2 ADD R3, R2, R1 # R3 = R2 + R1 LOADK R1, 4 ADD R3, R3, R1 # R3 = R3 + R1
-
Examples:
-
Lua VM
-
Dalvick VM
-
-
Objects are always stored in the heap
-
Only the pointer to the object is stored in the stack
-
Operation of
BINARY_ADDinstruction
-
Pops off the top two objects
10and20, adds them resulting in a new object30
link:code/hack-1.py[role=include]The function just divides two constants and returns the value back to the caller.
link:code/hack-2.py[role=include]-
Code object is accessible from
myfunc.code. -
co_consts, contains a tuple of constants used by the function. -
co_code, contains the byte code instructions, generated by compiling the function
link:code/hack-2.py[role=include]-
Code object associated with every function
-
Contains the byte code instructions and associated data to execute the function
-
dismodule has a mapping from opcodes to mnemonics>>> import dis >>> print dis.opname[0x64]
-
Followed by 2 byte integer operand
-
Operand is an index into the
co_consttuple -
Specifies the constant to be pushed / loaded into the stack
-
Operand specified in this case
0x0001, corresponds to the constant6 -
Instruction can thus be represented in mnemonics as
LOAD_CONST 1
co_consts: (None, 6, 2)
co_code:
0: 64
1: 01
2: 00
3: 64 <=
4: 02
5: 00
6: 15
7: 53
-
Second instruction is
LOAD_CONST -
Operand is the index
0x0002, which corresponds to the constant2
co_consts: (None, 6, 2)
co_code:
0: 64
1: 01
2: 00
3: 64
4: 02
5: 00
6: 15 <=
7: 53
>>> print dis.opname[0x15]
-
Does not require any operands
-
Pops top two values from the stack, and performs a divide and pushes the result back on to the stack
co_consts: (None, 6, 2)
co_code:
0: 64
1: 01
2: 00
3: 64
4: 02
5: 00
6: 15
7: 53 <=
>>> print dis.opname[0x53]
-
RETURN_VALUEinstruction takes the top of the stack and returns the value back to the caller.
-
Manual dis-assembly of code
-
A disassembler runs through byte code string
-
prints the
opnameof the each byte code instruction
-
-
Only catch is that some of them take an operand
-
Code to determine if an opcode takes an operand
if op > dis.HAVE_ARGUMENT: print "opcode has an operand." else: print "opcode does not have an operand."
-
Get disas.py
-
Modify the constants so that the tuple is
(None, 10, 2)instead of(None, 6, 2) -
Will that result in the program printing
5? -
But code objects are immutable
-
Create a new code object with the new value for
co_consts -
Get hack-1.py
link:code/hack-5/hack.py[role=include]-
newmodule has the constructor to create the code objects -
Takes a huge list of arguments
-
All arguments are specified from the old code object, except for the
co_consts -
A new set of constants is specified instead
-
Modify the byte code string
-
Replace the
BINARY_DIVIDEinstruction withBINARY_ADD -
BINARY_ADDcorresponds to opcode0x17 -
BINARY_DIVIDEappears at offset 6 is the byte code string -
Need to replace it with
BINARY_ADD
-
Create code object for a function, without actually writing the Python function
-
Let’s implement the classic "Hello World"
-
Byte code instructions and the constants tuple for implementing this
Consts: (None, "Hello Byte Code World!") Byte Code Instructions: LOAD_CONST 1 PRINT_ITEM PRINT_NEWLINE LOAD_CONST 0 RETURN_VALUE
-
PRINT_ITEMpops object from the top of the stack and prints it -
PRINT_NEWLINE, prints an newline character -
Code returns
None, to the caller, as required by Python -
Get hack-3.py
-
co_argcount-
Specifies the number of positional arguments
-
Our value is
0
-
-
co_nlocals-
Specifies the number of local variables, including positional arguments
-
Our value is
0
-
-
co_stacksize-
Specifies stack depth utilized by the code object
-
At any given point, we do not store more than 1 element in the stack
-
-
co_flags-
Specifies using bit flags whether the function accepts variable number of arguments, whether the function is a generator, etc.
-
-
co_varnames-
Specifies the names of positional arguments and local variables
-
Empty tuple
-
-
co_names-
Specifies the names of identifiers other than local variables
-
Empty tuple
-
-
co_filename-
Specifies the file in which the function was present
-
Dummy filename
test.py
-
-
co_name-
Specifies the name of the function.
-
-
co_firstlineno-
Specifies the first line number of the function within the file
-
-
A Python function, that will accept two arguments, add them and return the result
-
The byte code equivalent of the following Python function
def myfunc(a, b): return a + b
-
Arguments / local variables, can be loaded using
LOAD_FAST -
Loads the value of a local variable on to the stack
-
Instruction accepts an argument that specifies the local variable as an index into the
co_varnamestuple. -
Get hack-4.py
Var Names: ("a", "b") Byte Code: LOAD_FAST 0 LOAD_FAST 1 BINARY_ADD RETURN_VALUE
-
BINARY_ADDinstruction-
Does not operate on integers
-
Unlike the
ADDinstruction of a microprocessor -
Operates on objects, provided they implement the
add()orradd()magic methods
-
-
Distinguishes Python’s byte codes from Java’s byte codes
-
Python’s byte codes are there to simplify interpretation
-
More closer to the source language













