Hacking the CPython interpreter¶
This blog post is inspired by the talk CPython Meanderings by James Powell.
Note: some of the details in this blog post are slightly out-of-date, as the Python interpreter has changed a little bit. You should still be able to follow along in spirit, though.
The CPython interpreter is not a hugely complicated software project, which means it's not too hard to add on to as a learning project. Thanks to the strong philosophy against premature optimisation, little interpreter behaviour is obscured.
Python is also largely very well-documented and has a friendly core development community which are great places to go for help. However, you can get very far with a little file searching and debugging.
If you know a little about the internals of Python already, particularly the bytecode compilation model it uses, and can follow a little C, you'll have no problem with this. I won't go into detail about Python bytecode, but here are some recommended reading links:
- The Python
dis
standard library module, which contains a reference of all the bytecode operations - Advanced Metaphors in Coding with Python, a talk by James Powell, which goes into detail about a lot of Python's internals
So what are we going to do with CPython? Well, since this is not really a practical approach to extending the language, we might as well go with something silly. So have you heard of Code Golf? It's a "sport" where you aim to write a program to complete a particular task, but with the code as short as possible. Python is not normally hugely competitive, as languages go, so I thought I'd add something to it to make it better. When golfing, best practices truly go out the window, such as correct error handling. Python, however, is not as forgiving with errors as (say) JavaScript or a Shell language, so let's "fix" that!
I'm going to add a short-circuiting inline exception catching operator, as an expression-based shorthand for try: ...
/except: ...
.
Let's define the operator's behaviour a bit more specifically:
- we'll call it
?
, since the?
character is not used anywhere in vanilla Python syntax - it will have looser precedence than
or
- it will still always print the exception when it occurs to make debugging easier
Since this feature is actually a terrible idea, I'm going to call it Exception Mishandling.
By the way, I'm calling this project Whython (because, why would I do this?), so I'll be naming things using a Py_Y
prefix.
A final warning: this post will walk you through almost exactly the steps I took, including all the troubleshooting that came with it. I hope to show you that you could have worked all of this out yourself by reading relevant documentation, and some educated guesswork informed by just a bit of knowledge of conventions used in C and Python. I will skip over some typos and other obvious mistakes I made though.
Initial setup¶
First, get yourself a copy of the Python source code. For these purposes I'll be using Python 3.9.5 taken directly from the 3.9
branch on GitHub.
1 2 3 4 |
|
Now, let's create a branch for our new feature:
1 |
|
And finally, create an initial build from here. I'm passing the --with-pydebug
flag to configure
to give us some helpful debugging information. (Again, this isn't something I magically knew - you can find a list of the options with ./configure --help
, and then scroll through to look for debugging-related ones)
1 2 3 4 |
|
Parsing¶
The first step in the Python interpreter pipeline is parsing. Let's add our ?
operator to the parser.
The Python parser, since version 3.9, uses a PEG grammar which generates a parser. Instructions for working on it are helpfully given in the devguide.
First, let's define the question mark token in Grammar/tokens
:
1 2 3 4 5 6 7 8 9 10 |
|
Now we'll add the grammar rule for a ?
expression. I don't fully understand this, to be honest, but it's largely copied from some of the existing boolean and binary operators' definitions. It also references a _Py_ExcMishandle
action, which I think will create the AST node.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
But it needs to be used somewhere else in the grammar, because otherwise it would be unreachable. Since we're adding it with lower precedence than disjunction
(an or
expression), we'll replace all other uses of disjunction
with y_exc_mishandle
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
We also need to define the new Y_ExcMishandle
AST node type in Python.asdl
:
1 2 3 4 5 6 7 8 9 10 |
|
This AST item is the action that gets called in the generated parser, I think.
Finally, we need to regenerate the parser code, as described in the devguide:
1 2 3 4 |
|
Testing it out¶
1 2 |
|
And just like that, it actually compiled, with few warnings and no errors! I did scan through the compiler output, though, to see how bad GCC thought my code was so far, and I found only one noticeable problem, repeated a few times:
1 2 3 |
|
Some issues in compile.c
were to be expected, because we haven't implemented any support beyond parsing yet. I didn't know what symtable.c
was until now, but it turns out it's an intermediate step in compilation that works out variable scoping.
However, an issue in ast.c
seemed more pressing. It was in the validate_expr
function, which seemed to be responsible for checking semantic issues in the syntax that couldn't be identified by the parser.
Just for fun, I decided to try our newly built Python anyway: - python --version
worked - python
alone worked and gave me a REPL with most functionality working fine
I knew for a fact simply trying to run some code with a ?
operator in it would crash the interpreter. Instead I decided to try, first of all, using the ast
module to see if this expression could be correctly parsed:
1 2 3 4 5 |
|
Yikes. I got greedy - I guess the AST validation was necessary after all.
AST validation¶
Let's find the definition of the function that errored:
1 2 3 4 |
|
and analysing the functions it calls, we see that validate_expr
was almost certainly the culprit: at the bottom of the function is a handler for if the switch-case doesn't match anything:
1 2 |
|
So, drawing heavy inspiration from the nearby expression handling code for binary operations, I came up with this:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
And parsing it now works!
1 2 3 4 5 6 7 8 9 10 11 |
|
Compiling¶
If we take the next step in the Python interpreter's pipline, and try to compile some code that uses our operator, we'll see an unexpected result:
1 2 |
|
It seemed to compile just fine! Weird, huh? Now, if we run it, we're surely not going to get a success...
1 2 3 |
|
And so the rabbit hole goes deeper. I want to see why that compiled successfully, and what bytecode it produced:
1 2 3 4 5 |
|
I think I know why it crashed now: it tried to return the top of the stack, but nothing had been pushed on to it. I'm also guessing that the reason the code is so short is because the compiler didn't know how to handle our new AST node, but it added the RETURN_VALUE
because all code must end in it, or at least it does by default.
Interestingly, if we try to compile it in exec
mode instead of eval
mode, it does crash while compiling:
1 2 3 |
|
Anyway, let's get on to dealing with compilation.
Symbol table¶
First I want to have a look at the symtable.c
we found earlier. The switch
statement our compiler was complaining about is here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
As before, we can pretty much copy from BinOp
's implementation without too much bother:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
and as far as I can tell, that's all we need to do here.
Understanding try-except¶
Now let's look into how we can compile a ? b
expressions into bytecode.
Control-flow graphs¶
The devguide page for the CPython compiler informs us of the two major remaining steps: transformation of AST into a Control Flow Graph, and then assembly into raw bytecode. The control flow graph is very close to the final bytecode, but does not have jump instructions "resolved", so-to-speak; they are described as labels only. The CFG is implemented using "basic blocks", which are just "chunks" of bytecode, but the jump instructions point to other basic blocks instead of using offset numbers. When the compiler emits an instruction, it does so by appending it to the active basic block, which is changed using compiler_use_next_block
.
By the way, I just inferred most of this information while reading the source code in compile.c
. It's surprisingly simple to follow.
Exception handling and frame blocks¶
Let's disassemble a try-except statement in the simplest case, from which we can take inspiration:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Each Python stack frame has a stack of blocks, which represent loops and try-except syntactic blocks. They are used to keep track of where to jump if an exception occurs, and where to jump with the break
and continue
statements (and the corresponding BREAK
and CONTINUE
opcodes).
Again, I learnt this by just examining CPython's code (ceval.c
, the main interpreter loop), and reading the documentation for [dis
], I didn't just magically find this out.
Here's a simplified and annotated bytecode template that we need to write for an exception mishandling expression like a ? b
, and that also explains the blocks in more detail:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
Back to compiling¶
All that remains is to write the code to compile it. Here's the function template we'll need:
1 2 3 4 |
|
The function returns either 0
or 1
to indicate whether compilation succeeded (0 for success, 1 for failure).
As a sanity check, we'll assert that the expression is the right kind. I'm only adding this because all the other compiler functions have it.
1 |
|
Now, to save myself the effort of writing the meat of the function from scratch, I'm going to copy the set of functions called by compiler_try_except
by getting a trace of all the functions called with a debugger.
1 2 |
|
I'll start the interpreter first, so we don't break on a whole load of things we don't care about, as it starts up:
1 2 3 4 |
|
(get back to GDB by pressing Ctrl+C)
How I'll break on the first line of compiler_try_except
(the line number may be different for you depending on what version you start from and what modifications you've made already), and type in a try-except statement that roughly corresponds to the desired behaviour of exception mishandling:
1 2 3 4 5 6 7 8 9 10 |
|
My plan from here is to just keep stepping through the compiler function using next
and copy all the functions that are called into compiler_exception_mishandling
, tweaking them as necessary:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
|
Here's that cleaned up a little (I removed the unexecuted branches, except the error handling ones, which I just copied), and with some of the variables adapted to use the AST types we defined):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
I could've also got this by copying the code in compiler_try_except
and manually simplifying bits of it.
Finally, let's add a handler for expressions of this kind to the compiler function for arbitrary expressions, in the switch-case pointed out to us by gcc
earlier:
1 2 3 4 5 6 7 8 9 10 11 |
|
Trying it out¶
Let's try out this handywork!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Looks not bad! And the compiler didn't even crash! However, there are a few things wrong compared to the bytecode template we came up with earlier: - why are there extra RETURN_VALUE
instructions? - why is there a RERAISE
instruction?
After scouring the CPython codebase for instances of RETURN_VALUE
, and a bit of Google-fu, I found that they came from the peephole optimiser, a system which makes simple optimisations to the bytecode after its main compile step. One of the things it does is replace unconditional JUMP
instructions that point to RETURN_VALUE
instructions, with RETURN_VALUE
instructions directly. We can see the code without them if we compile the expression in a single-expression context (as is used in the Python REPL) which prints the expression instead of returning it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Now as for the RERAISE instruction, it looks like it will always be skipped, so it must be a weird side-effect of copying the code for a try-except statement. I'll come to removing that later.
Let's see what happens if we actually try to use this 1 ? 2
expression:
1 2 |
|
Wow! Just like that, it worked! Let's try the case in which an error occurs, by dividing by zero:
1 2 3 |
|
That's irritating - and worse, all it tells us is that a PyErr_Occurred
, which is vague and tells us nothing about the root cause.
I spent a long time floundering here: adding all sorts of debugging instrumentation to my copy of the interpreter to work out what exactly was going wrong, but I got nowhere.
The final insight I needed to solve the problem came from comparing the disassemblies of these two functions:
1 2 3 4 5 6 |
|
These should be identical in behaviour, but let's look at their bytecode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
Besides the extra LOAD_CONST
in the first, the substantial change is the addition of a ROT_FOUR
in the working one. It turns out this is necessary.
The fix, then, was simple: insert a ROT_FOUR
instruction after the handler expression is visited:
1 2 3 4 5 6 7 8 9 10 |
|
Next steps¶
Where do I go from here?
Documentation and tests¶
This feature has no unit tests and is undocumented. Were this an official addition to Python (God help us), it would need those.
Printing error messages¶
For debugging purposes, it's probably useful to print the error tracebacks even though they're ignored.
Ignoring SystemExit
and KeyboardInterrupt
¶
It might be worth letting these two non-errors pass through silently, because they (particularly KeyboardInterrupt
) might cause some confusing and unreliable behaviour.
Conclusion¶
I hope this blog post has shown you that large codebases can be approachable, with the right tactics. "Fake it till you make it" applies, and you can achieve a lot just by guessing and copying-and-pasting existing code. Along the way, you're sure to learn something useful, so you might not need to guess as much next time!
If you want to play with this modified version of Python, you can do so here!
Update: I've been keeping my fork somewhat up-to-date with more recent versions of Python, and added another couple of features to it which I've not blogged about. Check the GitHub repo for more information.
First published: 2021-09-19
Last updated: 2023-01-03