The knight is the only piece that jumps. This chapter is about jumping across language boundaries: the same search and evaluation logic, implemented three times, in Python, C++, and Rust, then stitched back together so one program can use whichever one it needs.
Why three languages
Python is the language of thought. It is quick to prototype and easy to reshape, which is exactly what you want while an engine’s ideas are still changing every day. But it is SLOW where it hurts most: the move generator, the evaluator, and the transposition table run millions of times per search, and a pure-Python search manages only around a thousand nodes per second.
The fix is not to abandon Python, however. It is to keep the experimentation in Python and rewrite only the hot loops in a language built for them. Eschess treats each language as a specialized tool for a specific phase of the engine’s lifecycle:
Python
The Architect
Strengths
Rapid iteration, massive ML ecosystem.
Weakness
The Global Interpreter Lock and slow hot-loops.
Eschess Role
Drafting search algorithms and training the neural network via PyTorch.
By giving C++ and Rust a full, line-for-line port of the Python core, the project binds back to whichever one it needs for the task at hand.
The same function, three times
If three engines are going to stand in for each other, their logic has to match exactly, down to the last rounding decision. Here is one small piece of the move-ordering code, the history heuristic’s update, in all three languages. Switch between the tabs: the shape is identical on purpose.
def _update_history(self, color, from_square, to_square, depth):
# Gravity formula: pull the score toward the bonus without overflow.
bonus = min(depth * depth, _MAX_HISTORY)
clamped = max(-_MAX_HISTORY, min(_MAX_HISTORY, bonus))
cur = self._history[color][from_square][to_square]
self._history[color][from_square][to_square] += (
clamped - cur * abs(clamped) // _MAX_HISTORY
) The interesting detail is the comment. Python’s // operator is floor division: it
rounds toward negative infinity. C++ and Rust integer division truncates toward zero
instead, so -7 // 2 is -4 in Python but -3 in the other two. For most inputs that
never shows, but the history score can go negative, and a single off-by-one there changes
move ordering, which changes which branches get pruned, which changes the final move. So
the ports carry an explicit floordiv() helper. The same care covers using floating-point
evaluation instead of integer centipawns, and a stable sort for move ordering so equal
keys keep their original order in every language.
Two ways to cross the boundary
Once the engines are written, Python needs a way to actually call them. There are two, and the project uses both.
The first is the UCI protocol. Each of C++ and Rust compiles to a standalone engine
binary that speaks the same text protocol a desktop chess GUI would. Python launches it as
a subprocess and talks over stdin and stdout: write a position, write go depth 3, read back
bestmove. It is simple, language-agnostic, and how the benchmarking harness pits engines
against each other.
Driving an engine over UCI Send the position and a go command down stdin, then read info and bestmove lines back off stdout.
python/engine_backend.py · UCIBackend.get_best_move
def get_best_move(self, board, depth=3):
self._send(f"position fen {board.to_fen()}")
self._send(f"go depth {depth}")
for raw in self._proc.stdout:
line = raw.strip()
if line.startswith("bestmove"):
token = line.split()[1]
return None if token == "0000" else _uci_to_move(token)
raise RuntimeError("engine closed before 'bestmove'") The second is FFI, a foreign function interface. The Rust core is compiled into a Python
extension module (eschess_native, built with PyO3 and maturin) that Python imports like any
other package. There is no subprocess and no text protocol: a Python method call lands
directly on a Rust function. This is the path the machine-learning loop uses, because it runs
the whole self-play search in Rust and only calls back into Python for batched neural-network
inference.
try:
import eschess_native as _native
HAS_NATIVE = True
except ImportError: # extension not built; callers fall back to Python
_native = None
HAS_NATIVE = False The try / except ImportError is the whole graceful-degradation story: if the native
extension was never built, HAS_NATIVE is False and every caller silently falls back to
the pure-Python path. The fast road is optional; the engine still runs without it.
Proving they’re identical
Matching code by eye is not proof, and before we trust these native bounds to replace the Python core, the engines must be checked. They are verified in two ways:
- perft: count the exact number of legal positions reachable to a given depth from a set of known FENs, and confirm all three produce the reference numbers from the Chess Programming Wiki.
- byte-identical search: run a fixed-depth search and compare the scores, node counts, and principal variation across the three. They have to agree to the digit.
Because the FFI boundary allows direct Python access to the native engine, there is a third
check that runs in the test suite: play thousands of random positions through both the native
board and the pure-Python CBoard in lockstep and assert they never diverge, promotions included.
Lockstep parity test Walk random games through the native and Python boards together and assert their FEN and legal moves match at every ply.
tests/test_native_parity.py
def test_movegen_fen_parity():
rng = random.Random(0)
for _ in range(60):
nb, cb = en.PyBoard(), CBoard() # native vs pure-Python
for _ in range(60):
assert nb.to_fen() == cb.to_fen()
keys = sorted(tuple(k) for k in nb.legal_move_keys())
assert keys == _py_move_keys(cb), cb.to_fen()
if not keys:
break
f, t = rng.choice(keys)
cb.make_move(f, t, None)
nb.make_move(f, t, -1) What the leap buys
All of that machinery exists for one number: nodes per second. The search is the same, the position is the same, the move ordering is the same. Only the language underneath changes.
Nodes searched in 1.0s of equal thinking time (Depth=5)
Same search, same position, same move ordering. The pure-Python reference manages about 1.8k nodes per second; the native C++ and Rust ports of that exact logic clear hundreds of times more, which is the whole reason for the leap across the language boundary. Python's bar is there, it is just too short to see.
That gap is why the leap is worth making. Prototype an idea in Python where it is cheap to get wrong, prove the C++ and Rust ports reproduce it to the byte, and then run the real workload, benchmarking and self-play, at native speed. The knight clears the boundary the other pieces cannot.