Python 3.14 and its New JIT Compiler

Python 3.14 and its New JIT Compiler


marks an important point in the evolution of the world’s most popular programming language. While Python has long been acknowledged for its readability and large ecosystem, its execution speed has often been the “elephant in the room.”

With the arrival of 3.14, the CPython core development team has delivered not one, but two of the most anticipated features in recent times.

The end of the GIL

I’ve previously written about this before. True concurrency is now available in Python if you want it. If you want more details on GIL-free Python, I’ll leave a link to my article about it at the end.

The Just-In-Time (JIT) compiler 

This experimental feature is now bundled directly in official installers, and it’s what we’ll focus on here. It’s the result of years of architectural preparation done by the Python core team and others, aimed at making Python “faster by default” without breaking the C-extension ecosystem that powers everything from data science to web backends.

In this article, we’ll lift the hood of the new JIT, explore how it differentiates itself from previous optimisation efforts, and walk through some benchmarking methodology to help you decide if it’s time to try out the JIT on your workloads.

What is Python’s New Just-In-Time (JIT) compiler?

To understand the 3.14 JIT, we need to how Python traditionally runs. Standard Python (CPython) is an interpreted language. When you run a script, your code is compiled into bytecode, which is a set of instructions that the CPython virtual machine executes.

The JIT changes this flow. Instead of simply interpreting bytecode line-by-line, the JIT monitors which parts of your code are executed most frequently (the “hot” paths). When a function or loop is deemed “hot,” the JIT translates the bytecode into native machine code (instructions the CPU understands). Then, the next time the code is invoked, no interpretation is required. Instead, it just runs as it is. This can be a great time-saver, as we’ll see later on.

How the JIT fits into CPython

The Python 3.14 JIT is not a total rewrite. It is designed as an opt-in component that works alongside the existing interpreter. It uses a technique called “copy-and-patch,” which allows the JIT to be lightweight and portable across different CPU architectures without requiring a massive, complex compiler backend like LLVM.

What Changed in Python 3.14?

Python 3.13 had a basic, experimental JIT, but it was disabled by default. If you wanted to test it, you had to clone the CPython source tree and compile it with specific experimental flags such as - - enable-experimental-jit.

With Python 3.14, everything changed. It offered the JIT in the official .msi (Windows) and .pkg (macOS) installers. It also meant that you no longer needed a C compiler on your machine to experience JIT benefits. While still “experimental,” the inclusion in official binaries signals that the core team believes the JIT is stable enough for broad community testing.

Getting Python 3.14

Head over to https://www.python.org/downloads/, and you’ll see a download option for 3.14. Click that, then follow the instructions.

Alternatively, if you have the UV tool installed, you can type the following.

PS C:\ > uv python install 3.14

Enabling the JIT

By default, the JIT is disabled. This is a safety measure; because it is experimental, the Python Steering Council wants to ensure that users don’t face unexpected regressions in stability or memory usage without explicitly choosing to.

To activate the JIT, you use an environment variable. This tells the CPython runtime to initialise the JIT engine upon startup.

On Windows (PowerShell):

$env:PYTHON_JIT=1
python my_script.py

On macOS/Linux (Bash/Zsh):

PYTHON_JIT=1
python my_script.py

Once enabled, CPython doesn’t JIT-compile everything immediately. It uses a tiering system. Basically, it tries to run code as cheaply as possible first, and only spends compilation/optimisation effort on the parts that prove to be hot.

  • Tier 0: Standard interpretation.
  • Tier 1: Specialised bytecode (introduced in 3.11).
  • Tier 2 (The JIT): Machine code generation for the most frequently used paths.

Measuring the Impact of the JIT

When testing a JIT, you can’t simply use the time.time() around a function. JITs require a warm-up period. The first few iterations of a loop might be slower than normal as the JIT profiles the code, but subsequent iterations can be significantly faster.

The Benchmark Suite

Below is a comprehensive test suite designed to exercise different aspects of the JIT, from heavy math to complex object manipulation.

File 1: workloads.py

This file contains three different CPU-bound tasks. 

1/ The Mandelbrot function iterates the Mandelbrot formula over a pixel grid and returns a checksum of per-pixel iteration counts.

2/ The Djikstra function builds a deterministic random weighted graph and runs Dijkstra from node 0, returning how many nodes were finalised/visited.

3/ The Levenshtein function generates N deterministic random string pairs and returns the sum of their Levenshtein distances

from __future__ import annotations

import random
import heapq

# Workload 1: Mandelbrot (CPU + math loops)
def mandelbrot(width: int = 1000, height: int = 1000, iters: int = 500) -> int:
    checksum = 0
    for y in range(height):
        cy = (y / height) * 2.4 - 1.2
        for x in range(width):
            cx = (x / width) * 3.2 - 2.2
            zx, zy, count = 0.0, 0.0, 0
            while zx * zx + zy * zy <= 4.0 and count < iters:
                zx, zy = zx * zx - zy * zy + cx, 2.0 * zx * zy + cy
                count += 1
            checksum += count
    return checksum

# Workload 2: Dijkstra (heap + list + logic)
def dijkstra(n: int = 10000, edges_per_node: int = 50, seed: int = 123) -> int:
    rng = random.Random(seed)
    graph = [[] for _ in range(n)]
    for u in range(n):
        for _ in range(edges_per_node):
            v = rng.randrange(n)
            if v != u:
                graph[u].append((v, rng.randrange(1, 30)))

    dist = [10**12] * n
    dist[0] = 0
    pq = [(0, 0)]
    visited = 0

    while pq:
        d, u = heapq.heappop(pq)
        if d != dist[u]:
            continue
        visited += 1
        for v, w in graph[u]:
            nd = d + w
            if nd < dist[v]:
                dist[v] = nd
                heapq.heappush(pq, (nd, v))

    return visited

# Workload 3: Levenshtein distance (dynamic programming)
def levenshtein(a: str, b: str) -> int:
    prev = list(range(len(b) + 1))
    for i, ca in enumerate(a, 1):
        cur = [i]
        for j, cb in enumerate(b, 1):
            cur.append(min(cur[j - 1] + 1, prev[j] + 1, prev[j - 1] + (ca != cb)))
        prev = cur
    return prev[-1]

def levenshtein_batch(n: int = 10000, seed: int = 7, k: int = 50) -> int:
    """
    Deterministic batch: fixed RNG seed, fixed alphabet, fixed string length.
    Returns the sum of distances.
    """
    rng = random.Random(seed)
    alphabet = "abc"
    total = 0
    for _ in range(n):
        a = "".join(rng.choices(alphabet, k=k))
        b = "".join(rng.choices(alphabet, k=k))
        total += levenshtein(a, b)
    return total

File 2: benchmark.py

This script automates comparing different workloads with JIT enabled and disabled.

import os
import time
import json
import subprocess
from pathlib import Path

PYTHON_EXE = r"C:\Users\thoma\AppData\Local\Programs\Python\Python314\python.exe"
PROJECT_DIR = Path(__file__).resolve().parent

# Original workloads (statement prints a result for sanity)
WORKLOADS = [
    ("mandelbrot", 'from workloads import mandelbrot; print(mandelbrot())'),
    ("dijkstra", 'from workloads import dijkstra; print(dijkstra())'),
    ("levenshtein_batch", 'from workloads import levenshtein_batch; print(levenshtein_batch())'),
]

N_RUNS = 10  # average of ALL runs (set to 6/10/20 as you like)
OUTFILE = PROJECT_DIR / "results_avg.json"

def run_once(stmt: str, jit_val: int) -> tuple[float, str]:
    env = os.environ.copy()
    env["PYTHON_JIT"] = str(jit_val)

    # Ensure local workloads.py is importable in subprocess
    env["PYTHONPATH"] = str(PROJECT_DIR) + (os.pathsep + env.get("PYTHONPATH", ""))

    t0 = time.perf_counter()
    p = subprocess.run(
        [PYTHON_EXE, "-c", stmt],
        env=env,
        cwd=str(PROJECT_DIR),
        capture_output=True,
        text=True,
    )
    t1 = time.perf_counter()

    if p.returncode != 0:
        raise RuntimeError(
            f"Run failed (PYTHON_JIT={jit_val})\n\n"
            f"Statement:\n{stmt}\n\n"
            f"STDOUT:\n{p.stdout}\n\nSTDERR:\n{p.stderr}"
        )

    return (t1 - t0, p.stdout.strip())

def summarize(times: list[float]) -> dict:
    return {
        "avg": sum(times) / len(times),
        "min": min(times),
        "max": max(times),
        "runs": times,
    }

def bench_workload(name: str, stmt: str) -> dict:
    results = {}
    outputs = {}

    for jit_val in (0, 1):
        times = []
        outs = []
        print(f"  PYTHON_JIT={jit_val}: running {N_RUNS} times...")
        for i in range(1, N_RUNS + 1):
            dt, out = run_once(stmt, jit_val)
            times.append(dt)
            outs.append(out)
            print(f"    run {i}/{N_RUNS}: {dt:.6f}s")

        results[jit_val] = summarize(times)
        outputs[jit_val] = outs

    avg0 = results[0]["avg"]
    avg1 = results[1]["avg"]
    speedup = avg0 / avg1 if avg1 else float("inf")
    delta_pct = (avg1 - avg0) / avg0 * 100.0 if avg0 else 0.0

    return {
        "workload": name,
        "jit0": results[0],
        "jit1": results[1],
        "speedup_jit0_over_jit1": speedup,
        "delta_pct_jit1_vs_jit0": delta_pct,
        "outputs": outputs,  # sanity: should be stable
    }

def main() -> int:
    all_results = []
    print(f"Using Python: {PYTHON_EXE}")
    print(f"Project dir: {PROJECT_DIR}")
    print(f"Runs per setting (avg of all runs): {N_RUNS}\n")

    for name, stmt in WORKLOADS:
        print(f"=== {name} ===")
        r = bench_workload(name, stmt)
        all_results.append(r)

        print(f"\n  Averages:")
        print(f"    JIT=0 avg: {r['jit0']['avg']:.6f}s (min {r['jit0']['min']:.6f}, max {r['jit0']['max']:.6f})")
        print(f"    JIT=1 avg: {r['jit1']['avg']:.6f}s (min {r['jit1']['min']:.6f}, max {r['jit1']['max']:.6f})")
        print(f"    Speedup (JIT=0 / JIT=1): {r['speedup_jit0_over_jit1']:.3f}×  (Δ={r['delta_pct_jit1_vs_jit0']:+.2f}%)\n")

        # Optional: warn if outputs vary across runs (nondeterminism)
        if len(set(r["outputs"][0])) != 1:
            print("  !! WARNING: JIT=0 output differs across runs (nondeterministic workload?)")
        if len(set(r["outputs"][1])) != 1:
            print("  !! WARNING: JIT=1 output differs across runs (nondeterministic workload?)")

    OUTFILE.write_text(json.dumps(all_results, indent=2), encoding="utf-8")
    print(f"Wrote: {OUTFILE}")
    return 0

if __name__ == "__main__":
    raise SystemExit(main())

Here are my results.

C:\Users\thoma\projects\python_jit>C:\Users\thoma\AppData\Local\Programs\Python\Python314\python.exe benchmark.py
Using Python: C:\Users\thoma\AppData\Local\Programs\Python\Python314\python.exe
Project dir: C:\Users\thoma\projects\python_jit
Runs per setting (avg of all runs): 10

=== mandelbrot ===
  PYTHON_JIT=0: running 10 times...
    run 1/10: 6.890924s
    run 2/10: 6.950737s
    run 3/10: 7.265357s
    run 4/10: 6.947150s
    run 5/10: 6.932333s
    run 6/10: 6.939378s
    run 7/10: 7.194705s
    run 8/10: 6.995550s
    run 9/10: 6.902696s
    run 10/10: 7.256164s
  PYTHON_JIT=1: running 10 times...
    run 1/10: 5.216740s
    run 2/10: 5.241888s
    run 3/10: 5.350822s
    run 4/10: 5.246767s
    run 5/10: 5.294771s
    run 6/10: 5.273295s
    run 7/10: 5.272135s
    run 8/10: 5.617062s
    run 9/10: 5.251656s
    run 10/10: 5.239060s

  Averages:
    JIT=0 avg: 7.027499s (min 6.890924, max 7.265357)
    JIT=1 avg: 5.300420s (min 5.216740, max 5.617062)
    Speedup (JIT=0 / JIT=1): 1.326×  (Δ=-24.58%)

=== dijkstra ===
  PYTHON_JIT=0: running 10 times...
    run 1/10: 0.235401s
    run 2/10: 0.227603s
    run 3/10: 0.244492s
    run 4/10: 0.232971s
    run 5/10: 0.249589s
    run 6/10: 0.232229s
    run 7/10: 0.229422s
    run 8/10: 0.238399s
    run 9/10: 0.230657s
    run 10/10: 0.235772s
  PYTHON_JIT=1: running 10 times...
    run 1/10: 0.238862s
    run 2/10: 0.239266s
    run 3/10: 0.240312s
    run 4/10: 0.231413s
    run 5/10: 0.232692s
    run 6/10: 0.233783s
    run 7/10: 0.230016s
    run 8/10: 0.237760s
    run 9/10: 0.240895s
    run 10/10: 0.246033s

  Averages:
    JIT=0 avg: 0.235653s (min 0.227603, max 0.249589)
    JIT=1 avg: 0.237103s (min 0.230016, max 0.246033)
    Speedup (JIT=0 / JIT=1): 0.994×  (Δ=+0.62%)

=== levenshtein_batch ===
  PYTHON_JIT=0: running 10 times...
    run 1/10: 2.176256s
    run 2/10: 2.171253s
    run 3/10: 2.171834s
    run 4/10: 2.170444s
    run 5/10: 2.149874s
    run 6/10: 2.162820s
    run 7/10: 2.171975s
    run 8/10: 2.199151s
    run 9/10: 2.168398s
    run 10/10: 2.167821s
  PYTHON_JIT=1: running 10 times...
    run 1/10: 1.575666s
    run 2/10: 1.612615s
    run 3/10: 1.571106s
    run 4/10: 1.584650s
    run 5/10: 1.579948s
    run 6/10: 1.582633s
    run 7/10: 1.593924s
    run 8/10: 1.573608s
    run 9/10: 1.581427s
    run 10/10: 1.578553s

  Averages:
    JIT=0 avg: 2.170983s (min 2.149874, max 2.199151)
    JIT=1 avg: 1.583413s (min 1.571106, max 1.612615)
    Speedup (JIT=0 / JIT=1): 1.371×  (Δ=-27.06%)

Interpreting the Results

As you can see, the results are a mixed bag. This is normal for an experimental JIT.

  • 10–30% Speedup: Common in “pure Python” loops (like the Mandelbrot or Levenshtein tests) where the JIT can avoid the overhead of the bytecode dispatch loop.
  • 0% Improvement: Common in I/O-bound tasks or code that heavily uses C extensions. The Dijkstra code didn’t speed up because its runtime is dominated by heap/tuple operations and memory-heavy, allocation-driven work that the current CPython JIT doesn’t optimise significantly, so any interpreter savings are lost in the noise.

When to Use the Python 3.14 JIT

The JIT is a powerful tool, but it is not a “magic button.” From my experience, you should try the JIT when you have…

  • CPU-Bound Logic: Your application performs heavy calculations, data processing, or complex logic in pure Python.
  • Long-Running Processes: Web servers (Gunicorn/Uvicorn) or background workers (Celery) that run for hours, allowing the JIT plenty of time to warm up and optimise hot paths.
  • Experimental Testing: You want to prepare your codebase for future versions of Python (3.15+), where the JIT will likely be more aggressive.

And avoid it when you have…

  • I/O-Bound Apps: If your app just waits for database queries or API responses, the JIT won’t help.
  • Memory-Constrained Environments: Small Lambda functions or tiny containers might suffer from the increased memory footprint of the JIT cache.
  • Short-Lived CLI Tools: A script that runs in under a second doesn’t need a JIT.

Future Directions: Beyond 3.14

The CPython core team views 3.14 as the “foundation year.” Future iterations (Python 3.15 and 3.16) are expected to include:

  • Deeper Optimisation Passes: Using the type information gathered at runtime to perform even more aggressive machine code generation.
  • Better Heuristics: Smarter decisions on when to compile, reducing the “warm-up” penalty.
  • Lower Overhead: Refining the copy-and-patch mechanism to reduce memory consumption.

Summary

Python 3.14’s JIT is more than just a performance patch. It’s a statement of intent. It shows that Python is serious about closing the performance gap with languages like Java or Go while maintaining the “batteries-included” simplicity that made it famous.

For most developers, JIT is simply another tool worth keeping an eye on. If performance matters in your projects, it’s worth testing Python 3.14 against your existing workloads. A few benchmarks on your most important code paths might reveal performance gains where you weren’t expecting them.

Here is the link to my previous article on GIL Fee Python, I mentioned at the start.




Source link