Some simple steps to speeding up Python with Cython (2024)

Simple tutorial: speeding up Python using Cython.

Paul Norvig


January 1, 2024


I’ve been coding in Python for years, and it’s great for a lot of things. But sometimes, it just isn’t fast enough. That’s where Cython comes in—it lets you speed up your Python code by adding C extensions. I’ve used it to speed up several projects at work and it’s been a huge help. Below I’ll get into some simple steps to do so.

Introduction to Cython and its Importance in 2024

For software development efficiency is king. Iterations need to be rapid, experiments bold, and execution flawless. Within the Python community, Cython has emerged as an indispensable tool that resonates with these needs, and its importance is more relevant now than ever as we tackle complex computational challenges in 2024.

Cython is essentially a superset of Python; think of it as Python with training wheels for static type declarations that allow direct C-level API interactions. The beauty here is two-fold: firstly, you can sprinkle static typing onto your existing Python code to optimize it, and secondly, it’s an avenue to seamlessly integrate C-level operations where Python’s speed might lack.

Python is my go-to language for scripting and prototyping—for many of us, it’s the bedrock of simplicity and readability. But, when a deep learning model diverges into epochs that feel like centuries, or data processing staggers under the weight of massive datasets, Cython presents a lifeline.

def prime_numbers(limit):
primes = []
for possible_prime in range(2, limit):
is_prime = True
for num in range(2, possible_prime):
if possible_prime % num == 0:
is_prime = False
if is_prime:
return primes

Admittedly quaint, this Python code is straightforward but sluggish for large values of limit. Inject Cython into the mix, and we can turbocharge it with types.

# cython_example.pyx
def prime_numbers(int limit):
cdef int num
primes = []
for possible_prime in range(2, limit):
cdef bint is_prime = True
for num in range(2, possible_prime):
if possible_prime % num == 0:
is_prime = False
if is_prime:
return primes

Cython’s directive here with cdef declares C variables. Adding types to our function parameters and variables marks the ground for Cython to compile this code down to C, often resulting in dramatic performance gains.

It’s not witchcraft or snake oil; Cython works its magic by reducing overhead. Instead of dynamically typing during runtime, Cython allows for static types to be declared and leverages the speed of C for operations best suited to it. Moreover, if there’s existing C code that you’d love to use in Python, Cython is your bridge. It permits calls to C functions and can even handle C++ classes.

This relevance can be grasped fully by understanding the context of contemporary computational tasks: the datasets are getting larger, and the models more sophisticated. Cython has risen to the occasion by offering a pragmatic solution when Python alone isn’t coping with the performance demands.

Granted, Cython requires a C compiler, and the nitty-gritty of setting that up along with your development environment is crucial. But, once past that initial setup, the conversion from Python to Cython is sincere in its simplicity, guided by meticulous documentation and a supportive community. Function-by-function, module-by-module, you can incrementally compile and tune your codebase for speed - no need for a full rewrite.

In sharing these experiences, be it the empowerment of watching your code’s performance leap or the ease of interweaving C into Python, I’m echoing a sentiment within the community—Cython isn’t just a nice-to-have, it’s a strategic must, especially in 2024. So, jump aboard the Cython Express, and let’s speed up those Python applications without reinventing the wheel.

Setting up Your Development Environment for Cython

Cython can empower your Python code with the speed of C, but before you see the benefits, you need to set up your development environment properly. I remember when I first approached this process, it seemed daunting, but it’s pretty straightforward once you get the hang of it. Let’s get your system ready to turbocharge your Python code with Cython.

Firstly, if you haven’t already, you’ll need Python installed. I’m assuming you’ve got that covered. The next step is installing Cython itself. You can do this easily using pip. I personally like to work within a virtual environment to keep dependencies tidy and isolated:

python -m venv cython_env
source cython_env/bin/activate  # On Windows use `cython_env\Scripts\activate`
pip install Cython

With Cython installed, you’ll also require a C compiler. If you’re on Linux or macOS, chances are you already have GCC installed. If you’re on Windows, you might need to install Microsoft’s Visual C++ or MinGW.

Now, the real magic of Cython is in its .pyx files. These files let you write Python code that compiles to C. To demonstrate, I’ll show you how to write a simple hello.pyx file:

print("Hello, Cython!")

To compile this .pyx file into a .c file, and then into a shared library, you’ll write a script:

from setuptools import setup
from Cython.Build import cythonize


Run the setup script with the following command:

python build_ext --inplace

This will generate a hello.c file and a (or hello.pyd on Windows) file, which is the shared library you can import in Python.

OK, to test the shared library, create a file that imports the compiled hello module:

import hello

When you run, you should see the “Hello, Cython!” message printed out.

For the best practice, you’ll want to add a .gitignore file that ignores build artifacts:


Lastly, you may want a more complex build that involves multiple Cython modules or additional C libraries. In these cases, you’ll have to expand the script. Here’s an example that includes a libraries argument:

from setuptools import setup, Extension
from Cython.Build import cythonize

extensions = [
Extension("mymodule", ["mymodule.pyx"], libraries=["mylib"])


Compiling Cython code with external C libraries requires proper setup of the library paths. Check the documentation specific to your OS for this.

Remember to maintain readability, even when working with Cython. Employing .pyx files allows using familiar Python syntax, and where you need a speed boost, integrate C code directly.

Setting up the development environment for Cython definitely becomes instinctive after a few goes. And while it does add a few steps to your development process, the time you save with a faster-running application is immeasurable. Armed with Cython, your Python code is on a whole new level of performance. Enjoy the speed!

Converting Python Code to Cython: A Step-by-Step Guide

When I first encountered Python’s sluggish performance with certain computational tasks, Cython seemed like a foreign concept. The idea of wrangling Python code into a form that could be compiled was daunting. But, with a methodical approach, converting Python code to Cython can be broken down into manageable steps. I’ll walk you through my process to make this as painless as possible.

Step one involves identifying the “hotspots” in your code; functions where performance bottlenecks occur. Profiling tools can be a lifesaver, and Python’s built-in cProfile module does a decent job at this. I usually start with a script that’s embarrassingly slow, like the following pure Python function that naively calculates Fibonacci numbers:

def fib(n):
if n < 2:
return n
return fib(n - 1) + fib(n - 2)

In step two, you maintain your existing code as the correctness benchmark. Now, you create a copy with a .pyx extension, which is where the Cython modification occurs. I often use Jupyter Notebook for this with the Cython magic commands, but for a standalone script, rename it to fib.pyx.

Now, let’s add static type declarations to the function parameters and return types in step three. In Cython, this can dramatically improve performance because it allows C-like operations on variables without the overhead of Python’s dynamic typing.

def fib(int n) -> int:
if n < 2:
return n
return fib(n - 1) + fib(n - 2)

The fourth step involves further optimizations, such as declaring C types within the function and converting Python data structures to Cython alternatives if applicable. For a function like fib, we could introduce local variables:

cpdef int fib(int n):
if n < 2:
return n
cdef int a = fib(n - 1)
cdef int b = fib(n - 2)
return a + b

In step five, you need to compile this .pyx file. You would do this with a script that uses distutils paired with Cython’s build_ext module. In Bash, compiling could look something like this:

python build_ext --inplace

The can be as simple as this:

from distutils.core import setup
from Cython.Build import cythonize


After the build, you’ll have a compiled .so or .pyd file, depending on your OS. Import and run this compiled version in Python just as you would import a regular module.

Step six is where you run your newly minted Cython code. It should be significantly faster than its purely Python counterpart. I like to use timeit to time both versions for comparison.

It’s important to remember that not all code benefits equally from Cythonization. Pure computational functions see the most significant boosts. You might not see game-changing improvements if you’re dealing with I/O-bound tasks or processes that are limited by something other than CPU.

Finally, iterate. My initial conversions usually aren’t perfect. Profiling the Cython version often reveals new optimization opportunities. Refinement is key, and translating Python to Cython is as much art as it is science.

As you’ve noticed, I am steering clear of benchmarking or introducing advanced Cython features. These are covered elsewhere. What I find compelling about Cython is the balance it strikes: it retains Python’s readability while breaching C’s performance domain. Give it a try – you might just find the speed you need for your Python projects.

Performance Benchmarking: Before and After Cython Optimization

When I first heard about Cython, I was skeptical about the purported performance gains. But the proof is in the pudding, or in this case, the code benchmarks. Let’s walk through an example that demonstrates the before and after of wrapping Python code with Cython.

Consider this simple Python function that calculates the sum of squares for a given range:

def sum_of_squares(n):
total = 0
for i in range(n):
total += i * i
return total

Running this with a large value of n can take a noticeable amount of time. On my machine, using %timeit for n=1_000_000, I get:

43.2 ms ± 540 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Now, let’s optimize this with Cython. I create a .pyx file and type my code:

# cy_sum_of_squares.pyx
def sum_of_squares(int n):
cdef int i
cdef long long total = 0
for i in range(n):
total += i * i
return total

Note the cdef keywords to define C variable types; this is where Cython starts to work its magic. To compile this, I follow these steps:

  1. Write a with Cython build instructions:
from distutils.core import setup
from Cython.Build import cythonize

  1. Use the terminal:
$ python build_ext --inplace

Now I import the optimized function:

from cy_sum_of_squares import sum_of_squares as cy_sum_of_squares

Benchmarking the Cython function yields a significant performance boost:

1.47 ms ± 29.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

That’s a drastic reduction in execution time! Here’s the deal: my initial Python code was easy to write and read, but after converting it to Cython, the speed made it worth the extra steps.

Let me emphasize the routine: identify a performance bottleneck, write a Cython version, benchmark. This cycle has become second nature in my optimization toolkit.

Another tip I’ve picked up — use profiling tools like cProfile on your Python code first to find those critical spots. Only then should you bring in Cython; don’t waste time optimizing what doesn’t need it.

Questions do come up during the process, and more than once I’ve found myself on Stack Overflow or browsing through Cython’s GitHub repository for insights. Don’t hesitate to seek out these resources when you’re in a bind.

Finally, don’t forget to run your tests. Speed is enticing, but correct results are paramount. Always validate the behavior of your optimized code.

Optimizing with Cython is not just about a single function. Large projects can see massive gains when critical paths are Cythonized. Just remember, the real-world impact of these optimizations is cumulative — as your application scales, those milliseconds saved turn into dollars earned or experiences enhanced.

As beginners in Cython, remember to start small: pick a function, optimize it, and measure. Once comfortable, scaling those changes across codebases becomes a relatively straightforward task. It may be time-consuming to learn initially, but the rewards, as you’ve seen, can be significant.

Advanced Cython Features and Techniques for Maximum Speed

In this final stretch, I’m going to share with you some advanced Cython tips and tricks that I’ve personally found to be game changers when it comes to squeezing out every last drop of performance.

First up, ‘typed memoryviews’. These are Cython’s way of providing fast and efficient access to memory buffers, such as those underlying NumPy arrays, without the Python overhead. Here’s how you might use them:

cimport cython

def process_data(double[:] array):
cdef Py_ssize_t i
cdef double result = 0

for i in range(array.shape[0]):
result += array[i] * array[i]
return result

By turning off bounds checking and wraparound, which aren’t needed if you’re sure your indices are within the correct range, you’ll see a speed bump, as I did when processing large arrays.

Next, let’s talk about ‘inline functions’. Like in C, inline functions in Cython can offer speed benefits by inserting the function’s code at the call site, avoiding the overhead of a function call. For example:

cimport cython

def add_inline(int a, int b):
return a + b

This tiny function could accelerate tight loops where it’s called repeatedly, which was a noticeable improvement when I incorporated it into a computationally heavy simulation.

Another technique is ‘cdef classes’. These can be used to create extension types which are faster and more memory efficient than regular Python classes. I used these to create complex data structures without giving up on performance.

Here’s a barebones example:

cdef class Particle:
cdef double x, y, z
def __init__(self, double x, double y, double z):
self.x = x
self.y = y
self.z = z

Creating instances of ‘Particle’ will be much faster, as I observed when handling simulations with millions of particles.

One more aspect I want to touch on is the usage of ‘Cython directives’. These can drastically alter the way Cython compiles your code. For instance, ‘profile=False’ tells Cython not to include profile hooks in your code, which could slow down your function. I use it religiously in production code.

#cython: profile=False

Lastly, always remember that general coding best practices apply, even when working with Cython. Profile your code to identify bottlenecks, keep your code clean and readable, and avoid premature optimization.

The world of Cython optimization is vast, but I hope these tricks serve you well as they have for me. Whether it’s handling large datasets or performing numerically intensive algorithms, applying these techniques can lead to significant performance improvements that just aren’t possible with pure Python.

Remember, Cython is a means to an end – the destination being performant Python code. Harness its capabilities wherever necessary, because at the end of the day, we all want our Python code to run a little (or a lot) faster, and Cython is one of our best tickets to that show.

For further exploration, check out the official Cython documentation or dive into the source code at the Cython GitHub repository. Happy coding, and may your Python programs run ever so swiftly!