Liberating Performance with Immutable DataFrames in Free-Threaded Python

to every row of a DataFrame is a typical operation. These operations are embarrassingly parallel: every row will be processed independently. With a multi-core CPU, many rows will be processed directly.

Till lately, exploiting this chance in Python was not doable. Multi-threaded perform software, being CPU-bound, was throttled by the International Interpreter Lock (GIL).

Python now provides an answer: with the “experimental free-threading construct” of Python 3.13, the GIL is eliminated, and true multi-threaded concurrency of CPU-bound operations is feasible.

The efficiency advantages are extraordinary. Leveraging free-threaded Python, StaticFrame 3.2 can carry out row-wise perform software on a DataFrame no less than twice as quick as single-threaded execution.

For instance, for every row of a sq. DataFrame of one-million integers, we will calculate the sum of all even values with lambda s: s.loc[s % 2 == 0].sum(). When utilizing Python 3.13t (the “t” denotes the free-threaded variant), the period (measured with ipython %timeit) drops by greater than 60%, from 21.3 ms to 7.89 ms:

# Python 3.13.5 experimental free-threading construct (primary, Jun 11 2025, 15:36:57) [Clang 16.0.0 (clang-1600.0.26.6)] on darwin
>>> import numpy as np; import static_frame as sf

>>> f = sf.Body(np.arange(1_000_000).reshape(1000, 1000))
>>> func = lambda s: s.loc[s % 2 == 0].sum()

>>> %timeit f.iter_series(axis=1).apply(func)
21.3 ms ± 77.1 μs per loop (imply ± std. dev. of seven runs, 10 loops every)

>>> %timeit f.iter_series(axis=1).apply_pool(func, use_threads=True, max_workers=4)
7.89 ms ± 60.1 μs per loop (imply ± std. dev. of seven runs, 100 loops every)

Row-wise perform software in StaticFrame makes use of the iter_series(axis=1) interface adopted by both apply() (for single-threaded software) or apply_pool() for multi-threaded (use_threads=True) or multi-processed (use_threads=False) software.

The advantages of utilizing free-threaded Python are sturdy: the outperformance is constant throughout a variety of DataFrame shapes and compositions, is proportional in each MacOS and Linux, and positively scales with DataFrame measurement.

When utilizing commonplace Python with the GIL enabled, multi-threaded processing of CPU-bound processes typically degrades efficiency. As proven under, the period of the identical operation in commonplace Python will increase from 17.7 ms with a single thread to virtually 40 ms with multi-threading:

# Python 3.13.5 (primary, Jun 11 2025, 15:36:57) [Clang 16.0.0 (clang-1600.0.26.6)]
>>> import numpy as np; import static_frame as sf

>>> f = sf.Body(np.arange(1_000_000).reshape(1000, 1000))
>>> func = lambda s: s.loc[s % 2 == 0].sum()

>>> %timeit f.iter_series(axis=1).apply(func)
17.7 ms ± 144 µs per loop (imply ± std. dev. of seven runs, 100 loops every)

>>> %timeit f.iter_series(axis=1).apply_pool(func, use_threads=True, max_workers=4)
39.9 ms ± 354 µs per loop (imply ± std. dev. of seven runs, 10 loops every)

There are trade-offs when utilizing free-threaded Python: as obvious in these examples, single-threaded processing is slower (21.3 ms on 3.13t in comparison with 17.7 ms on 3.13). Free-threaded Python, typically, incurs efficiency overhead. That is an lively space of CPython growth and enhancements are anticipated in 3.14t and past.

Additional, whereas many C-extension packages like NumPy now provide pre-compiled binary wheels for 3.13t, dangers resembling thread rivalry or information races nonetheless exists.

StaticFrame avoids these dangers by imposing immutability: thread security is implicit, eliminating the necessity for locks or defensive copies. StaticFrame does this by utilizing immutable NumPy arrays (with flags.writeable set to False) and forbidding in-place mutation.

Prolonged DataFrame Efficiency Exams

Evaluating efficiency traits of a fancy information construction like a DataFrame requires testing many kinds of DataFrames. The next efficiency panels carry out row-wise perform software on 9 totally different DataFrame varieties, testing all combos of three shapes and three ranges of sort homogeneity.

For a hard and fast variety of parts (e.g., 1 million), three shapes are examined: tall (10,000 by 100), sq. (1,000 by 1,000), and vast (100 by 10,0000). To range sort homogeneity, three classes of artificial information are outlined: columnar (no adjoining columns have the identical sort), blended (teams of 4 adjoining columns share the identical sort), and uniform (all columns are the identical sort). StaticFrame permits adjoining columns of the identical sort to be represented as two-dimensional NumPy arrays, lowering the prices of column transversal and row formation. On the uniform excessive, a whole DataFrame will be represented by one two-dimensional array. Artificial information is produced with the frame-fixtures bundle.

The identical perform is used: lambda s: s.loc[s % 2 == 0].sum(). Whereas a extra environment friendly implementation is feasible utilizing NumPy straight, this perform approximates frequent functions the place many intermediate Sequence are created.

Determine legends doc concurrency configuration. When use_threads=True, multi-threading is used; when use_threads=False, multi-processing is used. StaticFrame makes use of the ThreadPoolExecutor and ProcessPoolExecutor interfaces from the usual library and exposes their parameters: the max_workers parameter defines the utmost variety of threads or processes used. A chunksize parameter can be accessible, however isn’t assorted on this examine.

Multi-Threaded Perform Utility with Free-Threaded Python 3.13t

As proven under, the efficiency advantages of multi-threaded processing in 3.13t are constant throughout all DataFrame varieties examined: processing time is decreased by no less than 50%, and in some instances by over 80%. The optimum variety of threads (the max_workers parameter) is smaller for tall DataFrames, because the faster processing of smaller rows signifies that extra thread overhead really degrades efficiency.

Determine by Writer.

Scaling to DataFrames of 100 million parts (1e8), outperformance improves. Processing time is decreased by over 70% for all however two DataFrame varieties.

The overhead of multi-threading can range drastically between platforms. In all instances, the outperformance of utilizing free-threaded Python is proportionally constant between MacOS and Linux, although MacOS reveals marginally larger advantages. The processing of 100 million parts on Linux reveals related relative outperformance:

Surprisingly, even small DataFrames of solely ten-thousand parts (1e4) can profit from multi-threaded processing in 3.13t. Whereas no profit is discovered for vast DataFrames, the processing time of tall and sq. DataFrames will be decreased in half.

Multi-Threaded Perform Utility with Normal Python 3.13

Previous to free-threaded Python, multi-threaded processing of CPU-bound functions resulted in degraded efficiency. That is made clear under, the place the identical exams are carried out with commonplace Python 3.13.

Multi-Processed Perform Utility with Normal Python 3.13

Previous to free-threaded Python, multi-processing was the one possibility for CPU-bound concurrency. Multi-processing, nonetheless, solely delivered advantages if the quantity of per-process work was enough to offset the excessive price of making an interpreter per course of and copying information between processes.

As proven right here, multi-processing row-wise perform software considerably degrades efficiency, course of time growing from two to 10 occasions the single-threaded period. Every unit of labor is just too small to make up for multi-processing overhead.

The Standing of Free-Threaded Python

PEP 703, “Making the International Interpreter Lock Non-obligatory in CPython”, was accepted by the Python Steering Council in July of 2023 with the steerage that, within the first section (for Python 3.13) it’s experimental and non-default; within the second section, it turns into non-experimental and formally supported; within the third section, it turns into the default Python implementation.

After important CPython growth, and assist by essential packages like NumPy, PEP 779, “Standards for supported standing for free-threaded Python” was accepted by the Python Steering Council in June of 2025. In Python 3.14, free-threaded Python will enter the second section: non-experimental and formally supported. Whereas it isn’t but sure when free-threaded Python will grow to be the default, it’s clear {that a} trajectory is ready.

Conclusion

Row-wise perform software is just the start: group-by operations, windowed perform software, and lots of different operations on immutable DataFrames are equally well-suited to concurrent execution and are prone to present comparable efficiency positive aspects.

The work to make CPython quicker has had success: Python 3.14 is alleged to be 20% to 40% quicker than Python 3.10. Sadly, these efficiency advantages haven’t been realized for a lot of working with DataFrames, the place efficiency is essentially certain inside C-extensions (be it NumPy, Arrow, or different libraries).

As proven right here, free-threaded Python permits environment friendly parallel execution utilizing low-cost, memory-efficient threads, delivering a 50% to 90% discount in processing time, even when efficiency is primarily certain in C-extension libraries like NumPy. With the power to securely share immutable information buildings throughout threads, alternatives for substantial efficiency enhancements at the moment are ample.

Source link

Three OpenClaw Mistakes to Avoid and How to Fix Them

I Stole a Wall Street Trick to Solve a Google Trends Data Problem

Why Your AI Search Evaluation Is Probably Wrong (And How to Fix It)

Struggling to Land a Data Role in 2025? These 5 Tips Will Change That

Stochastic Differential Equations and Temperature — NASA Climate Data pt. 2

Faster Is Not Always Better: Choosing the Right PostgreSQL Insert Strategy in Python (+Benchmarks)

How to Evaluate Retrieval Quality in RAG Pipelines (part 2): Mean Reciprocal Rank (MRR) and Average Precision (AP)

The Machine Learning “Advent Calendar” Day 11: Linear Regression in Excel

Most Popular

43 Best Chatgpt Prompts For Amazon Sellers In 2026 » Ofemwire

Music, Lyrics, and Agentic AI: Building a Smart Song Explainer using Python and OpenAI

How I Automated My Machine Learning Workflow with Just 10 Lines of Python

Our Picks