Why Is My Code So Slow? A Guide to Py-Spy Python Profiling

irritating points to debug in knowledge science code aren’t syntax errors or logical errors. Slightly, they arrive from code that does precisely what it’s presupposed to do, however takes its candy time doing it.

Practical however inefficient code generally is a large bottleneck in an information science workflow. On this article, I’ll present a quick introduction and walk-through of py-spy, a robust software designed to profile your Python code. It may pinpoint precisely the place your program is spending essentially the most time so inefficiencies will be recognized and corrected.

Instance Drawback

Let’s arrange a easy analysis query to jot down some code for:

“For all flights going between US states and territories, which departing airport has the longest flights on common?”

Under is an easy Python script to reply this analysis query, utilizing knowledge retrieved from the Bureau of Transportation Statistics (BTS). The dataset consists of information from each flight inside US states and territories between January and June of 2025 with data on the origin and vacation spot airports. It’s roughly 3.5 million rows.

It calculates the Haversine Distance — the shortest distance between two factors on a sphere — for every flight. Then, it teams the outcomes by departing airport to search out the common distance and studies the highest 5.

import pandas as pd  
import math  
import time  
  
  
def haversine(lat_1, lon_1, lat_2, lon_2):  
    """Calculate the Haversine Distance between two latitude and longitude factors"""  
    lat_1_rad = math.radians(lat_1)  
    lon_1_rad = math.radians(lon_1)  
    lat_2_rad = math.radians(lat_2)  
    lon_2_rad = math.radians(lon_2)  
  
    delta_lat = lat_2_rad - lat_1_rad  
    delta_lon = lon_2_rad - lon_1_rad  
  
    R = 6371  # Radius of the earth in km  
  
    return 2*R*math.asin(math.sqrt(math.sin(delta_lat/2)**2 + math.cos(lat_1_rad)*math.cos(lat_2_rad)*(math.sin(delta_lon/2))**2))  
  
  
if __name__ == '__main__':  
    # Load in flight knowledge to a dataframe  
    flight_data_file = r"./knowledge/2025_flight_data.csv"  
    flights_df = pd.read_csv(flight_data_file)  
  
    # Begin timer to see how lengthy evaluation takes  
    begin = time.time()  
  
    # Calculate the haversine distance between every flight's begin and finish airport  
    haversine_dists = []  
    for i, row in flights_df.iterrows():  
        haversine_dists.append(haversine(lat_1=row["LATITUDE_ORIGIN"],  
                                         lon_1=row["LONGITUDE_ORIGIN"],  
                                         lat_2=row["LATITUDE_DEST"],  
                                         lon_2=row["LONGITUDE_DEST"]))  
  
    flights_df["Distance"] = haversine_dists  
  
    # Get consequence by grouping by origin airport, taking the common flight distance and      printing the highest 5  
    consequence = (  
        flights_df  
        .groupby('DISPLAY_AIRPORT_NAME_ORIGIN').agg(avg_dist=('Distance', 'imply'))  
        .sort_values('avg_dist', ascending=False)  
    )  
  
    print(consequence.head(5))  
  
    # Finish timer and print evaluation time  
    finish = time.time()  
    print(f"Took {finish - begin} s")

Working this code offers the next output:

                                        avg_dist
DISPLAY_AIRPORT_NAME_ORIGIN                     
Pago Pago Worldwide              4202.493567
Guam Worldwide                   3142.363005
Luis Munoz Marin Worldwide       2386.141780
Ted Stevens Anchorage Worldwide  2246.530036
Daniel Okay Inouye Worldwide        2211.857407
Took 169.8935534954071 s

These outcomes make sense, because the airports listed are in American Samoa, Guam, Puerto Rico, Alaska, and Hawaii, respectively. These are all places outdoors of the contiguous United States the place one would anticipate lengthy common flight distances.

The issue right here isn’t the outcomes — that are legitimate — however the execution time: virtually three minutes! Whereas three minutes could be tolerable for a one-off run, it turns into a productiveness killer throughout growth. Think about this as a part of an extended knowledge pipeline. Each time a parameter is tweaked, a bug is fastened, or a cell is re-run, you’re pressured to sit down idle whereas this system runs. That friction breaks your circulate and turns a fast evaluation into an all-afternoon affair.

Now let’s see how py-spy may help us diagnose precisely what strains are taking so lengthy.

What Is Py-Spy?

To grasp what py-spy is doing and the advantages of utilizing it, it helps to check py-spy to the built-in Python profiler cProfile.

cProfile: It is a Tracing Profiler, working just like a stopwatch on every perform name. The time between every perform name and return is measured and reported. Whereas extremely correct, this provides important overhead, because the profiler has to always pause and file knowledge, which may decelerate the script considerably.
py-spy: It is a Sampling Profiler, working just like a excessive pace digicam trying on the complete program without delay. py-spy sits fully outdoors the operating Python script and takes high-frequency snapshots of this system’s state. It appears to be like on the complete “Name Stack” to see precisely what line of code is being run and what perform known as it, all the best way as much as the highest degree.

Working Py-spy

So as to run py-spy on a Python script, the py-spy library should be put in within the Python surroundings.

pip set up py-spy

As soon as the py-spy library is put in, our script will be profiled by operating the next command within the terminal:

py-spy file -o profile.svg -r 100 -- python most important.py

Here’s what every a part of this command is definitely doing:

py-spy: Calls the software.
file: This tells py-spy to make use of its “file” mode, which can repeatedly monitor this system whereas it runs and saves the info.
-o profile.svg: This specifies the output filename and format, telling it to output the outcomes as an SVG file known as profile.svg.
-r 100: This specifies the sampling fee, setting it to 100 occasions per second. Because of this py-spy will examine what this system is doing 100 occasions per second.
--: This separates the py-spy command from the Python script command. It tells py-spy that every little thing following this flag is the command to run, not arguments for py-spy itself.
python most important.py: That is the command to run the Python script to be profiled with py-spy, on this case operating most important.py.

Observe: If operating on Linux, sudo privileges are sometimes a requirement for operating py-spy, for safety causes.

After this command is completed operating, an output file profile.svg will seem which can enable us to dig deeper into what components of the code are taking the longest.

Py-spy Output

Icicle Graph output from py-spy

Opening up the output profile.svg reveals the visualization that py-spy has created for a way a lot time our program spent in numerous components of the code. This is called a Icicle Graph (or typically a Flame Graph if the y-axis is inverted) and is interpreted as follows:

Bars: Every coloured bar represents a selected perform that was known as in the course of the execution of this system.
X-axis (Inhabitants): The horizontal axis represents the gathering of all samples taken in the course of the profiling. They’re grouped in order that the width of a selected bar represents the proportion of the entire samples that this system was within the perform represented by that bar. Observe: That is not a timeline; the ordering doesn’t characterize when the perform was known as, solely the entire quantity of time spent.
Y-axis (Stack Depth): The vertical axis represents the decision stack. The highest bar labeled “all” represents the whole program, and the bars beneath it characterize features known as from “all”. This continues down recursively with every bar damaged down into the features that had been known as throughout its execution. The very backside bar reveals the perform that was truly operating on the CPU when the pattern was taken.

Interacting with the Graph

Whereas the picture above is static, the precise .svg file generated by py-spy is absolutely interactive. Whenever you open it in an internet browser, you possibly can:

Search (Ctrl+F): Spotlight particular features to see the place they seem within the stack.
Zoom: Click on on any bar to zoom in on that particular perform and its kids, permitting you to isolate advanced components of the decision stack.
Hover: Hovering over any bar shows the precise perform identify, file path, line quantity, and the precise proportion of time it consumed.

Essentially the most vital rule for studying the icicle graph is just: The broader the bar, the extra frequent the perform. If a perform bar spans 50% of the graph’s width, it signifies that this system was engaged on executing that perform for 50% of the entire runtime.

Prognosis

From the icicle graph above, we are able to see that the bar representing the Pandas iterrows() perform is noticeably extensive. Hovering over that bar when viewing the profile.svg file reveals that the true proportion for this perform was 68.36%. So over 2/3 of the runtime was spent within the iterrows() perform. Intuitively this bottleneck is sensible, as iterrows() creates a Pandas Collection object for each single row within the loop, inflicting large overhead. This reveals a transparent goal to try to optimize the runtime of the script.

Optimizing The Script

The clearest path to optimize this script based mostly on what was realized from py-spy is to cease utilizing iterrows() to loop over each row to calculate that haversine distance. As an alternative, it needs to be changed with a vectorized calculation utilizing NumPy that can do the calculation for each row with only one perform name. So the adjustments to be made are:

Rewrite the haversine() perform to make use of vectorized and environment friendly C-level NumPy operations that enable complete arrays to be handed in reasonably than one set of coordinates at a time.
Substitute the iterrows() loop with a single name to this newly vectorized haversine() perform.

import pandas as pd  
import numpy as np  
import time  
  
  
def haversine(lat_1, lon_1, lat_2, lon_2):  
    """Calculate the Haversine Distance between two latitude and longitude factors"""  
    lat_1_rad = np.radians(lat_1)  
    lon_1_rad = np.radians(lon_1)  
    lat_2_rad = np.radians(lat_2)  
    lon_2_rad = np.radians(lon_2)  
  
    delta_lat = lat_2_rad - lat_1_rad  
    delta_lon = lon_2_rad - lon_1_rad  
  
    R = 6371  # Radius of the earth in km  
  
    return 2*R*np.asin(np.sqrt(np.sin(delta_lat/2)**2 + np.cos(lat_1_rad)*np.cos(lat_2_rad)*(np.sin(delta_lon/2))**2))  
  
  
if __name__ == '__main__':  
    # Load in flight knowledge to a dataframe  
    flight_data_file = r"./knowledge/2025_flight_data.csv"  
    flights_df = pd.read_csv(flight_data_file)  
  
    # Begin timer to see how lengthy evaluation takes  
    begin = time.time()  
  
    # Calculate the haversine distance between every flight's begin and finish airport  
    flights_df["Distance"] = haversine(lat_1=flights_df["LATITUDE_ORIGIN"],  
                                       lon_1=flights_df["LONGITUDE_ORIGIN"],  
                                       lat_2=flights_df["LATITUDE_DEST"],  
                                       lon_2=flights_df["LONGITUDE_DEST"])  
  
    # Get consequence by grouping by origin airport, taking the common flight distance and      printing the highest 5  
    consequence = (  
        flights_df  
        .groupby('DISPLAY_AIRPORT_NAME_ORIGIN').agg(avg_dist=('Distance', 'imply'))  
        .sort_values('avg_dist', ascending=False)  
    )  
  
    print(consequence.head(5))  
  
    # Finish timer and print evaluation time  
    finish = time.time()  
    print(f"Took {finish - begin} s")

Working this code offers the next output:

                                        avg_dist
DISPLAY_AIRPORT_NAME_ORIGIN                     
Pago Pago Worldwide              4202.493567
Guam Worldwide                   3142.363005
Luis Munoz Marin Worldwide       2386.141780
Ted Stevens Anchorage Worldwide  2246.530036
Daniel Okay Inouye Worldwide        2211.857407
Took 0.5649983882904053 s

These outcomes are equivalent to the outcomes from earlier than the code was optimized, however as an alternative of taking practically three minutes to course of, it took simply over half a second!

Trying Forward

In case you are studying this from the longer term (late 2026 or past), examine in case you are operating Python 3.15 or newer. Python 3.15 is anticipated to introduce a local sampling profiler in the usual library, providing related performance to py-spy with out requiring exterior set up. For anybody on Python 3.14 or older py-spy stays the gold commonplace.

This text explored a software for tackling a typical frustration in knowledge science — a script that features as supposed, however is inefficiently written and takes a very long time to run. An instance script was offered to be taught which US departure airports have the longest common flight distance in response to the Haversine distance. This script labored as anticipated, however took virtually three minutes to run.

Utilizing the py-spy Python profiler, we had been capable of be taught that the reason for the inefficiency was the usage of the iterrows() perform. By changing iterrows() with a extra environment friendly vectorized calculation of the Haversine distance, the runtime was optimized from three minutes down to simply over half a second.

See my GitHub Repository for the code from this text, together with the preprocessing of the uncooked knowledge from BTS.

Thanks for studying!

Knowledge Sources

Knowledge from the Bureau of Transportation Statistics (BTS) is a piece of the U.S. Federal Authorities and is within the public area underneath 17 U.S.C. § 105. It’s free to make use of, share, and adapt with out copyright restriction.

Source link

Building an AI Agent to Detect and Handle Anomalies in Time-Series Data

Using synthetic biology and AI to address global antimicrobial resistance threat | MIT News

Not All RecSys Problems Are Created Equal

How to improve AP and invoice tasks

By putting AI into everything, Google wants to make it invisible

Modular Arithmetic in Data Science

Data Visualization Explained: What It Is and Why It Matters

Merging AI and underwater photography to reveal hidden ocean worlds | MIT News

Most Popular

A Bird’s Eye View of Linear Algebra: The Basics

Sam Altman and the whale

How to Perform Effective Agentic Context Engineering

Our Picks

The Foundation of Trusted Enterprise AI