Organizing Code, Experiments, and Research for Kaggle Competitions

and I overlook. Educate me and I bear in mind. Contain me and I study.

holds true, and studying by doing is likely one of the most instructive processes to accumulate a brand new ability. Within the area of information science and machine studying, taking part in competitions is likely one of the handiest methods to realize hands-on expertise and improve your expertise and skills.

Kaggle is the world’s largest information science neighborhood, and its competitions are extremely revered within the trade. Most of the world’s main ML conferences (e.g., NeurIPS), organizations (e.g., Google), and universities (e.g., Stanford) host competitions on Kaggle.

The featured Kaggle Competitions award medals to prime performers on the personal leaderboard. Just lately, I’ve participated in my very first medal-awarding Kaggle competitors, and I used to be lucky sufficient to earn a Silver Medal. This was the NeurIPS – Ariel Data Challenge 2025. I don’t intend to share my resolution right here. For those who’re , you possibly can try my solution here.

What I didn’t notice previous to participation is how a lot Kaggle exams moreover simply ML expertise.

Kaggle exams one’s coding and software program engineering expertise. It harassed one’s skill to correctly manage their codebase with a purpose to shortly iterate and check out new concepts. It additionally examined the power to trace experiments and leads to a transparent method.

Being a part of the NeurIPS 2025 Competition Track, a analysis convention, additionally examined the power to analysis and study a brand new area shortly and successfully.

All in all, this competitors humbled me so much and taught me many classes moreover ML.

The aim of this text is to share a few of these non-ML classes with you. All of them revolve round one precept: group, group, group.

First, I’ll persuade you why clear code structuring and course of group isn’t time losing or good to have, however slightly important for competing in Kaggle particularly and any profitable information science mission normally. Then, I’ll share with you a number of the methods I used and classes realized concerning code structuring and the experimentation course of.

I need to begin with a be aware of humility. Not at all am I an professional on this area. I’m nonetheless within the outset of my journey. All I hope for is that some readers will discover a few of these classes useful and can study from my pitfalls. In case you have another suggestions or ideas, I urge you to share them in order that all of us can study collectively.

1 Science Golden Tip: Manage

It is no secret that natural scientists like to keep detailed records of their work and research process. Unclear steps may (and will) lead to incorrect conclusions and understanding. Irreproducible work is the bane of science. For us data scientists, why should it be any different?

1.1 But Speed is Important!

The common counterargument is that the nature of data science is fast-paced and iterative. Generally speaking, experimentation is cheap and quick; besides, who in the world prefers writing documentation over coding and building models?

As much as I sympathize with this thought and I love quick results, I fear that this mindset is short-sighted. Remember that the final goal of any data science project is to either deliver accurate, data-supported, and reproducible insights or to build reliable and reproducible models. If fast work compromises the end goal, then it is not worth anything.

My solution to this dilemma is to make the mundane parts of organization as simple, quick, and painless as possible. We shouldn’t seek total deletion of the organization process, but rather fix its faults to make it as efficient and productive as possible.

1.2 Costs of Lack of Organization

Imagine with me this scenario. For each of your experiments, you have a single notebook on Kaggle that does everything from loading and preprocessing the data to training the model, evaluating it, and finally submitting it. By now, you have run dozens of experiments. You discover a small bug in the data loading function that you used in all your experiments. Fixing it will be a nightmare because you will have to go through each of your notebooks, fix the bug, ensure no new bugs were introduced, and then re-run all your experiments to get the updated results. All of this would have been avoided if you had a clear code structure and your code were reusable and modular.

Drivendata (2022) mentions a terrific instance of the prices of an unorganized information science mission. It mentions the story of a failed information science mission that took months to finish and price hundreds of thousands of {dollars}. The failure got here all the way down to an incorrect conclusion found early within the mission. A code bug within the information cleansing polluted the information and led to unsuitable insights. If the staff had higher tracked the information sources and transformations, they might have caught the bug earlier, and cash would have been saved.

If there’s one lesson to remove from this part, it’s that group isn’t a nice-to-have, however slightly an important a part of any information science mission. With no clear code construction and course of group, we’re certain to make errors, waste time, and produce irreproducible work.

1.3 What to trace and manage?

There are three main aspects that I consider worth the effort to track:

Codebase
Experiments Results and Configurations
Research and Learning

2 The Codebase

After all, code is the backbone of any data science project. So, there is a lesson or two to learn from software engineers here.

2.1 Repo Structure

As long as you give much thought to the structure of your codebase, you are doing great.

There is no one universally agreed upon structure (nor will ever be). So, this section is highly subjective and opinionated. I will discuss the general structure I like and use.

I like to initialize my work with the widely popular Cookiecutter Data Science (ccds) template. While you initialize a mission with ccds, it creates a folder with the next construction. ¹

├── LICENSE            <- Open-source license if one is chosen
├── Makefile           <- Makefile with comfort instructions like `make information` or `make practice`
├── README.md          <- The highest-level README for builders utilizing this mission.
├── information
│   ├── exterior       <- Knowledge from third occasion sources.
│   ├── interim        <- Intermediate information that has been reworked.
│   ├── processed      <- The ultimate, canonical information units for modeling.
│   └── uncooked            <- The unique, immutable information dump.
│
├── docs               <- A default mkdocs mission; see www.mkdocs.org for particulars
│
├── fashions             <- Skilled and serialized fashions, mannequin predictions, or mannequin summaries
│
├── notebooks          <- Jupyter notebooks. Naming conference is a quantity (for ordering),
│                         the creator's initials, and a brief `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── pyproject.toml     <- Mission configuration file with bundle metadata for 
│                         {{ cookiecutter.module_name }} and configuration for instruments like black
│
├── references         <- Knowledge dictionaries, manuals, and all different explanatory supplies.
│
├── studies            <- Generated evaluation as HTML, PDF, LaTeX, and so forth.
│   └── figures        <- Generated graphics and figures for use in reporting
│
├── necessities.txt   <- The necessities file for reproducing the evaluation surroundings, e.g.
│                         generated with `pip freeze > necessities.txt`
│
├── setup.cfg          <- Configuration file for flake8
│
└── {{ cookiecutter.module_name }}   <- Supply code to be used on this mission.
    │
    ├── __init__.py             <- Makes {{ cookiecutter.module_name }} a Python module
    │
    ├── config.py               <- Retailer helpful variables and configuration
    │
    ├── dataset.py              <- Scripts to obtain or generate information
    │
    ├── options.py             <- Code to create options for modeling
    │
    ├── modeling                
    │   ├── __init__.py 
    │   ├── predict.py          <- Code to run mannequin inference with skilled fashions          
    │   └── practice.py            <- Code to coach fashions
    │
    └── plots.py                <- Code to create visualizations

2.1.1 Setting Administration

When you use ccds, you are prompted to select an environment manager. I personally prefer uv by Astral. It information all of the used packages within the pyproject.toml file and permits us to recreate the identical surroundings by merely utilizing uv sync.

Beneath the hood, uv makes use of venv. I discover utilizing uv a lot easier than immediately managing digital environments as a result of managing and studying pyproject.toml is way easier than necessities.txt.

Furthermore, I discover uv a lot easier than conda. uv is constructed particularly for python whereas conda is far more generic.

2.1.2 The Generated Module

A great part of this template is the { cookiecutter.module_name } directory. In this directory, you defined a Python package that shall contain all the important parts of your code (e.g. preprocessing functions, models definition, inference function, etc.).

I find the usage of the package quite helpful, and in Section 2.3, I’ll talk about what to put right here and what to put in Jupyter Notebooks.

2.1.3 Staying Versatile

Don’t regard this structure as perfect or complete. You don’t have to use everything ccds provides, and you may (and should) alter it if the project requires it. ccds provides you with a great starting point for you to tune to your exact project needs and demands.

2.2 Version Control

Git has become an absolute necessity for any project involving code. It allows us to track changes, revert to earlier versions, and, with GitHub, collaborate with team members.

When you use Git, you basically access a time machine that can remedy any faults you introduce to your code. Today, the use of Git is non-negotiable.

2.3 The Three Code Types

Choosing when to use Python scripts and when to use Jupyter Notebooks is a long-debated topic in the data science community. Here I present my stance on the topic.

I like to separate all of my code into one of three directories:

The Module
Scripts
Notebooks

2.3.1 The Module

The module should contain all the important functions and classes you create.

Its usage helps us minimize redundancy and create a single source of truth for all the important operations happening on the data.

In data science projects, some operations will be repeated in all your training and inference workflows, such as reading the data from files, transforming data, and model definitions. Repeating all these functions in all your notebooks or scripts is difficult and extremely boring. Using a module allows us to write the code once and then import it everywhere.

Moreover, this helps reduce errors and mistakes. When a bug in the module is discovered, you fix it once in the module, and it’s automatically fixed in all scripts and notebooks importing it.

2.3.2 Scripts

The scripts directory contains .py files. These files are the only source of generating outputs from the project. They are the interface to interacting with our module and code.

The two main usages for these files are training and inference. All the used models should be created by running one of the scripts, and all submissions on Kaggle should be made by such files.

The usage of these scripts helps make our results reproducible. To reproduce an older result (train the same model, for example), one only has to clone the same version of the repo and run the script used to make the old results 2.

For the reason that scripts are run from the CLI, utilizing a library to handle CLI arguments simplifies the code. I like utilizing typer for easy scripts that don’t have many config choices and utilizing hydra for complicated ones (I’ll talk about hydra in additional depth later).

2.3.3 Notebooks

Jupyter Notebooks are wonderful for exploration and prototyping because of the short feedback loop they provide.

On many occasions, I start writing code in a notebook to quickly test it and figure out all mistakes. Only then would I transfer it to the module.

However, notebooks shouldn’t be used to create final results. They are hard to reproduce and track changes in. Therefore, always use the scripts to create final outputs.

3 Running the Codebase on Kaggle

Using the structure discussed in the previous section, we need to follow these steps to run our code on Kaggle:

Clone The Repo
Install Required Packages
Run one of the Scripts

Because Kaggle provides us with a Jupyter Notebook interface to run our code and most Kaggle competitions have restrictions on internet access, submissions aren’t as straightforward as running a script on our local machine. In what follows, I will discuss how to perform each of the above steps on Kaggle.

3.1 Cloning The Repo

First of all, we can’t directly clone our repo from GitHub in the submission notebook because of the internet restrictions. However, Kaggle allows us to import outputs of other Kaggle notebooks into our current notebook. Therefore, the solution is to create a separate Kaggle notebook that clones our repo and installs the required packages. This notebook’s output is then imported into the submission notebook.

Most likely, you will be using a private repo. The simplest way to clone a private repo on Kaggle is to use a personal access token (PAT). You can create a PAT on GitHub by following this guide. An ideal follow is to create a PAT particularly for Kaggle with the minimal required permissions.

Within the cloning pocket book, you need to use the next code to clone your repo:

from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
github_token = user_secrets.get_secret("GITHUB_TOKEN")
consumer = "YOUR_GITHUB_USERNAME"
CLONE_URL = f"https://oauth2:{github_token}@github.com/{consumer}/YOUR_REPO_NAME.git"
get_ipython().system(f"git clone {CLONE_URL}")

This code downloads your repo into the working listing of the present pocket book. It assumes that you’ve saved your PAT in a Kaggle secret named GITHUB_TOKEN. Just remember to activate the key within the pocket book settings earlier than working it.

3.2 Putting in Required Packages

In the cloning notebook, you can also install the required packages. If you are using uv, you can build your custom module, install it, and install its dependencies by running the following commands: 3.

cd ariel-2025 && uv construct

This creates a wheel file within the dist/ listing to your module. You may then set up it and all its dependencies in a customized listing by working: ⁴.

pip set up /path/to/wheel/file --target /path/to/customized/dir

Make sure that to switch /path/to/wheel/file and /path/to/customized/dir with the precise paths. /path/to/wheel/file would be the path to the .whl file contained in the REPO_NAME/dist/ listing. The /path/to/customized/dir might be any listing you want. Bear in mind the customized listing path as a result of subsequent notebooks will depend on it to import your module and your mission dependencies.

I prefer to each obtain the repo and set up the packages in a single pocket book. I identify this pocket book the identical identify because the repo to simplify importing it later.

3.3 Operating One of many Scripts

The first thing to do in any subsequent notebook is to import the notebook containing the cloned repo and installed packages. When you do this, Kaggle stores the contents of /kaggle/working/ from the imported notebook into a directory named /kaggle/input/REPO_NAME/, where REPO_NAME is the name of the repo 5.

Many occasions, your scripts will create outputs (e.g., submission recordsdata) relative to their areas. By default, your code will stay on /kaggle/enter/REPO_NAME/, which is read-only. Subsequently, that you must copy the contents of the repo to /kaggle/working/, which is the present working listing and is read-write. Whereas this can be pointless, it’s a good follow that causes no hurt and prevents foolish points.

cp -r /kaggle/enter/REPO_NAME/REPO_NAME/ /kaggle/working/

For those who immediately run your scripts from /kaggle/working/scripts/, you’re going to get import errors as a result of Python can’t discover the put in packages and your module. This will simply be solved by updating the PYTHONPATH surroundings variable. I exploit the next command to replace it after which run my scripts:

! export PYTHONPATH=/kaggle/enter/REPO_NAME/custom_dir:$PYTHONPATH && cd /kaggle/working/REPO_NAME/scripts && python your_script.py --arg1 val1 --arg2 val2

I normally identify any pocket book working a script with the script identify for simplicity. Furthermore, after I re-run the pocket book on Kaggle, I identify the model with the hash of the present Git commit to maintain observe of which model of the code was used to generate the outcomes. ⁶.

3.4 Gathering The whole lot Collectively

At the end, two notebooks are necessary:

The Cloning Notebook: clones the repo and installs the required packages.
The Script Notebook: runs one of the scripts.

You may need more script notebooks in the pipeline. For example, you may have one notebook for training and another for inference. Each of these notebooks will follow the same structure as the script notebook discussed above.

Separating each step in the pipeline (e.g. data preprocessing, training, inference) into its own notebook is useful when one step takes a long time to run and rarely changes. For example, in the Ariel Data Challenge, my preprocessing step took more than seven hours to run. If I had everything in one notebook, I would have to wait seven hours every time I tried a new idea. Moreover, time limits on Kaggle kernels would have made it impossible to run the entire pipeline in one notebook.

Each notebook would then import the previous notebook’s output and run its own step, and build from there. A good advice is to make the paths of any data files or models arguments to the scripts so that you can easily change them when running on Kaggle or any other environment.

When you update your code, re-run the cloning notebook to update the code on Kaggle. Then, re-run only the necessary script notebooks to generate the new results.

3.5 Is all this Effort Worth it?

Absolutely yes!

I know that the specified pipeline will add some overhead when starting your project. However, it will save you much more time and effort in the long run. You will be able to write all your code locally and run the same code on Kaggle.

When you create a new model, all you have to do is copy one of the script notebooks and change the script. No conflicts will arise between your local and Kaggle code. You will be able to track all your changes using Git. You will be able to reproduce any old results by simply checking out the corresponding Git commit and re-running the necessary notebooks on Kaggle.

Moreover, you will be able to develop on any machine you like. Everything is centralized on GitHub. You can work from your local machine. If you need more power, you can work from a cloud VM. If you want to train on Kaggle, you can do that too. All your code and environment are the same everywhere.

This is such a small price to pay for such a great convenience. Once the pipeline is set up, you can forget about it and focus on what matters: researching and building models!

4 Recording Learnings and Research

When diving into a new domain, a huge part of your time will be spent researching, studying, and reading papers. It is easy to get lost in all the information you read, and you can forget where you encountered a certain idea or concept. To that end, it is important to manage and organize your learning.

4.1 Readings Tracking

Rajpurkar (2023) suggests having an inventory of all of the papers and articles you learn. This lets you shortly overview what you could have learn and refer again to it when wanted.

Professor Rajpurkar additionally suggests annotating every paper with one, two, or three stars. One-star papers are irrelevant papers, however you didn’t know that earlier than studying them. Two-star papers are related. Three-star papers are extremely related. This lets you shortly filter your readings afterward.

You must also take notes on every paper you learn. These notes ought to deal with how the paper pertains to your mission. They need to be brief to be reviewed simply, however have sufficient particulars to know the principle concepts. Within the papers checklist, it is best to hyperlink studying notes to every paper for straightforward entry.

I additionally like holding notes on the papers themselves, reminiscent of highlights. For those who’re utilizing a PDF reader or an e-Ink system, it is best to retailer the annotated model of the paper for future reference and hyperlink it in your notes. For those who want studying on paper, you possibly can scan the annotated model and retailer it digitally.

4.2 Instruments

For most documents, I like using Google Docs because it allows me to access my notes from anywhere. Moreover, you can write on Google Docs in Markdown, which is my preferred writing format (I am using it to write this article).

Zotero is a good device for managing analysis papers. It’s nice at storing and organizing papers. You may create a group for every mission and retailer all of the related papers there. Importing papers may be very straightforward utilizing the browser extension, and exporting citations in BibTeX format is easy.

5 Experiment Monitoring

In data science projects, you will often run many experiments and try many ideas. Once again, it is easy to get lost in all this mess.

We have already made a great step forward by structuring our codebase properly and using scripts to run our experiments. Nevertheless, I want to discuss two software tools that allow us to do even better.

5.1 Wandb

Weights and Biases (wandb), pronounced “w-and-b” (for weights and biases) or “wand-b” (for being magical like a wand) or “wan-db” (for being a database), is a good device for monitoring experiments. It permits us to run a number of experiments and save all their configurations and leads to a central place.

Determine 1: Wandb Dashboard Picture from Adrish Dey’s Configuring W&B Projects with Hydra article

Wandb offers us with a dashboard to match the outcomes of various experiments, the hyperparameters used, and the coaching curves. It additionally tracks system metrics reminiscent of GPU and CPU utilization.

Wandb additionally integrates with Hugging Face libraries, making it straightforward to trace experiments when utilizing transformers.

When you begin utilizing a number of experiments, wandb turns into an indispensable device.

5.2 Hydra

Hydra is a device constructed by Meta that simplifies configuration administration. It lets you outline all of your configuration in YAML recordsdata and simply override them from the CLI.

It’s a very versatile device and matches a number of use circumstances. This guide discusses the best way to use Hydra for experiment configuration.

6 The Finish-to-Finish Course of

Figure 2: End-to-End Organized Kaggle Competition Process created by the Author using Mermaid.js

Figure 2 summarizes the method mentioned on this article. First, we analysis concepts and file our learnings. Then, we experiment with these concepts on our native machines in Jupyter Notebooks. As soon as we’ve got a working thought, we refactor the code into our module and create scripts to run the experiments. We run the brand new experiment(s) on Kaggle. Lastly, we observe the outcomes of the brand new experiments.

As a result of all the things is rigorously tracked, we’re in a position to predict our shortcomings and shortly head again to the analysis or improvement phases to repair them.

7 Conclusion

Disorder is the source of all evil in data science projects. If we are to produce reliable and reproducible work, we must strive for organization and clarity in our processes. Kaggle competitions are no exception.

In this article, we discussed a technique to organize our codebase, tips to track research and learnings, and tools to track experiments. Figure 2 summarizes the proposed method.

I hope this text was useful to you. In case you have another suggestions or ideas, please share them within the feedback part beneath.

Better of luck in your subsequent competitors!

7.1 References

Drivendata. (2022). The 10 Rules of Reliable Data Science.

Rajpurkar, P. (2023). Harvard CS197: AI Research Experiences. https://www.cs197.seas.harvard.edu

Source link

Three OpenClaw Mistakes to Avoid and How to Fix Them

I Stole a Wall Street Trick to Solve a Google Trends Data Problem

Why Your AI Search Evaluation Is Probably Wrong (And How to Fix It)

Google’s generative video model Veo 3 has a subtitles problem

Graph Neural Networks Part 4: Teaching Models to Connect the Dots

Undetectable AI vs. Grammarly’s AI Detector: It’s One-Sided

The Machine Learning Projects Employers Want to See

Get Ready for Your Next Career Move

Most Popular

Historic Milestone or Creative Crisis?

Cutting LLM Memory by 84%: A Deep Dive into Fused Kernels

OpenAI inför vattenstämplar på gratisgenererade bilder

Our Picks