Data Science Spotlight: Selected Problems from Advent of Code 2025

of Code is an annual creation calendar of programming puzzles which might be themed round serving to Santa’s elves put together for Christmas. The whimsical setting masks the truth that many puzzles name for severe algorithmic problem-solving, particularly in direction of the tip of the calendar. In a previous article, we mentioned the significance of algorithmic pondering for information scientists at the same time as AI-assisted coding turns into the norm. With Introduction of Code 2025 having wrapped up final month, this text takes a better take a look at a choice of issues from the occasion which might be particularly related for information scientists. We are going to sketch out some fascinating answer approaches in Python, highlighting algorithms and libraries that may be leveraged in a big selection of real-world information science use circumstances.

Navigating Tachyon Manifolds with Units and Dynamic Programming

The primary downside we’ll take a look at is Day 7: Laboratories. We’re given a tachyon manifold in a file known as input_d7.txt, as proven beneath:

.......S.......
...............
.......^.......
...............
......^.^......
...............
.....^.^.^.....
...............
....^.....^....
...............
...^.^...^.^...
...............
..^...^.....^..
...............
.^...^.^.....^.
...............

A tachyon beam (“|”) begins on the high of the manifold and travels downward. If the beam hits a splitter (“^”), it splits into two beams, one on both facet of the splitter. Half One of many puzzle asks us to find out the variety of instances a beam will cut up given a set of preliminary circumstances (place to begin of the beam and the manifold structure). Be aware that merely counting the variety of splitters and multiplying by two won’t give the right reply, since overlapping beams are solely counted as soon as, and a few splitters are by no means reached by any of the beams. We will leverage set algebra to account for these constraints as proven within the implementation beneath:

import functools

def find_all_indexes(s, ch):
    """Return a set of all positions the place character ch seems in s."""
    return {i for i, c in enumerate(s) if c == ch}

with open("input_d7.txt") as f:
    first_row = f.readline()  # row containing preliminary beams ('S')
    f.readline()  # skip separator line
    rows = f.readlines()  # remaining manifold rows

beam_ids = find_all_indexes(first_row, "S")  # energetic beam column positions
split_counter = 0  # complete variety of splits

for row_index, line in enumerate(rows):
    # Solely even-indexed rows include splitters
    if row_index % 2 != 0:
        proceed

    # Discover splitter positions on this row
    splitter_ids = find_all_indexes(line, "^")

    # Beams that hit a splitter (intersection)
    hits = beam_ids.intersection(splitter_ids)
    split_counter += len(hits)

    # New beams created by splits (left and proper)
    if hits:
        new_beams = functools.cut back(lambda acc, h: acc.union({h - 1, h + 1}), hits, set())
    else:
        new_beams = set()

    # Replace energetic beams (add new beams, take away beams that hit splitters)
    beam_ids = beam_ids.union(new_beams).distinction(splitter_ids)

print(split_counter)

We use the intersection operation to determine the splitters which might be immediately hit by energetic beams coming from above. New beams are created to the left and proper of each splitter that’s hit, however overlapping beams are solely counted as soon as with the union operator. The set of beams ensuing from every layer of splitters within the tachyon manifold is computed utilizing an inventory comprehension wrapped in a cut back operate, a higher-order operate that helps to simplify the code and sometimes seen in useful programming. The distinction operator ensures that the unique beams incident on the splitter should not counted among the many set of outgoing energetic beams.

In a classical system, if a tachyon particle is shipped by means of the manifold and encounters a splitter, the particle can solely proceed alongside one distinctive path to the left or proper of the splitter. Half Two of the puzzle introduces a quantum model of this setup, during which a particle concurrently goes down each the left and proper paths, successfully spawning two parallel timelines. Our job is to find out the full variety of timelines that exist after a particle has traversed all viable paths in such a quantum tachyon manifold. This downside might be solved effectively utilizing dynamic programming as proven beneath:

from functools import lru_cache

def count_timelines_with_dfs_and_memo(path):
    """Depend distinct quantum timelines utilizing DFS + memoization (top-down DP)"""
    with open(path) as f:
        traces = [line.rstrip("n") for line in f if line.strip()]

    peak = len(traces)
    width = len(traces[0])

    # Discover beginning column
    start_col = subsequent(i for i, ch in enumerate(traces[0]) if ch == "S")

    @lru_cache(maxsize=None)
    def dfs_with_memo(row, col):
        """Return variety of timelines from (row, col) to backside utilizing DFS + memoization"""
        # Out of bounds horizontally
        if col < 0 or col >= width:
            return 0
        
        # Previous the underside row: one full timeline
        if row == peak:
            return 1

        if traces[row][col] == "^":
            # Break up left and proper
            return dfs_with_memo(row+1, col-1) + dfs_with_memo(row+1, col+1)
        else:
            # Proceed straight down
            return dfs_with_memo(row+1, col)

    return dfs_with_memo(1, start_col)

print(count_timelines_with_dfs_and_memo("input_d7.txt"))

Recursive depth-first search with memoization is used to arrange a top-down type of dynamic programming, the place every subproblem is solved as soon as and reused a number of instances. Two base circumstances are outlined: a legitimate timeline just isn’t created if a particle goes out of bounds horizontally, and a whole timeline is counted as soon as the particle reaches the underside of the manifold. The recursive step accounts for 2 circumstances: every time the particle reaches a splitter, it branches into two timelines, in any other case it continues straight down within the present timeline. Memoization (utilizing the @lru_cache decorator) prevents recalculation of recognized values when a number of paths converge on the similar location within the manifold.

In observe, information scientists can use the instruments and methods described above in quite a lot of conditions. The idea of beam splitting is comparable in some methods to the proliferation of knowledge packets in a fancy communications community. Simulating the cascading course of is a bit like modeling provide chain disruptions, epidemics, and data diffusion. At a extra summary stage, the puzzle might be framed as a constrained graph traversal or path counting downside. Set algebra and dynamic programming are versatile ideas that information scientists can use to unravel such seemingly troublesome algorithmic issues.

Constructing Circuits with Nearest Neighbor Search

The subsequent downside we’ll take a look at is Day 8: Playground. We’re supplied with an inventory of triples that symbolize the 3D location coordinates {of electrical} junction bins in a file known as input_d8.txt, as proven beneath:

162,817,810

59,618,56

901,360,560

…

In Half One, we’re requested to successively determine and join pairs of junction bins which might be closest collectively by way of straight-line (or Euclidean) distance. Related bins type a circuit by means of which electrical energy can move. The duty is in the end to report the results of multiplying collectively the sizes of the three largest circuits after connecting the 1000 pairs of junction bins which might be closest collectively. One neat answer entails utilizing a min-heap to retailer pairs of junction field coordinates. Following is an implementation primarily based on an instructive video by James Peralta:

from collections import defaultdict
import heapq
from math import dist as euclidean_dist

# Load factors
with open("input_d8.txt") as f:
    factors = [tuple(map(int, line.split(","))) for line in f.read().split()]

okay = 1000

# Construct min‑heap of all pairwise distances
dist_heap = [
    (euclidean_dist(points[i], factors[j]), factors[i], factors[j])
    for i in vary(len(factors))
    for j in vary(i + 1, len(factors))
]

heapq.heapify(dist_heap)

# Take okay shortest edges and construct adjacency record
neighbors = defaultdict(record)
for _ in vary(okay):
    _, a, b = heapq.heappop(dist_heap)
    neighbors[a].append(b)
    neighbors[b].append(a)

# Use DFS to compute part dimension
def dfs(begin, seen):
    stack = [start]
    seen.add(begin)
    dimension = 0

    whereas stack:
        node = stack.pop()
        dimension += 1
        for nxt in neighbors[node]:
            if nxt not in seen:
                seen.add(nxt)
                stack.append(nxt)
    return dimension

# Compute sizes of all related parts
seen = set()
sizes = [dfs(p, seen) for p in points if p not in seen]

# Derive remaining reply
sizes.type(reverse=True)
a, b, c = sizes[:3]
print("Answer:", a * b * c)

A min-heap is a binary tree during which mum or dad nodes have values lower than or equal to the values of their little one nodes; this ensures that the smallest worth is saved on the high of the tree and might be accessed effectively. Within the above answer, this beneficial property of min-heaps is used to shortly determine the closest neighbors among the many given junction bins. The 1000 nearest pairs thus recognized symbolize a 3D graph. Depth-first search is used to traverse the graph ranging from a given junction field and rely the variety of bins which might be in the identical related graph part (i.e., circuit).

In Half Two, useful resource shortage is launched (not sufficient extension cables). We should now proceed connecting the closest unconnected pairs of junction bins collectively till they’re all a part of one massive circuit. The required reply is the results of multiplying collectively the x-coordinates of the final two junction bins that get related. To resolve this downside, we will use a union-find information construction and Kruskal’s algorithm for constructing minimal spanning bushes as follows:

import heapq
from math import dist as euclidean_dist

# Load factors
with open("input_d8.txt") as f:
    factors = [tuple(map(int, line.split(","))) for line in f.read().split()]

# Construct min‑heap of all pairwise distances
dist_heap = [
    (euclidean_dist(a, b), a, b)
    for i, a in enumerate(points)
    for b in points[i+1:]
]
heapq.heapify(dist_heap)

# Outline capabilities to implement Union-Discover
mum or dad = {p: p for p in factors}

def discover(x):
    if mum or dad[x] != x:
        mum or dad[x] = discover(mum or dad[x])
    return mum or dad[x]

def union(a, b):
    ra, rb = discover(a), discover(b)
    if ra == rb:
        return False
    mum or dad[rb] = ra
    return True

# Use Kruskal's algorithm to attach factors till all are in a single part
edges_used = 0
last_pair = None

whereas dist_heap:
    _, a, b = heapq.heappop(dist_heap)
    if union(a, b):
        edges_used += 1
        last_pair = (a, b)
        if edges_used == len(factors) - 1:
            break

# Derive remaining reply
x_product = last_pair[0][0] * last_pair[1][0]
print(x_product)

The situation information is saved in a min-heap and related graph parts are constructed. We repeatedly take the shortest remaining edge between two factors and solely preserve that edge if it connects two beforehand unconnected parts; that is the essential thought behind Kruskal’s algorithm. However to do that effectively, we’d like a approach of shortly figuring out whether or not two factors are already related. If sure, then union(a, b) == False, and we skip the sting to keep away from making a cycle. In any other case, we merge their graph parts. Union-find is a knowledge construction that may carry out this test in practically fixed time. To make use of a company analogy, it’s a bit like asking “Who’s your boss?” repeatedly till you attain the CEO after which rewriting the worth of everybody’s boss to be the identify of the CEO (i.e., the basis). Subsequent time, when somebody asks, “Who’s your boss?”, you may shortly reply with the CEO’s identify. If the roots of two nodes are the identical, the respective parts are merged by attaching one root to the opposite.

The circuit-building downside pertains to clustering and neighborhood detection, that are necessary ideas to know for real-life information science use circumstances. For instance, constructing graph parts by figuring out nearest neighbors might be a part of sensible algorithm for grouping prospects by similarity of preferences, detecting communities in social networks, and clustering geographical places. Kruskal’s algorithm can be utilized to design and optimize networks by minimizing routing prices. Summary ideas resembling Euclidean distances, min-heaps, and union-find assist us measure, prioritize, and manage information at scale.

Configuring Manufacturing facility Machines with Linear Programming

Subsequent, we’ll stroll by means of the issue posed in Day 10: Playground. We’re given a handbook for configuring manufacturing facility machines in a file known as input_d10.txt as proven beneath:

[.##.] (2) (0,3) (2) (2,3) (0,2) (0,1) {3,5,4,7}
[..##.] (0,2,3) (2,3) (0,4) (0,1,2) (1,2,3,4) {7,5,12,8,2}
[.###.#] (0,1,2,3) (0,3,4) (0,1,2,4,5) (1,2) {10,11,9,5,10,5}

Every line describes one machine. The variety of characters within the sq. brackets displays the variety of indicator lights and their desired states (“.” means off and “#” on). All lights will initially be off. Button wiring schematics are proven in parentheses; e.g., urgent the button with schematic “(2, 3)” will flip the present states of the indicator lights at positions 2 and three from “.” to “#” or vice versa. The target of Half One is to find out the minimal button presses wanted to appropriately configure the indicator lights on all given machines. A sublime answer utilizing blended‑integer linear programming (MILP) is proven beneath:

import re
import numpy as np
from scipy.optimize import milp, LinearConstraint, Bounds

# Parse a single machine description line
def parse_machine(line: str):
    # Extract mild sample
    match = re.search(r"[([.#]+)]", line)
    if not match:
        increase ValueError(f"Invalid line: {line}")

    sample = match.group(1)
    m = len(sample)

    # Goal vector: '#' -> 1, '.' -> 0
    goal = np.fromiter((ch == "#" for ch in sample), dtype=int)

    # Extract button wiring
    buttons = [
        [int(x) for x in grp.split(",")] if grp.strip() else []
        for grp in re.findall(r"(([^)]*))", line)
    ]

    # Construct toggle matrix A
    n = len(buttons)
    A = np.zeros((m, n), dtype=int)

    for j, btn in enumerate(buttons):
        for idx in btn:
            if not (0 <= idx < m):
                increase ValueError(f"Button index {idx} out of vary for {m} lights")
            A[idx, j] = 1

    return A, goal

# Clear up all machines within the enter file
def solve_d10_part1(filename):
    with open(filename) as f:
        traces = [line.strip() for line in f if line.strip()]

    complete = 0

    for line in traces:
        A, goal = parse_machine(line)
        m, n = A.form

        # Goal: reduce sum(x)
        c = np.r_[np.ones(n), np.zeros(m)]

        # Specify constraint
        A_eq = np.hstack([A, -2 * np.eye(m)])
        lc = LinearConstraint(A_eq, goal, goal)

        # Outline bounds
        lb = np.zeros(n + m)
        ub = np.r_[np.ones(n), np.full(m, np.inf)]
        bounds = Bounds(lb, ub)

        # Specify integrality
        integrality = np.r_[np.full(n, 2), np.full(m, 1)]

        res = milp(c=c, constraints=[lc], integrality=integrality, bounds=bounds)

        if not res.success:
            increase RuntimeError(f"No possible answer for line: {line}")

        complete += spherical(res.x[:n].sum())

    return complete

print(solve_d10_part1("input_d10.txt"))

First, every machine is encoded as a matrix A during which the rows are the lights and the columns are the buttons. A[i, j] = 1 if button j toggles mild i. Common expressions are used for sample matching on the enter information. Subsequent, we arrange the optimization downside with a binary button‑press vector x, integer slack variables okay, and a goal mild sample t. For every machine, our intention is to decide on button presses x, such that x_j = 1 if the j-th button is pressed and 0 in any other case. The situation “after urgent buttons x, the lights equal goal t” displays the congruence Ax ≡ t (mod 2), however for the reason that MILP solver can’t cope with mod 2 immediately, we categorical the situation as Ax – 2okay = t, for some vector okay consisting solely of non-negative integers; this reformulation works as a result of subtracting a good quantity doesn’t change parity. The integrality specification says that the primary n variables (the button presses) are binary and the remaining m variables (slack) are non-negative integers. We then run the MILP solver with the target of minimizing the variety of button presses wanted to succeed in the goal state. If the solver succeeds, res.x[:n] incorporates the optimum button‑press decisions and the code provides the variety of pressed buttons to a working complete.

In Half Two, the duty is to succeed in a goal state described by the so-called “joltage” necessities, that are proven in curly braces for every machine. The joltage counters of a machine are initially set to 0, and buttons might be pressed any variety of instances to replace the joltage ranges. For instance, the primary machine begins with joltage values “{0, 0, 0, 0}”. Urgent button “(3)” as soon as, “(1, 3)” thrice, “(2,3)” thrice, “(0,2)” as soon as, and (0,1) twice produces the goal state “{3, 5, 4, 7}”. This additionally occurs to be the fewest button presses wanted to succeed in the goal state. Our job is to compute the minimal variety of button presses wanted to realize the goal joltage states for all machines. Once more, this may be solved utilizing MILP as follows:

import re
import numpy as np
from scipy.optimize import milp, LinearConstraint, Bounds

def parse_machine(line: str):
    # Extract joltage necessities
    match = re.search(r"{([^}]*)}", line)
    if not match:
        increase ValueError(f"No joltage necessities in line: {line}")

    goal = np.fromiter((int(x) for x in match.group(1).cut up(",")), dtype=int)
    m = len(goal)

    # Extract button wiring
    buttons = [
        [int(x) for x in grp.split(",")] if grp.strip() else []
        for grp in re.findall(r"(([^)]*))", line)
    ]

    # Construct A (m × n)
    n = len(buttons)
    A = np.zeros((m, n), dtype=int)

    for j, btn in enumerate(buttons):
        for idx in btn:
            if not (0 <= idx < m):
                increase ValueError(f"Button index {idx} out of vary for {m} counters")
            A[idx, j] += 1

    return A, goal

def solve_machine(A, goal):
    m, n = A.form

    # Reduce sum(x)
    c = np.ones(n)

    # Constraint: A x = goal
    lc = LinearConstraint(A, goal, goal)

    # Bounds: x ≥ 0
    bounds = Bounds(np.zeros(n), np.full(n, np.inf))

    # All x are integers
    integrality = np.ones(n, dtype=int)

    res = milp(c=c, constraints=[lc], integrality=integrality, bounds=bounds)
    if not res.success:
        increase RuntimeError("No possible answer")

    return int(spherical(res.enjoyable))

def solve_d10_part2(filename):
    with open(filename) as f:
        traces = [line.strip() for line in f if line.strip()]

    return sum(solve_machine(*parse_machine(line)) for line in traces)

print(solve_d10_part2("input_d10.txt"))

Whereas Half One was a parity downside, Half Two is a counting downside. The core constraint of Half Two might be captured by the linear equation Ax = t, and no slack variables are wanted. In a approach, Half Two is harking back to the integer knapsack downside, the place a knapsack have to be full of the suitable mixture of in another way weighted/sized objects.

Optimization issues resembling these are sometimes a function of knowledge science use circumstances in domains like logistics, provide chain administration, and monetary portfolio administration. The underlying intention is to attenuate or maximize some goal operate topic to numerous constraints. Knowledge scientists would additionally do effectively to grasp using modular arithmetic; see this article for a conceptual overview of modular arithmetic and an exploration of its sensible use circumstances in information science. Lastly, there’s an fascinating conceptual hyperlink between MILP and the notion of function choice with regularization in machine studying. Function choice is about selecting the least variety of options to coach a mannequin with out adversely affecting predictive efficiency. Utilizing MILP is like performing an specific combinatorial search over function subsets with pruning and optimization. L1 regularization quantities to a steady leisure of MILP; the L1 penalty nudges the coefficients of unimportant options in direction of zero. L2 regularization relaxes the MILP constraints even additional by shrinking the coefficients of unimportant options with out setting them to precisely zero.

Reactor Troubleshooting with Community Evaluation

The final downside we’ll take a look at is Day 11: Reactor. We’re supplied with a dictionary illustration of a community of nodes and edges in a file known as input_d11.txt as proven beneath:

you: hhh ccc
hhh: ccc fff iii
…
iii: out

The keys and values are supply and vacation spot nodes (or units as per the issue storyline), respectively. Within the above instance, node “you” is related to nodes “hhh” and “ccc”. The duty in Half One is to rely the variety of totally different paths by means of the community that go from node “you” to “out”. This may be accomplished utilizing depth-first search as follows:

from collections import defaultdict

def parse_input(filename):
    """
    Parse the enter file right into a directed graph.
    Every line has the format: supply: dest1 dest2 ...
    """
    graph = defaultdict(record)
    with open(filename) as f:
        for line in f:
            line = line.strip()
            if not line:
                proceed
            src, dests = line.cut up(":")
            src = src.strip()
            for d in dests.strip().cut up():
                graph[src].append(d.strip())
    return graph

def dfs_paths(graph, begin, objective):
    """
    Generate all paths from begin to objective utilizing DFS.
    """
    stack = [(start, [start])]
    whereas stack:
        (node, path) = stack.pop()
        for next_node in graph.get(node, []):
            if next_node in path:
                # Keep away from cycles
                proceed
            if next_node == objective:
                yield path + [next_node]
            else:
                stack.append((next_node, path + [next_node]))

def solve_d11_part1(filename):
    graph = parse_input(filename)
    all_paths = record(dfs_paths(graph, "you", "out"))
    print(len(all_paths))    

solve_d11_part1("input_d11.txt")

We use an specific stack to implement the search. Every stack entry holds details about the present node and the trail to this point. For every neighbor, we skip it whether it is already within the path, yield the finished path if the neighbor is the “out” node, or push the neighbor and the up to date path onto the stack to proceed our exploration of the remaining community. The search course of thus enumerates all legitimate paths from “you” to “out” and the ultimate code output is the rely of distinct legitimate paths.

In Half Two, we’re requested to rely the variety of paths that go from “svr” to “out” by way of nodes “dac” and “fft”. The constraint of intermediate nodes successfully restricts the variety of legitimate paths within the community. Following is a pattern answer:

from collections import defaultdict
from functools import lru_cache

def parse_input(filename):
    graph = defaultdict(record)
    with open(filename) as f:
        for line in f:
            line = line.strip()
            if not line:
                proceed
            src, dests = line.cut up(":")
            src = src.strip()
            dests = [d.strip() for d in dests.strip().split()]
            graph[src].lengthen(dests)
            for d in dests:
                if d not in graph:
                    graph[d] = []
    return graph

def count_paths_with_constraints(graph, begin, objective, must_visit):
    must_visit = frozenset(must_visit)

    @lru_cache(maxsize=None)
    def dfs(node, seen_required):
        seen_required = frozenset(seen_required)
        if node == objective:
            return 1 if seen_required == must_visit else 0

        complete = 0
        for nxt in graph[node]:
            # Keep away from cycles by not revisiting nodes already in seen_required+path
            # As an alternative of monitoring full path, we assume DAG or small cycles
            new_seen = seen_required | (frozenset([nxt]) & must_visit)
            complete += dfs(nxt, new_seen)
        return complete

    return dfs(begin, frozenset([start]) & must_visit)

def solve_d11_part2(filename):
    graph = parse_input(filename)
    must_visit = {"dac", "fft"}
    total_valid_paths = count_paths_with_constraints(graph, "svr", "out", must_visit)
    print(total_valid_paths)

solve_d11_part2("input_d11.txt")

The code builds on the logic of Half One, in order that we now moreover preserve monitor of visits to the intermediate nodes “dac” and “fft” throughout the depth-first search routine. As within the quantum tachyon manifold puzzle, we leverage memoization to preempt redundant computations.

Issues involving community evaluation are a staple of knowledge science. Path enumeration is immediately related to make use of circumstances regarding telecommunications, web routing, and energy grid optimization. Advanced ETL pipelines are sometimes represented as networks (e.g., directed acyclic graphs), and path counting algorithms can be utilized to determine crucial dependencies or bottlenecks within the workflow. Within the context of recommender engines powered by data graphs, analyzing paths flowing by means of the graph may help with the interpretation of recommender responses. Such recommenders can use paths between entities to justify suggestions, making the system clear by displaying how a advised merchandise is related to a person’s recognized preferences – in any case, we will explicitly hint the reasoning.

The Wrap

On this article now we have seen how the playful eventualities that type the narratives of Introduction of Code puzzles can floor genuinely highly effective concepts, starting from graph search and optimization to linear programming, combinatorics, and constraint fixing. By dissecting these issues and experimenting with totally different answer methods, information scientists can sharpen their algorithmic instincts and construct a flexible toolkit that transfers on to sensible work spanning function engineering, mannequin interpretability, optimization pipelines, and extra. As AI-assisted coding continues to evolve, the flexibility to border, remedy, and critically purpose about such issues will probably stay a key differentiator for information scientists. Introduction of Code gives a enjoyable, low‑stakes technique to preserve these abilities sharp – readers are inspired to aim the opposite puzzles within the 2025 edition and expertise the enjoyment of cracking powerful issues utilizing algorithmic pondering.

Source link

Three OpenClaw Mistakes to Avoid and How to Fix Them

I Stole a Wall Street Trick to Solve a Google Trends Data Problem

Why Your AI Search Evaluation Is Probably Wrong (And How to Fix It)

Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale

EDA in Public (Part 1): Cleaning and Exploring Sales Data with Pandas

The AI Hype Index: The people can’t get enough of AI slop

A Google Gemini model now has a “dial” to adjust how much it reasons

The Math You Need to Pan and Tilt 360° Images

Most Popular

Lessons Learned from Upgrading to LangChain 1.0 in Production

AI system learns from many types of scientific information and runs experiments to discover new materials | MIT News

Why You Should Not Replace Blanks with 0 in Power BI

Our Picks