In my previous article I defined how YOLOv1 works and the right way to assemble the structure from scratch with PyTorch. In right now’s article, I’m going to concentrate on the loss operate used to coach the mannequin. I extremely suggest you learn my earlier YOLOv1 article earlier than studying this one because it covers a number of fundamentals it’s good to know. Click on on the hyperlink at reference quantity [1] to get there.
What’s a Loss Perform?
I imagine all of us already know that loss operate is an especially necessary element in deep studying (and in addition machine studying), the place it’s used to guage how good our mannequin is in predicting the bottom fact. Typically talking, a loss operate ought to be capable of take two enter values, specifically the goal and the prediction made by the mannequin. This operate goes to return a big worth each time the prediction is way from the bottom fact. Conversely, the loss worth might be small each time the mannequin efficiently provides a prediction near the goal.
Usually, a mannequin is used for both classification or regression solely. Nonetheless, YOLOv1 is a bit particular because it incorporates a classification job — to categorise the detected objects, whereas the objects themselves might be enclosed with bounding packing containers which the coordinates and the sizes are decided utilizing steady numbers — therefore a regression job. We sometimes use cross entropy loss when coping with classification job, and for regression we are able to use one thing like MAE, MSE, SSE, or RMSE. However because the prediction made by YOLOv1 includes each classification and regression directly, we have to create a customized loss operate to accommodate each duties. And right here’s the place issues begin to get attention-grabbing.
Breaking Down the Parts
Now let’s take a look on the loss operate itself. Beneath is what it seems like based on the unique YOLOv1 paper [2].
Sure, the above equation seems scary at look, and that’s precisely what I felt after I first noticed it. However don’t fear as you will see that this equation simple as we get deeper into it. I’ll undoubtedly strive my finest to elucidate all the pieces in easy phrases.
Right here you’ll be able to see that the loss operate mainly consists of 5 rows. Now let’s get into every of them one after the other.
Row #1: Midpoint Loss

The primary time period of the loss operate focuses on evaluating the thing midpoint coordinate prediction. You’ll be able to see in Determine 2 above that what it basically does is solely evaluating the anticipated midpoint (x_hat, y_hat) with the corresponding goal midpoint (x, y) by subtraction earlier than summing the squared outcomes of the x and the y elements. We do that iteratively for the 2 predicted bounding packing containers (B) inside all cells (S) and sum the error values from all of them. Or in different phrases, what we mainly do right here is to compute the SSE (Sum of Squared Errors) of the coordinate predictions. Assuming that we use the default YOLOv1 configuration (i.e., S=7 and B=2), we can have the primary and the second sigma iterates 49 and a couple of instances, respectively.
Moreover, the 1^obj variable you see here’s a binary masks, through which the worth could be 1 each time there may be an object midpoint inside the corresponding cell within the floor fact. But when there isn’t any object midpoint contained inside, then the worth could be 0 as an alternative, which cancels out all operations inside that cell as a result of there may be certainly nothing to foretell.
Row #2: Measurement Loss

The main focus of the second row is to guage the correctness of the bounding field measurement. I imagine the above variables are fairly simple: w denotes the width and h denotes the peak, the place those with hats are the predictions made by the mannequin. In the event you take a better take a look at this row, you’ll discover that that is mainly the identical because the earlier one, besides that right here we take the sq. root of the variables first earlier than doing the remaining computation.
Using sq. root is definitely a really intelligent thought. Naturally, if we immediately compute the variables as they’re (with out sq. root), the identical inaccuracy on small bounding field could be weighted the identical as that on massive bounding field. That is truly not a superb factor as a result of the identical deviation within the variety of pixels on small field will visually seem extra misaligned from the bottom fact than that of the bigger field. Take a look at Determine 4 under to higher perceive this concept. Right here you’ll be able to see that although the deviation of each circumstances are 60 pixels on the peak axis, however on the smaller bounding field the error seems worse. This is usually because within the case of the smaller field the deviation of 60 pixels is 75% of the particular object top, whereas on the bigger field it solely deviates 25% from the goal top.

By taking the sq. root of w and h, we can have inaccuracy in smaller field penalized greater than that within the bigger one. Let’s perform a little little bit of math to show this. To make issues less complicated, I put the 2 examples in Determine 4 to Gemini and let it compute the peak prediction error primarily based on the equation in Determine 3. You’ll be able to see the consequence under that the error of the small bounding field prediction is bigger than that of the massive bounding field (8.349 vs 3.345).

Row #3: Object Loss

Shifting on to the third row, this a part of the YOLOv1 loss operate is used to measure how assured the mannequin is in predicting whether or not or not there may be an object inside a cell. Every time an object is current within the floor fact, we have to set C to the IoU of the bounding field. Assuming that the anticipated field completely matches with the goal field, we basically need our mannequin to supply C_hat near 1. But when the anticipated field shouldn’t be fairly correct, say it has an IoU of 0.8, then we count on our mannequin to supply C_hat near 0.8 as effectively. Simply consider it like this: if the bounding field itself is inaccurate, then we must always count on our mannequin to know that the thing doesn’t completely current inside that field. In the meantime, each time an object is certainly not current within the floor fact, then the variable C must be precisely 0. Once more, we then sum all of the squared distinction between C and C_hat throughout all predictions made all through your complete picture to acquire the object loss of a single picture.
It’s price noting that C_hat is designed to replicate two issues concurrently: the likelihood that the thing being there (a.okay.a. objectness) and the accuracy of the bounding field (IoU). That is basically the rationale that we outline floor fact C because the multiplication of the objectness and the IoU as talked about within the paper. By doing so, we implicitly ask the mannequin to offer C_hat, whose worth incorporates each parts.

As a refresher, IoU is a metric we generally use to measure how good our bounding field prediction is in comparison with the bottom fact when it comes to space protection. The best way to compute IoU is solely to take the ratio of the intersection of the goal and predicted bounding packing containers to the union of them, therefore the identify: Intersection over Union.

Row #4: No Object Loss

The so-called no-object loss is kind of distinctive. Regardless of having an analogous computation because the object loss within the third row, the binary masks 1^noobj causes this half to work one thing just like the inverse of the object loss. It is because the binary masks worth could be 1 if there isn’t any object midpoint current inside a cell within the floor fact. In any other case, if an object midpoint is current, then the binary masks could be 0, inflicting the remaining operations for that single cell to be canceled out. So briefly, this row goes to return a non-zero quantity each time there isn’t any object within the floor fact however is predicted as containing an object midpoint.
Row #5: Classification Loss

The final row within the YOLOv1 loss operate is the classification loss. This a part of the loss operate is probably the most simple if I have been to say, as a result of what we basically do right here is simply to match the precise and the anticipated class, which is analogous to the one within the typical multi-class classification job. Nonetheless, what you want to bear in mind right here is that we nonetheless use the identical regression loss (i.e., SSE) to compute the error. It’s talked about within the paper that the authors determined to make use of this regression loss for each the regression and the classification elements for the sake of simplicity.
Adjustable Parameters
Discover that I truly haven’t mentioned the λ_coord and λ_noobj parameters. The previous is used to offer extra weight to the bounding field prediction, which is why it’s utilized to the primary and the second row of the loss operate. You’ll be able to return to Determine 1 to confirm this. The λ_coord parameter by default is ready to a big worth (i.e., 5) as a result of we would like our mannequin to concentrate on the correctness of the bounding field creation. So, any small inaccuracy within the xywh prediction might be penalized 5 instances bigger than what it must be.
In the meantime, λ_noobj is used to regulate the no-object loss, i.e., the one within the fourth row within the loss operate. It’s talked about within the paper that the authors set a default worth of 0.5 for this parameter, which mainly causes the no-object loss half to not be weighted as a lot. That is mainly as a result of within the case of object detection the variety of objects is usually a lot lower than the overall variety of cells, inflicting nearly all of the cells to not comprise an object. Thus, if we don’t give a small multiplier to the time period, the no-object loss will give a really excessive contribution to the overall loss, which actually shouldn’t be that necessary. By setting λ_noobj to a small quantity, we are able to suppress the contribution of this loss.
Code Implementation
I do acknowledge that our earlier dialogue was very mathy. Don’t fear when you haven’t grasped your complete thought of the loss operate simply but. I imagine you’ll ultimately perceive as soon as we get into the code implementation.
So now, let’s begin the code by importing the required modules as proven in Codeblock 1 under.
# Codeblock 1
import torch
import torch.nn as nn
The IoU Perform
Earlier than we get into the YOLOv1 loss, we are going to first create a helper to calculate IoU, which might be used inside the principle YOLOv1 operate. Take a look at the Codeblock 2 under to see how I implement it.
# Codeblock 2
def intersection_over_union(boxes_targets, boxes_predictions):
box2_x1 = boxes_targets[..., 0:1] - boxes_targets[..., 2:3] / 2
box2_y1 = boxes_targets[..., 1:2] - boxes_targets[..., 3:4] / 2
box2_x2 = boxes_targets[..., 0:1] + boxes_targets[..., 2:3] / 2
box2_y2 = boxes_targets[..., 1:2] + boxes_targets[..., 3:4] / 2
box1_x1 = boxes_predictions[..., 0:1] - boxes_predictions[..., 2:3] / 2
box1_y1 = boxes_predictions[..., 1:2] - boxes_predictions[..., 3:4] / 2
box1_x2 = boxes_predictions[..., 0:1] + boxes_predictions[..., 2:3] / 2
box1_y2 = boxes_predictions[..., 1:2] + boxes_predictions[..., 3:4] / 2
x1 = torch.max(box1_x1, box2_x1)
y1 = torch.max(box1_y1, box2_y1)
x2 = torch.min(box1_x2, box2_x2)
y2 = torch.min(box1_y2, box2_y2)
intersection = (x2 - x1).clamp(0) * (y2 - y1).clamp(0) #(1)
box1_area = torch.abs((box1_x2 - box1_x1) * (box1_y2 - box1_y1))
box2_area = torch.abs((box2_x2 - box2_x1) * (box2_y2 - box2_y1))
union = box1_area + box2_area - intersection + 1e-6 #(2)
iou = intersection / union #(3)
return iou
The intersection_over_union() operate above works by taking two enter parameters, specifically the bottom fact (boxes_targets) and the anticipated bounding packing containers (boxes_predictions). These two inputs are mainly arrays of size 4, storing the x, y, w, and h values. Observe that x and y are the coordinate of the field midpoint, not the top-left nook. The bounding field data is then extracted in order that we are able to compute the intersection (#(1)) and the union (#(2)). We will lastly get hold of the IoU utilizing the code at line #(3). Along with line #(2), right here we additionally want so as to add a really small worth on the finish of the operation (1e-6 = 0.000001). This quantity is helpful to stop division-by-zero error within the case when the world of the anticipated bounding field is 0 for some causes.
Now let’s run the intersection_over_union() operate we simply created on a number of take a look at circumstances with a view to test if it really works correctly. The three examples in Determine 11 under present intersections with excessive, medium, and low IoU (from left to proper, respectively).

All of the packing containers you see right here have the scale of 200×200 px, and what makes the three circumstances totally different is just their space of the intersections. In the event you take a better take a look at the Codeblock 3 under, you will notice that the anticipated packing containers (pred_{0,1,2}) are shifted by 20, 100, and 180 pixels from their respective targets (target_{0,1,2}) alongside each the horizontal and vertical axes.
# Codeblock 3
target_0 = torch.tensor([[0., 0., 200., 200.]])
pred_0 = torch.tensor([[20., 20., 200., 200.]])
iou_0 = intersection_over_union(target_0, pred_0)
print('iou_0:', iou_0)
target_1 = torch.tensor([[0., 0., 200., 200.]])
pred_1 = torch.tensor([[100., 100., 200., 200.]])
iou_1 = intersection_over_union(target_1, pred_1)
print('iou_1:', iou_1)
target_2 = torch.tensor([[0., 0., 200., 200.]])
pred_2 = torch.tensor([[180., 180., 200., 200.]])
iou_2 = intersection_over_union(target_2, pred_2)
print('iou_2:', iou_2)
Because the above code is run, you’ll be able to see that our instance on the left has the very best IoU of 0.6807, adopted by the one within the center and the one on the appropriate with the scores of 0.1429 and 0.0050, a pattern that’s precisely what we anticipated earlier. This basically proves that our intersection_over_union() operate works effectively.
# Codeblock 3 Output
iou_0: tensor([[0.6807]])
iou_1: tensor([[0.1429]])
iou_2: tensor([[0.0050]])
The YOLOv1 Loss Perform
There’s truly one other factor we have to do earlier than creating the loss operate, specifically instantiating an nn.MSELoss occasion which is able to assist us compute the error values throughout all cells. Because the identify suggests, this operate by default will compute MSE (Imply Squared Error). Since we would like the error worth to be summed as an alternative of averaged, we have to set the discount parameter to sum as proven in Codeblock 4 under. Subsequent, we initialize the lambda_coord, lambda_noobj, S, B, and C parameters, which on this case I set all of them to their default values talked about within the unique paper. Right here I additionally initialize the BATCH_SIZE parameter which signifies the variety of samples we’re going to course of in a single ahead cross.
# Codeblock 4
sse = nn.MSELoss(discount="sum")
lambda_coord = 5
lambda_noobj = 0.5
S = 7
B = 2
C = 20
BATCH_SIZE = 1
Alright, as all pre-requisite variables have been initialized, now let’s truly outline the loss() operate for the YOLOv1 mannequin. This operate is kind of lengthy, so I made a decision to interrupt it down into a number of elements. Simply be certain that all the pieces is positioned inside the similar cell if you wish to strive operating this code by yourself pocket book.
You’ll be able to see in Codeblock 5a under that this operate takes two enter arguments: goal and prediction (#(1)). Keep in mind that initially the output of YOLOv1 (the prediction) is a protracted single dimensional tensor of size 1470, whereas the size of the goal tensor is 1225. What we have to do first contained in the loss() operate is to reshape them into 7×7×30 (#(3)) and seven×7×25 (#(2)), respectively, in order that we are able to course of the data contained in each tensors simply.
# Codeblock 5a
def loss(goal, prediction): #(1)
goal = goal.reshape(-1, S, S, C+5) #(2)
prediction = prediction.reshape(-1, S, S, C+B*5) #(3)
obj = goal[..., 20].unsqueeze(3) #(4)
noobj = 1 - obj #(5)
Subsequent, the code at traces #(4) and #(5) are simply how we implement the 1^obj and 1^noobj binary masks. At line #(4) we take the worth at index 20 from the goal tensor and retailer it in obj variable. Index 20 itself corresponds to the bounding field confidence (see Determine 12), which if there may be an object midpoint inside the cell, then the worth of that index could be 1. In any other case, if object midpoint shouldn’t be current, then the worth could be 0. Conversely, the noobj variable I initialize at line #(5) will act because the inverse of obj, which the worth could be 1 if there isn’t any object midpoint current within the grid cell.

Now let’s transfer on to Codeblock 5b, the place we compute the bounding field error, which corresponds to the primary and the second rows of the loss operate. What we basically do initially is to take the xywh values from the goal tensor (indices 21, 22, 23, and 24). This may be achieved with a easy array slicing approach as proven at line #(1). Subsequent, we do the identical factor to the predicted tensor. Nonetheless, do not forget that since our mannequin generates two bounding packing containers for every cell, we have to retailer their xywh values into two separate variables: pred_bbox0 and pred_bbox1 (#(2–3)).
In Determine 12, the sliced indices are those known as x1, y1, w1, h1, and x2, y2, w2, h2. Among the many two bounding field predictions, we are going to solely take the one which finest approximates the goal field. Therefore, we have to compute the IoU between each predicted packing containers and the goal field utilizing the code at line #(4) and #(5). The expected bounding field that produces the very best IoU is chosen utilizing torch.max() at line #(6). The xywh values of one of the best bounding field prediction will then be saved in best_bbox, whereas the corresponding data of the field that has the decrease IoU might be discarded (#(8)). At traces #(7) and #(8) itself we multiply each the precise xywh and one of the best predicted xywh with obj, which is how we apply the 1^obj masks.
At this level we have already got our x and y values able to be processed with the sse operate we initialized earlier. Nonetheless, do not forget that we nonetheless want to use sq. root to w and h beforehand, which I do at line #(9) and #(10) for the goal and one of the best prediction vectors, respectively. One factor that you just want to bear in mind at line #(10) is that we must always take absolutely the worth of the numbers earlier than making use of torch.sqrt() simply to stop us from computing the sq. root of detrimental numbers. Not solely that, additionally it is crucial so as to add a really small quantity (1e-6) to make sure that we gained’t take the sq. root of 0, which is able to trigger numerical instability. Nonetheless with the identical line, we then multiply the ensuing tensor with its unique signal that we preserved earlier utilizing torch.signal().
Lastly, as we’ve got utilized torch.sqrt() to the w and h parts of target_bbox and best_bbox, we are able to now cross each tensors to the sse() operate as proven at line #(11). Observe that the loss worth saved in bbox_loss already consists of each the error from the primary and the second row of the YOLOv1 loss operate.
# Codeblock 5b
target_bbox = goal[..., 21:25] #(1)
pred_bbox0 = prediction[..., 21:25] #(2)
pred_bbox1 = prediction[..., 26:30] #(3)
iou_pred_bbox0 = intersection_over_union(pred_bbox0, target_bbox) #(4)
iou_pred_bbox1 = intersection_over_union(pred_bbox1, target_bbox) #(5)
iou_pred_bboxes = torch.cat([iou_pred_bbox0.unsqueeze(0),
iou_pred_bbox1.unsqueeze(0)],
dim=0)
best_iou, best_bbox_idx = torch.max(iou_pred_bboxes, dim=0) #(6)
target_bbox = obj * target_bbox #(7)
best_bbox = obj * (best_bbox_idx*pred_bbox1 #(8)
+ (1-best_bbox_idx)*pred_bbox0)
target_bbox[..., 2:4] = torch.sqrt(target_bbox[..., 2:4]) #(9)
best_bbox[..., 2:4] = torch.signal(best_bbox[..., 2:4]) * torch.sqrt(torch.abs(best_bbox[..., 2:4]) + 1e-6) #(10)
bbox_loss = sse( #(11)
torch.flatten(target_bbox, end_dim=-2),
torch.flatten(best_bbox, end_dim=-2)
)
The subsequent element we are going to implement is the object loss. Check out the Codeblock 5c under to see how I try this.
# Codeblock 5c
target_bbox_confidence = goal[..., 20:21] #(1)
pred_bbox0_confidence = prediction[..., 20:21] #(2)
pred_bbox1_confidence = prediction[..., 25:26] #(3)
target_bbox_confidence = obj * target_bbox_confidence #(4)
best_bbox_confidence = obj * (best_bbox_idx*pred_bbox1_confidence #(5)
+ (1-best_bbox_idx)*pred_bbox0_confidence)
object_loss = sse( #(6)
torch.flatten(obj * target_bbox_confidence * best_iou), #(7)
torch.flatten(obj * best_bbox_confidence),
)
What we initially do within the codeblock above is to take the worth at index 20 from the goal vector (#(1)). In the meantime for the prediction vector, we have to take the values at indices 20 and 25 (#(2–3)), through which they correspond to the arrogance scores of every of the 2 packing containers generated by the mannequin. You’ll be able to return to Determine 12 to confirm this.
Subsequent, at line #(5) I take the arrogance of the field prediction that has the upper IoU. The code at line #(4) is definitely not crucial as a result of obj and target_bbox_confidence are mainly the identical factor. You’ll be able to confirm this by checking the code at line #(4) in Codeblock 5a. I truly do that anyway for the sake of readability as a result of we basically have each C and C_hat multiplied with 1^obj within the unique equation (see Determine 6).
Afterwards, we compute the SSE between the bottom fact confidence (target_bbox_confidence) and the anticipated confidence (best_bbox_confidence) (#(6)). It is very important word at line #(7) that we have to multiply the bottom fact confidence with the IoU of one of the best bounding field prediction (best_iou). It is because the paper mentions that each time there may be an object midpoint inside a cell, then we would like the prediction confidence equal to that IoU rating. — And this ends our dialogue in regards to the implementation of object loss.
Now the Codeblock 5d under focuses on computing the no-object loss. The code is kind of easy since right here we reuse the target_bbox_confidence and the pred_bbox{0,1}_confidence we initialized within the earlier codeblock. These variables have to be multiplied with the noobj masks earlier than the SSE computation is carried out. Observe that the error made by the 2 predicted packing containers must be summed, which is the rationale why you see the addition operation at line #(1).
# Codeblock 5d
no_object_loss = sse(
torch.flatten(noobj * target_bbox_confidence),
torch.flatten(noobj * pred_bbox0_confidence),
)
no_object_loss += sse( #(1)
torch.flatten(noobj * target_bbox_confidence),
torch.flatten(noobj * pred_bbox1_confidence),
)
Lastly, we compute the classification loss utilizing the Codeblock 5e under, through which this corresponds to the fifth row within the unique equation. Keep in mind that the unique YOLOv1 was skilled on the 20-class PASCAL VOC dataset. That is mainly the rationale that we take the primary 20 indices from the goal and prediction vectors (#(1–2)). Then, we are able to merely cross the 2 into the sse() operate (#(3)).
# Codeblock 5e
target_class = goal[..., :20] #(1)
pred_class = prediction[..., :20] #(2)
class_loss = sse( #(3)
torch.flatten(obj * target_class, end_dim=-2),
torch.flatten(obj * pred_class, end_dim=-2),
)
As we’ve got already accomplished the 5 parts of the YOLOv1 loss operate, what we have to do now’s to sum all the pieces up utilizing the next codeblock. Don’t overlook to offer weightings to bbox_loss and no_object_loss by multiplying them with their corresponding lambda parameters we initialized earlier (#(1–2)).
# Codeblock 5f
total_loss = (
lambda_coord * bbox_loss #(1)
+ object_loss
+ lambda_noobj * no_object_loss #(2)
+ class_loss
)
return bbox_loss, object_loss, no_object_loss, class_loss, total_loss
Take a look at Instances
On this part I’m going to exhibit the right way to run the loss() operate we simply created on a number of take a look at circumstances. Now take note of the Determine 13 under as I’ll make the next take a look at circumstances primarily based on this picture.

Bounding Field Loss Instance
The bbox_loss_test() operate in Codeblock 6 under focuses on testing whether or not the bounding field loss is working correctly. On the traces marked with #(1) and #(2) I initialize two all-zero tensors which I confer with as goal and prediction. I set the scale of those two tensors to 1×7×7×25 and 1×7×7×30, respectively, in order that we are able to modify the weather intuitively. We assume that the picture in Determine 13 as the bottom fact, therefore we have to retailer the bounding field data within the corresponding indices of the goal tensor.
The indexer [0] within the 0th axis signifies that we entry the primary (and the one one) picture within the batch (#(3)). Subsequent, [3,3] within the 1st and 2nd axes denotes the placement of the grid cell the place the thing midpoint is situated. We slice the tensor with [21:25] as a result of we need to replace the values contained in these indices with [0.4, 0.5, 2.4, 3.2], through which they correspond to the x, y, w and h values of the bounding field. The worth at index 20, which is the place the goal bounding field confidence is saved, is ready to 1 because the object midpoint is situated inside this cell (#(4)). Subsequent, the index that corresponds to class cat (the category at index 7) additionally must be set to 1 (#(5)), similar to how we create one-hot encoding label in a typical classification job. You’ll be able to refer again to Determine 12 to confirm that the category cat is certainly on the seventh index.
# Codeblock 6
def bbox_loss_test():
goal = torch.zeros(BATCH_SIZE, S, S, (C+5)) #(1)
prediction = torch.zeros(BATCH_SIZE, S, S, (C+B*5)) #(2)
goal[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2]) #(3)
goal[0, 3, 3, 20] = 1.0 #(4)
goal[0, 3, 3, 7] = 1.0 #(5)
prediction[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2]) #(6)
#prediction[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.8, 4.0]) #(7)
#prediction[0, 3, 3, 21:25] = torch.tensor([0.3, 0.2, 3.2, 4.3]) #(8)
goal = goal.reshape(BATCH_SIZE, S*S*(C+5)) #(9)
prediction = prediction.reshape(BATCH_SIZE, S*S*(C+B*5)) #(10)
bbox_loss = loss(goal, prediction)[0] #(11)
return bbox_loss
bbox_loss_test()
You’ll be able to see within the above codeblock that I ready three take a look at circumstances at line #(6–8), through which the one at line #(6) is a situation the place the anticipated bounding field midpoint and the thing measurement matches precisely with the bottom fact. In that specific case, our bbox_loss could be 1.8474e-13, which is an especially small quantity. Keep in mind that it doesn’t return precisely 0 due to the 1e-6 we added throughout the IoU and the sq. root calculations. In the meantime within the second take a look at case, I assume that the midpoint prediction is appropriate, however the field measurement is a bit too massive. In the event you attempt to run this, we can have our bbox_loss improve to 0.0600. Third, I additional enlarge the bounding field prediction and in addition shift from the precise place. And in such a case, our bbox_loss will get even bigger to 0.2385.
By the way in which, it is very important do not forget that the loss operate we outlined earlier expects the goal and prediction tensors to have the scale of 1×1225 and 1×1470, respectively. Therefore, we have to reshape them (#(9–10)) accordingly earlier than ultimately computing the loss worth (#(11)).
# Codeblock 6 Output
Case 1: tensor(1.8474e-13)
Case 2: tensor(0.0600)
Case 3: tensor(0.2385)
Object Loss Instance
To test whether or not the object loss is appropriate, we have to concentrate on the worth at index 20. What we do initially within the object_loss_test() operate under is just like the earlier one, specifically creating the goal and prediction tensors (#(1–2)) and initializing floor fact vector for cell (3, 3) (#(3–5)). Right here we assume that the bounding field prediction completely aligns with the precise bounding field (#(6)).
# Codeblock 7
def object_loss_test():
goal = torch.zeros(BATCH_SIZE, S, S, (C+5)) #(1)
prediction = torch.zeros(BATCH_SIZE, S, S, (C+B*5)) #(2)
goal[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2]) #(3)
goal[0, 3, 3, 20] = 1.0 #(4)
goal[0, 3, 3, 7] = 1.0 #(5)
prediction[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2]) #(6)
prediction[0, 3, 3, 20] = 1.0 #(7)
#prediction[0, 3, 3, 20] = 0.9 #(8)
#prediction[0, 3, 3, 20] = 0.6 #(9)
goal = goal.reshape(BATCH_SIZE, S*S*(C+5))
prediction = prediction.reshape(BATCH_SIZE, S*S*(C+B*5))
object_loss = loss(goal, prediction)[1]
return object_loss
object_loss_test()
I’ve arrange three take a look at circumstances particularly for the object loss. The primary one is the case when the mannequin is completely assured that there’s a field midpoint inside the cell, or in different phrases, this can be a situation the place the arrogance is 1 (#(7)). In the event you attempt to run this, the ensuing object loss could be 1.4211e-14, which is once more a price very near zero. You may as well see within the ensuing output under that the object loss will increase to 0.0100 and 0.1600 as we lower the anticipated confidence to 0.9 and 0.6 (#(8–9)), which is precisely what we anticipated.
# Codeblock 7 Output
Case 1: tensor(1.4211e-14)
Case 2: tensor(0.0100)
Case 3: tensor(0.1600)
Classification Loss Instance
Speaking in regards to the classification loss, let’s now see if our loss operate can actually penalize misclassifications. Identical to the earlier ones, within the Codeblock 8 under I ready three take a look at circumstances, through which the primary one is the situation the place the mannequin appropriately provides excellent confidence to class cat and on the similar time leaving all different class chances to 0 (#(1)). In the event you attempt to run this, the ensuing classification loss could be precisely 0. Subsequent, when you lower the arrogance of predicting cat to 0.9 whereas barely growing the arrogance for sophistication chair (index 8) to 0.1 as proven at line #(2), we are going to get our classification loss to extend to 0.0200. The loss worth will get even bigger to 1.2800 after I assume that the mannequin misclassifies cat as chair by assigning a really low confidence for the cat (0.2) and a excessive confidence for the chair (0.8) (#(3)). This basically signifies that our loss operate implementation is proven to have the ability to measure errors in classification correctly.
# Codeblock 8
def class_loss_test():
goal = torch.zeros(BATCH_SIZE, S, S, (C+5))
prediction = torch.zeros(BATCH_SIZE, S, S, (C+B*5))
goal[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2])
goal[0, 3, 3, 20] = 1.0
goal[0, 3, 3, 7] = 1.0
prediction[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2])
prediction[0, 3, 3, 7] = 1.0 #(1)
#prediction[0, 3, 3, 7:9] = torch.tensor([0.9, 0.1]) #(2)
#prediction[0, 3, 3, 7:9] = torch.tensor([0.2, 0.8]) #(3)
goal = goal.reshape(BATCH_SIZE, S*S*(C+5))
prediction = prediction.reshape(BATCH_SIZE, S*S*(C+B*5))
class_loss = loss(goal, prediction)[3]
return class_loss
class_loss_test()
# Codeblock 8 Output
Case 1: tensor(0.)
Case 2: tensor(0.0200)
Case 3: tensor(1.2800)
No Object Loss Instance
Now with a view to take a look at our implementation on the no-object loss half, we’re going to study the cell that doesn’t comprise any object midpoint, which right here I give you the grid cell at coordinate (1, 1). Because the object within the picture is just the one situated at grid cell (3, 3), the goal bounding field confidence for coordinate (1, 1) must be set to 0, as proven at line #(1) in Codeblock 9. In reality, this step shouldn’t be fairly crucial as a result of we already set the tensors to be all-zero within the first place — however I do it anyway for readability. Keep in mind that this no-object loss half might be activated solely when the goal bounding field confidence is 0 like this. In any other case, each time the goal field confidence is 1 (i.e., there may be an object midpoint inside the cell), then the no-object loss half will at all times return 0.
Right here I ready two take a look at circumstances, through which the primary one is when the values at indices 20 and 25 of the prediction tensor are each 0 as written at line #(2) and #(3), specifically when our YOLOv1 mannequin appropriately predicts that there isn’t any bounding field midpoint inside the cell. The loss worth will improve once we use the code at line #(4) and #(5) as an alternative, through which it simulates the mannequin considerably thinks that there must be objects in there whereas it’s truly not. You’ll be able to see within the ensuing output under that the loss worth now will increase to 0.1300, which is predicted.
# Codeblock 9
def no_object_loss_test():
goal = torch.zeros(BATCH_SIZE, S, S, (C+5))
prediction = torch.zeros(BATCH_SIZE, S, S, (C+B*5))
goal[0, 1, 1, 20] = 0.0 #(1)
prediction[0, 1, 1, 20] = 0.0 #(2)
prediction[0, 1, 1, 25] = 0.0 #(3)
#prediction[0, 1, 1, 20] = 0.2 #(4)
#prediction[0, 1, 1, 25] = 0.3 #(5)
goal = goal.reshape(BATCH_SIZE, S*S*(C+5))
prediction = prediction.reshape(BATCH_SIZE, S*S*(C+B*5))
no_object_loss = loss(goal, prediction)[2]
return no_object_loss
no_object_loss_test()
# Codeblock 9 Output
Case 1: tensor(0.)
Case 2: tensor(0.1300)
Ending
And effectively, I feel that’s just about all the pieces in regards to the loss operate of the YOLOv1 mannequin. We have now utterly mentioned the formal mathematical expression of the loss operate, applied it from scratch, and carried out testing on every of the parts. Thanks very a lot for studying, I hope you be taught one thing new from this text. Please let me know when you spot any errors in my rationalization or within the code. See ya in my subsequent article!
By the way in which you can even discover the code in my GitHub repository. Click on the hyperlink at reference quantity [4].
References
[1] Muhammad Ardi. YOLOv1 Paper Walkthrough: The Day YOLO First Noticed the World. In direction of Information Science. https://towardsdatascience.com/yolov1-paper-walkthrough-the-day-yolo-first-saw-the-world/ [Accessed December 18, 2025].
[2] Joseph Redmon et al. You Solely Look As soon as: Unified, Actual-Time Object Detection. Arxiv. https://arxiv.org/pdf/1506.02640 [Accessed July 25, 2024].
[3] Picture created initially by creator.
[4] MuhammadArdiPutra. Regression For All — YOLOv1 Loss Perform. GitHub. https://github.com/MuhammadArdiPutra/medium_articles/blob/main/Regression%20For%20All%20-%20YOLOv1%20Loss%20Function.ipynb [Accessed July 25, 2024].
