is a recreation changer in Machine Studying. In truth, within the current historical past of Deep Learning, the thought of permitting fashions to deal with essentially the most related components of an enter sequence when making a prediction utterly revolutionized the way in which we have a look at Neural Networks.
That being mentioned, there may be one controversial take that I’ve concerning the consideration mechanism:
The easiest way to be taught the eye mechanism is not by way of Pure Language Processing (NLP)
It’s (technically) a controversial take for 2 causes.
- Individuals naturally use NLP circumstances (e.g., translation or NSP) as a result of NLP is the rationale why the eye mechanism was developed within the first place. The unique purpose was to overcome the restrictions of RNNs and CNNs in dealing with long-range dependencies in language (when you haven’t already, it’s best to actually learn the paper Attention is All You Need).
- Second, I may also should say that so as to perceive the final concept of placing the “consideration” on a particular phrase to do translation duties could be very intuitive.
That being mentioned, if we need to perceive how consideration REALLY works in a hands-on instance, I consider that Time Series is the perfect framework to make use of. There are numerous explanation why I say that.
- Computer systems aren’t actually “made” to work with strings; they work with ones and zeros. All of the embedding steps which might be essential to convert the textual content into vectors add an additional layer of complexity that’s not strictly associated to the eye concept.
- The eye mechanism, although it was first developed for textual content, has many different functions (for instance, in laptop imaginative and prescient), so I like the thought of exploring consideration from one other angle as effectively.
- With time collection particularly, we will create very small datasets and run our consideration fashions in minutes (sure, together with the coaching) with none fancy GPUs.
On this weblog submit, we are going to see how we will construct an consideration mechanism for time collection, particularly in a classification setup. We are going to work with sine waves, and we are going to attempt to classify a traditional sine wave with a “modified” sine wave. The “modified” sine wave is created by flattening a portion of the unique sign. That’s, at a sure location within the wave, we merely take away the oscillation and substitute it with a flat line, as if the sign had briefly stopped or grow to be corrupted.
To make issues extra spicy, we are going to assume that the sine can have no matter frequency or amplitude, and that the location and extension (we name it size) of the “rectified” half are additionally parameters. In different phrases, the sine could be no matter sine, and we will put our “straight line” wherever we like on the sine wave.
Properly, okay, however why ought to we even hassle with the eye mechanism? Why are we not utilizing one thing easier, like Feed Ahead Neural Networks (FFNs) or Convolutional Neural Networks (CNNs)?
Properly, as a result of once more we’re assuming that the “modified” sign could be “flattened” in every single place (in no matter location of the timeseries), and it may be flattened for no matter size (the rectified half can have no matter size). Because of this a normal Neural Community shouldn’t be that environment friendly, as a result of the anomalous “half” of the timeseries shouldn’t be all the time in the identical portion of the sign. In different phrases, if you’re simply making an attempt to cope with this with a linear weight matrix + a non linear perform, you should have suboptimal outcomes, as a result of index 300 of time collection 1 could be utterly completely different from index 300 of time collection 14. What we want as a substitute is a dynamic method that places the eye on the anomalous a part of the collection. Because of this (and the place) the eye methodology shines.
This weblog submit might be divided into these 4 steps:
- Code Setup. Earlier than moving into the code, I’ll show the setup, with all of the libraries we are going to want.
- Information Technology. I’ll present the code that we’ll want for the information technology half.
- Mannequin Implementation. I’ll present the implementation of the eye mannequin
- Exploration of the outcomes. The good thing about the eye mannequin might be displayed by way of the eye scores and classification metrics to evaluate the efficiency of our method.
It looks like we have now a number of floor to cowl. Let’s get began! 🚀
1. Code Setup
Earlier than delving into the code, let’s invoke some buddies that we’ll want for the remainder of the implementation.
These are simply default values that can be utilized all through the mission. What you see under is the brief and candy necessities.txt file.
I prefer it when issues are straightforward to vary and modular. For that reason, I created a .json file the place we will change all the pieces concerning the setup. A few of these parameters are:
- The variety of regular vs irregular time collection (the ratio between the 2)
- The variety of time collection steps (how lengthy your timeseries is)
- The dimensions of the generated dataset
- The min and max areas and lengths of the linearized half
- Way more.
The .json file appears like this.
So, earlier than going to the following step, be sure you have:
- The constants.py file is in your work folder
- The .json file in your work folder or in a path that you just bear in mind
- The libraries within the necessities.txt file have been put in
2. Information Technology
Two easy features construct the traditional sine wave and the modified (rectified) one. The code for that is present in data_utils.py:
Now that we have now the fundamentals, we will do all of the backend work in knowledge.py. That is meant to be the perform that does all of it:
- Receives the setup info from the .json file (that’s why you want it!)
- Builds the modified and regular sine waves
- Does the prepare/take a look at cut up and prepare/val/take a look at cut up for the mannequin validation
The info.py script is the next:
The extra knowledge script is the one which prepares the information for Torch (SineWaveTorchDataset), and it appears like this:
If you wish to have a look, it is a random anomalous time collection:
And it is a non-anomalous time collection:

Now that we have now our dataset, we will fear concerning the mannequin implementation.
3. Mannequin Implementation
The implementation of the mannequin, the coaching, and the loader could be discovered within the mannequin.py code:
Now, let me take a while to clarify why the eye mechanism is a game-changer right here. Not like FFNN or CNN, which might deal with all time steps equally, consideration dynamically highlights the components of the sequence that matter most for classification. This enables the mannequin to “zoom in” on the anomalous part (no matter the place it seems), making it particularly highly effective for irregular or unpredictable time collection patterns.
Let me be extra exact right here and discuss concerning the Neural Community.
In our mannequin, we use a bidirectional LSTM to course of the time collection, capturing each previous and future context at every time step. Then, as a substitute of feeding the LSTM output straight right into a classifier, we compute consideration scores over your complete sequence. These scores decide how a lot weight every time step ought to have when forming the ultimate context vector used for classification. This implies the mannequin learns to focus solely on the significant components of the sign (i.e., the flat anomaly), irrespective of the place they happen.
Now let’s join the mannequin and the information to see the efficiency of our method.
4. A sensible instance
4.1 Coaching the Mannequin
Given the large backend half that we develop, we will prepare the mannequin with this tremendous easy block of code.
This took round 5 minutes on the CPU to finish.
Discover that we applied (on the backend) an early stopping and a prepare/val/take a look at to keep away from overfitting. We’re accountable youngsters.
4.2 Consideration Mechanism
Let’s use the next perform right here to show the eye mechanism along with the sine perform.
Let’s present the eye scores for a traditional time collection.

As we will see, the eye scores are localized (with a kind of time shift) on the areas the place there’s a flat half, which might be close to the peaks. Nonetheless, once more, these are solely localized spikes.
Now let’s have a look at an anomalous time collection.

As we will see right here, the mannequin acknowledges (with the identical time shift) the realm the place the perform flattens out. Nonetheless, this time, it’s not a localized peak. It’s a complete part of the sign the place we have now greater than ordinary scores. Bingo.
4.3 Classification Efficiency
Okay, that is good and all, however does this work? Let’s implement the perform to generate the classification report.
The outcomes are the next:
Accuracy : 0.9775
Precision : 0.9855
Recall : 0.9685
F1 Rating : 0.9769
ROC AUC Rating : 0.9774Confusion Matrix:
[[1002 14]
[ 31 953]]
Very excessive efficiency when it comes to all of the metrics. Works like a allure. 🙃
5. Conclusions
Thanks very a lot for studying by way of this text ❤️. It means lots. Let’s summarize what we discovered on this journey and why this was useful. On this weblog submit, we utilized the eye mechanism in a classification process for time collection. The classification was between regular time collection and “modified” ones. By “modified” we imply {that a} half (a random half, with random size) has been rectified (substituted with a straight line). We discovered that:
- Consideration mechanisms have been initially developed in NLP, however in addition they excel at figuring out anomalies in time collection knowledge, particularly when the placement of the anomaly varies throughout samples. This flexibility is troublesome to attain with conventional CNNs or FFNNs.
- Through the use of a bidirectional LSTM mixed with an consideration layer, our mannequin learns what components of the sign matter most. We noticed {that a} posteriori by way of the eye scores (alpha), which reveal which era steps have been most related for classification. This framework offers a clear and interpretable method: we will visualize the eye weights to know why the mannequin made a sure prediction.
- With minimal knowledge and no GPU, we skilled a extremely correct mannequin (F1 rating ≈ 0.98) in only a few minutes, proving that focus is accessible and highly effective even for small tasks.
6. About me!
Thanks once more in your time. It means lots ❤️
My identify is Piero Paialunga, and I’m this man right here:

I’m a Ph.D. candidate on the College of Cincinnati Aerospace Engineering Division. I speak about AI and Machine Studying in my weblog posts and on LinkedIn, and right here on TDS. In the event you appreciated the article and need to know extra about machine studying and comply with my research, you possibly can:
A. Observe me on Linkedin, the place I publish all my tales
B. Observe me on GitHub, the place you possibly can see all my code
C. For questions, you possibly can ship me an e-mail at [email protected]
Ciao!