is universality?
We human beings are all “initialized” in a different way — we’re born with completely different genetics. We then develop up in numerous households with completely different backgrounds, experiencing completely different occasions. Nevertheless, it’s fascinating to suppose that our brains lastly converge on related constructions and features. We are able to think about this phenomenon common.
In 2020, Olah et al. proposed three speculative claims relating to decoding synthetic neural networks:
- Options are the basic unit of neural networks.
- Options are linked by weights, forming circuits.
- Analogous options and circuits type throughout fashions and duties.
The third declare is probably essentially the most fascinating. It issues universality and means that completely different neural networks — even when skilled on impartial datasets — may converge to the identical underlying mechanisms.
There’s a well-known instance: the primary layer of just about any convolutional community skilled on pictures learns Gabor filters, which establish edges and orientations.
With the speedy growth of enormous language fashions (LLMs), researchers are asking a pure query: Can we observe universality in LLMs as effectively? In that case, how can we discover common neurons?

On this weblog submit, we will probably be specializing in a easy experiment and figuring out common neurons. Extra exactly, we’d design an experiment with two completely different transformers to see whether or not we will discover any common neurons between them.
Please discuss with the notebook for the whole Python implementation.
Fast Recap on Transformers
Recall that transformers — particularly their important part, consideration — are doubtlessly the best breakthrough behind the success of contemporary giant language fashions. Earlier than their arrival, researchers had struggled for years with fashions like RNNs with out reaching robust efficiency. However transformers modified all.
A primary transformer block consists of two key elements:
- Multi-Head Self-Consideration: Every token attends to all different tokens (earlier than), studying which tokens matter most for prediction.
- Feedforward MLP: After consideration, every token illustration is handed via a small MLP.
The 2 elements above are wrapped with residual connections (skip connections) and layer normalization.
Right here, essentially the most fascinating half for us is the MLP inside every block, as a result of it comprises the “neurons” we’ll analyze to search for universality.
Experiment Setup
We designed an experiment utilizing two tiny transformers.

Please observe that our objective is to not obtain state-of-the-art efficiency, however to create a toy mannequin the place we will have an impression of the existence of common neurons.
We outline a transformer construction that comprises:
- Embedding + positional encoding
- Multi-head self-attention
- MLP block with ReLU activation
- Output layer projecting to vocabulary dimension.
We now create two independently initialized fashions of the tiny transformer structure, model_a and model_b. Although they share the identical structure, the fashions will be thought-about as completely different due to their completely different preliminary weights and separate coaching course of on 10,000 completely different random samples. After all, fashions are skilled self-supervised, studying to foretell the subsequent token given the earlier tokens.
Discover Universality with Correlation
As soon as each model_a and model_b are skilled, we run them on a check dataset and extract the worth of all MLP activations: once more, they’re values of the hidden values instantly after the primary linear layer within the MLP block. We thus get a tensor with the dimension[num_samples, sequence_length, mlp_dim].
Right here is the fascinating factor: We’ll now compute the Pearson correlation between corresponding neurons in model_a and model_b by the system:

the place at,i, bt,i are the activations of neuron i at time t in sequences of model_a and model_b.
We declare that if a neuron exhibits a excessive correlation, it would counsel that the 2 fashions have realized an identical function, or, in different phrases, this neuron could also be common.
Nevertheless, not all correlations result in universality. It’s potential that some seem due to… likelihood. We due to this fact examine correlations in opposition to a baseline: making use of a random rotation to the neurons in model_b, that’s, we exchange the second set of neurons by randomly rotated ones.
This random rotation will destroy any alignment between the 2 fashions however will nonetheless protect the distribution of activations.
Lastly, we compute the so-called extra correlation by subtracting the baseline from the precise correlation.
We flag the neurons with excessive extra correlation (above 0.5) as common neurons between the 2 fashions.
Please discuss with the pocket book for an in depth Python implementation.
Outcomes
We’ll now check out the outcomes.
First, we now have a plot evaluating baseline vs precise correlations. We see that baseline correlations are close to zero; the precise correlations of a number of neurons are a lot increased, displaying that noticed alignment isn’t resulting from random likelihood.

We now plot the surplus correlation distribution. Because the readers may see, most neurons nonetheless have very low extra correlation. Nevertheless, a subset stands far above the brink of 0.5. These neurons (inexperienced dots on the histogram) are recognized as common neurons.

The outcomes of our evaluation give clear proof of common neurons within the two independently skilled transformers.
Conclusion
On this weblog submit, we launched the idea of LLMs. We’ve analyzed completely different tiny transformers. We have been capable of establish some common neurons in each fashions. These are neurons which may seize related options.
These findings give readers the impression that neural networks, particularly LLMs, can converge on related inner mechanisms. After all, our research was specializing in small fashions and a restricted dataset, and the ultimate outcome has nothing to do with the state-of-the-art efficiency. However such a way offers a risk to seek out universality in bigger fashions.