of the in-progress guide on Linear Algebra, “A birds eye view of linear algebra”. This guide will put a particular emphasis on AI functions and the way they leverage linear algebra.
Linear algebra is a elementary self-discipline underlying something one can do with Math. From Physics to machine studying, likelihood concept (ex: Markov chains), you title it. It doesn’t matter what you’re doing, linear algebra is all the time lurking underneath the covers, able to spring at you as quickly as issues go multi-dimensional. In my expertise (and I’ve heard this from others), this was on the supply of an enormous shock between highschool and college. In highschool (India), I used to be uncovered to some very fundamental linear algebra (primarily determinants and matrix multiplication). Then in college stage engineering schooling, each topic hastily appears to be assuming proficiency in ideas like Eigen values, Jacobians, and so on. such as you had been purported to be born with the information.
This chapter is supposed to offer a excessive stage overview of the ideas and their apparent functions that exist and are necessary to know on this self-discipline.
The AI revolution
Virtually any info will be embedded in a vector house. Photos, video, language, speech, biometric info and no matter else you’ll be able to think about. And all of the functions of machine studying and synthetic intelligence (just like the current chat-bots, textual content to picture, and so on.) work on high of those vector embeddings. Since linear algebra is the science of coping with excessive dimensional vector areas, it’s an indispensable constructing block.
Quite a lot of the strategies contain taking some enter vectors from one house and mapping them to different vectors from another house.
However why the give attention to “linear” when most fascinating capabilities are non-linear? It’s as a result of the issue of constructing our fashions excessive dimensional and that of constructing them non-linear (common sufficient to seize all types of advanced relationships) grow to be orthogonal to one another. Many neural community architectures work through the use of linear layers with easy one dimensional non-linearities in between them. And there may be a theorem that claims this type of structure can mannequin any perform.
Because the means we manipulate excessive dimensional vectors is primarily matrix multiplication, it isn’t a stretch to say it’s the bedrock of the fashionable AI revolution.
I) Vector areas
As talked about within the earlier part, linear algebra inevitably crops up when issues go multi-dimensional. We begin off with a scalar, which is simply quite a few some kind. For this text, we’ll be contemplating actual and complicated numbers for these scalars. On the whole, a scalar will be any object the place the fundamental operations of addition, subtraction, multiplication and division are outlined (abstracted as a “area”). Now, we would like a framework to explain collections of such numbers (add dimensions). These collections are referred to as “vector areas”. We’ll be contemplating the circumstances the place the weather of the vector house are both actual or advanced numbers (the previous being a particular case of the latter). The ensuing vector areas are referred to as “actual vector areas” and “advanced vector areas” respectively.
The concepts in linear algebra are relevant to those “vector areas”. The most typical instance is your flooring, desk or the pc display screen you’re studying this on. These are all two-dimensional vector areas since each level in your desk will be specified by two numbers (the x and y coordinates as proven under). This house is denoted by R² since two actual numbers specify it.
We are able to generalize R² in several methods. First, we are able to add dimensions. The house we dwell in is 3 dimensional (R³). Or, we are able to curve it. The floor of a sphere just like the Earth for instance (denoted S²), continues to be two dimensional, however in contrast to R² (which is flat), it’s curved. To date, these areas have all principally been arrays of numbers. However the thought of a vector house is extra common. It’s a assortment of objects the place the next concepts ought to be effectively outlined:
- Addition of any two of the objects.
- Multiplication of the objects by a scalar (an actual quantity).
Not solely that, however the objects ought to be “closed” underneath these operations. Which means that in case you apply these two operations to the objects of the vector house, it’s best to get objects of the identical sort (you shouldn’t depart the vector house). For instance, the set of integers isn’t a vector house as a result of multiplication by a scalar (actual quantity) can provide us one thing that isn’t an integer (3*2.5 = 7.5 which isn’t an integer).
One of many methods to precise the objects of a vector house is with vectors. Vectors require an arbitrary “foundation”. An instance of a foundation is the compass system with instructions — North, South, East and West. Any route (like “SouthWest”) will be expressed by way of these. These are “route vectors” however we are able to even have “place vectors” the place we’d like an origin and a coordinate system intersecting at that origin. The latitude and longitude system for referencing each place on the floor of the Earth is an instance. The latitude and longitude pair are one solution to determine your home. However there are infinite different methods. One other tradition would possibly draw the latitude and longitude traces at a barely totally different angle to what the usual is. And so, they’ll give you totally different numbers for your home. However that doesn’t change the bodily location of the home itself. The home exists as an object within the vector house and these other ways to precise that location are referred to as “bases”. Selecting one foundation permits you to assign a pair of numbers to the home and selecting one other one permits you to assign a unique set of numbers which are equally legitimate.

Vector areas can be infinite dimensional. As an example, in miniature 12 of [2], the complete set of actual numbers is considered an infinite dimensional vector house.
II) Linear maps
Now that we all know what a vector house is, let’s take it to the following stage and speak about two vector areas. Since vector areas are merely collections of objects, we are able to consider a mapping that takes an object from one of many areas and maps it to an object from the opposite. An instance of that is current AI applications like Midjourney the place you enter a textual content immediate and so they return a picture matching it. The textual content you enter is first transformed to a vector. Then, that vector is transformed to a different vector within the picture house by way of such a “mapping”.
Let V and W be vector areas (both each actual or advanced vector areas). A perform f: V ->W is alleged to be a ‘linear map’ if for any two vectors u, v 𝞮 V and any scalar c (an actual variety of advanced quantity relying on climate we’re working with actual or advanced vector areas) the next two situations are happy:
$$f(u+v) = f(u) + f(v) tag{1}$$
$$f(c.v) = c.f(v)tag{2}$$
Combining the above two properties, we are able to get the next end result a couple of linear mixture of n vectors.
$$f(c_1.u_1+ c_2.u_2+ … c_n.u_n) = c_1.f(u_1)+c_2.f(u_2)+…+c_n.f(u_n)$$
And now we are able to see the place the title “linear map” comes from. If we go to the linear map, f, a linear combination of n vectors (LHS of equation above), that is equal to making use of the identical linear map to the capabilities (f) of the person vectors. We are able to apply the linear map first after which the linear mixture or the linear mixture first after which the linear map. The 2 are equal.
In highschool, we find out about linear equations. In two dimensional house, such an equation is represented by f(x)=m.x+c. Right here, m and c are the parameters of the equation. Be aware that this perform isn’t a linear map. Though it satisfies equation (1) above, it fails to fulfill equation (2). If we set f(x)=m.x as an alternative, then this can be a linear map because it satisfies each equations.

III) Matrices
In part I, we launched the idea of foundation for a vector house. Given a foundation for the primary vector house (V) and the dimensionality of the second (U), each linear map will be expressed as a matrix (for particulars, see here). A matrix is only a assortment of vectors. These vectors will be organized in columns, giving us a 2-d grid of numbers as proven under.

Matrices are the objects individuals first consider within the context of linear algebra. And for good motive. More often than not spent practising linear algebra is coping with matrices. However you will need to do not forget that there (normally) are an infinite variety of matrices that may characterize a linear map, relying on the premise we select for the primary house, V. The linear map is therefore a extra common idea than the matrix one occurs to be utilizing to characterize it.
How do matrices assist us carry out the linear map they characterize (from one vector to the opposite)? By the matrix getting multiplied with the primary vector. The result’s the second vector and the mapping is full (from first to second).
Intimately, we take the dot product (sum product) of the primary vector, v_1 with the primary row of the matrix and this yields the primary entry of the ensuing vector, v_2 after which the dot product of v_1 with the second row of the matrix to get the second entry of v_2 and so forth. This course of is demonstrated under for a matrix with 2 rows and three columns. The primary vector, v_1 is three dimensional and the second vector, v_2 is 2 dimensional.

Be aware that the underlying linear map behind a matrix with this dimensionality (2x3) will all the time take a 3 dimensional vector, v_1 and map it to a two dimensional house, v_2.

On the whole an (nxm) matrix will map an m dimensional vector to an n dimensional one.
III-A) Properties of matrices
Let’s cowl some properties of matrices that’ll permit us to determine properties of the linear maps they characterize.
Rank
An necessary property of matrices and their corresponding linear maps is the rank. We are able to speak about this by way of a group of vectors, since that’s all a matrix is. Say we now have a vector, v1=[1,0,0]. The primary factor of the vector is the coordinate alongside the x-axis, the second is that alongside the y-axis and the third one the z-axis. These three axes are a foundation (there are a lot of) of the third-dimensional house, R³, that means that any vector on this house will be expressed as a linear mixture of these three vectors.

We are able to multiply this vector by a scalar, s. This offers us s.[1,0,0] = [s,0,0]. As we fluctuate the worth of s, we are able to get any level alongside the x-axis. However that’s about it. Say we add one other vector to our assortment, v2=[3.5,0,0]. Now, what are the vectors we are able to make with linear combos of these two vectors? We get to multiply the primary one with any scalar, s_1 and the second with any scalar, s_2. This offers us:
$$s_1.[1,0,0] + s_2[3.5,0,0] = [s_1+3.5 s_2, 0,0] = [s’,0,0]$$
Right here, s’ is simply one other scalar. So, we are able to nonetheless attain factors solely on the x-axis, even with linear combos of each these vectors. The second vector didn’t “broaden our attain” in any respect. The variety of factors we are able to attain with linear combos of the 2 is precisely the identical because the quantity we are able to attain with the primary. So despite the fact that we now have two vectors, the rank of this assortment of vectors is 1 because the house they span is one dimensional. If alternatively, the second vector had been v2=[0,1,0] then you possibly can get any level on the x-y aircraft with these two vectors. So, the house spanned can be two dimensional and the rank of this assortment can be 2. If the second vector had been v2=[2.1,1.5,0.8], we might nonetheless span a two dimensional house with v1 and v2 (although that house can be totally different from the x-y aircraft now, it will be another 2-d aircraft). And the 2 vectors would nonetheless have a rank of 2. If the rank of a group of vectors is identical because the variety of vectors (that means they’ll collectively span an area of dimensionality as excessive because the variety of vectors), then they’re referred to as “linearly impartial”.
If the vectors that make up the matrix can span an m dimensional house, then the rank of the matrix is m. However a matrix will be considered a group of vectors in two methods. Because it’s a easy two dimensional grid of numbers, we are able to both think about all of the columns because the group of vectors or think about all of the rows because the group as proven under. Right here, we now have a (3x4) matrix (three rows and 4 columns). It may be considered both as a group of 4 column vectors (every third-dimensional) or 3 row vectors (every 4 dimensional).

Full row rank means all row the row vectors are linearly impartial. Full column rank means all column vectors are linearly impartial.
When the matrix is a sq. matrix, it seems that the row rank and column rank will all the time be the identical. This isn’t apparent in any respect and a proof is given within the mathexchange submit, [3]. Which means that for a sq. matrix, we are able to speak simply by way of the rank and don’t need to trouble specifying “row rank” or “column rank”.
The linear transformation equivalent to a (3 x 3) matrix that has a rank of two will map all the things within the three-D house to a decrease, 2-d house very similar to the (3 x 2) matrix we encountered within the final part.

Notions intently associated to the rank of sq. matrices are the determinant and invertibility.
Determinants
The determinant of a sq. matrix is its “measure” in a way. Let me clarify by going again to pondering of a matrix as a group of vectors. Let’s begin with only one vector. The way in which to “measure” it’s apparent — its size. And since we’re dealing solely with sq. matrices, the one solution to have one vector is to have or not it’s one dimensional. Which is principally only a scalar. Issues get fascinating once we go from one dimension to 2. Now, we’re in two dimensional house. So, the notion of “measure” is now not size, however has graduated to areas. And with two vectors in that two dimensional house, it’s the space of the parallelogram they type. If the 2 vectors are parallel to one another (ex: each lie on x-axis). In different phrases, they aren’t linearly impartial, then the world of the parallelogram between them will turn into zero. The determinant of the matrix shaped by them shall be zero and so will the rank of that matrix be zero.

Taking it one dimension greater, we get 3 dimensional house. And to assemble a sq. matrix (3x3), we now want three vectors. And because the notion of “measure” in three dimensional house is quantity, the determinant of a (3x3) matrix turns into the amount contained between the vectors that make it up.

And this may be prolonged to house of any dimensionality.
Discover that we spoke concerning the space or the amount contained between the vectors. We didn’t specify if these had been the vectors composing the rows of the sq. matrix or those composing its columns. And the considerably stunning factor is that we don’t have to specify this as a result of it doesn’t matter both means. Climate we take the vectors forming the rows and measure the amount between them or the vectors forming the columns, we get the identical reply. That is confirmed within the mathexchange submit [4].
There are a number of different properties of linear maps and corresponding matrices that are invaluable in understanding them and extracting worth out of them. We’ll be delving into invertability, eigen values, diagonalizability and totally different transformations one can do within the coming articles (verify again right here for hyperlinks).
In the event you favored this story, purchase me a espresso 🙂 https://www.buymeacoffee.com/w045tn0iqw
References
[1] Linear map: https://en.wikipedia.org/wiki/Linear_map
[2] Matousek’s miniatures: https://kam.mff.cuni.cz/~matousek/stml-53-matousek-1.pdf
[3] Mathexchange submit proving row rank and column rank are the identical: https://math.stackexchange.com/questions/332908/looking-for-an-intuitive-explanation-why-the-row-rank-is-equal-to-the-column-ran
[4] Mathexchange submit proving the determinants of a matrix and its transpose are the identical: https://math.stackexchange.com/a/636198/155881