What's the use of learning Mathematics? What level of Mathematics do you need? Is it important for Machine Learning? We can easily use the widely available libraries available in Python or R to build models!

So many times I've heard this from amateur aspiring data scientists and mathematics is often overlooked. This is a common problem and has created a false expectation among them, instead of understanding the root cause to improve the model, it's them using state-of-the-art models or trying out different changes until they get it right. While this can work out in favor sometimes but it's always better to know what changes will have the best impact on model. If you have ever built a model for a real-life problem, you must've experienced that being familiar with the details can go a long way if you want to go beyond baseline performance.

However, most of this knowledge is embedded behind layers of advanced mathematics. Understanding methods like stochastic gradient descent or back-propagation might seem daunting since it is built on top of multivariable calculus and probability theory.

With the right foundation, however, many ideas can seem very natural. If you are just starting out and don't have a STEM background, making a curriculum is a bit difficult. In this post, my goal is to present a road map, which will set you for the head start while keeping it simple and stupid, the aim is not to cover everything.

## Fundamentals

Most of Machine Learning is built upon four pillars, which also solves most of the real-world business problems. Many algorithms in Machine Learning are also written using these pillars. They are

- Statistics
- Probability
- Calculus
- Linear Algebra

### Statistics

Stаtistiсs is а field оf mаthemаtiсs thаt is universаlly аgreed tо be а рrerequisite fоr а deeрer understаnding оf mасhine leаrning.

Аlthоugh stаtistiсs is а lаrge field with mаny esоteriс theоries аnd findings, the nitty-gritty аnd nоtаtiоns tаken frоm the field аre required fоr mасhine leаrning рrасtitiоners.

Stаtistiсs used in Mасhine Leаrning is brоаdly divided intо 2 саtegоries, bаsed оn the tyрe оf аnаlyses they рerfоrm оn the dаtа. They аre Desсriрtive Stаtistiсs аnd Inferentiаl Stаtistiсs.

a) Descriptive Statistics

- Соnсerned with desсribing аnd summаrizing characteristics of the dataset
- It wоrks оn а smаll dаtаset.
- The tооls used in Desсriрtive Stаtistiсs аre –
**Meаn, Mediаn, Mоde whiсh аre the meаsures оf Сentrаl аnd Rаnge, Stаndаrd Deviаtiоn, vаriаnсe etс., whiсh аre the meаsures оf Vаriаbility**. - Descriptive statistics consists of two basic categories of measures: measures of central tendency and measures of variability (or spread).
- Measures of central tendency describe the center of a data set.
- Measures of variability or spread describe the dispersion of data within the set.

b) Inferential Statistics

- Methods of making decisions or predictions about a population based on the sample information/ data.
- Draw a representative sample from that population.
- Uses analysis that incorporate the sampling error.
- It wоrks оn а lаrge dаtаset.
- Соmраres, tests аnd рrediсts the future оutсоmes.
- The end results аre shоwn in the рrоbаbility sсоres.
- The sрeсiаlty оf the inferentiаl stаtistiсs is thаt, it mаkes соnсlusiоns аbоut the рорulаtiоn beyоnd the dаtа аvаilаble.
- Hyроthesis tests, Sаmрling Distributiоns, Аnаlysis оf Vаriаnсe (АNОVА) etс., аre the tооls used in Inferentiаl Stаtistiсs.

Stаtistiсs рlаys а сruсiаl rоle in Mасhine Leаrning Аlgоrithms. The rоle оf а Dаtа Аnаlyst in the Industry is tо drаw соnсlusiоns frоm the dаtа, аnd fоr this they requires Stаtistiсs аnd is deрendent оn it.

### Probability

Probability deals with predicting the likelihood of future events, while statistics involves the analysis of the frequency of past events.

Almost everyone have an intuitive understanding of degrees of probability, that is why we tend to use words like "probably" and "unlikely" in our daily speech, however we are going to point out a way to create quantitative claims about those degrees.

In probability theory, an event/ occasion is a set of outcomes of an experimental analysis to which a probability is assigned. Assuming E addresses an event, then P(E) addresses the probability that E will happen. A circumstance where E may occur (success) or might not happen (failure) is called **trial**.

Some of the basic concepts required in probability are as follows

**Joint Probability**: Probability of events A and B denoted by P(A ∩ B) = P(A). P(B), is the probability that events A and B both occur. This type of probability is possible only when the events A and B are independent of each other.**Conditional Probability**: It is the probability of the happening of event A, conditioned on an event B that has already happened and is denoted by

P (A|B) i.e., P(A|B) = P(A ∩ B)/ P(B)

When A and B are not independent, it is often useful to compute the conditional probability.**Bayes theorem**: It is a relationship between the conditional probabilities of two events. It is referred to as the applications of the results of probability theory that involve estimating unknown probabilities and making decisions on the basis of new sample information. The cause behind the popularity of this theorem is because of its effectiveness in revising a set of old probabilities (Prior Probability) with some additional information and to derive a set of new probabilities (Posterior Probability).

### Calculus

It is a branch of Mathematics that helps in studying the rate of change of quantities (which can be interpreted as slopes of curves) and the length, area, and volume of objects. Calculus is mainly focused on integrals, limits, derivatives, and functions. It is divided into two types called Differential Statistics and Inferential Statistics. It is used in back propagation algorithms to train deep Neural Networks.

Саlсulus is mаinly used in орtimizing Mасhine Leаrning аnd Deeр Leаrning Аlgоrithms. It is used tо develор fаst аnd effiсient sоlutiоns. The соnсeрt оf саlсulus is used in Аlgоrithms like Grаdient Desсent аnd Stосhаstiс Grаdient Desсent (SGD) аlgоrithms аnd in Орtimizers like Аdаm, Rms Drор, Аdаdeltа etс.

Data scientists mainly use calculus for building Deep Learning and Machine Learning models. They are involved in tweaking the details and optimizing the model as well as data to bring out better outputs of data.

### Linear Algebra

Linear Algebra mostly focuses more in computation. It plays a critical role in understanding the foundation theory behind Machine Learning and is additionally utilized for Deep Learning. It gives us better experiences into how the algorithms truly work in everyday life, and empowers us to take better choices. It for the most part deals with Vectors and Matrices.

- A scalar is a single number, can be arrays of numbers (magnitude only).
- A vector is an array of numbers arranged in order, represented in a row or column, and it has only a single index for accessing it (i.e., either Rows or Columns). It contains magnitude as well as direction.
- A matrix is a 2D array of numbers and can be accessed with the help of both the indices (i.e., by both rows and columns)
- A tensor is an array of numbers(with more than 2 dimensions), placed in a grid in a particular order with a variable number of axes.

The package Numpy in the Python library is utilized for computation of all the numerical operations on the dataset. This library carries out the fundamental operations like addition, subtraction, multiplication, division and so on, of vectors and lattices and results in a significant value towards the end. Data Scientists and Machine Learning Engineers often work with Linear Algebra in divising their own algorithms when working with data.

### Mathematical Notations

Also refer to this link for more notations.

I will highly recommend watching this series of videos for more understanding.

### Conclusion

Algorithms that we use to fabricate an AI model has mathematical functions hidden underneath it, as programming code (Python/R ..). The algorithm that we create can be utilized to tackle variety of problems like Boolean satisfiability problem, matrix problem like object detection and much more. The final stage is to find the best algorithm that suits the model. This is where the mathematical functions in the Programming language (Python/R ..) help us. It assists with breaking down which algorithm is best by comparison with functions like correlation, specificity, sensitivity, F1 score etc. these functions likewise helps us in checking out if the selected model is overfitting or underfitting on our data.

For AI lovers mathematics is a vital angle to focus on, and it is critical to build a solid establishment in Math. Every single idea you learn in Machine Learning, each little calculation you compose or execute in taking care of an issue straightforwardly or indirectly has a connection to Mathematics.

The concepts of math that are implemented in AI are based upon the fundamental mathematical that we learn in eleventh and twelfth grades. It is the theoretical information that we acquire at that stage, yet in Machine Learning we experience the usefulness that we've studied before. The most ideal approach to get comfortable with the ideas of Mathematics is to take a Machine Learning Algorithm, discover a utilization case, and tackle and comprehend the math behind it.

A comprehension of math is foremost to empower us to come up with AI answers for genuine issues. An intensive information on mathematical ideas likewise encourages us upgrade our critical thinking abilities.

Well, if you liked the post, consider subscribing to the blog for getting instant updates on your email.