AI, But Simple
Posts
Supervised Machine Learning, Simplified

Supervised Machine Learning, Simplified

AI, But Simple Issue #10

Edwin Dong
July 15, 2024

Quick Note: The next 2 issues will come out on July 29 and August 12 (EST), as the team has a modified schedule. Thanks for all the support!

Supervised Machine Learning

AI, But Simple Issue #10

Supervised learning is a type of machine learning where an algorithm is trained on a labeled dataset. In this approach, the dataset consists of input-output pairs, where the input data is associated with the correct output (label).

The goal of supervised learning is to learn a mapping from inputs to outputs, which can then be used to predict the output for new, unseen inputs.

There are 2 main types of supervised learning: classification and regression.

An ML problem is a regression problem when the output variable is a numerical value, such as “weight” or “cost”.
An ML problem is a classification problem when the output variable is a category, such as “large” or “small”

Classification

Classification is a supervised learning task where the goal is to predict the categorical label of a given input based on labeled training data.

Each input is associated with one or more predefined categories or classes, and the objective of the classification algorithm is to assign the correct category to new, unseen inputs.

If there are only 2 possible classifications, it is a binary classification problem. (Think, yes or no)

If there are more than 2 possible classifications, it is a multi-class classification problem.

Algorithms

Logistic Regression

One of the first and most common classification algorithms is logistic regression.

It’s fairly simple, performs decently well for binary classification, and is computationally inexpensive.

Instead of fitting a line to some points to predict some continuous data, we’ll fit a s-shaped curve (logistic/sigmoid curve) to some points with two possible categorizations.

The curve will tell you the probability of a point being categorized as yes or no.

In the above image, the curve will tell you the probability of a human having a disease
So at about 35-40 years, a human will have a 50 percent chance of having some disease (based on the fitted curve)

Here are some more facts about logistic regression:

It uses the logistic/sigmoid function
Its output is the probability that a certain input belongs to a class (between 0 and 1)
- You can set different thresholds to either categorize or not categorize that input into a certain class (for instance, greater than 0.4)

If you want to dive deeper into the inner workings of logistic regression, feel free to check out this great video by StatQuest.

There is also a lesser known version of logistic regression, called multi-class logistic regression—logistic regression for multiple classes.

CART Classifier (Decision Tree)

Decision trees are supervised learning methods that predict the value of a target variable by learning simple decision rules inferred from input data features.

The tree part of the name is quite literal; decision tree architectures resemble binary trees in computer science.
The decision tree can be seen as a piecewise constant approximation

Decision trees fit a dataset by applying a set of if-then-else decision rules. For a clearer example, see the image below:

Because of this piecewise nature and architecture, decision tree fitted boundaries or lines look jagged and rectangular.

Decision trees are popular since they are easy to understand, interpret, and are fairly computationally inexpensive.

They also perform decently (but not amazingly) on most tasks

Support Vector Machine (SVM) Classifier

SVM is a popular choice for linear relationships and is efficient up to high dimensions.

Also, they are very versatile, as most libraries with SVMs have a kernel option (can modify the algorithm)

SVMs work by finding a separating hyperplane (decision boundary) where the distance between itself and the closest data points for both categories is maximized.

Remember that kernel option I was mentioning earlier? Well, using a kernel (which is just a function), we can model non-linear relationships as well (an example could be the rbf kernel).

If you want to learn about SVMs and get a basic intuition around them, feel free to check out this super helpful video.

Ensemble Models

Ensemble models are used to improve the accuracy of machine learning models. In a “wisdom of the crowd” fashion, if we take multiple models trained on different portions of the dataset (or even on the same portion) and use different techniques to combine all of the outputs, we would end up with a much more accurate model.

Random Forest (Classifier)

Random forest models are very well known in the ML community, as they perform quite well even without hyper-parameter tuning.

Random forest builds multiple decision trees and merges them to get a more accurate and stable prediction.
Specifically, it splits the dataset up into random subsamples, builds a decision tree for each subsample, fits each tree, then aggregates their outputs to produce a final output.

The random forest classifier has decent accuracy but can be improved.

That’s why we can use boosted ensemble models, which are more state-of-the-art and perform better:

Boosted Ensemble
- AdaBoost Classifier
- XGBoost Classifier
- CatBoost Classifier (one of the fastest)

Metrics

Metrics in ML are essential for evaluating and comparing the performance of models. They show how well a model performs and can help improve a model.

Different tasks (classification or regression) require different metrics.

Classification metrics are based on the comparison of correct predictions versus incorrect predictions.

Accuracy

Confusion Matrix
- true positives (TP): Predict yes, actually yes
- true negatives (TN): Predict no, actually no
- false positives (FP): We predicted yes, actually no (known as a "Type I error.")
- false negatives (FN): We predicted no, actually yes (Also known as a "Type II error.")\

Precision (less FP you get, greater precision. Precision can be thought of as “don't say yes when it's no”)

Sensitivity/Recall (less FN = greater recall. Recall can be thought of as “minimize saying no when yes”

Specificity (less FP = greater specificity. Think of specificity as “total number of no's identified should be accurate”)
- Often, the sensitivity and speciﬁcity of a test are inversely related

Area Under Receiver Operating Characteristic Curve (AUC_ROC)
- ROC
  - The ROC curve is a graphical representation of a classifier's performance
  - It plots the true positive rate (sensitivity) and false positive rate (FP/(FP+TN)).
- AUC is the area under the curve, closer to 1 is better
F1 Score
- Harmonic mean of precision and recall (performance metric)

Regression

Regression is a supervised learning task in machine learning where the goal is to produce a continuous numerical output based on input data. This output is also a prediction, just not a category but a value (think, predict a house price).

There are multiple types of regression in ML, notably simple regression, multiple regression, and nonlinear regression.

Simple regression is used to predict a continuous dependent variable (output) based on a single independent variable, while multiple regression is used to predict a continuous dependent variable based on multiple independent variables.

Nonlinear regression is just regression where the relationship between the dependent variable and independent variable(s) follows a nonlinear pattern.

Algorithms

CART Regressor (Decision Tree)

As explained earlier, decision trees work great for data with a little non-linearity, and the tree splits are based on different criteria to generate a prediction.

When it comes to regression, the prediction will be a numerical value instead of a class
Decision trees work pretty well for regression and are simple

In the image above, X1 and X2 refer to different input features (for example, in house value prediction, it can be square footage)

Linear Regression

Linear regression is a fundamental statistical method to generate a numerical prediction for a given input in a linear fashion.

It’s one of the most popular and simple first steps for ML learners

The goal of linear regression is to find the best-fitting straight line (regression line) that predicts the dependent variable based on the values of the independent variables.

We adjust the coefficients of the variables and bias using an optimization method

If there are multiple independent variables, we call it Multiple Linear Regression, whereas if there is only one independent variable, we call it Simple Linear Regression.

Multiple Linear Regression can result in higher dimensions (hyperplanes used)

Linear regression is very simple, computationally efficient, but not too accurate— especially for more complicated scenarios.

Polynomial Regression

Polynomial regression follows the same logic as linear regression, aside from the fact that the regression line becomes a curve.

It’s not seen very often, but it can be useful depending on the target function

Support Vector Machine (SVM) Regressor

As mentioned earlier, SVM is a popular choice for linear relationships and is efficient up to high dimensions.

It can model linear relationships, using a hyperplane
You can use kernels like rbf for non-linear relationships as well

SVM regressors are less well known but can still perform well in the regression space. Instead of deciding boundaries, it will use the boundaries to fit the line.

Ensemble Models

In regression, the ensemble’s prediction may be an average or a weighted average instead of a majority vote.

Random Forest Regressor

The random forest algorithm can be used in regression as well. It has good accuracy but can be improved.

Boosted ensemble models that improve performance significantly also exist for regression:

Boosting Ensemble Models
- AdaBoost Regressor
- XGBoost Regressor
- CatBoost Regressor (Regarded as one of the "best")

Partnership Segment—Feel free to read ahead!

🎓 Get an MBA in AI without student loans!

We're recommending "The AI Entrepreneurs" newsletter because it's like a degree in AI, minus the student debt.

Here's why you'll love it:

🚀 Jetpack to success with 58,000 AI-loving empire builders.
🧠 Connect with like-minded enthusiasts, and maybe even find your next co-founder with our private community.
📰 Featured on over 400 sites like Market Watch, Fox, and Benzinga – they're not just a newsletter; they're a movement.
💼 Build your AI-driven business without spending a dime.

Subscribe today for the clever price of FREE, and experience empire-building made easy, one email at a time. 🏰🤖🎉

🎉Plus, Get 100 ChatGPT FREE prompts instantly, a FREE AI writer to go viral on social media, Our FREE "Building A Minimum Viable Business In Record Time" Course and our FREE "4 Hour AI Workweek" Course!🎉

Metrics:

In regression, metrics are based on the residuals of the fitted curve.

Residuals are the differences between the predicted value and the observed value in the dataset.

Residuals are shown as the dotted red line above

R^2 score
- "How much better does this model compare to the baseline model?"
- An R^2 of 0.5 indicates that 50% of the variability in the outcome data cannot be explained by the model
Mean squared error (MSE)
- Mean squared sum of residuals (distance from line)
- RMSE is the square rooted version

Mean absolute error (MAE)
- Since the residuals are not squared before summing, small errors don't have as large of a weight

If you enjoy our content, consider supporting us so we can keep doing what we do. Please share this with a friend!

If you like, you can also donate to our team to push out better newsletters every week!

That’s it for this week’s issue of AI, but simple. See you next week!

—AI, but simple team

Subscribe to keep reading

This content is free, but you must be subscribed to AI, But Simple to continue reading.

Already a subscriber?Sign in.Not now