- AI, But Simple
- Posts
- Supervised Machine Learning, Simplified
Supervised Machine Learning, Simplified
AI, But Simple Issue #10
Quick Note: The next 2 issues will come out on July 29 and August 12 (EST), as the team has a modified schedule. Thanks for all the support!
Supervised Machine Learning
AI, But Simple Issue #10
Supervised learning is a type of machine learning where an algorithm is trained on a labeled dataset. In this approach, the dataset consists of input-output pairs, where the input data is associated with the correct output (label).
The goal of supervised learning is to learn a mapping from inputs to outputs, which can then be used to predict the output for new, unseen inputs.
There are 2 main types of supervised learning: classification and regression.
An ML problem is a regression problem when the output variable is a numerical value, such as “weight” or “cost”.
An ML problem is a classification problem when the output variable is a category, such as “large” or “small”
Classification
Classification is a supervised learning task where the goal is to predict the categorical label of a given input based on labeled training data.
Each input is associated with one or more predefined categories or classes, and the objective of the classification algorithm is to assign the correct category to new, unseen inputs.
If there are only 2 possible classifications, it is a binary classification problem. (Think, yes or no)
If there are more than 2 possible classifications, it is a multi-class classification problem.
Algorithms
Logistic Regression
One of the first and most common classification algorithms is logistic regression.
It’s fairly simple, performs decently well for binary classification, and is computationally inexpensive.
Instead of fitting a line to some points to predict some continuous data, we’ll fit a s-shaped curve (logistic/sigmoid curve) to some points with two possible categorizations.
The curve will tell you the probability of a point being categorized as yes or no.
In the above image, the curve will tell you the probability of a human having a disease
So at about 35-40 years, a human will have a 50 percent chance of having some disease (based on the fitted curve)
Here are some more facts about logistic regression:
It uses the logistic/sigmoid function
Its output is the probability that a certain input belongs to a class (between 0 and 1)
You can set different thresholds to either categorize or not categorize that input into a certain class (for instance, greater than 0.4)
If you want to dive deeper into the inner workings of logistic regression, feel free to check out this great video by StatQuest.
There is also a lesser known version of logistic regression, called multi-class logistic regression—logistic regression for multiple classes.
CART Classifier (Decision Tree)
Decision trees are supervised learning methods that predict the value of a target variable by learning simple decision rules inferred from input data features.
The tree part of the name is quite literal; decision tree architectures resemble binary trees in computer science.
The decision tree can be seen as a piecewise constant approximation
Decision trees fit a dataset by applying a set of if-then-else decision rules. For a clearer example, see the image below:
Because of this piecewise nature and architecture, decision tree fitted boundaries or lines look jagged and rectangular.
Decision trees are popular since they are easy to understand, interpret, and are fairly computationally inexpensive.
They also perform decently (but not amazingly) on most tasks
Support Vector Machine (SVM) Classifier
SVM is a popular choice for linear relationships and is efficient up to high dimensions.
Also, they are very versatile, as most libraries with SVMs have a kernel option (can modify the algorithm)
SVMs work by finding a separating hyperplane (decision boundary) where the distance between itself and the closest data points for both categories is maximized.
Remember that kernel option I was mentioning earlier? Well, using a kernel (which is just a function), we can model non-linear relationships as well (an example could be the rbf kernel).
If you want to learn about SVMs and get a basic intuition around them, feel free to check out this super helpful video.
Ensemble Models
Ensemble models are used to improve the accuracy of machine learning models. In a “wisdom of the crowd” fashion, if we take multiple models trained on different portions of the dataset (or even on the same portion) and use different techniques to combine all of the outputs, we would end up with a much more accurate model.
Random Forest (Classifier)
Random forest models are very well known in the ML community, as they perform quite well even without hyper-parameter tuning.
Random forest builds multiple decision trees and merges them to get a more accurate and stable prediction.
Specifically, it splits the dataset up into random subsamples, builds a decision tree for each subsample, fits each tree, then aggregates their outputs to produce a final output.
The random forest classifier has decent accuracy but can be improved.
That’s why we can use boosted ensemble models, which are more state-of-the-art and perform better:
AdaBoost Classifier
XGBoost Classifier
CatBoost Classifier (one of the fastest)
Metrics
Metrics in ML are essential for evaluating and comparing the performance of models. They show how well a model performs and can help improve a model.
Different tasks (classification or regression) require different metrics.
Classification metrics are based on the comparison of correct predictions versus incorrect predictions.
Accuracy
Confusion Matrix
true positives (TP): Predict yes, actually yes
true negatives (TN): Predict no, actually no
false positives (FP): We predicted yes, actually no (known as a "Type I error.")
false negatives (FN): We predicted no, actually yes (Also known as a "Type II error.")\
Precision (less FP you get, greater precision. Precision can be thought of as “don't say yes when it's no”)
Sensitivity/Recall (less FN = greater recall. Recall can be thought of as “minimize saying no when yes”
Specificity (less FP = greater specificity. Think of specificity as “total number of no's identified should be accurate”)
Often, the sensitivity and specificity of a test are inversely related
Area Under Receiver Operating Characteristic Curve (AUC_ROC)
ROC
The ROC curve is a graphical representation of a classifier's performance
It plots the true positive rate (sensitivity) and false positive rate (FP/(FP+TN)).
AUC is the area under the curve, closer to 1 is better
F1 Score
Harmonic mean of precision and recall (performance metric)
Regression
Regression is a supervised learning task in machine learning where the goal is to produce a continuous numerical output based on input data. This output is also a prediction, just not a category but a value (think, predict a house price).
There are multiple types of regression in ML, notably simple regression, multiple regression, and nonlinear regression.
Simple regression is used to predict a continuous dependent variable (output) based on a single independent variable, while multiple regression is used to predict a continuous dependent variable based on multiple independent variables.
Nonlinear regression is just regression where the relationship between the dependent variable and independent variable(s) follows a nonlinear pattern.
Algorithms
CART Regressor (Decision Tree)
As explained earlier, decision trees work great for data with a little non-linearity, and the tree splits are based on different criteria to generate a prediction.
When it comes to regression, the prediction will be a numerical value instead of a class
Decision trees work pretty well for regression and are simple
In the image above, X1 and X2 refer to different input features (for example, in house value prediction, it can be square footage)
Linear Regression
Linear regression is a fundamental statistical method to generate a numerical prediction for a given input in a linear fashion.
It’s one of the most popular and simple first steps for ML learners
The goal of linear regression is to find the best-fitting straight line (regression line) that predicts the dependent variable based on the values of the independent variables.
We adjust the coefficients of the variables and bias using an optimization method
If there are multiple independent variables, we call it Multiple Linear Regression, whereas if there is only one independent variable, we call it Simple Linear Regression.
Multiple Linear Regression can result in higher dimensions (hyperplanes used)
Linear regression is very simple, computationally efficient, but not too accurate— especially for more complicated scenarios.
Polynomial Regression
Polynomial regression follows the same logic as linear regression, aside from the fact that the regression line becomes a curve.
It’s not seen very often, but it can be useful depending on the target function
Support Vector Machine (SVM) Regressor
As mentioned earlier, SVM is a popular choice for linear relationships and is efficient up to high dimensions.
It can model linear relationships, using a hyperplane
You can use kernels like rbf for non-linear relationships as well
SVM regressors are less well known but can still perform well in the regression space. Instead of deciding boundaries, it will use the boundaries to fit the line.
Ensemble Models
In regression, the ensemble’s prediction may be an average or a weighted average instead of a majority vote.
Random Forest Regressor
The random forest algorithm can be used in regression as well. It has good accuracy but can be improved.
Boosted ensemble models that improve performance significantly also exist for regression:
Boosting Ensemble Models
AdaBoost Regressor
XGBoost Regressor
CatBoost Regressor (Regarded as one of the "best")
Partnership Segment—Feel free to read ahead!
🎓 Get an MBA in AI without student loans!
We're recommending "The AI Entrepreneurs" newsletter because it's like a degree in AI, minus the student debt.
Here's why you'll love it:
🚀 Jetpack to success with 58,000 AI-loving empire builders.
🧠 Connect with like-minded enthusiasts, and maybe even find your next co-founder with our private community.
📰 Featured on over 400 sites like Market Watch, Fox, and Benzinga – they're not just a newsletter; they're a movement.
💼 Build your AI-driven business without spending a dime.
Subscribe today for the clever price of FREE, and experience empire-building made easy, one email at a time. 🏰🤖🎉
🎉Plus, Get 100 ChatGPT FREE prompts instantly, a FREE AI writer to go viral on social media, Our FREE "Building A Minimum Viable Business In Record Time" Course and our FREE "4 Hour AI Workweek" Course!🎉
Metrics:
In regression, metrics are based on the residuals of the fitted curve.
Residuals are the differences between the predicted value and the observed value in the dataset.
Residuals are shown as the dotted red line above
R^2 score
"How much better does this model compare to the baseline model?"
An R^2 of 0.5 indicates that 50% of the variability in the outcome data cannot be explained by the model
Mean squared error (MSE)
Mean squared sum of residuals (distance from line)
RMSE is the square rooted version
Mean absolute error (MAE)
Since the residuals are not squared before summing, small errors don't have as large of a weight
If you enjoy our content, consider supporting us so we can keep doing what we do. Please share this with a friend!
If you like, you can also donate to our team to push out better newsletters every week!
That’s it for this week’s issue of AI, but simple. See you next week!
—AI, but simple team