17 Algorithms Machine Learning Engineers Need to Know


Flavors of Machine Learning

Machine Learning Flavors
Image — Machine Learning Flavors Source : https://in.mathworks.com
  • Classification techniques predict categorical responses, for example, whether an email is genuine or spam, or whether a tumor is cancerous or benign. Classification models classify input data into categories. Typical applications include medical imaging, image and speech recognition, and credit scoring.
  • Regression techniques predict continuous responses, for example, changes in temperature or fluctuations in power demand. Typical applications include electricity load forecasting and algorithmic trading.
  • Clustering is the most common unsupervised learning technique. It is used for exploratory data analysis to find hidden patterns or groupings in data.

Choosing the right algorithm

Choosing right algoirthm
Image — Choosing right algorithm Source — https://in.mathworks.com
  1. Support vector machines find the boundary that separates classes by as wide a margin as possible. When the two classes can’t be clearly separated, the algorithms find the best boundary they can.It is able to run fairly quickly. Where it really shines is with feature-intense data, like text or genomic. In these cases SVMs are able to separate classes more quickly and with less overfitting than most other algorithms, in addition to requiring only a modest amount of memory.
Support Vector Machines
Image — Support Vector Machines
  1. Discriminant Analysis is a supervised learning technique that can be used for classifying numerical variables in conjunction with a single categorical target. The method is useful for feature selection because it identifies the combination of features or parameters that best separates the groups.
Discriminant analysis
Image — Discriminant analysis
  1. Bayesian methods have a highly desirable quality: they avoid overfitting. They do this by making some assumptions beforehand about the likely distribution of the answer. Another byproduct of this approach is that they have very few parameters.
  2. Nearest Neighbor algorithm is a method for classifying objects based on the closest training examples in the feature space. Nearest Neighbor is a type of instance-based learning, or lazy learning where the function is only approximated locally and all computation is deferred until classification.
  3. Neural networks and perceptrons Neural networks are brain-inspired learning algorithms covering multiclass, two-class, and regression problems. input features are passed forward (never backward) through a sequence of layers before being turned into outputs. In each layer, inputs are weighted in various combinations, summed, and passed on to the next layer. This combination of simple calculations results in the ability to learn sophisticated class boundaries and data trends, seemingly by magic. Many-layered networks of this sort perform the “deep learning” that fuels so much tech reporting and science fiction.
Neural networks
Image — Neural networks
  1. Linear regression fits a line (or plane, or hyperplane) to the data set. It’s a workhorse, simple and fast, but it may be overly simplistic for some problems.
Linear regression
Image — Linear regression / Source — Azure
  1. The Support Vector Regression (SVR) uses the same principles as the SVM for classification, with only a few minor differences. First of all, because output is a real number it becomes very difficult to predict the information at hand, which has infinite possibilities. In the case of regression, a margin of tolerance (epsilon) is set in approximation to the SVM which would have already requested from the problem. But besides this fact, there is also a more complicated reason, the algorithm is more complicated therefore to be taken in consideration.
  2. Gaussian Process Regression (GPR) provides a different way of characterizing functions that does not require committing to a particular function class, but instead to the relation that different points on the function have to each other.It can be used to characterize parameterized functions as a special case, but offers much more flexibility.
  3. Ensemble methods is basically to combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustness over a single estimator.Two families of ensemble methods are usually distinguished:
  • In averaging methods, the driving principle is to build several estimators independently and then to average their predictions. On average, the combined estimator is usually better than any of the single base estimator because its variance is reduced.
  • By contrast, in boosting methods, base estimators are built sequentially and one tries to reduce the bias of the combined estimator. The motivation is to combine several weak models to produce a powerful ensemble.
  1. Logistic regression is actually a powerful tool for two-class and multiclass classification. It’s fast and simple. The fact that it uses an ‘S’-shaped curve instead of a straight line makes it a natural fit for dividing data into groups. Logistic regression gives linear class boundaries, so when you use it, make sure a linear approximation is something you can live with.
Logistic regression
Image — Logistic regression / Source — Azure
  1. Trees, forests, and jungles Decision forests (regression, two-class, and multiclass), decision jungles (two-class and multiclass), and boosted decision trees (regression and two-class) are all based on decision trees, a foundational machine learning concept. There are many variants of decision trees, but they all do the same thing subdivide the feature space into regions with mostly the same label. These can be regions of consistent category or of constant value, depending on whether you are doing classification or regression.
Decision tree
Image — Decision tree
  1. Clustering algorithms : Clustering can be considered the most important unsupervised learning problem; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data.
    A loose definition of clustering could be “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.
    We can show this with a simple graphical example:
Clustering algorithms (example)
Image- Clustering algorithms (example) Source :https://home.deib.polimi.it
  1. Clustering algorithms (K-Means) k-means clustering is a method of vector quantization that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The most common algorithm uses an iterative refinement technique. Due to its ubiquity it is often called the k-means algorithm; it is also referred to as Lloyd’s algorithm.
Convergence of k-means
Image — Convergence of k-means / Source -wikipedia
  1. Hierarchical clustering Given a set of N items to be clustered, and an N*N distance (or similarity) matrix, the basic process of hierarchical clustering
  2. Start by assigning each item to a cluster, so that if you have N items, you now have N clusters, each containing just one item. Let the distances (similarities) between the clusters the same as the distances (similarities) between the items they contain.
  3. Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one cluster less.
  4. Compute distances (similarities) between the new cluster and each of the old clusters.
  5. Repeat steps 2 and 3 until all items are clustered into a single cluster of size N. (*)
  6. Step 3 can be done in different ways, which is what distinguishes single-linkage from complete-linkage and average-linkage clustering.
    This kind of hierarchical clustering is called agglomerative because it merges clusters iteratively. There is also a divisive hierarchical clustering which does the reverse by starting with all objects in one cluster and subdividing them into smaller pieces. Divisive methods are not generally available, and rarely have been applied.
  7. Fuzzy c-means (FCM) is a method of clustering which allows one piece of data to belong to two or more clusters. This method is frequently used in pattern recognition. It is based on minimization of the following objective function:
Fuzzy C-Means Clustering function
Image — Fuzzy C-Means Clustering function / Source : Fuzzy C-Means Clustering
  1. Clustering as a Mixture of Gaussians There’s another way to deal with clustering problems: a model-based approach, which consists in using certain models for clusters and attempting to optimize the fit between the data and the model.
  2. In practice, each cluster can be mathematically represented by a parametric distribution, like a Gaussian (continuous) or a Poisson (discrete). The entire data set is therefore modelled by a mixture of these distributions. An individual distribution used to model a specific cluster is often referred to as a component distribution.
  3. A mixture model with high likelihood tends to have the following traits:
  • component distributions have high “peaks” (data in one cluster are tight);
  • the mixture model “covers” the data well (dominant patterns in the data are captured by component distributions).
Mixture of Gaussians
Image -Mixture of Gaussians / Source : https://home.deib.polimi.it
  1. Hidden Markov Model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (i.e. hidden) states.A hidden Markov model can be considered a generalization of a mixture model where the hidden variables (or latent variables), which control the mixture component to be selected for each observation, are related through a Markov process rather than independent of each other.The Hidden Markov Model (HMM) is a variant of a finite state machine having a set of hidden states, Q, an output alphabet (observations), O, transition probabilities, A, output (emission) probabilities, B, and initial state probabilities, Π. The current state is not observable. Instead, each state produces an output with a certain probability (B). Usually the states, Q, and outputs, O, are understood, so an HMM is said to be a triple, ( A, B, Π ).
Hidden Markov Models
Image — Hidden Markov Models / Source http://www.shokhirev.com



Author | Digital Solution Architect | Blog https://upnxtblog.com | My K8s Book https://a.co/d/79wyfKs

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store