A Guide to Choosing the Right Machine Learning Technique

helen ristov
Jun 23, 2022
5 min read

Executives today find themselves facing new data challenges where complex forecasts and machine learning are no longer considered luxury items but are staples necessary for the efficient management of their business. In this last decade we have seen overwhelming growth not only in the new type of models but in the open ecosystem which is churning out a second wave of distributed computing; one which can drastically alter the world we know today.

What we saw in the 80s was a cultural shift from mainframe computing to the personal computer driven by the introduction of the microprocessor. This effectively decentralized control away from the main technology companies and made the functions of high-performance computers available to everyone. You might have heard the iconic phrase that the modern computer is just a “bicycle for the mind” and it was just that: a tool that allows your mind to be more productive and effective. That change, although seen as disruptive, was not met with as much fear as what is happening today especially around artificial intelligence. If we want to navigate through this jarring and what some consider a “scary” experience, we must deal with the lagging governance around a new regime change; one that hasn’t been fully established yet. The goal should be to encourage not stifle innovation, and history has proven that the open-source stomping ground is one of the optimal ways to improve your product through the clashing of opinion and information. It is important that we preserve this free exchange of information, but also be weary on some of the negative consequences.

I’d like to offer up some ways to reduce the noise in a world that is only growing more complex and increasing in entropy. Adding complexity to a system is very easy, however, it is scales more difficult to reduce it. Many only have eureka! moments after someone has elucidated a problem by reducing it to the essential elements. The path forward is always obvious after it is explained by someone who has made those mistakes, so this is my attempt to write a guide that can help you save time and avoid some of the common pitfalls when trying to identify what is the appropriate machine learning technique for your particular business problem!

To handle the increasing variety of problems, many techniques have been developed each with its own special use. Today, there are three main classification types of machine learning and I will briefly describe each.

Supervised Learning

The house of supervised learning broadly covers all prediction models where the target variable or “what you are trying to predict” is known. The data needs to be labelled accurately and the training dataset subdivided into features and the target variable. We will use these features and build a model to try and predict this target. There are various ways to measure model accuracy or prediction error which is just the difference between the estimate from the scoring and the actual value of what you are trying to predict. All of your standard regression techniques fall into this category.

Unsupervised Learning

Unsupervised Learning holds the advantage of working with unstructured or unlabeled data. These are the methods that you would use if you wanted to find any latent structure in your data or any other types of associations. These algorithms essentially mine and detect patterns and rules that might be lurking in your dataset. Once these patterns are detected, you can summarize on these classifications to extract valuable insights and information. An example would be creating marketing clusters and profiles that can help you manage your campaigns.

Reinforcement Learning

This method trains a machine to learn and make decisions to optimize a reward and minimize risk. A computer could for example be trained to play chess by evaluating board positions. It could simultaneously evaluate all the combinations of moves and choose the best one to move forward by using an evaluation function. This is also a branch of artificial intelligence. Software agents detect ideal behavior to maximize performance, and reward feedback is included to help learn and reinforce the desired behavior. Autonomous vehicles also use these methods to make decisions in real-time.

The various applications can be broadly mapped across these categories. Here is a depiction of only a fraction of them as including all of them would easily create an explosive kaleidoscope.

Now that we have a better understanding of the different types of machine learning, how do we tackle the challenge of selecting the appropriate application for our business problem? This can be a daunting task, but it can be simplified by considering these key questions:

What is the domain of the machine learning application and what is it supposed to do?

This is at heart of the decision. The purpose will drive the value and help make your machine learning application not only viable but also useful. I’ve often told people that you could build a fantastic engine, but without a chassis, it’s not going anywhere. You must think about the ways that your model will gain and drive user acceptance. Understanding the problems within your department is the starting point to identifying your purpose statement. You don’t want to build a better model, but a differentiated model.

Understand your data. What does your data look like?

Garbage in is garbage out. Spending time understanding your data is an essential step for all machine learning applications. Dig into the weeds and get a solid understanding of your data. Is it structured or unstructured? What are your data sources and formats? Run an exploratory data analysis (EDA) report to detect any anomalies and outliers in your data. The EDA is often overlooked and is critical for ensuring data quality.

Examples of Models and Data Structures Supported

It helps to use tools and packages that run summary reports on all the variables in your dataset as part of your EDA. You should check distributions, min/max values, missing and distinct values. Graphs and charts can check for anomalies or outliers that can skew your estimates while also detecting numeric and categorical variables. A sample report below:

What is the evaluation criteria for your model?

The effectiveness of the model is a combination of assessing model performance and business fit. There are various measures available to gauge model performance that are different for each type of machine learning technique. Next, and equally as important, is identifying KPIs that will associate your model performance to overall business goals like revenue. You will want to show how your model does not only a good job at predicting certain events but how it is integrated with the business. Give yourself some liberty to explore and define different measures of value. Do you want to drive incremental promotions and sales through certain channels? Include that in your performance reports. Finally, establish a baseline for which your model will be evaluated against. Eventually someone will ask you to prove how your model adds incremental value so anticipate how you setup your test and control scenarios or champion and challenger models.

The purpose, data, and evaluation criteria is a three pronged approach for helping you determine the machine learning technique best suitable for your business problems. At the end of the day, a successful model will be relevant, useful, and robust. Congratulations you are now equipped with the necessary knowledge for adapting machine learning and are in a dangerous situation to show how machine learning can add benefit to your organization. 🙂

A Guide to Choosing the Right Machine Learning Technique

Recent Posts

Comments