Modeling with Machine Learning Algorithms
Introduction
Machine learning algorithms are at the heart of data mining, data science, and artificial intelligence (AI). As SAS defines it, machine learning is “a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns, and make decisions with minimal human intervention.” This definition highlights the synergy between machine learning and AI, enabling groundbreaking technologies like Tesla’s self-driving systems, online recommendations from Amazon and Netflix, fraud detection by banks, and generative AI tools like ChatGPT, Copilot, and Google Gemini. In this context, machine learning methods are powerful tools for discovering hidden patterns in data to make predictions.

Supervised vs. Unsupervised Learning
Machine learning algorithms can be broadly categorized into supervised learning and unsupervised learning. The choice between supervised and unsupervised learning depends on your dataset and research goals.
Supervised Learning:
Designed for structured data, supervised learning requires a target variable. The goal is to develop models that predict this target variable using input predictors. Examples include regression, logistic regression, decision trees, random forests, gradient boosting, support vector machines, and neural networks. These methods are widely used for applications such as predicting customer behavior, detecting fraud, or forecasting outcomes.
Unsupervised Learning:
Unlike supervised learning, unsupervised learning works with data that lacks a specific target variable. The focus is on uncovering hidden structures within the dataset, such as grouping similar items or identifying underlying patterns. Algorithms like cluster analysis, principal component analysis, and multi-dimensional scaling fall under this category.
Transforming data into decisions: The power of modeling and configuration in machine learning

Algorithms
The following are algorithms we cover

Modeling
Modeling in machine learning creates a computational representation of a system using data, enabling algorithms to learn patterns and relationships. Once trained, models predict, classify, or generate insights from new data, turning raw data into actionable results.
The modeling process typically involves:
- Selecting the Algorithm: Choosing an appropriate machine learning algorithm based on the problem type and data characteristics.
- Training the Model: Feeding the algorithm with labeled or unlabeled data to adjust its parameters and optimize its performance.
- Evaluating the Model: Testing the model’s accuracy, precision, recall, or other metrics using a validation dataset to ensure it generalizes well to unseen data.
Model Configuration
Model configuration involves defining parameters, hyperparameters, and architecture to optimize a machine-learning model’s performance. It determines how the model processes data, its structure, and training rules, ensuring accuracy and suitability for the task.
Key aspects of model configuration include:
- Hyperparameter Tuning: Adjusting hyperparameters like learning rate, regularization strength, number of layers in a neural network, or the number of trees in a random forest. These are set before training and significantly affect the model’s performance.
- Feature Engineering: Deciding which features (variables) to include in the model and how to preprocess them (e.g., normalization, encoding categorical variables).
- Model Architecture: For complex algorithms like neural networks, defining the number of layers, neurons, activation functions, and other architectural details.
- Loss Function: Configuring the metric the model uses to measure its prediction error during training.
- Optimization Method: Selecting an optimization algorithm (e.g., gradient descent, Adam) to minimize the loss function.

