Glossary

All | # A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

There are currently 19 names in this directory beginning with the letter R.

R-squared or coefficient of determination

R-squared is a measure of the percentage of the variance in the target variable that is explained by variance in predictors, collectively.

Random forest

Random forest is a machine learning algorithm that combines the predictions of multiple decision trees to make more accurate and robust predictions. It falls under the category of ensemble learning, where multiple models are combined to form a more powerful predictive model.

Random subset selection

Each decision tree in the random forest is built using a random subset of the training data. This process, known as bootstrap aggregating or “bagging,” involves randomly sampling data points with replacement to create diverse subsets for training each tree.

Random variable selection

The random forest algorithm selects a random subset of input variables for each decision tree. This technique, known as variable bagging or variable subsampling, introduces additional randomness and helps to reduce the dominance of any single variable in the ensemble.

Reactive research

Reactive research focuses on examining events that have already happened and identifying the root causes of a specific incident or accident. Based on the identified root causes, the organizations can form appropriate strategies to mitigate the risk.

Receiver Operating Characteristic (ROC) chart

The ROC chart displays the true positive rate (sensitivity) on the vertical axis and the false positive rate (sensitivity) on the horizontal axis, illustrating the trade-off between sensitivity and specificity.

Reconstruction

PCA allows the inverse transformation of the lower-dimensional data back into the original high-dimensional space. Although some information is lost during dimensionality reduction, the reconstruction can still be useful for interpretation or other downstream tasks.

Recursive partitioning

Recursive partitioning denotes an iterative procedure involving the division of data into partitions, followed by further splitting along each branch.

Regression coefficients

Regression coefficients are the coefficients assigned to each predictor (Xi), indicating the magnitude and direction of their influence on the target variable. A positive coefficient means an increase in the predictor leads to an increase in the target variable, and vice versa for a negative coefficient.

Regression method

Regression is a popular method that estimates the relationship between one dependent variable (target variable) and multiple independent variables (predictors).

Regression model

See Association model

Relative Squared Error (RSE)

RSE is a common metric for association models. It measures the relative improvement of the model’s predictions compared to a simple baseline model that always predicts the mean of the actual values.

Replacement

Replacement is a data modification technique to correct errors, reassign values, or remove incorrect information in the data.

Representative sample

Representative sample is a sample that accurately reflects the characteristics and diversity of the population.

Residual by predicted chart

The residual by predicted chart presents not only how close the actual values are to the predicted values but also indicates any patterns in the residuals.

Residuals

Residuals measure the difference between actual values and predicted values in a dataset.

Robustness

The algorithm is known for its robustness against overfitting, noisy data, and outliers. By aggregating multiple decision trees, it can capture complex relationships and handle noisy datasets more effectively.

Root Mean Squared Error (RMSE)

RMSE is a common metric for association models. It measures the average difference between the predicted values and the actual values in a dataset, expressed in the same units as the predicted variable.

Root node

A root node is at the beginning of a tree. It represents the initial decision based on a specific feature that has the most significant impact on the target variable. Starting from the root node, the dataset is partitioned based on different features, and subsequently, these subgroups are further divided at each decision node located beneath the root node.