Glossary

All | # A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

There are currently 20 names in this directory beginning with the letter D.

Data de-identification

Data de-identification is the process of removing or altering identifying information from a dataset to protect the privacy and anonymity of individuals.

Data mining

Data mining is a method of using machine learning algorithms to detect unknown patents involving relationships among variables within large datasets to predict outcomes of interest, which can lead to informed business decisions.

Data mining model

See Predictive model

Data mining process

Data mining process refers to steps involves discovering meaningful patterns, relationships, and insights from large datasets using various techniques and algorithms.

Data modification

Data modification involves preparing and transforming the raw data to make it suitable for training our predictive models.

Data science

Data science is a larger and multidisciplinary field, focusing on capturing and extracting knowledge from data and communicating the outcomes. Data science consists of data mining as an essential part of analyzing data and other components such as data collection and management, data treatment, data visualization, computer programming, and artificial intelligence applications.

Data scientist

Data scientist is a professional who does a task or a combination of tasks involving analytics, data collection and treatment, data mining, machine learning, and programming

Data visualization

Data visualization is the graphical representation of data and information through charts, graphs, and other visual elements to help understand patterns, trends, and insights in a more intuitive and accessible way.

Decision matrix

Decision matrix is the table that presents the costs of misclassifications, including costs of false postives and costs of false negatives.

Decision node

When a sub-node is divided into additional sub-nodes, it is referred to as a decision node.

Decision tree

Decision tree is a logical rule-based method that presents a hierarchical structure of variables, including the root node, parent nodes, and child nodes.

Deep learning

Deep learning is a complex neural network with many hidden layers. Deep learning breakthroughs lead to AI boom.

Dendrograms

A dendrogram is a tree-like diagram that shows the hierarchical relationship between clusters in hierarchical clustering. The height of each branch represents the distance between clusters at that level.

Density plot

A density plot is a graphical representation of the distribution of continuous data, providing an estimate of the underlying probability density function, often using smoothed curves.

Dependent variable

See Target variable

Descriptive statistics

Descriptive statistics is a branch of statistics that involves the summarization and presentation of data to provide a clear and concise understanding of its main characteristics, such as measures of central tendency, dispersion, and distributions.

Dimensionality reduction

Once the principal components have been identified, PCA can be used to transfer the high-dimensional data into a lower-dimensional data, while still retaining as much of the original variation as possible. The number of principal components retained determines the dimensionality of the new data.

Directed Acyclic Graph (DAG)

The graphical structure of a Bayesian network is represented by a DAG, which is a graph without cycles (no path that starts and ends at the same node). The absence of cycles ensures that the network does not contain feedback loops and allows for efficient probabilistic inference.

Distance or similarity measures

A distance, also known as a similarity measure, quantifies the similarity or dissimilarity between pairs of objects. These measures are used to determine the proximity between objects and form the basis for clustering algorithms. Common distance measures include Euclidean distance, Manhattan distance, and cosine similarity.

Diversity

The primary strength of ensemble models lies in the diversity of their individual models. By using different learning algorithms or training on different subsets of the data, the ensemble captures various patterns and reduces the risk of overfitting.