Glossary
There are currently 20 names in this directory beginning with the letter D.
D
Data de-identification
Data de-identification is the process of removing or altering identifying information from a dataset to protect the privacy and anonymity of individuals.
Data mining
Data mining is a method of using machine learning algorithms to detect unknown patents involving relationships among variables within large datasets to predict outcomes of interest, which can lead to informed business decisions.
Data mining process
Data mining process refers to steps involves discovering meaningful patterns, relationships, and insights from large datasets using various techniques and algorithms.
Data modification
Data modification involves preparing and transforming the raw data to make it suitable for training our predictive models.
Data science
Data science is a larger and multidisciplinary field, focusing on capturing and extracting knowledge from data and communicating the outcomes. Data science consists of data mining as an essential part of analyzing data and other components such as data collection and management, data treatment, data visualization, computer programming, and artificial intelligence applications.
Data scientist
Data scientist is a professional who does a task or a combination of tasks involving analytics, data collection and treatment, data mining, machine learning, and programming
Data visualization
Data visualization is the graphical representation of data and information through charts, graphs, and other visual elements to help understand patterns, trends, and insights in a more intuitive and accessible way.
Decision matrix
Decision matrix is the table that presents the costs of misclassifications, including costs of false postives and costs of false negatives.
Decision node
When a sub-node is divided into additional sub-nodes, it is referred to as a decision node.
Decision tree
Decision tree is a logical rule-based method that presents a hierarchical structure of variables, including the root node, parent nodes, and child nodes.
Deep learning
Deep learning is a complex neural network with many hidden layers. Deep learning breakthroughs lead to AI boom.
Dendrograms
A dendrogram is a tree-like diagram that shows the hierarchical relationship between clusters in hierarchical clustering. The height of each branch represents the distance between clusters at that level.
Density plot
A density plot is a graphical representation of the distribution of continuous data, providing an estimate of the underlying probability density function, often using smoothed curves.
Descriptive statistics
Descriptive statistics is a branch of statistics that involves the summarization and presentation of data to provide a clear and concise understanding of its main characteristics, such as measures of central tendency, dispersion, and distributions.
Dimensionality reduction
Once the principal components have been identified, PCA can be used to transfer the high-dimensional data into a lower-dimensional data, while still retaining as much of the original variation as possible. The number of principal components retained determines the dimensionality of the new data.
Directed Acyclic Graph (DAG)
The graphical structure of a Bayesian network is represented by a DAG, which is a graph without cycles (no path that starts and ends at the same node). The absence of cycles ensures that the network does not contain feedback loops and allows for efficient probabilistic inference.
Distance or similarity measures
A distance, also known as a similarity measure, quantifies the similarity or dissimilarity between pairs of objects. These measures are used to determine the proximity between objects and form the basis for clustering algorithms. Common distance measures include Euclidean distance, Manhattan distance, and cosine similarity.

