We live in a Big Data world, one in which everything is quantified. As a result, customer-centric professional (e.g., customer experience, customer success) are increasingly using the practice of data science to extract value from these data. As the field of data science evolves, more terms are being used to define the process by which value is extracted from data. Consequently, it is not surprising that the less data-savvy customer professional can get lost in the esoteric terminology.
I believe that a good first step for customer-centric professionals to leverage customer data is to understand the terms data professionals use when extracting insights from their data. I put together a glossary of terms that you will commonly hear when data professionals discuss data science. Bookmark this page as I will be continually updating it with additional data science terms and definitions. If you have suggestions for terms you want defined, please leave them in the comments section.
An algorithm is a self-contained sequence of specified actions performed to solve a mathematical problem. Algorithms perform calculation, data processing, and/or objective tasks.
Artificial Intelligence (AI)
Artificial Intelligence is a field in computer science that focuses on developing computer systems to perform tasks that usually require human intelligence, including visual perception, speech recognition, decision-making, and translation between languages.
In a data warehouse, atomic data is the lowest level of detail. In business, for example, the renewal of an individual service contract is recorded as a piece of atomic data.
Behavioral analytics focuses on observing how consumers behave while interacting with a product or service to help us predict what they are likely to do in the future.
Intentionally collecting data in a specific way to reveal insights about what is happening presently and historically. Reports with tables and visualization can give important insight, but also reveal what missing data should be collected for a better picture.
Big Data is a phenomenon that is the result of the quantification of everything; it is not one thing; it covers six broad areas: 1) the 3 Vs, including the size (i.e., volume), speed (i.e., velocity) and complexity (i.e., variety) of the data and the technology to process it all, 2) Analytic approaches, 3) the people who analyze/process the data, 4) data integration, 5) communicating the results and 6) ethical/security/privacy considerations.
The practice of using a network of remote servers to store, manage, and process data.
Cognitive bias refers to people’s tendencies to acquire and process information by filtering it through one’s own preferences and beliefs. Cognitive biases can lead to errors in decisions and judgment.
Correlation analysis is a statistical technique used to study the strength of the linear relationship between two variables. Depending on the two variables being studied, correlations can be either positive (as one variable increases, the other increases) or negative (as one variable increases, the other decreases). Correlation analysis does not determine causality, but may indicate the possibility.
Data mining is the process of locating, munging, and extracting information from large amounts of data.
The process of exploring and transforming data into an appropriate form for additional analytics.
Data science is a catch phrase for extracting insights from data. The goal of data science is to derive empirically-based insights to augment and enhance human decisions and algorithms. The skills needed to successfully practice data science revolve around three general areas: 1) subject matter expertise, 2) technology/programming and 3) statistics/math. In data-intensive projects, the application of these three skills helps you ask the right questions, access the right data and analyze the data to answer the questions, respectively.
Deep learning is a class of machine learning algorithms that are modeled after the information processing and communication patterns of the brain. Deep learning uses layers of units or nodes for feature extraction and transformation, each layer using the output of the previous layer as input. Higher level features are derived from lower level features to form a hierarchical representation.
Descriptive analytics is the basic type of analytics that businesses use to extract insight from their data. Basically, the purpose of descriptive analytics is to summarize statistical trends in the data, and not necessarily to provide predictions.
Empirical / Empiricism
Verifiable by means of observation or experiment / the philosophy that all knowledge is derived from experience.
The Empirical Enterprise
Companies that make business decisions based on information that is derived through empiricism.
ETL is short for Extract, Transform and Load. These three functions are combined into a single tool that helps companies extract data from source systems and bring it into a data warehouse.
General Data Protection Regulation (GDPR)
The GDPR was designed “to harmonize data privacy laws across Europe, to protect and empower all EU citizens data privacy and to reshape the way organizations across the region approach data privacy.” The regulation was adopted on 27 April 2016 and becomes enforceable from 25 May 2018.
Internet of Things (IoT)
The Internet of Things is about the interconnection of physical devices — embedded with electronics, software, sensors, actuators, and network connectivity, enabling them to send and receive data.
Machine learning uses statistics/math to allow computers to find hidden insights (i.e., make predictions) without being explicitly programmed where to look. Iterative in nature, machine learning algorithms continually learn from data. The more data they ingest, the better they get at making predictions. Based on math, statistics and probability, algorithms find connections among variables that help optimize important organizational outcomes.
As the amount of data continues to grow, businesses will be leveraging the power of machine learning to quickly sift through data to find hidden patterns. Data scientists, who are simply unable to quickly sift through the sheer volume of data manually, adopt machine learning capabilities to quickly uncover insights to help our clients identify which of their customers are at risk of churning and the reasons why.
A set of data that describes other data.
The purpose of predictive analytics is to be able to predict the future or to be able to predict data that we don’t have.
Prescriptive analytics helps us determine what decision we need to make to 1) take advantage of a future opportunity or mitigate a future risk and 2) show the implication of each decision option. You can think of prescriptive analytics as a way of applying human judgement to the results of descriptive and predictive analytics. Human judgement takes the form of business rules, and these rules can be codified into the analytics process.
The scientific method is body of techniques for objectively investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. The scientific method includes the collection of empirical evidence, subject to specific principles of reasoning. Through trial and error, the scientific method helps us uncover the reasons why variables are related to each other and the underlying processes that drive the observed relationships.
Statistics is the study of the collection, analysis, interpretation, presentation, and organization of data. The American Statistical Association defines statistics as “the science of learning from data, and of measuring, controlling, and communicating uncertainty.”
Don’t forget to bookmark this page as I will be continually updating it with new terms and definitions.