Description
1.1 Data Analytics Overview
-
Definition: The process of examining raw data to find patterns, draw conclusions, and support decision-making.
-
Importance: * Informed Decisions: Moves from “gut feeling” to data-driven choices.
-
Efficiency: Identifies bottlenecks in business processes.
-
Customer Insight: Understands behavior and preferences.
-
1.2 Types of Data Analytics
These are often viewed as a maturity model (from simple to complex).
| Type | Question Answered | Focus |
| Descriptive | What happened? | Historical data, reports, dashboards. |
| Diagnostic | Why did it happen? | Finding root causes, data drilling/mining. |
| Predictive | What will happen? | Forecasting, trends, machine learning. |
| Prescriptive | How can we make it happen? | Optimization, simulation, “what-if” analysis. |
| Visual | What does the data look like? | Graphs, charts, interactive storytelling. |
1.3 Life Cycle & Quality
-
Data Analytics Life Cycle:
-
Discovery: Business objectives.
-
Preparation: Cleaning and transforming data.
-
Model Planning: Choosing algorithms.
-
Model Building: Execution.
-
Communicate Results: Visualization.
-
Operationalize: Deployment.
-
-
Quality vs. Quantity: More data is not always better. Quality (accuracy, completeness, consistency) beats Quantity (volume) if the volume is “noisy” or biased.
-
Measurement: Assigning numbers to observations (Scales: Nominal, Ordinal, Interval, Ratio).
1.4 Data Types & Statistics
-
Data Types:
-
Qualitative (Categorical): Nominal (Labels, e.g., Color) and Ordinal (Ordered, e.g., Ratings).
-
Quantitative (Numerical): Discrete (Counts, e.g., 5 people) and Continuous (Measurements, e.g., 5.5 kg).
-
-
Measures of Central Tendency: * Mean: Average.
-
Median: Middle value.
-
Mode: Most frequent value.
-
-
Measures of Dispersion: * Range: Max – Min.
-
Variance: Average of squared differences from the Mean.
-
Standard Deviation: Square root of variance; indicates how spread out data is.
-
1.5 Sampling & Probability Concepts
-
Sampling Funnel: The process of narrowing down a Population (the whole group) to a Sample (the subgroup we actually study).
-
Central Limit Theorem (CLT): States that if you take enough large samples from any population, the means of those samples will follow a Normal Distribution (Bell Curve).
-
Confidence Interval (CI): A range of values (e.g., 95%) within which we are reasonably sure the true population parameter lies.
-
Sampling Variation: The natural difference between results from different samples taken from the same population.
Quick Recall Keywords for Exams:
-
GIGO: “Garbage In, Garbage Out” (referring to Data Quality).
-
Bell Curve: Visual representation of Normal Distribution (CLT).
-
Root Cause: The goal of Diagnostic Analytics.
-
Outlier: Data points that fall far outside the normal range (affects the Mean heavily).





Reviews
There are no reviews yet.