17 IMPORTANT TERMS BEFORE GETTING INTO DATA SCIENCE/ANALYTICS
1. Data Lake
A large storage system that holds raw, unprocessed data from multiple sources in any format (structured, semi‑structured, unstructured).
2. Data Warehouse
A centralized storage system that holds cleaned, structured, and organized data specifically prepared for analytics and reporting.
3. ETL (Extract, Transform, Load)
A process where data is extracted from sources, transformed into a proper format, and loaded into a warehouse or database.
4. Data Mining
The process of discovering patterns, correlations, and insights from large datasets using statistics and machine learning.
5. Machine Learning (ML)
A field of AI where systems learn patterns from data and make predictions or decisions without being explicitly programmed.
6. Artificial Intelligence (AI)
AI refers to machines performing tasks that require human-like intelligence, such as reasoning, learning, and problem‑solving.
7. Big Data
Massive datasets that are too large and complex for traditional tools, characterized by Volume, Velocity, Variety, Veracity, and Value.
8. Data Science (DS)
A discipline that combines statistics, programming, and domain knowledge to extract meaningful insights from data.
Recommended by LinkedIn
9. Data Visualization
The representation of data through graphs, charts, dashboards, and plots to make information easy to understand.
10. Data Governance
A set of policies and procedures ensuring data quality, security, privacy, and compliance across an organization.
11. Data Pipelines
Automated flows that move, transform, and process data from one system to another reliably and continuously.
12. API (Application Programming Interface)
A set of rules that allows software systems to communicate and share data with each other.
13. Cloud Computing
Delivering computing services—like servers, storage, databases, and ML—over the internet instead of local machines.
14. Model (in ML/AI)
A mathematical system that has learned patterns from data and can make predictions or decisions.
15. Algorithm
A step-by-step set of rules or instructions used to solve a problem or compute a result.
16. Vector Database
A special database optimized to store and search vector embeddings used in AI, similarity search, and LLM retrieval.
17. LLM (Large Language Model)
An advanced AI model trained on massive text data, capable of understanding, generating, and reasoning with human language (example: GPT-based models).