Skip to content

Commit 043ad72

Browse files
committed
Getting Started with ML Docs
1 parent f252a9a commit 043ad72

File tree

11 files changed

+849
-1
lines changed

11 files changed

+849
-1
lines changed

docs/ai-ml/machine-learning/index.mdx

Lines changed: 0 additions & 1 deletion
This file was deleted.

docs/machine-learning/fundamentals/data-splitting.mdx

Whitespace-only changes.

docs/machine-learning/fundamentals/ml-workflow.mdx

Whitespace-only changes.

docs/machine-learning/fundamentals/types-of-learning.mdx

Whitespace-only changes.
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
---
2+
title: "What is Machine Learning (ML)?"
3+
sidebar_label: "What is ML?"
4+
description: "Define Machine Learning, its key characteristics, and how it differs from traditional programming."
5+
tags:
6+
[
7+
machine-learning,
8+
ml,
9+
definition,
10+
ai,
11+
traditional-programming,
12+
data-driven,
13+
algorithms,
14+
]
15+
---
16+
17+
Machine Learning is a subset of Artificial Intelligence (AI) that focuses on building systems capable of learning patterns and making decisions or predictions directly from data, rather than following static, explicitly programmed instructions.
18+
19+
## The Formal Definition
20+
21+
A widely accepted, formal definition of Machine Learning was provided by computer scientist **Tom M. Mitchell** in 1997:
22+
23+
> A computer program is said to learn from **Experience ($E$)** with respect to some **Task ($T$)** and some **Performance measure ($P$)**, if its performance on $T$, as measured by $P$, improves with experience $E$.
24+
25+
Let's break down this concept with a simple example: **Spam Filtering**.
26+
27+
| Component | Description | Spam Filtering Example |
28+
| :--- | :--- | :--- |
29+
| **Task ($T$)** | The problem the ML system is trying to solve. | Classifying an email as "Spam" or "Not Spam (Ham)". |
30+
| **Experience ($E$)** | The data the ML system uses to train itself. | A large dataset of historical emails labeled as either spam or ham. |
31+
| **Performance ($P$)** | A metric used to evaluate the system's success. | **Accuracy:** The percentage of emails correctly classified. |
32+
33+
:::tip
34+
The core idea is that the program's ability to classify new, unseen emails gets better the more labeled examples it processes. The program *learns* the rules itself.
35+
:::
36+
37+
## ML vs. Traditional Programming
38+
39+
This is the most crucial concept when starting out. Machine Learning fundamentally shifts the paradigm of software development.
40+
41+
42+
43+
<Tabs>
44+
<TabItem value="traditional" label="Traditional Programming" default>
45+
46+
In traditional programming, you (the programmer) write explicit **Rules** (algorithms, logic, conditions) that process **Data** to produce an **Answer**.
47+
48+
```mermaid
49+
graph LR
50+
A[Data] --> B(Rules/Program);
51+
B --> C[Answer];
52+
```
53+
54+
**Example (Temperature Conversion):**
55+
You explicitly write the formula: `Fahrenheit = (Celsius * 9/5) + 32`. The computer executes this static rule.
56+
57+
</TabItem>
58+
<TabItem value="ml" label="Machine Learning">
59+
60+
In Machine Learning, you feed the system the **Data** and the desired **Answers** (Labels), and the system autonomously generates the **Rules** (the Model/Algorithm) that maps the input to the output.
61+
62+
```mermaid
63+
graph LR
64+
A[Data] --> B(ML Algorithm);
65+
C[Answers/Labels] --> B;
66+
B --> D[Rules/Model];
67+
```
68+
69+
**Example (Predicting House Price):**
70+
You feed it past house data (size, location) and the final sale price. The ML algorithm creates a complex mathematical model (the "Rule") that predicts the price of a *new* house based on its features.
71+
72+
</TabItem>
73+
</Tabs>
74+
75+
## Key Characteristics of Machine Learning
76+
77+
* **Data-Driven:** ML models require vast amounts of high-quality data to learn effectively.
78+
* **Automatic Pattern Discovery:** The system discovers hidden patterns, correlations, and rules in the data without human intervention.
79+
* **Generalization:** A good ML model can accurately predict or classify data it has never seen before (its performance improves with experience $E$).
80+
* **Iterative Process:** Developing an ML model is a cyclical process of data collection, training, evaluation, and refinement.
81+
82+
## Where is ML Used?
83+
84+
Machine Learning is the engine behind many everyday technologies:
85+
86+
| Domain | Application | ML Task |
87+
| :--- | :--- | :--- |
88+
| **E-commerce** | Recommendation Systems (e.g., "People who bought X also bought Y") | Classification / Ranking |
89+
| **Healthcare** | Tumor detection in X-rays or MRIs | Image Segmentation / Classification |
90+
| **Finance** | Fraud detection in credit card transactions | Anomaly Detection / Classification |
91+
| **Speech** | Voice assistants (Siri, Alexa) | Natural Language Processing (NLP) |
92+
| **Transportation**| Self-driving cars | Computer Vision / Reinforcement Learning |
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
---
2+
title: Introduction to Machine Learning
3+
sidebar_label: Introduction
4+
description: "A comprehensive introduction to the Machine Learning Tutorial structure, purpose, and key learning outcomes for CodeHarborHub learners."
5+
tags:
6+
[
7+
machine-learning,
8+
ml,
9+
introduction,
10+
ai,
11+
data-science,
12+
tutorial,
13+
codeharborhub,
14+
roadmap,
15+
ml-engineer,
16+
]
17+
---
18+
19+
Welcome to the **CodeHarborHub Machine Learning Tutorial**! This is your official gateway into the transformative world of Artificial Intelligence, data analysis, and predictive modeling.
20+
21+
:::info
22+
Machine Learning is not just about complex algorithms; it is about building systems that learn from data to make decisions or predictions *without* being explicitly programmed for every outcome.
23+
:::
24+
25+
## Why Machine Learning Now?
26+
27+
The demand for ML skills is soaring across every industry—from finance and healthcare to entertainment and autonomous technology. By learning ML, you are gaining one of the most valuable and future-proof skill sets in the 21st century.
28+
29+
### What You Will Learn
30+
31+
This tutorial provides a complete, structured roadmap to transform you into a proficient ML practitioner. By the end, you will master:
32+
33+
1. **Foundations:** The mathematical and statistical bedrock of ML.
34+
2. **Core Algorithms:** Implementing models like Linear Regression, Support Vector Machines, and K-Means.
35+
3. **Deep Learning:** Building advanced Neural Networks (CNNs, RNNs, Transformers).
36+
4. **Practical Workflow:** Handling real-world data, evaluating models, and deploying solutions (MLOps).
37+
5. **Coding:** Writing efficient, production-ready Python code using libraries like NumPy, Pandas, and Scikit-learn.
38+
39+
## Tutorial Structure Overview
40+
41+
This curriculum is designed as a deep, sequential progression. We move from the absolute basics (Math and Programming) to advanced deployment strategies.
42+
43+
<Tabs>
44+
<TabItem value="foundation" label="Foundations" default>
45+
### The Bedrock of ML
46+
This initial stage ensures you have the solid academic footing required for understanding the algorithms.
47+
48+
* **Mathematics:** Linear Algebra (Vectors, Matrices, Tensors) and Calculus (Derivatives, Gradients). For instance, the **Gradient Descent** optimization algorithm relies heavily on the partial derivative concept:
49+
$$
50+
\theta_{j} := \theta_{j} - \alpha \frac{\partial}{\partial \theta_{j}} J(\theta)
51+
$$
52+
* **Statistics & Probability:** Concepts like probability distributions, conditional probability, and data visualization.
53+
* **Programming Fundamentals:** Mastering Python, NumPy, and Pandas.
54+
</TabItem>
55+
<TabItem value="core_ml" label="ML & Deep Learning Core">
56+
### Algorithms and Architectures
57+
Here, you start building models and diving into neural networks.
58+
59+
* **ML Core:** Supervised, Unsupervised, and Reinforcement Learning paradigms.
60+
* **Data Engineering:** Preprocessing data, handling missing values, and the critical step of **Feature Engineering**.
61+
* **Deep Learning:** Understanding Perceptrons, Backpropagation, and specialized networks (CNNs for images, RNNs/Transformers for text).
62+
</TabItem>
63+
<TabItem value="advanced_ml" label="Advanced & Production">
64+
### Real-World Application
65+
The final stage focuses on specialized fields and moving models into production.
66+
67+
* **NLP:** Tokenization, Embeddings, and Attention Mechanisms for text processing.
68+
* **Explainable AI (XAI):** Tools like LIME and SHAP to interpret complex model decisions.
69+
* **MLOps:** The engineering discipline of deploying, monitoring, and maintaining ML models in a reliable and reproducible way (CI/CD, Model Versioning).
70+
</TabItem>
71+
</Tabs>
72+
73+
---
74+
75+
## The Machine Learning Engineer Role
76+
77+
Understanding the role helps you align your learning goals.
78+
79+
| Aspect | ML Engineer | AI Engineer |
80+
| :--- | :--- | :--- |
81+
| **Primary Focus** | Production-level implementation, deployment, MLOps, scalability, data pipelines. | Research, development of novel AI models (especially Deep Learning/Generative AI), fine-tuning large models. |
82+
| **Core Skills** | Python, Cloud (AWS/Azure/GCP), Docker, CI/CD, Scikit-learn, TensorFlow/PyTorch, **Data Engineering**. | Strong math/research background, Deep Learning frameworks, model optimization, **State-of-the-Art** techniques. |
83+
| **Goal** | Make models reliably work in production at scale. | Create new intelligence capabilities or highly specialized models. |
84+
85+
:::success
86+
This tutorial provides a strong foundation for **both** roles, with a dedicated focus on the practical implementation skills needed for the **ML Engineer** track.
87+
:::
88+
89+
## Types of Machine Learning
90+
91+
```mermaid
92+
mindmap
93+
root((Machine Learning))
94+
Supervised Learning
95+
Regression
96+
Classification
97+
Unsupervised Learning
98+
Clustering
99+
Dimensionality Reduction
100+
Reinforcement Learning
101+
Reward Systems
102+
Agents & Environment
103+
```
104+
105+
<Tabs>
106+
<TabItem value="Supervised Learning" label="Supervised Learning" default>
107+
Learn from labeled data (input → correct output).
108+
Examples:
109+
* House price prediction
110+
* Spam detection
111+
* Disease prediction .
112+
</TabItem>
113+
114+
<TabItem value="Unsupervised Learning" label="Unsupervised Learning">
115+
Find hidden patterns in data without labels.
116+
Examples:
117+
* Customer segmentation
118+
* Anomaly detection
119+
* Data clustering
120+
</TabItem>
121+
122+
<TabItem value="Reinforcement Learning" label="Reinforcement Learning">
123+
Learn through rewards and penalties.
124+
Examples:
125+
* Robotics
126+
* Game AI
127+
* Autonomous vehicles
128+
</TabItem>
129+
</Tabs>
130+
131+
## Tools You Will Use
132+
133+
<Tabs>
134+
<TabItem value="python" label="Python" default>
135+
Python is the primary language for ML due to its simplicity and rich ecosystem.
136+
</TabItem>
137+
138+
<TabItem value="libraries" label="Libraries">
139+
- NumPy
140+
- Pandas
141+
- Matplotlib / Seaborn
142+
- Scikit-Learn
143+
- TensorFlow
144+
- PyTorch
145+
</TabItem>
146+
147+
<TabItem value="notebooks" label="Notebooks">
148+
Jupyter Notebooks help you write code, visualize results, and document your workflow.
149+
</TabItem>
150+
</Tabs>
151+
152+
## Ready to Begin?
153+
154+
Start by learning the fundamental definition of Machine Learning and the core concepts that define this field.
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
title: "ML Engineer vs. AI Engineer"
3+
sidebar_label: "MLE vs. AIE"
4+
description: "A clear comparison of the Machine Learning Engineer, AI Engineer, and Data Scientist roles, focusing on responsibilities, tools, and project scope."
5+
tags:
6+
[
7+
ml-engineer,
8+
ai-engineer,
9+
data-scientist,
10+
comparison,
11+
roles,
12+
career-path,
13+
ai,
14+
ml,
15+
]
16+
---
17+
18+
The titles in the Artificial Intelligence (AI) domain often overlap, leading to confusion. While job descriptions vary widely by company, we can define the typical focus area for the three core roles: **Data Scientist (DS)**, **Machine Learning Engineer (MLE)**, and **AI Engineer (AIE)**.
19+
20+
21+
## 1. Data Scientist (DS): The Statistician & Modeler
22+
23+
The DS role is primarily focused on **discovery and experimentation**.
24+
25+
* **Goal:** To answer business questions using data, uncover patterns, and build predictive models in an experimental environment (e.g., Jupyter Notebooks).
26+
* **Focus:** **Why** and **What** is the data telling us? They are the domain experts in statistical modeling and analysis.
27+
* **Key Responsibilities:**
28+
* Statistical analysis and hypothesis testing.
29+
* Developing novel modeling approaches.
30+
* Data visualization and storytelling with data.
31+
* Communicating insights to stakeholders.
32+
* **Tools:** Python, R, Pandas, Scikit-learn, statistical packages.
33+
34+
## 2. Machine Learning Engineer (MLE): The Production Expert
35+
36+
The MLE role is the bridge between the experimental DS model and the production system.
37+
38+
* **Goal:** To turn high-performing models into reliable, scalable services used by millions of users.
39+
* **Focus:** **How** do we integrate this model into the product pipeline? They are system-level engineers specializing in ML.
40+
* **Key Responsibilities:**
41+
* Designing and implementing robust data pipelines.
42+
* Deploying models using MLOps tools (Docker, Kubernetes).
43+
* Monitoring model performance (drift detection, latency).
44+
* Optimizing model code for speed and efficiency.
45+
* **Tools:** Python, Cloud Platforms (AWS, Azure, GCP), Docker, Kubernetes, CI/CD, MLflow/DVC.
46+
47+
## 3. AI Engineer (AIE): The Advanced Modeler & Specialist
48+
49+
The AIE role is often used interchangeably with MLE, but when distinct, it typically focuses on **cutting-edge AI domains**.
50+
51+
* **Goal:** To work with and advance complex, high-impact AI systems, particularly in Deep Learning, NLP, and Computer Vision.
52+
* **Focus:** **What** state-of-the-art model should we use? They specialize in specific deep learning architectures.
53+
* **Key Responsibilities:**
54+
* Implementing and fine-tuning large, complex models (e.g., Transformers, LLMs, Generative Models).
55+
* Optimizing GPU/TPU utilization for training large neural networks.
56+
* Researching and adopting new AI architectures.
57+
* **Tools:** PyTorch, TensorFlow, Hugging Face, distributed training frameworks.
58+
59+
## Comparison Table
60+
61+
| Feature | Data Scientist (DS) | ML Engineer (MLE) | AI Engineer (AIE) |
62+
| :--- | :--- | :--- | :--- |
63+
| **Primary Output** | Insights, Reports, Experimental Models | Production-Ready ML Services/APIs | Specialized Deep Learning Systems |
64+
| **Core Skill** | Statistics, Modeling, Domain Knowledge | Software Engineering, MLOps, System Design | Deep Learning, Advanced AI Architectures |
65+
| **Project Stage** | Exploration & Proof-of-Concept | Deployment & Maintenance | Research & Implementation of Advanced Models |
66+
| **Typical Stack** | Python/R, Jupyter, Scikit-learn | Python, Docker, Kubernetes, Cloud SDKs | Python, PyTorch/TensorFlow, GPUs/TPUs |
67+
68+
:::important
69+
**CodeHarborHub's Focus:** This tutorial is geared towards the **Machine Learning Engineer** skillset. We will give you the *modeling foundation* of a Data Scientist and the *engineering discipline* of a Software Engineer, emphasizing the MLOps skills needed for real-world production.
70+
:::

0 commit comments

Comments
 (0)