Nov 15, 2025

Machine Learning Series: Episode 1.75

How does the complete ML workflow look like?

The Complete Journey

1. Prepare Data (labeled examples)
   ↓
2. Define Model (layers) --> THIS IS WHAT TINYTORCH IS BUILDING
   ↓
3. Training Loop (forward → loss → backward → optimize)
   ↓
4. Save Model (frozen weights)
   ↓
5. Deploy (inference only)

Step 1: Prepare Sample Data

First, we need labeled data - inputs with correct answers.

# Example: Simple classification problem
# Predicting if a number is even or odd

# Training data: (input, correct_label)
training_data = [
    (Tensor([2.0]), Tensor([1.0, 0.0])),  # 2 is even → [1, 0]
    (Tensor([3.0]), Tensor([0.0, 1.0])),  # 3 is odd → [0, 1]
    (Tensor([4.0]), Tensor([1.0, 0.0])),  # 4 is even → [1, 0]
    (Tensor([5.0]), Tensor([0.0, 1.0])),  # 5 is odd → [0, 1]
    # ... more examples
]

# What we're teaching:
# - Input: A number
# - Output: [probability_even, probability_odd]
# - Label: The correct answer [1, 0] or [0, 1]

Why this matters: The model learns patterns from these examples. More examples = better learning.

Step 2: Define Model Architecture (Layers)

We build the model using layers - the building blocks.

# Simple neural network using TinyTorch
class SimpleModel:
    def __init__(self):
        # Layer 1: Input (1 feature) → Hidden (4 neurons)
        self.layer1 = Linear(in_features=1, out_features=4)
        # Layer 2: Hidden (4 neurons) → Output (2 classes: even/odd)
        self.layer2 = Linear(in_features=4, out_features=2)

    def forward(self, x):
        # Forward pass through layers
        x = self.layer1(x)      # Transform input
        x = relu(x)              # Activation function
        x = self.layer2(x)       # Final prediction
        return x

model = SimpleModel()

What’s happening:

Layers transform data: input → hidden → output
Each layer has learnable weights (initialized randomly)
The model structure defines how data flows

Step 3: Training Loop - All Concepts Together

This is where autograd, backpropagation, and optimizers work together.

# Initialize optimizer (manages weight updates)
optimizer = SGD(model.parameters(), lr=0.01)

# Loss function (measures "wrongness")
def loss_function(prediction, target):
    # Cross-entropy: penalizes wrong predictions
    return cross_entropy_loss(prediction, target)

# Training loop
for epoch in range(100):  # Train for 100 iterations
    total_loss = 0

    for input_data, correct_label in training_data:
        # ============================================
        # FORWARD PASS
        # ============================================
        # Model makes prediction using current weights
        prediction = model.forward(input_data)
        # prediction = [0.6, 0.4] means: 60% even, 40% odd

        # Compute loss: how wrong is the prediction?
        loss = loss_function(prediction, correct_label)
        # High loss = model is wrong
        # Low loss = model is correct

        # ============================================
        # BACKPROPAGATION (Autograd)
        # ============================================
        # This is where the magic happens!
        loss.backward()

        # What autograd does:
        # 1. Tracks computation graph (input → layer1 → layer2 → loss)
        # 2. Computes gradients for each weight
        # 3. Stores gradients in weight.grad

        # After backward():
        # - layer1.weight.grad = "how much does layer1.weight affect loss?"
        # - layer2.weight.grad = "how much does layer2.weight affect loss?"

        # ============================================
        # OPTIMIZATION
        # ============================================
        # Update weights based on gradients
        optimizer.step()
        # What this does:
        # - Uses gradients to update weights
        # - Moves weights in direction that reduces loss
        # - Formula: weight = weight - learning_rate * gradient

        # Reset gradients for next iteration
        optimizer.zero_grad()

        total_loss += loss.item()

    # Print progress
    if epoch % 10 == 0:
        print(f"Epoch {epoch}, Loss: {total_loss / len(training_data)}")

How the Concepts Connect

1. FORWARD PASS
   input → layers → prediction → loss
   ↓
   Model uses current weights to make prediction
   Loss measures how wrong it is

2. BACKPROPAGATION (Autograd)
   loss.backward()
   ↓
   Computes gradients for all weights
   Tracks dependencies through computation graph

3. OPTIMIZATION
   optimizer.step()
   ↓
   Updates weights using gradients
   Moves toward lower loss

4. REPEAT
   Next iteration: model should be slightly better

Step 4: Model Evaluation

After training, check if the model learned:

# Test the trained model
test_cases = [
    (Tensor([6.0]), [1.0, 0.0]),  # Should predict even
    (Tensor([7.0]), [0.0, 1.0]),  # Should predict odd
]

model.eval()  # Set to evaluation mode (no training)

for input_data, expected in test_cases:
    with no_grad():  # No gradients needed for inference
        prediction = model.forward(input_data)
        print(f"Input: {input_data.data}, Prediction: {prediction.data}, Expected: {expected}")

Step 5: Save the Trained Model

Once training is complete, save the learned weights:

# Save model weights (the learned parameters)
model_state = {
    'layer1_weight': model.layer1.weight.data,
    'layer1_bias': model.layer1.bias.data,
    'layer2_weight': model.layer2.weight.data,
    'layer2_bias': model.layer2.bias.data,
}

save_model(model_state, 'model_v1.pth')
print("Model saved! Weights are frozen.")

What we’re saving: The learned weights that make the model accurate. These are static - they won’t change.

Step 6: Deploy Model (Inference Only)

In production, we load the saved model and use it for predictions:

# Load saved model
model = SimpleModel()
model.load_state_dict(load_model('model_v1.pth'))
model.eval()  # Evaluation mode: no training

# Production inference
def predict(input_number):
    """
    Make prediction on new data.
    Model is frozen - no learning happens here.
    """
    input_tensor = Tensor([input_number])

    with no_grad():  # No gradients needed (no training)
        prediction = model.forward(input_tensor)

        # Convert to readable output
        if prediction.data[0] > prediction.data[1]:
            return "even"
        else:
            return "odd"

# Use the deployed model
print(predict(8))   # "even"
print(predict(9))   # "odd"
print(predict(10))  # "even"

Key points:

Model weights are frozen (static)
Only forward pass (no backward, no gradients)
Fast inference, no training overhead
Same input → same output (deterministic)

The Complete Picture

Training Phase (Dynamic)

Data → Model → Prediction → Loss
                ↓
            Backward (Autograd)
                ↓
            Gradients
                ↓
            Optimizer Updates Weights
                ↓
            Repeat (model improves)

Deployment Phase (Static)

New Input → Model (frozen weights) → Prediction
           (no learning, no gradients)

Real-World Analogy

Training:

Like a student learning: makes mistakes, gets feedback (loss), adjusts (optimizer), improves over time

Deployment:

Like a graduated student taking an exam: uses learned knowledge (frozen weights), no learning during the exam, just applying what was learned

Summary: How Everything Connects

Data provides examples with correct answers
Layers define the model structure
Forward pass makes predictions using current weights
Loss measures how wrong predictions are
Backpropagation (Autograd) computes gradients automatically
Optimizer updates weights to reduce loss
Training loop repeats until model is accurate
Save model freezes the learned weights
Deploy uses frozen model for inference only

The framework (TinyTorch/PyTorch) handles steps 5-6 automatically, so you focus on steps 1-3 (data, architecture, training loop). That’s the power of ML frameworks - they abstract the complex math so you can build systems.