Machine Learning Series: Episode 1.8
If frameworks handle the math, what does the ML engineer do? Here’s the breakdown:
The ML Engineer’s Role
Frameworks handle the math; ML engineers focus on design, data, and systems.
1. Model Architecture Design
Decide what the model looks like:
# ML Engineer decides:
class MyModel:
def __init__(self):
# How many layers?
self.layer1 = Linear(784, 256) # Why 256? Not 128? Not 512?
self.layer2 = Linear(256, 128) # Why add this layer?
self.layer3 = Linear(128, 10) # Why this structure?
def forward(self, x):
x = relu(self.layer1(x)) # Why ReLU? Not tanh? Not sigmoid?
x = dropout(x, p=0.2) # Why dropout? Why 0.2?
x = relu(self.layer2(x))
return self.layer3(x)
Decisions:
- How many layers?
- How many neurons per layer?
- Which activation functions?
- Where to add dropout, batch norm, etc.?
- These choices affect performance.
2. Data Engineering
Prepare and manage data:
# ML Engineer's job:
# - Collect data
# - Clean data (handle missing values, outliers)
# - Label data (supervised learning)
# - Split data (train/validation/test)
# - Preprocess data (normalize, augment)
# - Create data loaders
def prepare_data():
# Clean and preprocess
data = load_raw_data()
data = remove_outliers(data)
data = normalize(data)
# Split appropriately
train_data, val_data, test_data = split_data(data)
# Create loaders
train_loader = DataLoader(train_data, batch_size=32)
return train_loader, val_loader, test_loader
Why it matters: Garbage in, garbage out. Bad data → bad model.
3. Loss Function Selection
Choose what to optimize:
# ML Engineer decides:
# - Classification? → Cross-entropy loss
# - Regression? → MSE loss
# - Imbalanced classes? → Weighted loss
# - Multiple objectives? → Custom loss
loss_fn = CrossEntropyLoss() # Why this? Not MSE?
# or
loss_fn = WeightedCrossEntropyLoss(weights=[0.3, 0.7]) # Custom for imbalanced data
Impact: The loss function defines what “good” means.
4. Hyperparameter Tuning
Tune settings that affect training:
# ML Engineer experiments with:
learning_rate = 0.001 # Try 0.01, 0.0001, etc.
batch_size = 32 # Try 16, 64, 128
num_epochs = 100 # When to stop?
optimizer = Adam() # Or SGD? Or RMSprop?
weight_decay = 0.0001 # Regularization strength
Process: Experiment, measure, iterate.
5. Training Loop Orchestration
Structure the training process:
# ML Engineer designs the training loop:
def train_model():
for epoch in range(num_epochs):
# Training phase
model.train()
for batch in train_loader:
loss = train_step(batch)
# Validation phase
model.eval()
val_loss = validate()
# Early stopping?
if val_loss < best_loss:
save_checkpoint()
# Learning rate scheduling?
scheduler.step()
# Logging and monitoring
log_metrics(epoch, loss, val_loss)
Responsibilities:
- When to validate
- When to save checkpoints
- When to stop training
- How to monitor progress
6. Evaluation and Debugging
Assess and fix issues:
# ML Engineer analyzes:
# - Is the model overfitting? (train loss ↓, val loss ↑)
# - Is it underfitting? (both losses high)
# - What mistakes is it making?
# - Where does it fail?
# Debugging techniques:
# - Visualize predictions
# - Analyze confusion matrix
# - Check gradient flow
# - Monitor weight distributions
Skills: Diagnose why a model isn’t working and fix it.
7. Feature Engineering
Decide what inputs to use:
# ML Engineer creates features:
def extract_features(raw_data):
features = []
features.append(raw_data['age'])
features.append(raw_data['income'] / raw_data['age']) # Custom feature
features.append(one_hot_encode(raw_data['category'])) # Encode categorical
return features
# Or in deep learning:
# - How to preprocess images?
# - What data augmentation to use?
# - How to handle text? (tokenization, embeddings)
Impact: Better features often beat a better algorithm.
8. Production Deployment
Deploy models to production:
# ML Engineer handles:
# - Model serialization
# - API design
# - Performance optimization
# - Monitoring and logging
# - A/B testing
# - Model versioning
@app.route('/predict', methods=['POST'])
def predict():
input_data = request.json
prediction = model.inference(input_data)
return jsonify(prediction)
Considerations:
- Latency
- Throughput
- Resource usage
- Reliability
9. Problem Formulation
Frame business problems as ML problems:
Business: "We want to reduce customer churn"
↓
ML Engineer translates:
- What's the input? (customer data)
- What's the output? (churn probability)
- What's the objective? (classification or regression?)
- How do we measure success? (accuracy? precision? revenue impact?)
Skill: Translating real-world problems into ML tasks.
10. Domain Expertise
Apply domain knowledge:
- Healthcare: Understand medical data, regulations
- Finance: Understand risk, regulations
- E-commerce: Understand user behavior, business metrics
Value: Domain knowledge guides better decisions.
The Analogy
Framework (PyTorch/TinyTorch):
- Like a programming language
- Provides tools and abstractions
- Handles low-level details
ML Engineer:
- Like a software engineer
- Uses tools to solve problems
- Makes architectural decisions
- Writes the application logic
What ML Engineers Don’t Do (Usually)
- Manually compute gradients (framework handles it)
- Implement backpropagation from scratch (framework handles it)
- Write optimization algorithms (framework provides optimizers)
- Low-level tensor operations (framework provides them)
What They Do Instead
- Design model architectures
- Prepare and manage data
- Tune hyperparameters
- Debug training issues
- Deploy to production
- Monitor and maintain systems
- Translate business problems to ML
The Value Proposition
Frameworks: Handle the math automatically
ML Engineers: Make decisions that affect model performance
The engineer’s decisions (architecture, data, hyperparameters) determine success, not just the framework.
Summary
ML engineers are:
- Architects: Design model structures
- Data scientists: Prepare and understand data
- Experimenters: Tune hyperparameters
- Debuggers: Diagnose and fix issues
- Engineers: Deploy and maintain systems
- Problem solvers: Translate business needs to ML solutions
The framework is a tool; the engineer uses it to build solutions. Like a carpenter uses a hammer, the skill is in knowing how to use it effectively.