Machine Learning System

Here is some notes for components in Machine Learning System.

1. System foundations

1.1. ML Systems

A ML Systems contains 3 components:

Model.
Infras.
Data.

1.2. Different than traditional software

Traditional software crashes visibly while ML systems can degrade silently without triggering alerts.
Traditional software requires more monitoring than ML systems.
ML systems always fail faster than traditional software.
Traditional software cannot handle errors while ML systems have built-in error recovery.

1.3. The Core Insight of the Bitter Lesson

Systems Over Algorithms: This insight suggests that systems engineering, rather than just algorithmic development, has become the determinant of AI success.

1.4. Historical evoluation

1.4.1. Symbolic AI Era

First idea:

Problem: "If the number of customers Tom gets is twice the square of 20% of the number of advertisements he runs, and the number of advertisements is 45, what is the number of customers Tom gets?"

Optimize idea:

STUDENT would:
Parse the English text
Convert it to algebraic equations
Solve the equation: n = 2(0.2 × 45)²
Provide the answer: 162 customers

1.4.2. Expert System Era

Due to STUDENT system, can only solve the specific domain (math calculating).
The general system => rule-based access for the AI system.

Rule Example from MYCIN:
IF
The infection is primary-bacteremia
The site of the culture is one of the sterile sites
The suspected portal of entry is the gastrointestinal tract
THEN
Found suggestive evidence (0.7) that infection is bacteroid

1.4.3. Statistical Learning Era

Due to the Yes/No questions are too much ambigious and not based on logical, we used statistical larning for more convince.
The pivotal transition away from rule-based systems toward methods that learn from data

Rule-based (1980s):
IF contains("viagra") OR contains("winner") THEN spam

Statistical (1990s):
P(spam|word) = (frequency in spam emails) / (total frequency)

Combined using Naive Bayes:
P(spam|email) ฀ P(spam) × ฀ P(word|spam)

1.4.4. Shallow Learning Era

Characterized by classical machine learning algorithms and heavy reliance on human-engineered features.

=> Feature selections focused

1.4.5. Deep Learning Era

Utilizes multiple hierarchical layers to automatically discover patterns and representations from raw data

=> Automatically learning and choose learning parameters.

1.5. Core Engineering Challenges

1.5.1. Data Drift & Distribution Shifts

Data Drift: too much data, and it consumes both right data and wrong data with the same problem. For example, for the problem A, there are 2 sources that answered B and C => To verify this, the knowledge need knowledge and brainstorming thinking.
Distribution Shifts: Dynamic data seasonally => For example, about recommmendation for clothes, the data should be dynamic in summer and winter.

=> The data patterns changed overtime.

1.5.2. Model Challenges

Infrastructure costs: host, how to know we are training the right patterns (because cost computing time), how long to train it.
Generation Gap: 99% accurate in experience, 75% correctness in production.

1.5.3. System Challenges

1 service for model prediction.
1 service for contraints, ad-hocs rules.

1.5.4. Ethical Considerations

A model: maybe bias for CV have demographics.
Sensive data.
Black boxes: Only return the data in boxes without understanding the internal reasons.

1.6. The Five Engineering Disciplines

Data Engineer: quality assurance, scale management, drift detection and distribution shift.
Training System: implement parallel training to optimize, balance training costs againist model quality.
Deployment Infrastructure
Operations and Monitoring
Ethics and Governance

2. ML Systems

2.1. Cloud ML - High computation

Machine learning models are trained and run on powerful cloud servers (like AWS, Azure, Google Cloud, etc.).

Typical workflow:

Data collected from devices → sent to cloud
Model is trained, updated, and sometimes run (inference) in the cloud
Results sent back to devices or applications

Advantages:

Virtually unlimited compute power (GPUs, TPUs)
Easier to scale and manage large models
Centralized data management and updates

Disadvantages:

High latency (data must travel to/from the cloud)
Requires constant internet connection
Privacy concerns (data leaves the device)

☁️ Use Cloud ML when:

You need massive compute power (e.g., training GPT, BERT, or large CNNs)
You’re handling big centralized datasets
You need scalability and orchestration (multiple models, services)
You can tolerate some latency and have constant internet

Examples:

Training an image classification model on millions of images
Running a recommendation system backend
AI SaaS APIs (like ChatGPT, Google Vision API)

2.2. Edge ML (Edge Machine Learning) - Real time

ML models are deployed close to where data is generated — on devices like IoT gateways, routers, smart cameras, or local servers.

Typical workflow:

Model trained in the cloud or locally
Deployed to “edge” devices for real-time inference

Advantages:

Low latency (no cloud round-trip)
Better privacy (data stays local)
Works even with limited connectivity

Disadvantages:

Limited hardware resources
Updating models can be harder
May need optimized models (quantization, pruning)

🏠 Use Edge ML when:

You need real-time predictions near the data source
Data is sensitive or large (not practical to send to cloud)
The environment has limited connectivity
You’re doing local aggregation before sending to the cloud

Examples:

Smart factory detecting machine anomalies locally
Traffic camera detecting congestion
Retail edge server analyzing foot traffic

2.3. Mobile ML (On-device ML)

ML models run directly on mobile devices like smartphones and tablets.

Typical workflow:

Model trained in cloud → optimized → deployed to mobile app
Inference happens locally (e.g., image recognition, voice command)

Advantages:

Instant response (low latency)
Offline operation
Better privacy (no cloud data transfer)

Disadvantages:

Limited CPU/GPU and battery power
Smaller models required
Harder to update frequently

Use Mobile ML when:

You want fast, private, and offline inference on users’ phones
Model size is small enough for mobile deployment
You want to enhance user experience without cloud dependency
Battery life is important but manageable

Examples:

Face recognition in iPhone Photos
Google Translate offline mode or voice recognization.

2.4. Tiny ML (Tiny Machine Learning)

Machine learning deployed on ultra-low-power microcontrollers (MCUs) and embedded devices — often with <1MB RAM and no OS.

Typical workflow:

Model trained in cloud
Quantized and compressed
Deployed to MCU (e.g., Arduino, STM32)

Advantages:

Extremely low power consumption (can run on batteries for months)
Real-time local inference
Ideal for IoT and sensor applications

Disadvantages:

Very constrained memory and compute
Only simple models (e.g., small CNNs, decision trees)
Hard to debug and update

⚙️ Use Tiny ML when:

Device has very limited memory and power (microcontrollers)
You need always-on, ultra-low power sensing
Connectivity may not exist at all
You want real-time, on-sensor intelligence

Examples:

Predictive maintenance on industrial sensors
Keyword detection (“Hey Alexa”)
Environmental monitoring in remote locations

November 6, 2025