Machine Learning System

Here is some notes for components in Machine Learning System.

1. System foundations

1.1. ML Systems

A ML Systems contains 3 components:

  • Model.

  • Infras.

  • Data.

1.2. Different than traditional software

  1. Traditional software crashes visibly while ML systems can degrade silently without triggering alerts.

  2. Traditional software requires more monitoring than ML systems.

  3. ML systems always fail faster than traditional software.

  4. Traditional software cannot handle errors while ML systems have built-in error recovery.

1.3. The Core Insight of the Bitter Lesson

  • Systems Over Algorithms: This insight suggests that systems engineering, rather than just algorithmic development, has become the determinant of AI success.

1.4. Historical evoluation

1.4.1. Symbolic AI Era

  • First idea:
Problem: "If the number of customers Tom gets is twice the square of 20% of the number of advertisements he runs, and the number of advertisements is 45, what is the number of customers Tom gets?"
  • Optimize idea:
STUDENT would:
1. Parse the English text
2. Convert it to algebraic equations
3. Solve the equation: n = 2(0.2 × 45)²
4. Provide the answer: 162 customers

1.4.2. Expert System Era

  • Due to STUDENT system, can only solve the specific domain (math calculating).

  • The general system => rule-based access for the AI system.

Rule Example from MYCIN:
IF
The infection is primary-bacteremia
The site of the culture is one of the sterile sites
The suspected portal of entry is the gastrointestinal tract
THEN
Found suggestive evidence (0.7) that infection is bacteroid

1.4.3. Statistical Learning Era

  • Due to the Yes/No questions are too much ambigious and not based on logical, we used statistical larning for more convince.

  • The pivotal transition away from rule-based systems toward methods that learn from data

Rule-based (1980s):
IF contains("viagra") OR contains("winner") THEN spam

Statistical (1990s):
P(spam|word) = (frequency in spam emails) / (total frequency)

Combined using Naive Bayes:
P(spam|email)  P(spam) ×  P(word|spam)

1.4.4. Shallow Learning Era

  • Characterized by classical machine learning algorithms and heavy reliance on human-engineered features.

=> Feature selections focused

1.4.5. Deep Learning Era

  • Utilizes multiple hierarchical layers to automatically discover patterns and representations from raw data

=> Automatically learning and choose learning parameters.

1.5. Core Engineering Challenges

1.5.1. Data Drift & Distribution Shifts

  • Data Drift: too much data, and it consumes both right data and wrong data with the same problem. For example, for the problem A, there are 2 sources that answered B and C => To verify this, the knowledge need knowledge and brainstorming thinking.

  • Distribution Shifts: Dynamic data seasonally => For example, about recommmendation for clothes, the data should be dynamic in summer and winter.

=> The data patterns changed overtime.

1.5.2. Model Challenges

  • Infrastructure costs: host, how to know we are training the right patterns (because cost computing time), how long to train it.

  • Generation Gap: 99% accurate in experience, 75% correctness in production.

1.5.3. System Challenges

  • 1 service for model prediction.

  • 1 service for contraints, ad-hocs rules.

1.5.4. Ethical Considerations

  • A model: maybe bias for CV have demographics.

  • Sensive data.

  • Black boxes: Only return the data in boxes without understanding the internal reasons.

1.6. The Five Engineering Disciplines

  1. Data Engineer: quality assurance, scale management, drift detection and distribution shift.

  2. Training System: implement parallel training to optimize, balance training costs againist model quality.

  3. Deployment Infrastructure

  4. Operations and Monitoring

  5. Ethics and Governance

2. ML Systems

2.1. Cloud ML - High computation

  • Machine learning models are trained and run on powerful cloud servers (like AWS, Azure, Google Cloud, etc.).
  1. Typical workflow:
  • Data collected from devices → sent to cloud

  • Model is trained, updated, and sometimes run (inference) in the cloud

  • Results sent back to devices or applications

  1. Advantages:
  • Virtually unlimited compute power (GPUs, TPUs)

  • Easier to scale and manage large models

  • Centralized data management and updates

  1. Disadvantages:
  • High latency (data must travel to/from the cloud)

  • Requires constant internet connection

  • Privacy concerns (data leaves the device)

  1. ☁️ Use Cloud ML when:
  • You need massive compute power (e.g., training GPT, BERT, or large CNNs)

  • You’re handling big centralized datasets

  • You need scalability and orchestration (multiple models, services)

  • You can tolerate some latency and have constant internet

  1. Examples:
  • Training an image classification model on millions of images

  • Running a recommendation system backend

  • AI SaaS APIs (like ChatGPT, Google Vision API)

2.2. Edge ML (Edge Machine Learning) - Real time

  • ML models are deployed close to where data is generated — on devices like IoT gateways, routers, smart cameras, or local servers.
  1. Typical workflow:
  • Model trained in the cloud or locally

  • Deployed to “edge” devices for real-time inference

  1. Advantages:
  • Low latency (no cloud round-trip)

  • Better privacy (data stays local)

  • Works even with limited connectivity

  1. Disadvantages:
  • Limited hardware resources

  • Updating models can be harder

  • May need optimized models (quantization, pruning)

  1. 🏠 Use Edge ML when:
  • You need real-time predictions near the data source

  • Data is sensitive or large (not practical to send to cloud)

  • The environment has limited connectivity

  • You’re doing local aggregation before sending to the cloud

  1. Examples:
  • Smart factory detecting machine anomalies locally

  • Traffic camera detecting congestion

  • Retail edge server analyzing foot traffic

2.3. Mobile ML (On-device ML)

  • ML models run directly on mobile devices like smartphones and tablets.
  1. Typical workflow:
  • Model trained in cloud → optimized → deployed to mobile app

  • Inference happens locally (e.g., image recognition, voice command)

  1. Advantages:
  • Instant response (low latency)

  • Offline operation

  • Better privacy (no cloud data transfer)

  1. Disadvantages:
  • Limited CPU/GPU and battery power

  • Smaller models required

  • Harder to update frequently

  1. Use Mobile ML when:
  • You want fast, private, and offline inference on users’ phones

  • Model size is small enough for mobile deployment

  • You want to enhance user experience without cloud dependency

  • Battery life is important but manageable

  1. Examples:
  • Face recognition in iPhone Photos

  • Google Translate offline mode or voice recognization.

2.4. Tiny ML (Tiny Machine Learning)

  • Machine learning deployed on ultra-low-power microcontrollers (MCUs) and embedded devices — often with <1MB RAM and no OS.
  1. Typical workflow:
  • Model trained in cloud

  • Quantized and compressed

  • Deployed to MCU (e.g., Arduino, STM32)

  1. Advantages:
  • Extremely low power consumption (can run on batteries for months)

  • Real-time local inference

  • Ideal for IoT and sensor applications

  1. Disadvantages:
  • Very constrained memory and compute

  • Only simple models (e.g., small CNNs, decision trees)

  • Hard to debug and update

  1. ⚙️ Use Tiny ML when:
  • Device has very limited memory and power (microcontrollers)

  • You need always-on, ultra-low power sensing

  • Connectivity may not exist at all

  • You want real-time, on-sensor intelligence

  1. Examples:
  • Predictive maintenance on industrial sensors

  • Keyword detection (“Hey Alexa”)

  • Environmental monitoring in remote locations

November 6, 2025