Design Strava

Here is implements low-level for designing Strava.

1. Features

  • Strava is a fitness tracking application that allows users to record and share their physical activities, primarily focusing on running and cycling, with their network.

  • It provides detailed analytics on performance, routes, and allows social interactions among users.

2. Requirements

2.1. Functional requirements

  1. Users should be able to start, pause, stop, and save their runs and rides.

  2. While running or cycling, users should be able to view activity data, including route, distance, and time.

  3. Users should be able to view details about their own completed activities as well as the activities of their friends.

Below the Line (Out of Scope)

  • Adding or deleting friends (friend management).

  • Authentication and authorization.

  • Commenting or liking runs.

2.2. Non-functional requirements

  1. The system should be highly available (availability » consistency).

  2. The app should function in remote areas without network connectivity.

  3. The app should provide the athlete with accurate and up-to-date local statistics during the run/ride.

  4. The system should scale to support 10 million concurrent activities.

Below the Line (Out of Scope)

  • Compliance with data privacy regulations like GDPR.

  • Advanced security measures

3. High Level Design

3.1. Entity

  1. User: Represents a person using the app. Contains profile information and settings.

  2. Activity: Represents an individual running or cycling activity. Includes activity type (run/ride), start time, end time, route data (GPS coordinates), distance, and duration.

  3. Route: A collection of GPS coordinates recorded during the activity (this could also just be a field on the Activity entity).

  4. Friend: Represents a connection between users for sharing activities (note: friend management is out of scope, but the concept is necessary for sharing).

3.3. Users should be able to start, pause, stop, and save their runs and rides.

A user is going to start by opening the app and clicking the “Start Activity” button:

  1. The client app will make a POST request to /activities to create a new activity, specifying whether it’s a run or a ride.

  2. The Activity Service will create the activity in the database and return the activity object to the client.

  3. If the user opts to pause or resume their activity, they’ll make a PATCH request to /activities/:activityId with the updated state and the Activity Service will update the activity in the database accordingly.

  4. When the activity is over, the user will click the “Save Activity” button. This will trigger a PATCH request to /activities/:activityId with the state set to “COMPLETE”.

Activity {
  id: 123,
  type: "RUN",
  startTime: "2025-10-25T08:00:00Z",
  status: "STOPPED",
  statusUpdateEvents: [
    { event: "START", timestamp: "2025-10-25T08:00:00Z" },
    { event: "PAUSE", timestamp: "2025-10-25T08:20:00Z" },
    { event: "RESUME", timestamp: "2025-10-25T08:25:00Z" },
    { event: "STOP", timestamp: "2025-10-25T09:00:00Z" }
  ]
}
  • When the user clicks “Stop Activity”, we can calculate the elapsed time by summing the durations between each pair of timestamps, excluding pauses. In the example above, the elapsed time would be 15 minutes (10 minutes + 5 minutes).

  • The Activity Server computes the total active time by summing all intervals between:


START  PAUSE

RESUME  PAUSE (if multiple pauses)

Until STOP

```python
(08:20 - 08:00) + (09:00 - 08:25) = 45 minutes total

3.4. While running or cycling, users should be able to view activity data, including route, distance, and time.

Here is how this would work:

  1. The client app will record the user’s GPS coordinates at a constant interval, let’s say 2 seconds for a bike ride and 5 seconds for a run. To do this, we’ll utilize the built-in location services provided by both iOS and Android:
  • For iOS: We’ll use the Core Location framework, specifically the CLLocationManager class. We can set up location updates using startUpdatingLocation() method and implement the locationManager(_:didUpdateLocations:) delegate method to receive location updates.

  • For Android: We’ll use the Google Location Services API, part of Google Play Services. We can use the FusedLocationProviderClient class and call requestLocationUpdates() method to receive periodic location updates.

  1. The client app will then send these new coordinates to our /activities/:activityId endpoint.

  2. The Activity Service will update the Route table in the database for this activity with the new coordinates and the time the coordinates were recorded.

  3. We will also update the distance field by calculating the distance between the new coordinate and the previous one using the Haversine formula

Notes: Using REST API for polling or web-socket.

3.5. Users should be able to view details about their own completed activities as well as the activities of their friends.

The full flow is:

  1. User navigates to the activities list page on the app.

  2. Client makes a GET request to /activities?mode={USER FRIENDS}&page={page}&pageSize={pageSize} to get the list of activities.
  3. The list of activities is rendered in the UI and the user clicks on an activity to view details.

  4. Client makes a GET request to /activities/:activityId to get the full activity details to render on a new details page.

Notes: As long as how much API, in case it is make senses.

4. Potential Deep Dives

4.1. The system should be highly available (availability » consistency).

4.2. The app should function in remote areas without network connectivity. The app should provide the athlete with accurate and up-to-date local statistics during the run/ride.

  • The key insight is that, so long as we don’t support realtime-sharing of activities, we can record activity data locally, directly on the clients device and only sync activity data back to the server when the activity completes and/or the user is back online.

  • When the app is reopened or the activity is resumed, we first check local storage for any saved data and load it into our in-memory buffer before continuing to record new data.

=> Use local storage database in devices.

4.3. The system should scale to support 10 million concurrent activities.

  • Traffic: Let’s start by looking at our database. With ~100M DAU doing an activity each day we add up to ~100M new activities each day. Over a year, that’s ~36500M activities or ~36.5B activities

  • Storage: (8bytes + 8bytes + 8bytes) * 600 = ~15KB => 15KB * 36.5B = 547.5TB of data each year

  • Idea:

  1. We can shard our database to split the data across multiple database instances. We can shard by the time that the activity was completed since the majority of our queries will want to see recent activities.

  2. We could introduce data tiering. The chances that someone wants to view a run from several years ago are pretty low. To reduce storage costs, we could move older data to cheaper storage tiers:

  • Hot data (recent activities) stays in fast, expensive storage
  • Warm data (3-12 months old) moves to slower, cheaper storage
  • Cold data (>1 year old) moves to archival storage like S3
  1. We can introduce caching if needed. If we find that we are frequently querying the same activities, we can introduce caching to reduce the read load on our database. This would not be a priority for me, as read throughput should be pretty low. but its the first thing we would do if load times become unacceptable.

4.4. How can we support realtime sharing of activities with friends?

  • As the server gets these updates, they’ll be persisted in the database and broadcast to all of the user’s friends.

Solution: Websocket + Pub/Sub system or Polling.

4.5. How can we expose a leaderboard of top athletes?

  • Using Redis sorted set.

Skip List store

Level 3:   50 ─────────────── 80
Level 2:   50 ────── 70 ───── 80
Level 1:   50  60  70  80

Hash Table store:

Member	Score
alice	50
bob	80
carol	60
dave	70

Time complexity:

  • Insert/Delete: O(log N) - average, but worst case still O(N).

  • Range query: O(logN + M) -> Search logN for M

=> O(logN) find the starting point + M items

Or: A Red-Black Tree is a self-balancing binary search tree

Structure	Sorted traversal?	Range query speed	Insert/Delete	Memory use	Notes
Heap	 No	O(N)	O(log N)	Compact	Cant do sorted scans
Red-black Tree	 Yes	O(log N + M)	O(log N)	Moderate	Complex rotations
Skip List	 Yes	O(log N + M)	O(log N)	Moderate
Last Updated On October 25, 2025