Principle Software Architect

1. Principle Design

2. Trade-off

3. Focus Point

3.1. Layer Design

  • Focus on each layer, higher layer only depend on the lower level.

3.2. Modular Design

  • Each module is independent developed components.

  • Each module can interconnected with other modules.

3.3. Domain-Driven Design

  • Focus on business core.

  • Use bounded contexts to seperate domains.

3.4. Event Driven Architecture:

  • Communicate via events, not direct calls.

3.5. Service-Oriented Design

  • Design microservices to calling each others.

3.6. User-Centered Design

  • Design based on user interaction.

4. Sample Architecture

4.1. SaaS Platform Architecture

  • Microservices with API Gateway.

  • Authenticate with OAuth2, SSO.

  • RBAC.

  • Background job processing (emails, reports).

  • Stripe or Razorpay integration for billing.

  • Logging, Monitoring using ELK, Grafana.

4.2. eCommerece Platform

  • Modular Services: Catalog, Cart, Order, Payment, Delivery.

  • Redis-based caching for product search.

  • Event-driven checkout and order placement.

  • External Integration: Payment Gateway, shipping API.

  • Elastic search for search and filtering.

  • CDN for media assets.

4.3. Banking/ Fintech Architecture

  • Hexagon Architecture: strong boundary enforcement.

  • Encryption at rest and transit: KMS, TLS.

  • Real-time fraud detection using asynchronus processing.

  • Event sourcing and auto logging.

  • Mobile first client apps with biometric auth.

  • KYC, RBI,…

5. Design Security

5.1. Principle of Least Privilege

  • A database read-only replca should only have read access.

5.2. Fallback Securely

  • Always have fallback mechanism and sanitize error messages.

5.3. Use multi defense layer

  • Multiple secure layers: Firewall, authentication, encryption, rate limiting.

5.4. Using secure defaults of platforms.

  • The default configuration of the systems should be the most secure one.

5.5. Minimize attack surface

  • Do not expose admin API on public network.

  • Always route to API gateways.

5.6. Separation of Duties

  • No single person or component has complete control.

5.7. Don’t trust user input

  • Never trust the input from client-side.

5.8. Keep it simple

  • Using OAuth2 instead of building a custom token system.

  • Avoid over-engineering security features => Keep it simple.

5.9. Auditability and Logging

  • Log access to critical endpoints, permission escalations and login attempts.

5.10. Open Design

  • Do not depend on the security layer, using well-reviewed libraries and protocols.

5.11. Security Threat Modeling

  • We can use: STRIDE, DREAD for threat modeling.

5.12. Principles

  • Authentication: Username & Password, MFA, OAuth.

  • Authorization: RBAC, GBAC, PBAC, ACL, Scopes.

  • Encryption: Symmetric, Asymmetric, Hashing, Digital Signatures.

  • Compliance: GDPR, HIPAA, SOC 2.

5.13. Execution

  • Frontend: Cookie constent UX.

  • API Gateway: Enforce geo-IP restriction, throttle suspicious access.

  • Backend: Encrypt sensive fields, audit logs.

  • Data Layer: Use KMS, environment flag.

  • Infrastructure: Use infrastructure as code for auditing.

  • Compliance: use compliance checks tool, e.g. AWS Config, GCP Security Command Center.

6. Design Scalability

6.1. Context

  • It doesn’t mean add more servers, it designs that handle more users, more data, more complexity.

6.2. Scaling

  • Vertical Scaling: monolithic applications, not a distributed system => Moore’s Law.

  • Horizontal Scaling: unlimited scaling, stateless system.

  • Load balancing: Round robin, least connections, IP hash.

  • Caching: client-side cache, CDN (edge cache), mem-cache, distributed cache.

  • Sharding: Range-based, Hash-based, Geo-based.

6.3. Microservices && Domain-driven Design

  • Indentify the bounded context, e.g. Driver Management and Ride Matching are seperate bounded context.

  • Use the same Ubiquitous language: Transaction, Authorization, Refund.

  • Aggregate the Entities: Customer, RideDetails, PaymentInfo.

  • Apply ACL for inter-service communication.

6.4. Event-driven

  • Technology: Kafka, RabbitMQ (AMQP, Scheduling Queue), Pub/Sub System.

  • Using Kafka when you need real-time message processing, share messages in group.

  • Using RabbitMQ when you want to have message routing mechanism, share message for each consumer.

  • Using cloud native pub/sub for event-driven systems.

6.5. Serverless

  • A login with AWS Lambda + API Gateway + DynamoDB.

  • Schedule tasks, DynamoDB -> S3 Storage.

  • Use serverless: event-driven, short-lived jobs.

6.6. Eventually Consistency

  • Each replicas have data but not sync together.

  • Asynchronus Messaging.

  • Idempotency.

  • Outbox Pattern.

  • Saga Patterns.

  • Event Sourcing.

7. Design Operation

7.1. Devops

  • Design for CI/CD: canary deployments, blue-green deployments, rolling updates.

  • Using IaC (Infrastructure as code) to manage networks, servers, databases.

  • Using secrets for environment variables.

7.2. Observability

  • Logging

    • Log levels: info, debug, warn, error, fatal
    • Aggregation: Elastic Search, Kibana.
    • Info: requestID, userID, traceID.
  • Monitoring: Metrics, Dashboards, Alerts

    • Infrastructure: CPU, memory, disk, network.
    • Application: requests rate, number of requests, latencies.
    • Business: Orders processed, sign-up, conversions.
    • Dashboards: Grafana, Data dog.
    • Alerts: SLO-based alerts (e.g. 95% of API responses should be < 300ms)
  • Distributed Tracing

    • Trace context propagation: using X-Request-ID.
    • Tools: OpenTelemetry, Jaeger, AWS X-Ray.
    • Trace: DB Call, API Call, External services.
  • Goals: Ensure and trace RCA, SLAs

7.3. Handle Failure

  • Retries: wait 1s -> 2s -> 4s while retries, set max attempts.

  • Circuit Breakers: Do not to route the traffic to failed services, isolate the failure.

  • Backpressure: slowing down the requests

    • Using message queues.
    • Return HTTP Status code 429
    • Leaky Bucket, sliding window, queue threshold.
    • Load shedding: dropping non-critical traffic

7.4. Reliability: SLOs, SLIs, and Error Budgets

  • SLO: Service level objectives = SLI + Error Budgets

  • SLI: measurements of the system

    • % of successful requests.
    • % of requests under 200ms.
    • Number of requests per second.
    • % of failed API calls.
    • % of data not lost.
  • SLO: reliability goals with the skateholders in a period of time.

  • Error budgets: allowable amount of failures in given periods.

7.5. Diagram to codebase

  • Translate the architecture to maintainable code.

  • Enforcing Boundaries: Modular Monorepos, ADRs, ArchUnit

    • Modular Monorepos: Multi modules as services in the same repo, can be deployed independently.
    • ADR: short document about how the architecture decisions has made.
    • ArchUnit: Use to write test about the dependencies of a module.
  • Validation: Static validation (structure code), runtime validation, dependency validation, infrastructure conformance.

8. Design Performance

8.1. Caching

  • Client-side caching: Using HTTP Cache-control, Cache static resources e.g. image, scripts.

  • Edge Caching: APIs with GET cachable responses or static content in CDN.

  • Application/Database caching: Redis, memcache, frequently queried data.

8.2. CDN

  • Reduce latency by serveing content from closest geographically servers.

  • Prevent DDoS attack.

8.3. Async Processing

  • Email/SMS.

  • Data transformation.

  • Video/Image Processing.

  • Background analytics.

  • Using graceful fallbacks: to make sure handle all requests

8.4. Frontend Performance

  • Minimize critical rendering path: load critical resources first, lazy load for images and non-critical content.

  • Bundle optimization: tree-shaking unused JavaScript, code-splitting with tools like Webpack or Vite, compress assets using Gzip.

  • Image Optimization: Serve webp format, compress images, using responsive image techniques (dynamic load images by devices, srcset, picture)

  • Reduce HTTP requests: combile css/js files where appropriate, cache assets with proper headers.

  • Leverage browser caching and CDNs

  • Monitor and Analyze: Google Lighthouse, Sentry, monitor FCP, TTI, LCP.

8.5. Backend Performance

  • Optimize algorithms and logic: reduce loops and data structures, avoid blocking in async environments.

  • Reduce network overhead: compress API responses, pagination, avoid over-fetching using GraphQL (only fetch the necessary fields).

  • Connection Polling: manage HTTP and DB connection with pools, adjust thread pool size based on system capacity.

  • Async & Parallel processing: using non-blocking I/O, offload heavy tasks to queue and worker threads.

  • Memory & CPU Profilling: Prometheus, spot memory leaks, bottlenecks, CPU-hogging routines.

  • Caching responses: using memcached, redis, return HTTP code 304, HTTP not modified redirection.

8.6. Database Performance

  • Indexing: use index after where, join, order by,… Use composite indexes when approriate.

  • Query optimization: Use EXPLAIN or EXPLAIN ANALYZE to profile query, avoid N + 1 query problems in ORM (fix by join), replace subquery with join if faster => Optimize to from N + 1 query to 2 query (join + select)

  • Connection Management: Using max connection pool, using read replicas for scaling reads.

  • Sharding & Partitioning: Split large tables to smaller units, ditrbute write load across shards.

  • Materialized Views & Denormalization: Precompute complex joins or aggregations to a virtual table (use view to provide security / restricted access, hide certain columns or rows), view stores real-time data, material views stores stale data, eventual consistency.

Last Updated On August 20, 2025