Framework Thinking - Computer Science Interview

Here are some concepts to answer Computer Science Foundation Interview.

1. gRPC

1. What?

  • A high-performance, open-source RPC framework using HTTP/2 and Protocol Buffers (protobuf) for strongly-typed APIs and bi-directional streaming.

2. Why?

  • Efficient binary protocol, low-latency calls, streaming, strong typing, and automatic client/server stub generation — great for microservices.

3. Details

  • Uses HTTP/2 features (multiplexing, header compression, streams). IDL = .proto files → codegen. Supports unary, server-streaming, client-streaming, bidi-streaming. Interceptors/middleware for auth, retries, tracing.

4. Pros / Cons

  • Pros: fast, compact wire format, streaming, codegen, contract-first.
  • Cons: less friendly for browsers (needs proxy), more ops complexity, debugging binary payloads harder, versioning can be tricky.

5. Use case

  • Inter-service comms in microservices, low-latency RPC, real-time streaming (metrics, logs, chat), internal APIs with strict contracts.

2. HTTP (REST / HTTP/1.1 / HTTP/2)

1. What?

  • Application-layer protocol used for web APIs (text/JSON commonly). REST is an architectural style on top of HTTP.

2. Why?

  • Universal, browser-native, human-readable (JSON), easy to debug and cache, wide ecosystem.

3. Details

  • HTTP verbs (GET/POST/PUT/DELETE), status codes, headers for caching/auth.
  • HTTP/2 adds multiplexing and header compression; HTTP/3 (QUIC) adds UDP-based transport.

4. Pros / Cons

  • Pros: ubiquitous, easy to consume, cache-friendly, stateless.
  • Cons: higher overhead vs binary RPCs, less efficient for streaming (HTTP/2 helps), JSON can be verbose.

5. Use case

  • Public APIs, web frontends, integrations, when compatibility & human debugging matter.

3. Redis Caching

1. What?

  • In-memory key-value store supporting strings, hashes, lists, sets, sorted sets, and TTL—used as cache, session store, pub/sub, and more.

2. Why?

  • Sub-millisecond lookups to reduce DB load and speed reads; flexible data structures for complex caching patterns.

3. Details

  • Common patterns: cache-aside, write-through, write-back, TTL-based expiry, distributed locks (Redlock), LRU eviction. Persistence options: RDB/AOF (but often run as cache without persistence).

4. Pros / Cons

  • Pros: ultra-fast, rich data types, simple ops, pub/sub.
  • Cons: memory cost, eventual data loss (if persistence not configured), complexity scaling across clusters (resharding), consistency when combined with DB.

5. Use case

  • Session stores, leaderboard (sorted sets), frequently-read but infrequently-changed objects, rate limiting, caching DB query results or computed payloads.

4. Kafka (Distributed Log / Messaging)

1. What?

  • Distributed, fault-tolerant publish-subscribe streaming platform built around append-only logs (topics, partitions).

2. Why?

  • High-throughput, durable message store for event-driven architectures and stream processing with consumer groups for scale.

3. Details

  • Producers write to topics; topics partitioned for parallelism; consumers in consumer-groups share partitions. Guarantees: configurable acks, ordering per partition, retention policies. Works well with stream processors (Kafka Streams, ksqlDB) and connectors (Kafka Connect).

4. Pros / Cons

  • Pros: scalable throughput, durable, decouples producers/consumers, replayable history.
  • Cons: operational complexity, tuning (partitioning/retention), cross-datacenter replication needs Confluent or MirrorMaker, at-least-once semantics unless carefully handled.

5. Use case

  • Event sourcing, audit logs, metrics pipeline, async processing, data integration, streaming ETL.

5. CDC (Change Data Capture) & DB Synchronization

1. What?

  • CDC captures changes (inserts/updates/deletes) from a source DB and streams them (often into Kafka or other sinks) to keep other systems in sync.

2. Why?

  • Avoids expensive polling and full-table reads; enables real-time sync and event-driven architectures while preserving source-of-truth.

3. Details

  • Implementations: DB binlog (MySQL), WAL (Postgres logical decoding), Oracle redo. Tools: Debezium, Maxwell, etc. Important: schema evolution handling, ordering, transactional boundaries, idempotency, and semantics (exactly-once vs at-least-once).

4. Pros / Cons

  • Pros: near-real-time replication, low load on primary, enables analytics/streaming pipelines.
  • Cons: complex mapping to downstream schemas, drift, initial snapshot cost, ordering/duplication issues, operational complexity.

5. Use case

  • Replicate OLTP changes to data warehouses, update caches/search indexes, build event-driven microservices, auditing and compliance.

6. High-throughput architecture (e.g., Twitter feed)

1. What?

  • Design patterns to serve very large volumes of reads/writes with low latency — often use fan-out, caching, sharding, and async processing.

2. Why?

  • To deliver timelines/feeds to millions with acceptable latency and cost.

3. Details

  • Fan-out-on-write vs fan-out-on-read tradeoffs: precompute timelines (fast reads, expensive writes) vs compute on demand (cheaper writes, heavier reads). Use sharded storage, CDN + cache, rate limiting, backpressure, write-ahead logs, message queues for background work, denormalized data. Use consistency models: eventually consistent timelines acceptable.

4. Pros / Cons

  • Precompute pros: fast reads, cons: heavy write work and storage.
  • On-read pros: storage efficient, cons: high read latency and complex query. Both require hybrid approaches (precompute for heavy writers, read-assemble for others).

5. Use case

  • Social feeds, notification systems, recommendation result delivery, any high-read-volume personalized content.

7. Scalability (horizontal vs vertical, sharding)

1. What?

  • Ability to handle increased load by adding resources vertically (bigger machines) or horizontally (more machines); sharding partitions data across nodes.

2. Why?

  • To meet growth and avoid single points of failure; horizontal scaling provides cost-effective throughput increases.

3. Details

  • Stateless services → horizontal easy (load balancer).
  • Stateful → partitioning/sharding (range, hash, or directory-based). Use consistent hashing for rebalancing, leader/follower for replication. Autoscaling + health checks + circuit breakers.

4. Pros / Cons

  • Horizontal pros: resilient, cost-effective. Cons: complexity (distributed algorithms, data partitioning).
  • Vertical pros: simpler. Cons: single point of failure, cost and limits.

5. Use case

  • APIs, databases (sharded clusters), caches (clustered Redis), message systems (Kafka partitions).

8. Data Consistency (strong vs eventual, transactions)

1. What?

  • Models defining how and when different nodes see updates: strong (linearizability/serializability), eventual, causal, etc.

2. Why?

  • Different applications need different guarantees: banking (strong), social feeds (eventual).

3. Details

  • CAP tradeoffs: consistency, availability, partition tolerance. Distributed transactions, two-phase commit (2PC) — heavy and blocking; sagas for long-running workflows; techniques for causal consistency and vector clocks.

4. Pros / Cons

  • Strong pros: predictable correctness. Cons: high latency, reduced availability.
  • Eventual pros: high availability and performance. Cons: temporary anomalies, conflict resolution needed.

5. Use case

  • Strong: financial transfers, inventory decrements.
  • Eventual: social media, analytics, caches.

9. MQ Fan-out Optimization (message brokers, routing)

1. What?

  • Patterns to efficiently deliver messages from producers to many consumers (fan-out) using message brokers, topics, partitions, and pub/sub routing.

2. Why?

  • To broadcast events without coupling producers to many consumers and to scale consumers independently.

3. Details

  • Use topics with consumer groups, partitioning to parallelize, compacted topics for state, use of routing keys and exchanges (RabbitMQ), or topic-partitions (Kafka). For large fan-out, use hierarchical brokers, SSE/webhooks, or push to CDN edge caches for static content.

4. Pros / Cons

  • Pros: decoupling, replayability, elasticity.
  • Cons: duplicate deliveries, increased network usage, backpressure management, expensive fan-out to many endpoints (webhooks can fail).

5. Use case

  • Notifications, live updates, analytics pipelines, multi-subscriber event distribution.

10. Cache Invalidation & Read/Write Strategies

1. What?

  • Techniques to keep cache and DB consistent (invalidate or update caches) and strategies for read/writes: cache-aside, write-through, write-back.

2. Why?

  • To avoid stale reads, reduce DB load, and maintain correctness.

3. Details

  • Cache-aside: app reads cache → miss → read DB and populate cache; writes update DB and invalidate cache.
  • Write-through: writes go to cache then persisted to DB synchronously.
  • Write-back: write to cache, flush later (fast but risky).
  • Invalidation patterns: explicit delete, versioning, TTL, write-notify (pub/sub).

4. Pros / Cons

  • Cache-aside pros: simple, control over DB writes. Cons: race conditions, stale window.
  • Write-through pros: consistent reads. Cons: write latency.
  • Write-back pros: fast writes. Cons: data loss risk on crashes.

5. Use case

  • Cache-aside: read-heavy apps (product pages).
  • Write-through: read-after-write consistency needed.
  • TTL + versioning: session caches, ephemeral data.

11. Read/Write Strategies & CPU Bottleneck Reduction

1. What?

  • Approaches to distribute load between reads and writes and to reduce CPU-bound hotspots.

2. Why?

  • To maximize throughput and keep latency low while preventing single-node CPU saturation.

3. Details

  • Read replicas for scaling reads; leader for writes.
  • CQRS: split read model and write model. Batch writes, async processing, rate limiting, circuit breakers. Reduce CPU by caching, memoization, near-data processing, efficient algorithms, vectorized ops, offload heavy work to background workers/GPUs.

4. Pros / Cons

  • Pros: better utilization and elasticity.
  • Cons: complexity, eventual consistency, staleness.

5. Use case

  • Analytics pipelines, high-read OLTP apps, CPU-heavy image processing, API gateways.

12. Elasticsearch Usage (search & analytics)

1. What?

  • Distributed search and analytics engine built on Lucene for full-text search, aggregations, and near-real-time indexing.

2. Why?

  • Fast text search, scalable indices, aggregations, geospatial queries, and inverted index optimizations.

3. Details

  • Index → shards → replicas. Mapping defines field types. Queries: full-text (match), filters (term), aggregations. Near real-time: refresh interval affects visibility. Consider bulk indexing, mapping templates, analyzers, ILM.

4. Pros / Cons

  • Pros: powerful search, good aggregations, horizontal scale.
  • Cons: consistency/refresh delay, memory-heavy, operational overhead, tricky mapping, complex relevance tuning.

5. Use case

  • Site search, logs/metrics (ELK), analytics dashboards, autocomplete, faceted search.

13. Algorithmic Example — Median from Stream

1. What?

  • Maintain median of a stream efficiently.

2. Why?

  • Common interview problem testing heaps and balancing state.

3. Details

  • Use two heaps: max-heap for lower half, min-heap for upper half. Balance sizes so difference ≤1. Median is either top of one or average of tops.

4. Pros / Cons

  • Pros: O(log n) per insert, O(1) median query.
  • Cons: memory grows with stream; heavy memory if stream unbounded (use summarization/sketches).

5. Use case

  • Realtime stats, streaming analytics, online median calculators.

14. SMTP vs HTTP

1. What?

  • SMTP: Simple Mail Transfer Protocol for email delivery.
  • HTTP: web protocol for hypertext transfer and APIs.

2. Why?

  • SMTP handles store-and-forward, retries, queuing; HTTP handles request-response for resources.

3. Details

  • SMTP: push-based, commands (HELO, MAIL FROM, RCPT TO, DATA), queuing, retries. Uses ports 25/587/465.
  • HTTP: stateless request-response, verbs/status codes, ports 80/443.

4. Pros / Cons

  • SMTP pros: reliable delivery; cons: spam, complex headers.
  • HTTP pros: simple, synchronous, easy for REST; cons: not designed for store-and-forward or multi-hop mail delivery.

5. Use case

  • SMTP: sending emails, mail servers.
  • HTTP: web APIs, webpages, webhooks.

15. gRPC Internals (brief)

1. What?

  • Mechanics beneath gRPC — HTTP/2 streams, protobuf serialization, flow control, and status codes mapping.

2. Why?

  • Understanding internals helps with tuning, debugging, and designing fallbacks.

3. Details

  • gRPC uses HTTP/2 streams/multiplexing; messages framed with length prefixes; protobuf for compact serialization; OK/status codes map to HTTP; keepalive, window sizes, flow control; interceptors wrap client/server calls.

4. Pros / Cons

  • Pros: efficient transport.
  • Cons: complexity in proxies/load balancers that don’t fully support HTTP/2.

5. Use case

  • Low-latency microservices requiring streaming or advanced flow control.

16. Asynchronous Processing, Eventual Consistency, Fault Tolerance

1. What?

  • Async processing decouples producers and consumers. Eventual consistency accepts temporary divergence. Fault tolerance is design to survive failures.

2. Why?

  • Scales systems, smooths bursts, increases availability and resilience.

3. Details

  • Patterns: message queues, worker pools, retries, DLQs, idempotency tokens, circuit breakers, bulkheads, sagas, leader election. Observability: tracing, metrics, alerting.

4. Pros / Cons

  • Pros: scalable, resilient, performant.
  • Cons: complexity, harder correctness reasoning, testing harder.

5. Use case

  • Background jobs, email sending, order processing, tolerant systems.

17. Fault Tolerance Techniques

1. What?

  • Methods that allow a system to continue operating despite failures.

2. Why?

  • Keep availability and provide graceful degradation.

3. Details

  • Replication, consensus (Raft/Paxos), retries/backoff, circuit breakers, fallbacks, bulkheading, health checks, graceful degradation, redundancy, automated failover.

4. Pros / Cons

  • Pros: higher uptime and resilience.
  • Cons: added latency, cost, more complexity.

5. Use case

  • Critical services (payments, auth), multi-region deployments, high-SLA systems.
November 11, 2025