Framework Thinking - Computer Science Interview
Here are some concepts to answer Computer Science Foundation Interview.
1. gRPC
1. What?
- A high-performance, open-source RPC framework using HTTP/2 and Protocol Buffers (protobuf) for strongly-typed APIs and bi-directional streaming.
2. Why?
- Efficient binary protocol, low-latency calls, streaming, strong typing, and automatic client/server stub generation — great for microservices.
3. Details
- Uses HTTP/2 features (multiplexing, header compression, streams). IDL =
.protofiles → codegen. Supports unary, server-streaming, client-streaming, bidi-streaming. Interceptors/middleware for auth, retries, tracing.
4. Pros / Cons
- Pros: fast, compact wire format, streaming, codegen, contract-first.
- Cons: less friendly for browsers (needs proxy), more ops complexity, debugging binary payloads harder, versioning can be tricky.
5. Use case
- Inter-service comms in microservices, low-latency RPC, real-time streaming (metrics, logs, chat), internal APIs with strict contracts.
2. HTTP (REST / HTTP/1.1 / HTTP/2)
1. What?
- Application-layer protocol used for web APIs (text/JSON commonly). REST is an architectural style on top of HTTP.
2. Why?
- Universal, browser-native, human-readable (JSON), easy to debug and cache, wide ecosystem.
3. Details
- HTTP verbs (GET/POST/PUT/DELETE), status codes, headers for caching/auth.
- HTTP/2 adds multiplexing and header compression; HTTP/3 (QUIC) adds UDP-based transport.
4. Pros / Cons
- Pros: ubiquitous, easy to consume, cache-friendly, stateless.
- Cons: higher overhead vs binary RPCs, less efficient for streaming (HTTP/2 helps), JSON can be verbose.
5. Use case
- Public APIs, web frontends, integrations, when compatibility & human debugging matter.
3. Redis Caching
1. What?
- In-memory key-value store supporting strings, hashes, lists, sets, sorted sets, and TTL—used as cache, session store, pub/sub, and more.
2. Why?
- Sub-millisecond lookups to reduce DB load and speed reads; flexible data structures for complex caching patterns.
3. Details
- Common patterns: cache-aside, write-through, write-back, TTL-based expiry, distributed locks (Redlock), LRU eviction. Persistence options: RDB/AOF (but often run as cache without persistence).
4. Pros / Cons
- Pros: ultra-fast, rich data types, simple ops, pub/sub.
- Cons: memory cost, eventual data loss (if persistence not configured), complexity scaling across clusters (resharding), consistency when combined with DB.
5. Use case
- Session stores, leaderboard (sorted sets), frequently-read but infrequently-changed objects, rate limiting, caching DB query results or computed payloads.
4. Kafka (Distributed Log / Messaging)
1. What?
- Distributed, fault-tolerant publish-subscribe streaming platform built around append-only logs (topics, partitions).
2. Why?
- High-throughput, durable message store for event-driven architectures and stream processing with consumer groups for scale.
3. Details
- Producers write to topics; topics partitioned for parallelism; consumers in consumer-groups share partitions. Guarantees: configurable acks, ordering per partition, retention policies. Works well with stream processors (Kafka Streams, ksqlDB) and connectors (Kafka Connect).
4. Pros / Cons
- Pros: scalable throughput, durable, decouples producers/consumers, replayable history.
- Cons: operational complexity, tuning (partitioning/retention), cross-datacenter replication needs Confluent or MirrorMaker, at-least-once semantics unless carefully handled.
5. Use case
- Event sourcing, audit logs, metrics pipeline, async processing, data integration, streaming ETL.
5. CDC (Change Data Capture) & DB Synchronization
1. What?
- CDC captures changes (inserts/updates/deletes) from a source DB and streams them (often into Kafka or other sinks) to keep other systems in sync.
2. Why?
- Avoids expensive polling and full-table reads; enables real-time sync and event-driven architectures while preserving source-of-truth.
3. Details
- Implementations: DB binlog (MySQL), WAL (Postgres logical decoding), Oracle redo. Tools: Debezium, Maxwell, etc. Important: schema evolution handling, ordering, transactional boundaries, idempotency, and semantics (exactly-once vs at-least-once).
4. Pros / Cons
- Pros: near-real-time replication, low load on primary, enables analytics/streaming pipelines.
- Cons: complex mapping to downstream schemas, drift, initial snapshot cost, ordering/duplication issues, operational complexity.
5. Use case
- Replicate OLTP changes to data warehouses, update caches/search indexes, build event-driven microservices, auditing and compliance.
6. High-throughput architecture (e.g., Twitter feed)
1. What?
- Design patterns to serve very large volumes of reads/writes with low latency — often use fan-out, caching, sharding, and async processing.
2. Why?
- To deliver timelines/feeds to millions with acceptable latency and cost.
3. Details
- Fan-out-on-write vs fan-out-on-read tradeoffs: precompute timelines (fast reads, expensive writes) vs compute on demand (cheaper writes, heavier reads). Use sharded storage, CDN + cache, rate limiting, backpressure, write-ahead logs, message queues for background work, denormalized data. Use consistency models: eventually consistent timelines acceptable.
4. Pros / Cons
- Precompute pros: fast reads, cons: heavy write work and storage.
- On-read pros: storage efficient, cons: high read latency and complex query. Both require hybrid approaches (precompute for heavy writers, read-assemble for others).
5. Use case
- Social feeds, notification systems, recommendation result delivery, any high-read-volume personalized content.
7. Scalability (horizontal vs vertical, sharding)
1. What?
- Ability to handle increased load by adding resources vertically (bigger machines) or horizontally (more machines); sharding partitions data across nodes.
2. Why?
- To meet growth and avoid single points of failure; horizontal scaling provides cost-effective throughput increases.
3. Details
- Stateless services → horizontal easy (load balancer).
- Stateful → partitioning/sharding (range, hash, or directory-based). Use consistent hashing for rebalancing, leader/follower for replication. Autoscaling + health checks + circuit breakers.
4. Pros / Cons
- Horizontal pros: resilient, cost-effective. Cons: complexity (distributed algorithms, data partitioning).
- Vertical pros: simpler. Cons: single point of failure, cost and limits.
5. Use case
- APIs, databases (sharded clusters), caches (clustered Redis), message systems (Kafka partitions).
8. Data Consistency (strong vs eventual, transactions)
1. What?
- Models defining how and when different nodes see updates: strong (linearizability/serializability), eventual, causal, etc.
2. Why?
- Different applications need different guarantees: banking (strong), social feeds (eventual).
3. Details
- CAP tradeoffs: consistency, availability, partition tolerance. Distributed transactions, two-phase commit (2PC) — heavy and blocking; sagas for long-running workflows; techniques for causal consistency and vector clocks.
4. Pros / Cons
- Strong pros: predictable correctness. Cons: high latency, reduced availability.
- Eventual pros: high availability and performance. Cons: temporary anomalies, conflict resolution needed.
5. Use case
- Strong: financial transfers, inventory decrements.
- Eventual: social media, analytics, caches.
9. MQ Fan-out Optimization (message brokers, routing)
1. What?
- Patterns to efficiently deliver messages from producers to many consumers (fan-out) using message brokers, topics, partitions, and pub/sub routing.
2. Why?
- To broadcast events without coupling producers to many consumers and to scale consumers independently.
3. Details
- Use topics with consumer groups, partitioning to parallelize, compacted topics for state, use of routing keys and exchanges (RabbitMQ), or topic-partitions (Kafka). For large fan-out, use hierarchical brokers, SSE/webhooks, or push to CDN edge caches for static content.
4. Pros / Cons
- Pros: decoupling, replayability, elasticity.
- Cons: duplicate deliveries, increased network usage, backpressure management, expensive fan-out to many endpoints (webhooks can fail).
5. Use case
- Notifications, live updates, analytics pipelines, multi-subscriber event distribution.
10. Cache Invalidation & Read/Write Strategies
1. What?
- Techniques to keep cache and DB consistent (invalidate or update caches) and strategies for read/writes: cache-aside, write-through, write-back.
2. Why?
- To avoid stale reads, reduce DB load, and maintain correctness.
3. Details
- Cache-aside: app reads cache → miss → read DB and populate cache; writes update DB and invalidate cache.
- Write-through: writes go to cache then persisted to DB synchronously.
- Write-back: write to cache, flush later (fast but risky).
- Invalidation patterns: explicit delete, versioning, TTL, write-notify (pub/sub).
4. Pros / Cons
- Cache-aside pros: simple, control over DB writes. Cons: race conditions, stale window.
- Write-through pros: consistent reads. Cons: write latency.
- Write-back pros: fast writes. Cons: data loss risk on crashes.
5. Use case
- Cache-aside: read-heavy apps (product pages).
- Write-through: read-after-write consistency needed.
- TTL + versioning: session caches, ephemeral data.
11. Read/Write Strategies & CPU Bottleneck Reduction
1. What?
- Approaches to distribute load between reads and writes and to reduce CPU-bound hotspots.
2. Why?
- To maximize throughput and keep latency low while preventing single-node CPU saturation.
3. Details
- Read replicas for scaling reads; leader for writes.
- CQRS: split read model and write model. Batch writes, async processing, rate limiting, circuit breakers. Reduce CPU by caching, memoization, near-data processing, efficient algorithms, vectorized ops, offload heavy work to background workers/GPUs.
4. Pros / Cons
- Pros: better utilization and elasticity.
- Cons: complexity, eventual consistency, staleness.
5. Use case
- Analytics pipelines, high-read OLTP apps, CPU-heavy image processing, API gateways.
12. Elasticsearch Usage (search & analytics)
1. What?
- Distributed search and analytics engine built on Lucene for full-text search, aggregations, and near-real-time indexing.
2. Why?
- Fast text search, scalable indices, aggregations, geospatial queries, and inverted index optimizations.
3. Details
- Index → shards → replicas. Mapping defines field types. Queries: full-text (match), filters (term), aggregations. Near real-time: refresh interval affects visibility. Consider bulk indexing, mapping templates, analyzers, ILM.
4. Pros / Cons
- Pros: powerful search, good aggregations, horizontal scale.
- Cons: consistency/refresh delay, memory-heavy, operational overhead, tricky mapping, complex relevance tuning.
5. Use case
- Site search, logs/metrics (ELK), analytics dashboards, autocomplete, faceted search.
13. Algorithmic Example — Median from Stream
1. What?
- Maintain median of a stream efficiently.
2. Why?
- Common interview problem testing heaps and balancing state.
3. Details
- Use two heaps: max-heap for lower half, min-heap for upper half. Balance sizes so difference ≤1. Median is either top of one or average of tops.
4. Pros / Cons
- Pros: O(log n) per insert, O(1) median query.
- Cons: memory grows with stream; heavy memory if stream unbounded (use summarization/sketches).
5. Use case
- Realtime stats, streaming analytics, online median calculators.
14. SMTP vs HTTP
1. What?
- SMTP: Simple Mail Transfer Protocol for email delivery.
- HTTP: web protocol for hypertext transfer and APIs.
2. Why?
- SMTP handles store-and-forward, retries, queuing; HTTP handles request-response for resources.
3. Details
- SMTP: push-based, commands (HELO, MAIL FROM, RCPT TO, DATA), queuing, retries. Uses ports 25/587/465.
- HTTP: stateless request-response, verbs/status codes, ports 80/443.
4. Pros / Cons
- SMTP pros: reliable delivery; cons: spam, complex headers.
- HTTP pros: simple, synchronous, easy for REST; cons: not designed for store-and-forward or multi-hop mail delivery.
5. Use case
- SMTP: sending emails, mail servers.
- HTTP: web APIs, webpages, webhooks.
15. gRPC Internals (brief)
1. What?
- Mechanics beneath gRPC — HTTP/2 streams, protobuf serialization, flow control, and status codes mapping.
2. Why?
- Understanding internals helps with tuning, debugging, and designing fallbacks.
3. Details
- gRPC uses HTTP/2 streams/multiplexing; messages framed with length prefixes; protobuf for compact serialization; OK/status codes map to HTTP; keepalive, window sizes, flow control; interceptors wrap client/server calls.
4. Pros / Cons
- Pros: efficient transport.
- Cons: complexity in proxies/load balancers that don’t fully support HTTP/2.
5. Use case
- Low-latency microservices requiring streaming or advanced flow control.
16. Asynchronous Processing, Eventual Consistency, Fault Tolerance
1. What?
- Async processing decouples producers and consumers. Eventual consistency accepts temporary divergence. Fault tolerance is design to survive failures.
2. Why?
- Scales systems, smooths bursts, increases availability and resilience.
3. Details
- Patterns: message queues, worker pools, retries, DLQs, idempotency tokens, circuit breakers, bulkheads, sagas, leader election. Observability: tracing, metrics, alerting.
4. Pros / Cons
- Pros: scalable, resilient, performant.
- Cons: complexity, harder correctness reasoning, testing harder.
5. Use case
- Background jobs, email sending, order processing, tolerant systems.
17. Fault Tolerance Techniques
1. What?
- Methods that allow a system to continue operating despite failures.
2. Why?
- Keep availability and provide graceful degradation.
3. Details
- Replication, consensus (Raft/Paxos), retries/backoff, circuit breakers, fallbacks, bulkheading, health checks, graceful degradation, redundancy, automated failover.
4. Pros / Cons
- Pros: higher uptime and resilience.
- Cons: added latency, cost, more complexity.
5. Use case
- Critical services (payments, auth), multi-region deployments, high-SLA systems.