Framework Thinking - Computer Science Interview

Here are some concepts to answer Computer Science Foundation Interview.

1. gRPC

1. What?

A high-performance, open-source RPC framework using HTTP/2 and Protocol Buffers (protobuf) for strongly-typed APIs and bi-directional streaming.

2. Why?

Efficient binary protocol, low-latency calls, streaming, strong typing, and automatic client/server stub generation — great for microservices.

3. Details

Uses HTTP/2 features (multiplexing, header compression, streams). IDL = .proto files → codegen. Supports unary, server-streaming, client-streaming, bidi-streaming. Interceptors/middleware for auth, retries, tracing.

4. Pros / Cons

Pros: fast, compact wire format, streaming, codegen, contract-first.
Cons: less friendly for browsers (needs proxy), more ops complexity, debugging binary payloads harder, versioning can be tricky.

5. Use case

Inter-service comms in microservices, low-latency RPC, real-time streaming (metrics, logs, chat), internal APIs with strict contracts.

2. HTTP (REST / HTTP/1.1 / HTTP/2)

1. What?

Application-layer protocol used for web APIs (text/JSON commonly). REST is an architectural style on top of HTTP.

2. Why?

Universal, browser-native, human-readable (JSON), easy to debug and cache, wide ecosystem.

3. Details

HTTP verbs (GET/POST/PUT/DELETE), status codes, headers for caching/auth.
HTTP/2 adds multiplexing and header compression; HTTP/3 (QUIC) adds UDP-based transport.

4. Pros / Cons

Pros: ubiquitous, easy to consume, cache-friendly, stateless.
Cons: higher overhead vs binary RPCs, less efficient for streaming (HTTP/2 helps), JSON can be verbose.

5. Use case

Public APIs, web frontends, integrations, when compatibility & human debugging matter.

3. Redis Caching

1. What?

In-memory key-value store supporting strings, hashes, lists, sets, sorted sets, and TTL—used as cache, session store, pub/sub, and more.

2. Why?

Sub-millisecond lookups to reduce DB load and speed reads; flexible data structures for complex caching patterns.

3. Details

Common patterns: cache-aside, write-through, write-back, TTL-based expiry, distributed locks (Redlock), LRU eviction. Persistence options: RDB/AOF (but often run as cache without persistence).

4. Pros / Cons

Pros: ultra-fast, rich data types, simple ops, pub/sub.
Cons: memory cost, eventual data loss (if persistence not configured), complexity scaling across clusters (resharding), consistency when combined with DB.

5. Use case

Session stores, leaderboard (sorted sets), frequently-read but infrequently-changed objects, rate limiting, caching DB query results or computed payloads.

4. Kafka (Distributed Log / Messaging)

1. What?

Distributed, fault-tolerant publish-subscribe streaming platform built around append-only logs (topics, partitions).

2. Why?

High-throughput, durable message store for event-driven architectures and stream processing with consumer groups for scale.

3. Details

Producers write to topics; topics partitioned for parallelism; consumers in consumer-groups share partitions. Guarantees: configurable acks, ordering per partition, retention policies. Works well with stream processors (Kafka Streams, ksqlDB) and connectors (Kafka Connect).

4. Pros / Cons

Pros: scalable throughput, durable, decouples producers/consumers, replayable history.
Cons: operational complexity, tuning (partitioning/retention), cross-datacenter replication needs Confluent or MirrorMaker, at-least-once semantics unless carefully handled.

5. Use case

Event sourcing, audit logs, metrics pipeline, async processing, data integration, streaming ETL.

5. CDC (Change Data Capture) & DB Synchronization

1. What?

CDC captures changes (inserts/updates/deletes) from a source DB and streams them (often into Kafka or other sinks) to keep other systems in sync.

2. Why?

Avoids expensive polling and full-table reads; enables real-time sync and event-driven architectures while preserving source-of-truth.

3. Details

Implementations: DB binlog (MySQL), WAL (Postgres logical decoding), Oracle redo. Tools: Debezium, Maxwell, etc. Important: schema evolution handling, ordering, transactional boundaries, idempotency, and semantics (exactly-once vs at-least-once).

4. Pros / Cons

Pros: near-real-time replication, low load on primary, enables analytics/streaming pipelines.
Cons: complex mapping to downstream schemas, drift, initial snapshot cost, ordering/duplication issues, operational complexity.

5. Use case

Replicate OLTP changes to data warehouses, update caches/search indexes, build event-driven microservices, auditing and compliance.

6. High-throughput architecture (e.g., Twitter feed)

1. What?

Design patterns to serve very large volumes of reads/writes with low latency — often use fan-out, caching, sharding, and async processing.

2. Why?

To deliver timelines/feeds to millions with acceptable latency and cost.

3. Details

Fan-out-on-write vs fan-out-on-read tradeoffs: precompute timelines (fast reads, expensive writes) vs compute on demand (cheaper writes, heavier reads). Use sharded storage, CDN + cache, rate limiting, backpressure, write-ahead logs, message queues for background work, denormalized data. Use consistency models: eventually consistent timelines acceptable.

4. Pros / Cons

Precompute pros: fast reads, cons: heavy write work and storage.
On-read pros: storage efficient, cons: high read latency and complex query. Both require hybrid approaches (precompute for heavy writers, read-assemble for others).

5. Use case

Social feeds, notification systems, recommendation result delivery, any high-read-volume personalized content.

7. Scalability (horizontal vs vertical, sharding)

1. What?

Ability to handle increased load by adding resources vertically (bigger machines) or horizontally (more machines); sharding partitions data across nodes.

2. Why?

To meet growth and avoid single points of failure; horizontal scaling provides cost-effective throughput increases.

3. Details

Stateless services → horizontal easy (load balancer).
Stateful → partitioning/sharding (range, hash, or directory-based). Use consistent hashing for rebalancing, leader/follower for replication. Autoscaling + health checks + circuit breakers.

4. Pros / Cons

Horizontal pros: resilient, cost-effective. Cons: complexity (distributed algorithms, data partitioning).
Vertical pros: simpler. Cons: single point of failure, cost and limits.

5. Use case

APIs, databases (sharded clusters), caches (clustered Redis), message systems (Kafka partitions).

8. Data Consistency (strong vs eventual, transactions)

1. What?

Models defining how and when different nodes see updates: strong (linearizability/serializability), eventual, causal, etc.

2. Why?

Different applications need different guarantees: banking (strong), social feeds (eventual).

3. Details

CAP tradeoffs: consistency, availability, partition tolerance. Distributed transactions, two-phase commit (2PC) — heavy and blocking; sagas for long-running workflows; techniques for causal consistency and vector clocks.

4. Pros / Cons

Strong pros: predictable correctness. Cons: high latency, reduced availability.
Eventual pros: high availability and performance. Cons: temporary anomalies, conflict resolution needed.

5. Use case

Strong: financial transfers, inventory decrements.
Eventual: social media, analytics, caches.

9. MQ Fan-out Optimization (message brokers, routing)

1. What?

Patterns to efficiently deliver messages from producers to many consumers (fan-out) using message brokers, topics, partitions, and pub/sub routing.

2. Why?

To broadcast events without coupling producers to many consumers and to scale consumers independently.

3. Details

Use topics with consumer groups, partitioning to parallelize, compacted topics for state, use of routing keys and exchanges (RabbitMQ), or topic-partitions (Kafka). For large fan-out, use hierarchical brokers, SSE/webhooks, or push to CDN edge caches for static content.

4. Pros / Cons

Pros: decoupling, replayability, elasticity.
Cons: duplicate deliveries, increased network usage, backpressure management, expensive fan-out to many endpoints (webhooks can fail).

5. Use case

Notifications, live updates, analytics pipelines, multi-subscriber event distribution.

10. Cache Invalidation & Read/Write Strategies

1. What?

Techniques to keep cache and DB consistent (invalidate or update caches) and strategies for read/writes: cache-aside, write-through, write-back.

2. Why?

To avoid stale reads, reduce DB load, and maintain correctness.

3. Details

Cache-aside: app reads cache → miss → read DB and populate cache; writes update DB and invalidate cache.
Write-through: writes go to cache then persisted to DB synchronously.
Write-back: write to cache, flush later (fast but risky).
Invalidation patterns: explicit delete, versioning, TTL, write-notify (pub/sub).

4. Pros / Cons

Cache-aside pros: simple, control over DB writes. Cons: race conditions, stale window.
Write-through pros: consistent reads. Cons: write latency.
Write-back pros: fast writes. Cons: data loss risk on crashes.

5. Use case

Cache-aside: read-heavy apps (product pages).
Write-through: read-after-write consistency needed.
TTL + versioning: session caches, ephemeral data.

11. Read/Write Strategies & CPU Bottleneck Reduction

1. What?

Approaches to distribute load between reads and writes and to reduce CPU-bound hotspots.

2. Why?

To maximize throughput and keep latency low while preventing single-node CPU saturation.

3. Details

Read replicas for scaling reads; leader for writes.
CQRS: split read model and write model. Batch writes, async processing, rate limiting, circuit breakers. Reduce CPU by caching, memoization, near-data processing, efficient algorithms, vectorized ops, offload heavy work to background workers/GPUs.

4. Pros / Cons

Pros: better utilization and elasticity.
Cons: complexity, eventual consistency, staleness.

5. Use case

Analytics pipelines, high-read OLTP apps, CPU-heavy image processing, API gateways.

12. Elasticsearch Usage (search & analytics)

1. What?

Distributed search and analytics engine built on Lucene for full-text search, aggregations, and near-real-time indexing.

2. Why?

Fast text search, scalable indices, aggregations, geospatial queries, and inverted index optimizations.

3. Details

Index → shards → replicas. Mapping defines field types. Queries: full-text (match), filters (term), aggregations. Near real-time: refresh interval affects visibility. Consider bulk indexing, mapping templates, analyzers, ILM.

4. Pros / Cons

Pros: powerful search, good aggregations, horizontal scale.
Cons: consistency/refresh delay, memory-heavy, operational overhead, tricky mapping, complex relevance tuning.

5. Use case

Site search, logs/metrics (ELK), analytics dashboards, autocomplete, faceted search.

13. Algorithmic Example — Median from Stream

1. What?

Maintain median of a stream efficiently.

2. Why?

Common interview problem testing heaps and balancing state.

3. Details

Use two heaps: max-heap for lower half, min-heap for upper half. Balance sizes so difference ≤1. Median is either top of one or average of tops.

4. Pros / Cons

Pros: O(log n) per insert, O(1) median query.
Cons: memory grows with stream; heavy memory if stream unbounded (use summarization/sketches).

5. Use case

Realtime stats, streaming analytics, online median calculators.

14. SMTP vs HTTP

1. What?

SMTP: Simple Mail Transfer Protocol for email delivery.
HTTP: web protocol for hypertext transfer and APIs.

2. Why?

SMTP handles store-and-forward, retries, queuing; HTTP handles request-response for resources.

3. Details

SMTP: push-based, commands (HELO, MAIL FROM, RCPT TO, DATA), queuing, retries. Uses ports 25/587/465.
HTTP: stateless request-response, verbs/status codes, ports 80/443.

4. Pros / Cons

SMTP pros: reliable delivery; cons: spam, complex headers.
HTTP pros: simple, synchronous, easy for REST; cons: not designed for store-and-forward or multi-hop mail delivery.

5. Use case

SMTP: sending emails, mail servers.
HTTP: web APIs, webpages, webhooks.

15. gRPC Internals (brief)

1. What?

Mechanics beneath gRPC — HTTP/2 streams, protobuf serialization, flow control, and status codes mapping.

2. Why?

Understanding internals helps with tuning, debugging, and designing fallbacks.

3. Details

gRPC uses HTTP/2 streams/multiplexing; messages framed with length prefixes; protobuf for compact serialization; OK/status codes map to HTTP; keepalive, window sizes, flow control; interceptors wrap client/server calls.

4. Pros / Cons

Pros: efficient transport.
Cons: complexity in proxies/load balancers that don’t fully support HTTP/2.

5. Use case

Low-latency microservices requiring streaming or advanced flow control.

16. Asynchronous Processing, Eventual Consistency, Fault Tolerance

1. What?

Async processing decouples producers and consumers. Eventual consistency accepts temporary divergence. Fault tolerance is design to survive failures.

2. Why?

Scales systems, smooths bursts, increases availability and resilience.

3. Details

Patterns: message queues, worker pools, retries, DLQs, idempotency tokens, circuit breakers, bulkheads, sagas, leader election. Observability: tracing, metrics, alerting.

4. Pros / Cons

Pros: scalable, resilient, performant.
Cons: complexity, harder correctness reasoning, testing harder.

5. Use case

Background jobs, email sending, order processing, tolerant systems.

17. Fault Tolerance Techniques

1. What?

Methods that allow a system to continue operating despite failures.

2. Why?

Keep availability and provide graceful degradation.

3. Details

Replication, consensus (Raft/Paxos), retries/backoff, circuit breakers, fallbacks, bulkheading, health checks, graceful degradation, redundancy, automated failover.

4. Pros / Cons

Pros: higher uptime and resilience.
Cons: added latency, cost, more complexity.

5. Use case

Critical services (payments, auth), multi-region deployments, high-SLA systems.

November 11, 2025