Behavioural Interview Story - Amazon

Some stories prepared for Amazon leadership principles.

Amazon - Behavioral Interview

🌟 STAR Framework – Fill-in-the-Blanks Template

Situation (hoàn cảnh)

At [company/organization/school], I was working on [project/task]. The situation was [describe context, challenge, or problem briefly].

Task (được assign)

My responsibility was to [explain your role]. The goal/objective was to [what needed to be achieved or solved].

Action (cách làm)

To address this, I [step 1 you took]. I also [step 2 you took]. Additionally, I [step 3 or collaboration/leadership action].

Result (save cho công ty bao nhiêu tiền)

As a result, [explain the outcome]. This led to [quantifiable result or recognition]. From this experience, I learned [personal or professional takeaway].

Take notes

Tell me about a time you stepped up into a leadership role → Onelink System end-to-end.
Tell me an example when you get touch feedback → Tool Tracking hub is difficult to used.
Tell me example of when to failed to meet a commitment you promised → Missed deadlines
What’s your most recent innovation? → Scan Everything Hackathon
Tell me about a time when you took on something outside of your area of responsibility → Boilerplate template.
Describe a time when you sacrificed short-term goals for long-term success → Refactor code.
Tell me about a time you had to quickly adjust your work priorities to meet changing demands. → Tet holiday, user can not access mini app due to leak memory in Redis, fix it first.
Tell me about the failures in your work → Rush testing, do not parse data when consume Kafka, and restart services.
Tell me about a time when you had a complex problem that required a lot of in-depth analysis → Dive deep tracking Onelink.
Describe a time when you had to act quickly to resolve a critical issue → CDN downtime, memcache to hot fix immediately
Describe a time you identified inefficiency and took ownership to fix it. → Consolidate 3 tools of mini app.
Describe a time you make decision without have enough data → Do not have enough data to know whether the traffic users or bot.
Describe a project you delivered under tight deadlines. → Scan QR with miss universe tracking campaign.
Describe a time you had to communicate complex findings to non-technical stakeholders → Educate users to use onelink tools
Describe a time you prioritized tasks when a bug happened → Redis ran out of memory, need to change cache strategy to write-through.
Tell me about a plan you use to learn new things → Learn AWS.
Could you tell me about a time where you were working on a project where you were working with another person. Over time, that person lessened their involvement in the project and you had to take on more responsibility → A frontend member is assigned to another urgent task, I do both frontend and backend.
Tell me about a time when you had to communicate a change in direction that you anticipated people would have concerns with → Fire event to Appsflyer and sync to Meta and Tiktok Event Manager, rather than only fire to Meta.
Tell me about a time where you were thrown into a project where you had no experience in. → Onelink Solution
Describe a situation where you made an important business decision without consulting your manager → When high peak campaign, the Redis is crash and I decide to change it to memcaching to prevent it point to database.
Tell me about a time where you were the first one to take action on something → drop rate onelink and research dive deep in mobile.
Tell me about a time when you had a disagreement with a colleague or manager → Forward to Appsflyer and sync to other platforms, my manager say about cost but I am convince him for reusability and easy to operation.
Tell me about a time when you had a conflict with a colleague → I want to have /v2 endpoint, he want to use the old endpoint but I want to backward compatible with old endpoint, you know Zalopay is a mobile application so that we can not change the code in old version when submiting to the store, we roll out by percentage, although we have about 10% of users impact by the change but I still want to backward compatible
Describe a time you needed to do something ad-hoc for short-term business needs rather than long-term design → work around scanQR event to track event scanQR for Miss Universe
Tell me about an interesting metric that you designed to identify and eventually improve something → top K mini app used in Zalopay, so that we can cache in local resource when build mobile native app before submitting to store.
Tell me about a time when you were 75% through a project and had to pivot strategy → 75% development done of Tracking Hub when forward to Tiktok, then we recognized requirements to track the Campaign in Meta too ⇒ adjust to change fire into Appsflyer and sync to both platform.

Follow-up questions

Benefits
- To begin with, the main benefits are [benefit 1], [benefit 2], and [benefit 3].
- These advantages help to [impact or goal] , which makes this approach appealing.
Challenges
- However, despite these benefits, there are several challenges that need to be considered.
- The key challenges include [challenge 1], [challenge 2], and [challenge 3]. These could potentially [risk or negative outcome] if not handled carefully.
Options
- To address these challenges, I explored several options to find the most balanced solution.
- The main options were:
  - Option A: [description + benefit]
  - Option B: [description + benefit]
  - Option C: [description + benefit]
Trade-offs
- After evaluating these options, I considered the trade-offs involved before making a decision.
- Choosing [selected option] meant accepting [specific trade-off], but it offered [key advantage or long-term gain].
- Overall, the decision balanced [short-term vs long-term / cost vs benefit / speed vs quality] effectively.
Lesson learned
- From this experience, I learned [lesson 1] — for example, [specific insight or behavioral change].
- I also realized the importance of [lesson 2], which has since helped me [how you apply it going forward]. As a result, I now approach similar situations with [improved mindset or method].

1. Tell me about a time you stepped up into a leadership role

Situation:

At Zalopay, I took ownership of the OneLink product used to manage and track marketing campaigns.
When a user clicks a Zalopay Onelink, it checks if the Zalopay or Zalo mini-app is already installed. If yes, it opens the app directly at the right feature. If not, it automatically redirects the user to the app store to download the app.
As the main point of contact (PIC), I worked directly with the marketing team to understand their requirements and ensure the system supported their needs effectively.

Task:

My task was to do an end-to-end development and improvement from system design, implementation and operation of the OneLink system.
Ensuring reliability, scalability, and usability for multiple marketing use cases — all while managing cross-functional communication and delivery timelines.

Action:

To address it, I regular sync with marketing team to gather feedback, define clear objectives and prioritize impactful requests.
I can rearrange and plan to clear timelines with them. Break the user story to subtask, estimate and monitor timeline.
Execute end-to-end from design system, database design, sequence diagram and implement microservices using Backend.
Identify potential issues ⇒ adjust the design so that it improves performance and operates to match the scalability of the system value and user experience.
Facilitated collaboration between DevOps teams to ensure smooth deployment and integration.

Result:

Successfully delivered an improved version of OneLink that is used in more than 1000 campaigns in this year.
Clear communication between the marketing team and product owner to know more about business.

2. Tell me an example when you get touch feedback

Situation:

I received feedback from my manager regarding an internal back-office tool I had developed for the marketing team.
While the tool worked technically as expected, the user flow was complicated, making it difficult for the operations team to use effectively in their daily work.
Firstly, as a backend engineer, when I am a Fresher with a lack of experience, I am focusing more on scalability of the design, but do not think more about client-side experience in the front-end view. But after active listening and reflection from my manager’s feedback, I agree and create a task to improve this tool.

Task:

My task was to understand the root cause of the usability issues and improve the tool’s user experience so that non-technical users in the marketing and operations teams could use it easily and efficiently.

Action:

Scheduled feedback sessions with the operations team to observe how they interacted with the tool and identify specific pain points.
Research and mapped the user journey and simplified the workflow by reducing unnecessary steps and improving input fields and navigation flow.
Collaborated with designer to enhance the UI/UX, ensuring the interface was intuitive for business users.
Conducted usability testing sessions before rollout to confirm improvements.

Result:

The revised version of the tool reduced operation time by around 30% and significantly decreased user confusion.
Received positive feedback from both the marketing and operations teams.
Learned the importance of viewing problems from the end-user’s perspective, not just a technical one — a lesson that has since shaped how I design internal tools.

3. Tell me example of when to failed to meet a commitment you promised

Situation:

I was responsible for delivering the campaign tracking system to track the traffic performance for a major marketing event called “Scan Everything with Miss Universe.”. In my company, we have an employee who became a Vietnam Miss Universe in this year, and we invite her to become an an ambassador for marketing campaign relating scan QR feature.
The campaign had a tight launch timeline, but due to unexpected technical issues and communication gaps, I missed the expected delivery deadline for testing, impacting the marketing team’s preparation schedule.

Task:

My task was to identify the root cause of the delay, communicate transparently with stakeholders, and have a clear next action plan to ensure the campaign tracking was completed accurately and as quickly as possible without compromising data quality.

Action:

Immediately informed the marketing team and my manager about the delay, and presented a clear recovery plan outlining the next steps.
Investigated the root cause and found that we had misunderstood the campaign requirements — initially, we were sending events only to TikTok Event Manager, but the campaign was actually intended for both Meta and TikTok. We then revised the approach to forward events to Appsflyer, a third-party platform integrated with both channels, ensuring synchronization across platforms.
Prioritized delivering the core feature (forwarding the scanQR event) first, while scheduling the additional configuration tool for general use cases in a later phase.
Worked extra hours to complete the setup and developed it as fast as possible to meet the timeline.

Result:

Delivered the campaign tracking setup within 24 hours after the missed deadline, allowing the event to proceed smoothly.
Built stronger trust with the marketing team through transparent communication and accountability.
Implemented a standardized campaign setup checklist, which later helped prevent similar issues in future events.

4. What’s your most recent invention?

Situation:

I led the development of the “Scan Everything” solution — a new feature that allows users to scan real-world objects, such as a cup of coffee or a movie poster , and receive personalized recommendations from our internal mini apps. The idea aimed to create cross-sell opportunities between mini apps and generate affiliate revenue from partner merchants.

Task:

My task was to design and implement the technical foundation for this system, including image recognition, semantic similarity search, and recommendation logic.
I also needed to research emerging technologies, particularly vector databases and image-based semantic search, to make the system accurate and scalable.

Action:

Conducted research and prototyping using vector databases (e.g., Milvus, FAISS) to store and retrieve image embeddings efficiently.
Implemented a feature extraction pipeline using deep learning models to encode image features.
Designed a ranking and recommendation service that mapped recognized objects to related products or mini apps.
Collaborated with product and marketing teams to integrate the feature into the user journey and measure engagement metrics.

Result:

The solution won Top 2 in the company’s internal Hackathon, and the business team approved it for production rollout.
Demonstrated a new engagement channel through cross-selling between mini apps and partner merchants.
The project was later adopted as a proof-of-concept for AI-driven personalization across the platform.

5. Tell me about a time when you took on something outside of your area of responsibility

Situation:

While my main responsibility was backend development, I noticed that new team members, especially juniors, struggled to onboard when starting new mini-app projects.
Each project required repetitive setup work — configuring the frontend and backend frameworks, CI/CD pipelines, and Docker environments — which slowed down productivity.

Task:

I decided to take initiative to create a reusable boilerplate template that would automate and standardize this setup process, even though it was outside my formal scope of work.

Action:

Designed and implemented a full-stack boilerplate with pre-configured frontend (React) and backend (Golang) frameworks.
Set up CI/CD pipelines for automatic testing and deployment to integrate with company pipeline.
Added Dockerfile and environment configuration to ensure consistent local and production setups.
Documented the setup process and conducted an onboarding session for the team to introduce how to use it.

Result:

Reduced project setup time from days to just a few hours.
Helped junior developers onboard faster and focus on building business logic instead of setup.
The boilerplate became a standard starting point for all new mini-apps in the team.

6. Describe a time when you sacrificed short-term goals for long-term success

Situation:

During a project to enhance our internal marketing analytics platform, I discovered that the legacy codebase had several performance bottlenecks and potential security risks.
While the team was under pressure to deliver new features quickly, I realized that continuing development on top of unstable code would create technical debt and future maintenance problems.

Task:

I proposed to pause feature development temporarily and focus on refactoring critical parts of the system — improving maintainability, performance, and security — even though it would delay short-term deliverables.

Action:

Conducted a code audit to identify key modules causing slow queries and unsafe data handling.
Refactored core components, optimized database queries, and implemented input validation and secure data access patterns.
Worked with my manager and product owner to redefine priorities and communicate the long-term benefits of the refactor to stakeholders.
Ensured that the refactored version had comprehensive unit tests to prevent regressions and speed up future development.

Result:

Although feature delivery was delayed by one sprint, the refactor led to a 35% improvement in API response time and significantly reduced error rates in production.
The cleaner, more modular codebase accelerated future development and simplified onboarding for new team members.
This experience reinforced the importance of balancing short-term speed with long-term technical health, ensuring sustainable growth and system reliability.

7. Tell me about a time you had to quickly adjust your work priorities to meet changing demands.

Situation:

While I was working on a project scheduled for the sprint release, a critical production bug suddenly appeared in one of mini app platform services.
The issue caused system instability and affected real user can not access mini app

Task:

Although I was focused on completing planned feature development, I needed to pause my current work and reprioritize to help diagnose and resolve the production incident as quickly as possible. My task was to identify the root cause, implement a hotfix, and stabilize the system to minimize business impact.

Action:

Quickly joined the incident response channel and coordinated with the DevOps and QA teams to assess the scope of the issue.
Used logs and metrics to trace the root cause to a memory leak in the caching layer after a recent deployment ⇒ read database and crash database.
Implemented a hot fix for switching it to new Redis and restart service.
Communicated status updates to stakeholders and ensured a post review to handle issues after incident.

Result:

The issue was resolved within two hours, restoring system stability and minimizing customer impact.
My quick prioritization and collaboration helped the team protect the issues.
Afterward, I updated our incident response checklist and monitoring alerts, improving our ability to respond to similar issues faster in the future.

8. Tell me about the failures in your work

Situation:

When developing a consumer service responsible for consuming messages from Kafka and storing them in a database to monitor OneLink traffic, I failed to properly handle exceptions during data parsing because I rushed through the testing phase.
As a result, when unexpected data formats appeared, the service crashed repeatedly. This caused multiple restarts and led to message loss, impacting downstream analytics temporarily.

Task:

My task was to quickly stabilize the service, recover lost data if possible.

Action:

Investigated the root cause using service logs and identified the code that can caused an unhandled parsing exception triggered the crash loop.
Added exception handling around to gracefully skip malformed messages while logging them for later review, return error to the fail-safe machanism.
Implemented a dead-letter queue (DLQ) to capture failed messages instead of discarding them.
Added unit and integration tests to cover different data formats and error scenarios.
Documented the incident in our post-mortem and shared key takeaways with the team.

Result:

Restored service stability within a few hours and successfully reprocessed most of the lost messages from backup.
The improved error handling and DLQ mechanism prevented similar crashes in later releases.
Learned an important lesson about defensive programming and the importance of anticipating edge cases in distributed systems.

9. Tell me about a time when you had a complex problem that required a lot of in-depth analysis

Situation:

While monitoring the traffic funnel for OneLink, the marketing team raised an issue that is noticed a significant drop rate between the web click stage and the mobile app open event.
Since OneLink tracks users across multiple platforms — from a web page click to a mini app launch inside a mobile app — identifying where the tracking was failing required deep cross-platform investigation.

Task:

My task was to analyze the full event flow across multiple systems — including backend tracking services and both Android and iOS client implementations — to locate the root cause of the event loss and propose a fix to restore full funnel visibility.

Action:

Started by analyzing backend logs and event metrics to confirm where the drop occurred in the funnel.
After the mobile team confirmed they have fired enough events correctly, but when I analyzed the history data and determined the drop-rate was related to event_id in the mobile steps. This led me to dive deeper into the native code source to identify the root cause.
Collaborated with the mobile teams to review both Android and iOS codebases, focusing on the logic that triggered tracking events after mini app launch. During this process, I also learned the basic tracking mechanisms in Android and iOS to better debug the issue.
Identified that the root cause was related to the iOS app lifecycle — specifically, events tied to onStart and onResume were not properly fired. After discussions, the mobile team updated the logic to ensure data was sent at all necessary steps.

Result:

Successfully identified and fixed the issue, which recovered over 20% of previously missing tracking events in the funnel.
Improved cross-platform reliability of OneLink’s analytics pipeline, enhance trust and reliability in tracking system with marketing team.
The investigation also led to the creation of a cross-team diagnostic checklist for tracking issues, reducing debugging time in future incidents.

10. Describe a time when you had to act quickly to resolve a critical issue.

Situation:

During a high-traffic period, we received multiple reports that users were unable to access the mini app.
After initial checks, we discovered that the issue was caused by a CDN downtime affecting static assets and API endpoints used by the mini app. This caused service disruption across several campaigns during peak usage.

Task:

My task was to quickly investigate and restore service availability, minimize user impact, and identify long-term preventive measures to avoid similar CDN-related incidents in the future.

Action:

Joined the incident response immediately, coordinated with the infrastructure and DevOps teams to confirm that the root cause was a CDN outage.
Worked with the backend team to cache critical configuration files and assets locally to reduce dependency on external CDN availability.
After stabilization, we conducted a post-mortem analysis and proposed improvements and adding alerts and real-time health monitoring.
Documented the recovery steps and shared them with both engineering and operations teams and proposed multi-CDN fallback support for future solutions

Result:

Restored access within one hour, minimizing campaign and revenue impact during the downtime.
Earned recognition from leadership for demonstrating ownership and quick problem-solving under pressure.
Strengthened system resilience and improved our incident response process for future high-severity events.

11. Describe a time you identified inefficiency and took ownership to fix it.

Situation:

Our team managed 3 different tools for mini app management — because in legacy system we have with different tools to configure mini app for internsal, mini app for merchant, manage SDK, manage mini app versioning in multiple tools and different services.
This fragmentation made the onboarding process for new projects and team members complex and time-consuming, as they had to switch between multiple systems and environments to complete even basic workflows.

Task:

I took the initiative to streamline the onboarding and management process by consolidating all three tools into a single unified platform that could handle the entire lifecycle — from setup, monitoring and deployment — in one place.

Action:

Conducted a workflow analysis to identify overlapping features and pain points across the three existing tools.
Designed a centralized architecture that integrated configuration, deployment, and monitoring into a single tool.
Implement and have a migration plan, ensuring backward compatibility with existing projects.
Collaborated with frontend, backend, and DevOps teams to unify authentication, CI/CD, and data access.
Created detailed documentation and onboarding guides, then trained the team on the new workflow.

Result:

Reduced onboarding time for new mini apps by over 50% and simplified maintenance.
Improved developer productivity and consistency across projects.
The unified tool became the standard platform for all future mini app development.
Received recognition from both engineering leadership and product teams for inventing and simplifying a key internal process.

Notes: You can planning using chatGPT.

12. Describe a time you make decision without have enough data

Situation:

During a major marketing campaign, we observed a sudden surge in traffic to our OneLink service, x5 rather than normal traffic. The incoming requests were far higher than our usual peak load.
At that time, we did not have enough real-time data to determine whether this spike was from legitimate user traffic or a potential DDoS attack.
Any delay in response could risk service instability or downtime.

Task:

As the on-call engineer, I needed to quickly decide how to protect system stability while minimizing the risk of blocking real user requests. After view the log about user agent in API gateway, it still use the same format like user agent from users (Mozila Firefox,…)
The challenge was to make a decision under uncertainty, with incomplete visibility into the nature of the traffic.

Action:

Reviewed past campaign traffic metrics to estimate a safe threshold for expected user load.
Decided to apply rate limiting at the API gateway layers using those historical baselines.
Closely monitored error rates and latency to ensure legitimate users were not significantly affected.
In parallel, coordinated with the infrastructure and security teams to collect and analyze traffic patterns to confirm the source ⇒ And finally I decide all the traffic come from the same 4 IPs, and we conclude it is bot because around 60000 requests in 5 minutes come from the same IP is not make sense from users.

Result:

The system remained stable throughout the traffic surge, with no downtime or cascading failures.
Data later confirmed that the spike was a mix of legitimate campaign traffic and some bot noise — validating that rate limiting was the right precaution.
Afterward, we implemented real-time anomaly detection and auto-scaling policies to avoid similar manual decisions in future incidents.
The experience strengthened my ability to make data-informed decisions quickly under pressure, even with limited information.

13. Describe a project you delivered under tight deadlines.

Situation:

I was assigned to implement a tracking hub to monitor user engagement for the “Scan with Miss Universe” campaign — a marketing event with a very tight launch timeline.
The tracking system needed to collect and unify data from multiple sources of events to provide a solution for operations event tracking for the marketing team, forwards it to Tiktok Event Manager, Meta Event Manager before the campaign went live.

Task:

My task was to design, develop, and deploy the tracking system within a two-weeks timeline, ensuring accurate event collection, scalability under heavy traffic, and clear data visualization for non-technical stakeholders.

Action:

Quickly gathered requirements from the marketing and data teams to define key metrics and event flows.
Designed a lightweight but extensible data pipeline using Kafka for event streaming and Redis for quick aggregation.
Coordinated with DevOps to set up monitoring and alerting to handle expected traffic spikes.
Conducted end-to-end testing and collaborated with the marketing team to monitor traffic in Platform Events for testing before the days when we start the real campaign.

Result:

Successfully delivered the tracking hub on time, just one day before the campaign launch.
The system handled millions of events per day without downtime and provided accurate, real-time insights.
The marketing team used the data to optimize campaign performance in real time, increasing engagement rates.
The success of this project led to the tracking hub becoming a reusable framework for future campaigns.

14. Describe a time you had to communicate complex findings to non-technical stakeholders.

Situation:

I noticed that many users, especially from the marketing and operations teams, struggled to configure campaign flows in the OneLink tool. The setup process involved several technical parameters, which caused frequent mistakes and delayed campaign launches.
To make it easier for non-technical users, I decided to create clearer learning resources and examples to help them understand the flow.

Action:

My task was to educate users on how to correctly configure OneLink flows and reduce the dependency on engineers for campaign setup. I needed to make the training materials simple, visual, and easy to follow.

Task:

Created step-by-step documentation with screenshots explaining each configuration field and its purpose.
Recorded short demo videos showing real examples of campaign setup from start to finish.
Collected common mistakes and FAQs from users and added them into a troubleshooting section.
Hosted a training session with the marketing team to walk through the process and answer live questions.

Result:

After the training, setup errors dropped by more than 60%, and users were able to configure new campaigns independently.
Reduced engineering support time, allowing developers to focus more on product improvements.
Received positive feedback from the marketing team for making the tool more accessible and user-friendly.
The videos and guides became part of the official onboarding material for new team members.

15. Describe a time you prioritized tasks when a bug happened

Situation:

While implementing the Tracking Hub system, a critical issue occurred on the mini app platform — Redis ran out of memory. As a result, the system started reading directly from the database, leading to a spike in DB connections.
If left unresolved, it could have caused a database crash and disrupted user transactions during a high-traffic period. It required an immediate hotfix to prevent system downtime.

Task:

My responsibility was to quickly analyze the issue, prioritize the fix, and coordinate with cross-functional teams to stabilize the system while minimizing impact on active users and ongoing development.

Action:

Temporarily paused all non-critical development tasks to focus on incident resolution.
Used monitoring tools to identify memory usage patterns and confirm the Redis eviction policy was not properly configured.
Worked with the DevOps team to change it to another Redis cluster and change an appropriate caching strategy to write-through rather than cache-side with time to live.
After stabilizing the system, conducted a post-incident review and updated documentation to prevent recurrence.

Result:

Restored system stability within 30 minutes, preventing a potential database outage and transaction loss.
Improved Redis configuration and monitoring thresholds, reducing future memory incidents by over 70%.

16. Tell me about a plan you use to learn new things

Situation:

As our systems were gradually moving toward cloud-based infrastructure, I realized that gaining deeper knowledge of AWS services would help me design more scalable and efficient solutions.
However, I had limited hands-on experience with AWS, so I needed a structured plan to learn effectively while balancing ongoing work.

Task:

My goal was to build a solid foundation in AWS architecture and services, gain hands-on experience through real projects, and eventually apply this knowledge to improve our backend and deployment workflows.

Action:

I started by defining a learning roadmap — beginning with AWS Cloud Practitioner to cover the basics, then advancing to AWS Solutions Architect – Associate for deeper understanding.
Allocated 1 hour daily for study using AWS Skill Builder, documentation, and tutorial projects.
Built small practice projects, such as deploying a web app using EC2, S3, and CloudFront, and setting up serverless APIs with Lambda and API Gateway.
Set up a personal knowledge base to record lessons learned and commands I frequently used.
Joined AWS communities and online forums to discuss problems, get feedback, and learn best practices from others.

Result:

Gained practical experience with core AWS services and improved my understanding of cloud infrastructure design.
Successfully deployed a small-scale project using AWS, reducing hosting cost and improving scalability.
My new knowledge allowed me to contribute to cloud discussions within my team and support infrastructure improvements with confidence.
I’m now preparing for the AWS Certified Solutions Architect exam to formally validate my skills.

17. Could you tell me about a time where you were working on a project where you were working with another person. Over time, that person lessened their involvement in the project and you had to take on more responsibility

Situation:

During the development of our internal OneLink analytics tool, I was working with another engineer responsible for the frontend integration while I focused on the backend and data pipeline.
Midway through the project, my teammate became less involved due to shifting priorities and other urgent assignments. This created a risk of delayed delivery since our timeline was tied to a marketing campaign launch.

Task:

I needed to ensure the project stayed on schedule while maintaining code quality and functionality, even though I had to take on additional responsibilities beyond my original backend scope.

Action:

I first reassessed the project timeline and scope, identifying critical features that must be completed for the campaign.
Took ownership of several frontend tasks, including API integration and UI testing, despite not being the initial lead for that part.
Set up daily syncs with the marketing team to clarify requirements and confirm progress.
Documented the full implementation flow so that my teammate could easily catch up when available again.

Result:

Delivered the OneLink analytics tool on time, meeting the marketing campaign deadline.
The solution performed reliably in production, supporting thousands of tracking events per day.
My manager recognized my ownership and flexibility, and this experience later helped me become the main PIC (person in charge) of the OneLink product.

18. Describe a situation where you made an important business decision without consulting your manager

Situation:

During a high-traffic marketing campaign, our backend system experienced a Redis crash due to memory overload. This caused the service to start reading directly from the database, leading to a spike in DB connections and a high risk of system outage.
My manager was unavailable at that moment, but the issue required immediate action to protect the database and maintain service continuity.

Task:

I needed to quickly stabilize the system and prevent further database overload, while ensuring that user transactions and campaign tracking remained unaffected.

Action:

After checking the metrics and logs, I decided to temporarily switch the caching layer from Redis to Memcached, which had more available capacity and could handle our read-heavy workload.
Updated the configuration to redirect traffic to Memcached and deployed the change immediately.
Monitored system performance and confirmed that database connections dropped significantly after the switch.
Once the system was stable, I documented all steps taken and informed my manager and DevOps team about the change.
Later, we added automatic failover and alerting mechanisms for Redis to prevent similar incidents.

Result:

Successfully prevented a potential database crash during the campaign.
Restored system stability within 15 minutes, with zero impact on user transactions.
My quick decision demonstrated ownership and sound technical judgment under pressure.
The Memcached fallback strategy was later adopted as a standard contingency plan for high-load scenarios.

19. Tell me about a time where you were the first one to take action on something

Situation:

While monitoring OneLink’s traffic funnel, I noticed a sudden drop in conversion rate from web clicks to mini app opens. No one had reported it yet, but the data clearly indicated a tracking or event issue that could affect marketing performance.

Task:

I needed to investigate the cause proactively, before the marketing team escalated the issue, and ensure that tracking accuracy was restored as soon as possible.

Action:

Started by analyzing backend logs and event metrics to pinpoint where the drop occurred.
Discovered that certain mobile events weren’t firing correctly due to a parameter mismatch in the Android SDK.
Contacted the mobile team immediately and collaborated to debug the issue directly in their native code.
Proposed a quick fix and tested the event flow end-to-end across both Android and iOS.
Informed the marketing team about the issue and resolution, along with monitoring alerts to prevent recurrence.

Result:

Fixed the issue within a few hours, restoring accurate event tracking.
Prevented potential data loss and misattribution in campaign analytics.
My proactive action built trust with the marketing and mobile teams, and I became the main point of contact for tracking reliability going forward.

20. Tell me about a time when you had a disagreement with a colleague

Situation:

During the development of a new feature in the ZaloPay mini app platform, I worked closely with another engineer on the backend API. We had a disagreement about how to roll out an API change — I wanted to create a new /v2 endpoint for the updated logic, while he preferred to reuse the existing endpoint for simplicity.
However, since ZaloPay is a mobile application, we can’t instantly update all users after deployment — updates roll out gradually by percentage, meaning older app versions would still call the old API. Changing it directly would break backward compatibility and potentially affect about 10% of active users.

Task:

I needed to convince my colleague that introducing a new /v2 endpoint was the safer long-term decision, even though it required a bit more initial work, to ensure system stability and smooth rollout for all app versions.

Action:

Explained my reasoning using data from release rollout metrics, showing the risk to users who hadn’t updated yet.
Highlighted that backend changes must remain backward compatible during gradual mobile releases, since store approvals and user adoption can’t be controlled immediately.
Proposed a clear migration plan: maintain /v1 for old clients and introduce /v2 for new logic, then deprecate the old version once adoption reached 100%.
Took initiative to implement the versioning mechanism, add monitoring for both endpoints, and document the API change for the entire team.

Result:

We rolled out the new API smoothly without impacting any existing users.
Reduced potential production risk during rollout and made future version upgrades easier.
My colleague later acknowledged that the /v2 approach was more scalable and maintainable.
The solution became our standard API versioning practice across the backend team.

21. Describe a time you needed to do something ad-hoc for short-term business needs rather than long-term design.

Situation

During a major marketing campaign launch, the Tracking Hub System — which tracked user journeys from ads to mini apps, we need to track for event “ScanQR with the Miss Universe”
The proper long-term solution would involve updating multiple backend services, SDKs, and data pipelines — but that process would take weeks, while the campaign was scheduled to go live in two days.

Task

I needed to deliver a quick, stable workaround that allowed marketing to track conversions for this campaign without delaying the launch, while still ensuring data accuracy and minimal risk to production.

Action

I implemented an ad-hoc middleware in the backend that temporarily intercepted and enriched requests with the required parameter before forwarding to the tracking service.
Coordinated closely with the marketing and QA teams to test edge cases and validate that metrics were correctly recorded.
Documented the temporary solution and added monitoring alerts to detect any anomalies.
After the campaign ended, I led a refactor to integrate this logic properly into the main service for long-term maintainability.

Result

The campaign launched on time and successfully tracked millions of conversions without errors.
Business goals were met without compromising user experience.
Later, the temporary patch evolved into a standardized parameter handling feature, reducing setup time for future campaigns.

22. Tell me about an interesting metric that you designed to identify and eventually improve something

Situation

In ZaloPay, we host hundreds of mini apps inside the platform.

When the mobile app starts, it dynamically fetches mini app metadata and assets from the server — which sometimes led to slow startup times and degraded user experience, especially during high traffic periods or poor network conditions.

We wanted to optimize performance but needed a data-driven way to decide which mini apps should be preloaded or cached locally in the native app before release.

Task

My task was to design a metric and process to identify the top K most frequently used mini apps based on real user behavior, so that the mobile team could bundle their metadata and assets directly into the app build for faster access ⇒ fire it to Firebase for save storage and computing.

Action

Analyzed backend logs and event tracking from OneLink and mini app platforms.
Built a daily pipeline to aggregate and rank mini apps by their Usage Score.
Worked with the mobile team to cache the top K mini apps locally during build time, ensuring they loaded instantly without hitting the network.

Result

Reduced average mini app load time by ~35% for the top-used apps.
Improved user retention and satisfaction scores, especially in regions with slower networks.
The metric became part of our release checklist, automatically updating which mini apps were included in local cache for each app version.

23. Tell me about a time when you were 75% through a project and had to pivot strategy.

Situation

We were about 75% done developing a new campaign tracking system for the marketing team, designed to send data directly to TikTok Event Manager.

However, just before the launch, we learned that the campaign would also run on Meta (Facebook) — and TikTok integration alone wouldn’t support multi-platform attribution.

Task

I had to quickly adjust the integration plan to support both TikTok and Meta tracking, without missing the campaign launch deadline. The goal was to ensure unified event tracking across all platforms while minimizing disruption to the existing codebase.

Action

Reassessed the architecture and realized that sending data only to TikTok wasn’t scalable.
Proposed a new plan to route events through Appsflyer, a third-party service already integrated with both TikTok and Meta.
Coordinated with both backend and marketing teams to validate the new data flow and update configurations.
Conducted quick end-to-end testing to confirm events were correctly attributed across platforms.
Communicated the pivot clearly to all stakeholders and updated the delivery timeline.

Result

Delivered the revised system on time for the campaign launch with full cross-platform tracking.
The new architecture became the standard approach for all future campaigns, simplifying setup for marketing and reducing engineering overhead.
The pivot not only solved the immediate problem but also improved long-term scalability of our tracking infrastructure.

⇒ Use STAR for following questions too.

24. Tell me a time you find a mistake in a team and how you do this

Situation

While reviewing event tracking data for a major marketing campaign in the OneLink system, I noticed a sudden drop in conversion rates that didn’t align with traffic volume. After digging deeper, I found that one of our developers had accidentally missed an event-binding step in the backend — causing tracking data from iOS devices to not be logged correctly.

Task

My goal was to confirm the root cause, minimize data loss, and prevent the same mistake from happening again — all without blaming anyone on the team.

Action

I reproduced the issue locally and confirmed the missing event-binding logic.
Informed the team immediately and helped prepare a hotfix patch to restore correct tracking within hours.
Backfilled partial data from logs to recover missing metrics as much as possible.
Afterward, I proposed adding a unit test and event validation checklist to our CI/CD pipeline to automatically detect missing tracking fields before deployment.
Conducted a short retrospective session with the team to share what happened and focus on process improvement rather than fault.

Result

The issue was resolved within the same day, minimizing impact on marketing analytics.
The new validation process prevented similar issues in future releases.
The team’s trust and collaboration improved — everyone appreciated the focus on learning and prevention rather than blame.

October 13, 2025