TL;DR
A customer emailed saying they could see another user's data. What followed was 72 hours of crisis management: finding the bug, assessing the damage, notifying affected users, and trying to rebuild trust. The bug was a caching issue that took 10 minutes to fix. The aftermath took months.
The Email
It arrived at 2:47 PM on a Wednesday. The subject line was simple: "Seeing someone else's data."
"Hi, I logged into my dashboard and I'm seeing a company called 'Acme Corp' and their project data. I'm not Acme Corp. I'm really concerned about what's happening here. Please respond ASAP."
My stomach dropped. I read it three times, hoping I'd misunderstood. I hadn't.
The First 30 Minutes
I immediately tried to reproduce the issue. I couldn't. My test accounts showed the correct data. But I knew customers don't make up things like this.
I responded to the customer within 5 minutes:
"Thank you for reporting this. We're treating this as a critical issue and investigating immediately. Can you tell me exactly what you saw and the steps you took? Please don't share any details publicly while we investigate."
Then I started digging through logs.
Finding the Bug
After 45 minutes of investigation, I found it. The issue was in our caching layer.
We cached user dashboard data to improve performance. The cache key was supposed to be based on the user ID. But due to a bug in a recent deployment, under certain conditions, the cache key was being generated incorrectly.
When two users logged in within milliseconds of each other, there was a race condition where they could receive each other's cached data.
The trigger: The bug only manifested when two users logged in at nearly the same time AND the cache was empty AND specific conditions were met. That's why it was so hard to reproduce. It probably happened fewer than 20 times total, but each time was a data exposure.
The 72-Hour Timeline
Customer email received. Initial investigation begins.
Identified the caching race condition. Deployed immediate fix (disabled problematic cache).
Analyzed logs to determine how many users were affected and what data was exposed.
Called our lawyer to understand notification requirements.
Wrote and rewrote the customer notification email multiple times.
Sent individual emails to all affected users.
Called each affected customer to answer questions personally.
Documented everything that happened and what we changed.
The Hardest Part: Customer Notification
Writing the notification email was agonizing. I went through at least ten drafts.
The final version was direct and honest:
"We discovered a bug that may have briefly exposed your dashboard data to another user. We believe this affected your account on date. The data potentially visible included specific data types. We fixed the issue within 2 hours of discovery. We're deeply sorry this happened."
Some customers were understanding. Others were furious. One enterprise customer immediately scheduled a call with their legal team. Two customers canceled their subscriptions on the spot.
What I Learned About Crisis Communication
1. Speed Matters, But Accuracy Matters More
I wanted to email customers immediately. But sending incorrect information would have made things worse. We took time to fully understand the scope before communicating.
2. Be Specific About What Was Exposed
Vague statements like "some data may have been accessed" make customers imagine the worst. We listed exactly what data types were affected. It's uncomfortable, but it builds trust.
3. Own It Completely
No deflecting. No "a bug" as if it appeared from nowhere. "We introduced a bug" makes it clear we're taking responsibility.
4. Explain What Changed
Customers want to know it won't happen again. We detailed the specific changes we made to prevent similar issues.
The uncomfortable truth: How you handle a breach matters as much as preventing one. Customers who felt we handled it well became some of our strongest advocates. Those who felt we weren't transparent enough never trusted us again.
Long-term Changes
Technical Changes
- Cache key review: Audited all caching to ensure user isolation
- Race condition testing: Added automated tests for concurrent access scenarios
- Deployment review: Any change touching caching requires extra review
- Monitoring: Alerts for unusual data access patterns
Process Changes
- Incident response plan: Documented procedures so we don't make decisions under panic
- Legal relationship: Established ongoing relationship with a lawyer who understands data privacy
- Customer communication templates: Pre-written templates for various incident scenarios
The Customer Who Reported It
The customer who sent that first email became one of our most valuable relationships. We offered them a year of free service. They declined, saying they just wanted to know we took it seriously.
Six months later, they introduced us to two other companies who became customers. When I asked why, they said: "You handled a bad situation the right way. That tells me more about your company than a sales pitch ever could."
Do I have to notify customers about a data breach?
It depends on your jurisdiction and the nature of the data. Many places have mandatory breach notification laws (GDPR, CCPA, state laws in the US). Consult with a lawyer to understand your specific requirements.
How quickly should I notify affected users?
As soon as you have accurate information about what happened. GDPR requires notification within 72 hours of becoming aware of a breach. Balance speed with accuracy, but don't delay unreasonably.
Should I offer compensation to affected users?
Consider it, but focus first on clear communication and demonstrating you've fixed the problem. Some companies offer free service or credits. Consult with legal before making broad offers.
How do I prevent caching-related data leaks?
Always include user-specific identifiers in cache keys. Test concurrent access scenarios. Consider using per-user cache namespaces. Review caching logic in code reviews, especially when dealing with user data.
Find Issues Before Customers Do
Scan your application for data isolation and access control vulnerabilities.
Start Free Scan