Our first, MVP product built with Ruby on Rails, the product of a few weekend hackathons, had been up and running for a while and was accepting user signups. But after validating that there was a lot of user demand, we decided to fully rebuild the product with a new codebase to set ourselves up for future growth.
Let’s just say we’re fairly embarrassed by the results, also a little proud at the same time. This blogpost is about all the fuckups happened during our first week of launching PingPong v2 for our private customers and the bumpy road we travelled down.
We decided to launch because we had some great features in the pipeline that our customers were excited to use for some upcoming user research sprints. So we opted to launch a little earlier than planned, without attempting to comprehensively plan a launch procedure and test the whole product. This is a common situation: when you’re on a short runway and have limited capacity in your team, you want to make sure you’re as lean as possible and embrace failure in the most manageable way.
We already had a few customers. These customers are pretty open and failure tolerant. The reason for this is that PingPong’s offering is valuable and unique. You can’t easily find users to research from South Korea, Malaysia, Indonesia, Mexico or Japan. Look at Pivotal’s Trello board to see for yourself how complex it is to run a serious user research process. Especially this step, which can take days of work to complete. PingPong makes this process far easier for researchers and simplifies the most time consuming and painful parts of the user research process, which means we have very loyal early customers.
So what can go wrong?
We tested every piece of core functionality, of course, but there are always issues that will only come up in production when you’re exposing the product to many more complex use cases. Let’s see what happened…
#1 The password reset doesn’t work
The new version has been deployed and the app is up and running, so we send a newsletter to our existing testers to reset their password. After 8 hours someone sent me this:
My reaction: 🤔
Okay, let’s investigate what happened.
Maybe the email service is down? Nope. This would have been too easy. Maybe the users mistyped their emails? Nope. I request a password reset on my researcher account. It works perfectly. Okay, let’s try with my tester account… Oh, this really doesn’t work! Bug reproduced: it should be easy from here on.
After a little research, we realised what was happening: the user database was coming from the old first version of the app, which we’d imported. Of course, we didn’t store users’ raw passwords, nor import the encrypted string (we were using a different encryption algorithm and salt). The safest and simplest way to migrate all the users to a totally new codebase and app was to email them all to ask them to reset and create a new password.
So, having imported the user database, we set all user password fields to “unusable” passwords by default. To understand what’s going on, you need to know that our main technology stack is Django which is the most popular Python web framework. Django doesn’t allow users with an “unusable” password to reset their passwords—and this is actually intended to be a feature. The other factor that played a critical role here is another Django feature: if the password reset request doesn’t succeed and doesn’t return a response to the user (or the attacker!), we still serve them a successful “Password reset sent” message. This prevents users (and hackers) from guessing whether an email address is registered with the product.
We fixed this quickly by rewriting the default Django password reset view and rolled out to production.
Then I manually sent out the password reset to the people who already requested (we tracked those requests)… with a 10 hour delay. Fortunately, we don’t plan to migrate our entire backend stack again anytime soon!
Costs: 2 hours of work and a little confusion for a few testers who received their password reset email with a 10 hour delay.
#2 PingPong spams you 14 times with the same e-mail
The first user testing session had been successfully booked and went well. We have quite a few transactional e-mails in place. One of them is the “Review tester” email which is sent to the researcher after a test session is completed. The e-mail reminds the researcher to review the session and mark it as done (this is how we track completion and reviews).
The problem was that the researcher received the review e-mail 14 times.
Uhmm, what? Let’s figure out what’s going on. That took some time, because we had to carefully replicate the bug. Originally, we used Redis as a quick solution for a temporary message broker. We figured that there might be a bug somewhere here and that we should move to a dedicated message broker instead, which would support exactly what we needed.
We set up a cloud hosted RabbitMQ on cloudamqp.com which now works like a charm. We didn’t really catch what was really happening here but we could fix this fairly quickly and our code also improved a lot.
Costs: 6 hours of work and a disappointed researcher. We compensated the researcher for their bumpy experience by giving them a discount on their session.
#3 Overlapping bookings
When you’re scheduling calls between 2 people, there’s one thing you want to make sure never happens: booking sessions on top of each other. Well, we managed to do just that.
After a bunch of testing and research, we found 2 issues here:
First, if you had 2 completely separate user testing types running in parallel (let’s say you’re testing 2 products at the same time), PingPong didn’t calculate bookings made on the other test.
Second, our Office 365 calendar integration (which should actually have prevented double-booking of test sessions) wasn’t flawless and didn’t recognise some sessions in the researcher’s calendar.
This was by far the most complex bug we ran into. The reason for this is reproductivity. Whenever we run into a bug, the first and most important thing is to reproduce that bug. That allows us to understand and observe all the interfering elements.
To do this, we had to run through both Google and Office 365 calendar integrations and realise that only Office 365 calendar is affected and we need to dig deeper there. Which we did and fixed the issue.
Costs: 10 hours of work and an overbooked session that I had to reschedule.
+1 Onboarding a new developer instead of fixing issues
This isn’t a bug. It’s just a bad decision cost us quite a lot of time. We figured out we’d need a frontend dev who could also improve the app UI.
For some reason, we made the decision to bring him on board the same week we went live, rather than waiting another week and a half. We didn’t anticipate we’d run into so many bugs and we were over optimistic.
This was a 2-day lesson which didn’t cost that much in the end but it could have been worse.
In future, we won’t onboard new people when we’re in a critical stage.
Costs: approximately 1 day delay spent on fixing these bugs.
So, was it worth launching so early?
I see too many startups over-thinking and overdoing their product launch. And I believe doing it too late holds you back from focusing on the product market fit and iterating based on real user feedback.
We knew we’d run into problems. We also knew we could address them quickly. Sometimes writing tests and covering all possible test cases simply isn’t worth the effort.
The most important takeaway is this: Imperfection and errors are accepted when you’re working on an early stage product. If your product provides enough value, and your bug isn’t entirely ruining your user’s day, they’ll hopefully forgive you.