
Bazaarvoice Engineering was built on the foundation of empowering our talented teams to own every aspect of service delivery to customers, beginning to end. For our biggest event of the year, the Black Friday / Cyber Monday weekend: each engineering team builds their projected traffic loads, their load testing plan, their code freeze plan, and each are responsible for the execution and reporting of results for readiness. This preparation begins over 7 month back from Black Friday.
The work begins in April
While many people are thinking about their trip to Disneyland or the beach, Bazaarvoice Engineering is beginning the preparations for our biggest event of the year: Black Friday / Cyber Monday. There are many details to work through and much preparation in store. This is when our large scale systems planning efforts kick off, as we reassess overall system capacity in light of projected growth of traffic at ecommerce and brand sites throughout our network. Our teams analyze detailed traffic patterns, web traffic growth, user-generated content growth, storage capacity, and more. In addition to our own systems, we assess third party systems we integrate with and have meetings with their leadership teams to help them understand the improvements needed before the big holiday traffic arrives.
Coming from the initial assessment, it is clear that we have a number of major projects to complete to be ready for the holidays ahead. So while the Texas summer is starting to heat up in May, we kickoff our weekly Black Friday planning meetings and the teams get to work. There are major infrastructure projects to scale up what we call “Display”, the systems that serve up the front-end requests from half a billion unique visitors a month. There is more work to do to further improve the data feeds we send our customers each day. There are new load testing tools to build for brand new parts of our platform. New real-time dashboards are planned to improve our ability to visualize quickly the current state of all our services. Finally, there are improved processes and communication channels both before and during the peak season that need to be planned and rolled out through the company and to our customers. Cancel those vacations!
The calm before the storm
With record traffic projected for 2014, and with a significant number of new services in production, it was clear we needed an expanded load testing capability. We build out a massive new virtual test environment in the could where all the teams will deploy and scale out their services, just as we will in production when the time comes. The new test tools can simulate very precisely the actual traffic patterns from shoppers and like a fire hose, we crank the volume up and test all the services together. In addition to the load testing, each team participates in “Game Day” where we intentionally kill targeted services to ensure we can detect, react, and recover from failures throughout the system. Each team publishes all their results and signs off – we are ready.
As Black Friday approaches, weeks ahead we are already spinning up new virtual servers in the cloud. One thing we learned last year, even if you have reserved capacity in the cloud, if you try to spin up servers just prior to Black Friday there still may not be capacity available. Even as we enjoy our team potluck Thanksgiving lunch the week prior, we already see periods of traffic doubling.
“Game Day”
Thanksgiving is a wonderful time of the year, filled with family, food, football, and for the Bazaarvoice Engineering team, system performance graphs, on call schedules, pager alerts, and chat rooms. Command central is our “Incident Room” in our HipChat tool. As the rest of the country is enjoying turkey and kickoffs, our game day is online. Apparently, the country enjoys a little online shopping with their football. Make that a lot of shopping.
Our monitoring is paying off, we catch many issues and adjust capacity before any customer facing problems are seen. Thanksgiving Day and the chat room is full of engineers across the teams ready should an issue occur. Each hour the requests per second seem to be hitting a new record. And, inevitably, there is an issue on Thanksgiving evening on one of the many services, but thanks to our system design, the errors rate is <.1%. Still the engineers work until after midnight Thanksgiving night to completely resolve the issue.
As the holiday weekend progresses, we see the traffic grow to over 6x the normal volume, and when the traffic to our big data platform hits 30,000 per second, we all get excited. In the chat room, our incident manager posts graphs from our dashboards periodically and the team chats with holiday greetings and entertaining gifs to keep the mood light. Black Friday is here and it is bigger than ever.
Each hour, we send out health status emails to the company for those not in the chat room and we can see a significant increase in page views and API traffic over same time last year. A few issues continue to come up through the weekend, but very few are customer visible and the teams respond quickly, adding service resources, switching on new services, or increasing limits. The Engineering team works with the DevOps Support team and our outstanding Customer Service team to make sure the pre-planed communication process is working and everyone is informed along the way. As we say at BV – one team, one dream! Meanwhile, the traffic to our big data platform blows through 37,000 request per second and eventually peaks north of 50,000 rps.
384 million page views
By the time Cyber Monday rolls around, the increased traffic feels like the new norm. The team is back physically in the office, but you can tell the stress of the “Holiday” is tempered by the relief that we did it. So, what did we do exactly? Well, on Black Friday alone, we served up 384 million page views to more than 73 million unique visitors. We saw over 222 million unique visitors during this holiday period which is up a whopping 38% from last year, and we served up 7.7 billion overall impressions which is up an amazing 42% from last year.
In our never-ending desire to improve, we’ve already had a retrospective and thought of ideas of how can improve for next year, but I’m very proud of how this team planned, prepared, executed, responded, and delivered on a world class service for our customers. Truly this is the most customer focused and dedicated team I’ve had the privilege to lead in my career, and we are already looking forward to great things to come in 2015.