As a consumer, you’ve no doubt ordered something online by tapping into the e-commerce functionality of BigCommerce. The aptly named platform enables many big-name brands—Ben & Jerry’s, Crest, GE, Harley-Davidson, and tens of thousands of others—and is hugely important to the world of e-commerce as we know it.
A modern open source-based data platform that provides omnichannel commerce solutions to a variety of enterprise businesses, BigCommerce has to flawlessly meet ever-changing customer demands. But the company's engineering team was continually navigating and managing batch processing, which meant that their merchants were not running their businesses on real-time data.
This is the story of how a leading e-commerce platform strategized a move to fully managed Confluent Cloud for a more nimble approach to customer data and a lot of time saved on operational overhead.
Once, BigCommerce operated on top of a Hadoop MapReduce-based system in order to access analytics and insights. Although BigCommerce had self-managed Kafka clusters, they were only used for ad hoc analyses and not for merchant-facing analytics. There was a terrible lag with the MapReduce-based system, given batch processing, and merchants had to wait eight hours to get analytics reports. From time to time, jobs would fail, and manual interventions caused further delays. It was a system that was hard to manage and scale. Five years ago, BigCommerce built a new data platform using Apache Kafka for real-time merchant analytics so that they could capture various e-commerce events—visits, orders, and more.
In addition, BigCommerce managing its own Kafka cluster created an increased maintenance burden. Instead of focusing on delivering innovative new services, team members were bogged down by software patches, blind spots in data-related infrastructure, and systems updates. As a result, the company couldn't provide merchants with real-time analytics and insights to make critical business decisions.
Specifically, BigCommerce dealt with the following:
Designing how many nodes were needed and providing storage for each node
Provisioning the cluster
Handling seasonal traffic around surges and scaling (especially around holiday spikes in traffic—which often caused the company to freeze operations in order to expand Kafka clusters and add nodes)
Adding nodes—a complex process that required changes to terraform scripts and rebalancing of the cluster
Keeping up with software upgrades and patching
BigCommerce had a part-time team member managing all the operational aspects of Kafka, and three other engineers using a portion of their time on Kafka operations—an expensive setup. “It was not the best use of our time,” says Mahendra Kumar, VP of data and software engineering. “We wanted to focus on building more features and more functionality, and get to market faster. But this additional overhead, in terms of managing, came in the way of that.”
While this was an acceptable model for a while, merchants today demand real-time analytics and insights, so they can run retail businesses competitively in today’s digital-first world. They need to know how business is doing in the moment, who's coming to their site, what they’re looking at, what they're adding to their carts, and what they’re actually buying.
BigCommerce needed a platform that would allow them to stream and process data in real-time and give merchants access to their analytics right away—without a huge degree of operational burden. Since the team was already using Kafka, they needed a solution to get the data out of Kafka in real-time to process and perform ETL processes. This would allow them to build a better data model for reporting dashboards.
In addition to answering all of these questions, Confluent held a few distinct advantages for BigCommerce. The company’s storefront-serving infrastructure was already on Google Cloud, and the team had been contemplating moving their data platform to Google Cloud as well to save on data-transfer costs. Confluent enabled this capability.
Confluent also meant an immediate upgrade to their Kafka versioning and security. And it gave their small data team of nine people back their time so they could focus on their primary area of expertise and business focus: building a better product. “Confluent is designed for scale and resilience,” says Kumar.
Confluent specifically offered:
A cloud-agnostic platform that could accommodate the shift to Google Cloud for the data platform
Deep expertise in Kafka from the people who first created the technology
The ability to lead the change from self-managed Kafka to cloud-native, fully managed Kafka
Pre-built connectors to various data sources
Stronger security and better compliance, always kept up to date
By moving from open source Kafka and batch-based processing to Confluent on Google Cloud, BigCommerce was able to build out all of their e-commerce analytics and e-commerce insights reports. These reports make it possible for their merchants to see snapshots into store performance, product and customer trends to help drive sales, as well as discover which channel brings in the most customers and the highest lifetime value (LTV), and more.
For BigCommerce, the Confluent Cloud migration strategy had to be flexible enough so that operations would not be shut down. In whiteboarding the strategy with Confluent, the team decided to pump data into both their open source Kafka model and the new Confluent model for a period of time. That allowed them to scale in a test environment, starting with a low volume of data to ensure things would work smoothly. Then, they began to pump in production-level data.
With this migration strategy, BigCommerce experienced zero downtime, no data loss, and the confirmed ability to auto-scale. Now, they can seamlessly scale data—for instance, during and after the Cyber Monday rush—with low operational overhead and maintenance. A scaling situation that once might have required the team to estimate spikes in traffic a month in advance was now automatic and seamless on Confluent. It’s now also right-sized, so BigCommerce doesn’t overspend on an overestimated spike.
With such a tight strategy, the move from open source Kafka to Confluent took only five months.
Today, BigCommerce relies on Confluent to power its analytics pipeline. In 2022, the entire Black Friday/Cyber Monday spike was handled on Confluent, and BigCommerce’s engineering team does not have to get involved to add or move nodes. Everything simply scales seamlessly. And the need to provide real-time analytics is now possible. This saves the team at least a month’s worth of time in Kafka operations.
So, why use Kafka or Confluent for merchant analytics?
Ability to stream and process data in real-time for merchants to make intelligent merchandising and marketing decisions
Pre-built, fully managed connectors to and from various data sources and sinks that were needed to power the analytics
Lowered total cost of ownership and reduced operational burden associated with merchant analytics
By design, Confluent offers scale and resiliency, as well as fault tolerance. Those benefits, in addition to Confluent being fully managed, made all the difference for BigCommerce, its community of merchants, and, ultimately, the millions of consumers affected by these behind-the-scenes tech-stack decisions.
Interested in learning more about BigCommerce’s journey? Check out this webinar to see how BigCommerce migrated 1.6 Billion events a day from Kafka to Confluent, saving 20+ hours a week in Kafka management in just five months.