[Demo Webinar] Ready to break up with ZooKeeper? Meet KRaft!

Easy and Instant Insurance Quotes Using Data Streaming with Confluent Cloud

Get started with Confluent, for free

Written By

Technology is Redefining the Insurance Industry

The insurance industry, like other traditional sectors, is no stranger to the rapid technology-driven change in consumer expectations. If these companies don’t keep pace, they risk an eroding customer base and lost revenue to more nimble and innovative competitors. We’re seeing auto insurance giants race to provide ultra-personalized discounts through monitoring of driving behavior via mobile app data and in-car IoT devices. Within the realm of home insurance, drones are becoming a technically and financially viable option to aid in damage assessments from above without waiting for a human agent to arrive on scene. Health insurance providers are starting to take an active role in their customers’ lives, making proactive recommendations around diet and exercise. And across the board, every insurance company, big and small, is rushing to give their customers the best possible AI chatbot experience so claims can be filed end-to-end without ever picking up the phone and listening to that dreaded hold music. While these trends are well documented – including an interesting one here from McKinsey – the rate of implementation and delivery of these technology-focused advances has just started peaking in the past one to two years.

While technology is enabling product and customer experience improvements, a similar race for automation is occurring behind the scenes. Insurance providers have to balance user experience improvements with updates to their backend systems, including:

  • Real-time data integration between cloud-native platforms and legacy infrastructure and databases

  • Data-driven fraud detection and prevention

  • Analytics for dynamic pricing, claims prediction, customer retention, etc. 

  • ML-powered risk analysis to support rapid underwriting decisions

  • Regulatory and compliance reporting

Apache Kafka is frequently part of the critical infrastructure to deliver on both the backend and client-facing innovations. This is especially true in insurance, where 10 of 10 top providers are already using Kafka today.

Apache Kafka usage across industries

We’re in the Age of Insurtechs

As established providers race to modernize, there is also a set of newer companies carving out a space for themselves. A cousin to the more frequently mentioned superset category of Fintechs, Insurtech companies are combining deep insurance industry expertise with modern technology like AI and IoT to offer a more personalized product and a seamless digital experience. According to market research like this report sponsored by industry heavyweights such as MAPFRE and Confluent user Generali, the global Insurtech market has experienced exponential growth in recent years, while simultaneously being relatively underrepresented from an investments perspective. It seems a near certainty we’ll continue to see pioneers cropping up in this space for years to come. There are even wholly new markets being made for insurance products that didn’t exist in the recent past. A great example of this is cyber insurance, which protects against liability and damages caused by data breaches and hacks. The nascent cyber insurance market is also spurring growth for cyber-security companies like SecurityScorecard, which provides the data used to underwrite these policies. No matter the type of insurance, the shift toward a data-driven business has significant network effects on peripheral data and software providers. 

So What’s the Problem? 

In many cases, these Insurtech upstarts are digital-native and cloud-first. They may not have the same level of tech debt of a more established corporation, but many challenges remain. Here are two of the most common challenges that we see from Confluent customers in the real world:

  • “How can I get data from where it sits today to all these new services my team is building around AI/ML, chatbots, real-time web/mobile experiences, etc.?”

and

  • “How can I make decisions and perform calculations in near real time across data sets originating from a variety of sources?”

These are real questions and concerns I’ve personally dealt with while helping customers adopt Confluent Cloud. The particular customers I’ve been working with are Insurtechs that provide a digital marketplace for insurance. A prospective client can navigate to the Insurtech’s website, fill out a simple form about what type of insurance they’re interested in, and immediately get an array of provider options with associated quotes to compare, which have been aggregated across a number of providers. Then, when the prospective client is ready to purchase, they can buy the policy through the Insurtech’s marketplace, but the policy actually gets underwritten by that separate insurance provider. 

The reason I chose to highlight the two challenges above is because they come up time and time again. Getting data to where it needs to go in a fast, scalable, and durable way while simultaneously acting on that data as it passes through the system is Apache Kafka’s sweet spot. And implementing it in the public cloud, with enterprise security and SLAs that line up with the requirements of a highly regulated insurance industry, is where Confluent Cloud shines, especially when you need this solution implemented yesterday and time to market is critical.

How One Insurtech Tackled Its Monolith

For one legacy organization I worked with, the goal was simple – stream data out of an existing monolithic database and make it available to any number of downstream consumers in order to give both customers and agents immediate access to the latest information. For example, a consumer application calculating a quote for car insurance would need rapid access to the data the prospective customer submits to the Insurtech’s web application. The current state looked something like this:

Current state 3-tier app with monolith DB

In this situation, there’s a three-tier current state, but an application development team is now being told to reinvent the experience for their customers and agents to allow automated, real-time insurance quotes to be prepared for prospective clients. The issue is that the database is already heavily loaded and building a large number of modern microservices, which will make frequent calls into the DB, is just not going to work. They needed to keep the existing architecture in place because this is running the core platform, but it needed to be extended to unblock new services.

To summarize, the company needed to overcome the following challenges:

  • Traditional architecture does not easily scale

  • Monolithic DB is already under heavy load

  • DB isn’t optimized for search or analytics

  • Modern services need to consume real-time events

  • Prospective customers are expecting an immediate response

  • Cumbersome to add any new features to existing application

  • Existing platform does not have AI/ML capabilities or stream processing functionality

Attempting to roll out new microservices

So, how can all these “modern” GO and Python microservices gain access to the data they need quickly so they can pass it up to their Javascript frontend for display in a UI? This is where Confluent Cloud comes in, allowing the business to create a streaming data pipeline. Confluent Cloud is a cloud-native, fully managed, end-to-end data streaming platform that encompasses serverless Kafka, Connectors to data sources and sinks, stream processing, governance, and more.

I recommend several other great resources for information on the subject of data streaming in the insurance industry:

Bringing in Data Streaming with Confluent Cloud

Using a pre-built source connector for change data capture

Through CDC (aka “change data capture”) Connectors, Confluent Cloud plugs directly into the existing database and streams all new row-level changes to topics. Then, all microservices need to do is subscribe to topics containing the data they need.

These CDC Connectors are pre-built, pluggable, and easily configurable. All it takes are a few lines of JSON to launch the Connector from the API (example below using the Debezium PostgreSQL Connector). You can also use Confluent’s GUI console, or most popularly the Terraform Provider.

{
  "connector.class": "PostgresCdcSource",
  "name": "InsuranceDB_CDC_Connector",
  "kafka.auth.mode": "KAFKA_API_KEY",
  "kafka.api.key": "****************",
  "kafka.api.secret": "*******************************",
  "database.hostname": "insurdb.<host-id>.us-east-2.rds.amazonaws.com",
  "database.port": "5432",
  "database.user": "postgres",
  "database.password": "**************",
  "database.dbname": "postgres",
  "database.server.name": "cdc",
  "table.include.list":"prod.USERS,prod.CLAIMS",
  "plugin.name": "pgoutput",
  "output.data.format": "JSON",
  "tasks.max": "1"
}

Going Back to the Future 

CDC for sourcing data is not the only type of Confluent Connector out there. In many cases, customers want to use Connectors to “sink” data to downstream systems directly instead of consuming data with custom microservices. The Insurtech customer I work with doesn’t only want to consume data via Golang code – that is just the first step. Eventually, they’d like that data to make its way into Elasticsearch, a Snowflake data warehouse, and a Google Cloud Storage or AWS S3 bucket. This is all possible through Confluent’s ecosystem of supported Sink Connectors.

Fan-out of streaming data to cloud native platforms such as search, data lake, etc.

There are limitless possibilities to bring new datasets into the fold and power increasingly innovative end-user features. In the future, this same data streaming platform that started out doing basic CDC can be used to bring in data from IoT sensors, medical devices, automobiles, smartphones, or other sources. You can even use this streaming data to populate a knowledge base used to train and deploy ML models, or deliver real-time data to a Vector Database behind a cutting edge GenAI chatbot.

Future possibilities using Kafka for IoT and GenAI workloads

Providing Users with Real-Time Insurance Quotes

Before we get too far ahead of ourselves, let’s re-focus on the problem this particular customer was trying to solve in the near term. They wanted to provide an insurance quote in real time to a prospective customer who came to the website or mobile app and submitted a request. This might sound relatively simple – can’t we just give them a simple price for whatever amount of insurance coverage they asked for? (hint: no!) 

Real-time insurance quotes refer to the process of obtaining insurance price estimates or premium costs instantly, often within seconds or minutes, based on the most up-to-date and accurate information provided by the applicant. These quotes are generated using algorithms and databases that take into account various factors to determine the potential risk associated with insuring an individual or property. The goal is to provide customers with immediate and accurate pricing information to help them make informed decisions about purchasing insurance coverage. Real-time insurance quotes typically involve inputting specific details about the insured entity, such as personal information (age, gender, location), details about the property or vehicle being insured (make, model, year), and any other relevant information that could impact the risk assessment and subsequent premium calculation. This data is processed by the insurance company's systems, which use historical data, actuarial tables, and predictive modeling to generate a quote that reflects the potential cost of coverage.

This Insurtech company already solved the initial challenge a couple steps back with CDC and Kafka – data coming in directly from an application and/or data that is being persisted in the monolith database can now stream out to Kafka topics in Confluent Cloud. They’ve successfully unlocked their data and made it available on a real-time stream for anyone who wants to consume it! But what next? Well, the underwriting process for delivering a quote back to a customer actually has a significant number of steps.

Non-exhaustively, some of the major parts of the process are as follows for an auto insurance example:

  1. Validate incoming data (correct format, missing fields, meets minimum age, etc.)

  2. Fraud detection (check timestamps, location, etc.)

  3. Add a new entry into the CUSTOMER database table (w/ f_name, l_name, email, etc.)

  4. External web service call to Credit Reporting Agency for credit check

  5. External web service call to state DMV/RMW for driving record

  6. Enrich incoming data with reference data on: a. Home address neighborhood data (crime rates, etc.) b. Vehicle information (vehicle price, make/model data, etc.)

  7. Use ML or an actuarial model to analyze risk and calculate a weighted score

  8. Based on coverage requested and risk score, calculate quote for insurance premium

In the past, the major issue was that completing all these steps would take way more time than the average attention span of our modern buyer. Not to mention, the Insurtech company solving this challenge isn’t getting just one quote, but showing quotes in their marketplace from a variety of providers. Between scheduled batch processing, error-prone file transfers, and manual data entry into risk models, it could be hours or more likely days before a potential customer gets any information back.

As you can see, there is still a lot of feature development that may have to go into a real-time insurance quote application beyond extracting the “request for quote” payload from the monolith DB. Also, that initial request message itself can have a lot of various fields, for example:

{
"personal_info": {
    "name": "John Doe",
    "age": 35,
    "gender": "Male",
    "marital_status": "Married",
    "occupation": "Software Engineer",
    "address": "123 Main St, Anytown, USA"
  },
  "vehicle_info": {
    "make": "Toyota",
    "model": "Camry",
    "year": 2018,
    "vin": "1NXBR32E48Z123456"
  },
  "driving_history": {
    "accidents": 1,
    "traffic_violations": 2,
    "claims": 0
  },
  "coverage_options": {
    "type": "Full Coverage",
    "limits": {
      "bodily_injury": {
        "per_person": 100000,
        "per_accident": 300000
      },
      "property_damage": 50000
    }
  },
  "deductible": 500,
  "mileage": 10000,
  "credit_score": 750,
  "additional_drivers": [
    {
      "name": "Jane Doe",
      "age": 33,
      "gender": "Female",
      "driving_history": {
        "accidents": 0,
        "traffic_violations": 1,
        "claims": 0
      }
    }
  ]
}

This is a decent chunk of data in each incoming message. And there may be more data needed later for lookups and enrichment. We’ve yet to solve an outstanding problem – how do we take a stream of real-time data and transform/enrich/aggregate/validate each message as it moves through our pipeline? And similarly, how can the data in each message be used to do the real-time risk scoring necessary before providing a quote? The solution to this challenge comes in the form of a stream processing tool such as Confluent’s ksqlDB. This tool enables you to write simple SQL-like statements that perform actions continuously over a stream of data. The architecture ends up looking something like this:

Using Confluent Cloud as an end-to-end streaming data pipeline for real-time insurance quotes

In the diagram above, requests come directly from a client-facing application. Then, ksqlDB is used to perform an initial validation of the data. ksqlDB is also used to enrich the incoming stream of requests with lookup data from an internal DB that’s loaded into a topic via a CDC Connector. In the second phase, a custom application makes external calls to get info on credit score and driving record. Once there’s a fully enriched stream of requests, ksqlDB (or your own custom Python/etc. code, if you prefer) generates a dynamic risk score for each request.

ksqlDB is extremely simple to use – real-time enrichment only takes a couple lines of SQL for a simple JOIN. See below for a generic example of a stream-table JOIN. Confluent has a set of amazing tutorials at the Developer webpage here to help you understand the possibilities.

CREATE STREAM quotes_enriched
     WITH (kafka_topic='quotes_enriched', value_format=json) AS 
     SELECT quotes.customer_id as id, valid_to_date, premium_amount 
     FROM quotes 
     LEFT JOIN customers ON quotes.customer_id = customer.id;

Confluent Cloud combined the power of Kafka with Connectors for change data capture and ksqlDB for stream processing, creating an end-to-end solution to enable this company’s developers to deliver the real-time insurance quotes feature.

Once this solution is in place, it’s very straightforward to fan out the data pipeline to other downstream services. The Insurtechs I’ve been working with generally need to send an async email alert to their prospective customer once a quote has been generated, while simultaneously sinking all data into a data warehouse for things like:

  • Historical trend analysis

  • Populating internal dashboards for agents

  • Improving experience personalization

  • Improving product positioning and pricing

  • and more…

In addition to ksqlDB, another highly respected and widely adopted stream processing engine for the enrichment, validation, and calculation steps is Apache Flink. As this new functionality makes its way into Confluent Cloud as a fully managed offering, the possibilities for real-time analytics and stateful stream processing increase considerably.

Adding Flink into the mix and sending data to a data warehouse and email service

That’s All for Now

The business impact of this use case is improved customer engagement and satisfaction, increasing revenue while achieving operational efficiencies.

Whether you’re building systems at an up-and-coming cyber insurance startup or a hundred-year-old life insurance titan, the power of data streaming is vital to technology modernization and innovation. When it comes to data, steady helps win you the race, but slow surely does not. At Confluent, we see some companies who are brand new to Kafka and want to get in on the action, and just as often we work with Kafka veterans who need a more complete data streaming platform and fewer hours spent self-managing the software on their own.

I’d highly recommend taking a look at Confluent’s full library of solutions and use cases. And before you read too much more, give it a try yourself with Confluent Cloud’s free trial and launch your very own cluster on AWS, Google Cloud, or Azure.

Feel free to connect with me on LinkedIn or email me at kmoynihan@confluent.io with any questions along the way. Happy streaming!

  • Kyle Moynihan is a Senior Solutions Engineer at Confluent where he helps Mid-Market companies in New York City design and implement data streaming and event-driven architectures. Prior to Confluent, he spent 6 years as a Cloud Architect at Oracle where he focused on customer adoption of Oracle Cloud Infrastructure (OCI). Kyle holds a BS in Economics from the University of Michigan - Ann Arbor. When he’s not at work, Kyle enjoys snowboarding, traveling, and watching Boston sports.

Get started with Confluent, for free

Did you like this blog post? Share it now