Hands-on Workshop: ZooKeeper to KRaft Without the Hassle | Secure Your Spot

Beyond Boundaries: Leveraging Confluent for Secure Inter-Organizational Data Sharing

Verfasst von

Data is one of a company’s most valuable assets. Its value is often limited, however, by the challenge of sharing it across organizational boundaries in a secure, reliable, and scalable way. Traditional approaches to inter-organizational data sharing have contributed to this. Flat file sharing, API calls, and proprietary solutions all pose different challenges, from security concerns to scalability and development burden.

For organizations with advanced data maturity, data streaming offers a more effective way to share value across their ecosystems. Empowered with Confluent’s data streaming platform, companies across industries can confidently share data in real time with third parties.

In this blog, we’ll explain how.

We'll explore the strategic value of inter-organizational data sharing, address the critical technical concerns faced by both data providers and recipients, and demonstrate how Confluent enables secure, scalable solutions that transform cross-boundary data exchange.

Want to learn how to build data products to enable data sharing and other use cases? Read Shift Left: Unifying Operations and Analytics With Data Products.

Contextualizing the Value of Inter-Organizational Data Sharing

Inter-organizational data sharing creates significant business value through increased operational efficiency, new revenue opportunities, and improved customer experiences. By exchanging data across organizational boundaries, companies gain access to data-driven insights that would be difficult or costly to develop independently.

For example, imagine that you work for a financial services firm managing funds for your customers. Initially, you might implement a simple enrichment process to ensure that user IDs in trade streams have portfolio types attached, enabling proper downstream reporting and services. By enriching this data at the source—a practice known as the “shift-left pattern”—you avoid the added cost and inefficiency of duplicating this process across multiple services.

As you map the customer journey through your system, you discover that other departments, such as Risk Management, require this same enriched information. You begin developing additional streaming data products that multiple internal applications can use.

A data product strategy allows you to build data products for use and reuse across multiple systems, applications, and even lines of business

Eventually, you recognize that certain valuable data originates outside of your organization. This external data could significantly enhance the customer experience or reduce organizational risk, but collecting it yourself would be expensive or result in delays. Data like tagged blockchain activity from cryptocurrency transactions or real-time stock trading information from industry peers can inform your risk allocation decisions, but gaining secure access poses a significant roadblock.

Simultaneously, you identify external stakeholders that require your data. Regulatory reporting bodies may need the most current information possible so that you can avoid compliance issues or penalties. Or perhaps your company acquires a business with complementary products, and integrating its customer data would provide a holistic view of your combined customer base.

Similar scenarios exist across numerous industries, including logistics, healthcare, gaming, telecommunications, and more. Three key trends emerge that drive the expansion of data sharing:

  • Creating new revenue streams from existing data assets

  • Reducing resources spent on activities outside of your core business expertise

  • Minimizing delays in necessary processes such as regulatory reporting

These business imperatives all point toward one strategic priority: facilitating broader, more seamless data sharing beyond traditional organizational boundaries.

By implementing robust data sharing capabilities, organizations can transform data from a siloed asset into a dynamic resource that generates value throughout their business ecosystems. Companies that excel at data sharing often see benefits in customer satisfaction, operational efficiency, and competitive advantage.

The Technical Challenges of Inter-Organizational Data Sharing

As you start to consider inter-organizational data sharing, there are two opposing forces to be reckoned with. From the data producer’s perspective, security is the biggest concern. Data recipients, on the other hand, will be most concerned with the reliability of a data sharing service. Let's explore both perspectives and the technical challenges they face.

Security Challenges for the Data Producer to Solve

The data may contain your own customers’ private data or your organization’s intellectual property, so it’s crucial that sharing data outside of your business doesn’t also introduce new security threats as you open up your network.

Historically, the demand for this process has been met by sending one big load of data via methods like SFTP. This method of allowing one single transaction of data might have made security easier, but as data loads and the demands for quicker insight increased, it became an untenable solution. But you can’t simply give the consuming party access to your database. This creates multiple risks, including the obvious new security vector as well as potential challenges with load balancing.

If you opt for a queue, you end up creating multiple point-to-point connections with each of your consumers, which creates challenges whenever a flow needs to be evolved. As is often the case with point-to-point solutions, this becomes ungovernable, from both the monitoring and modeling perspectives.

Reliability Requirements for the Data Recipient

Consumers will have downstream applications and analytical systems that are dependent on receiving data that’s correctly formatted and available. Otherwise, they risk upsetting their own customers and requirements.

Many of us have experienced a similar scenario as retail consumers: If we buy a physical product, we expect it to match the specifications as defined by the manufacturer and advertised by the retailer. Data consumers have the same expectations. For a data product to work as designed, data producers must ensure its data quality so that recipients can trust the external asset.

If you wanted to collect static data from another business, you might use a REST API. This method works well for data at rest but becomes challenging as you move to data in motion.

Firstly, its synchronous nature can cause challenges with scalability. It might require various serialization steps on either side of the API gateway, and a synchronous session also raises the question of what happens when that session is interrupted. You may need to review—often manually—what messages were missed during the interruption. And you have to hope that the queue on the producer’s side has not been set to fire-and-forget once the message has reached the edge of its domain. Otherwise, a whole process may need to be reset between both parties.

Secondly, without insight into what the other organization is producing until you make the call, you’re beholden to it maintaining a set structure of the data. If it evolves what the data looks like by changing a column or removing a field, you may suddenly find your services sitting on the other side of the flow fail.

Inter-Organizational Data Sharing With Confluent

Confluent offers multiple ways of sharing data across organizational boundaries, depending on factors such as networking requirements, type or number of consuming applications, and replication needs. Regardless of method, Confluent enables organizations to maintain control over any data they share and its associated infrastructure while ensuring that recipients have reliable access to well-governed streams of real-time data.

Now we’ll take a closer look at three methods: Stream Sharing, Cluster Linking, and Apache Kafka® or REST API client applications. Here you’ll find a high-level comparison of these methods (including their recommended use in data sharing scenarios), followed by a more in-depth explanation of each pattern. It’s also worth mentioning that more patterns are possible, and these can be identified once the requirements of a specific use case are identified. (Contact us to help you.)

A high-level comparison of three methods for inter-organizational data sharing with Confluent

Stream Sharing

Confluent Stream Sharing, a component of Stream Governance, is the simplest way to securely share data from a single Kafka topic with external parties. Setting up sharing is straightforward: Producers simply activate Stream Sharing in their organizations and enter the recipients’ email addresses through Confluent Cloud's interface.

For data providers, robust security measures ensure complete control over shared data through authenticated sharing, layered encryption controls, and fine-grained access management to guarantee that data is accessed only by intended users. Providers maintain the ability to instantly revoke access to shared topics when needed.

Recipients experience frictionless data consumption with no transformation or integration work required. Data is shared and consumed directly from the Kafka topic through any Kafka client with just a few clicks in the Confluent Cloud console or via CLI/API. The platform ensures high-quality data exchange by enforcing schemas that guarantee compatibility and consistency of shared data. This secure, streamlined approach allows recipients to reliably access and leverage mission-critical data streams with minimal setup overhead.

How organizations can use Stream Sharing to provide external data recipients with read-only access to a Kafka topic in the Confluent data streaming platform

One organization that’s using Stream Sharing is Allium, a New York-based startup that facilitates access to high-quality, real-time blockchain data for analytical applications. Allium uses Stream Sharing to share previews of its blockchain data streams, allowing its customers and prospects to evaluate it for specific use cases.

Cluster Linking

Another method of inter-organizational data sharing with Confluent is based on Cluster Linking, a service for replicating data from one Kafka cluster to another in a reliable, scalable way. While Cluster Linking is commonly used for disaster recovery, geo-replication, and workload migration, you can also use it to share data between organizations in real time.

Currently, the most common data sharing pattern with Cluster Linking involves a gateway cluster—a public cluster that acts as a buffer between data providers and recipients, providing a degree of isolation for the former’s data infrastructure and complete control over the shared data. Selected topics can be replicated from the recipient’s internal, private cluster to the gateway cluster, which is then replicated itself to the recipient’s environment. As well as replicating data with Cluster Linking, you can also replicate schemas with Schema Linking to make sure the metadata is provided to the consumer for an even more reliable data stream.

How organizations can use Cluster Linking to replicate data from selected Kafka topics between data providers and recipients

This pattern is a robust, scalable way of sharing data securely between organizations and is being used by customers in a variety of scenarios, especially in the financial services and telecommunications sectors. One involves a major mobile network operator (MNO) sharing customer usage data with a mobile virtual network operator partner (i.e., a company leasing use of the physical network infrastructure from the MNO) for billing purposes.

Kafka or REST API Client Applications

A third common pattern for data sharing involves direct communication between the recipient and either a Kafka or REST API client. This provides for flexible application-level integration and is suitable for scenarios in which there are multiple recipient organizations with different access requirements.

For data providers, this pattern offers robust security through comprehensive authentication options, including Simple Authentication and Security Layer (SASL) and Mutual TLS (mTLS). This extension of traditional TLS requires mutual authentication between client and server, providing enhanced protection for data in transit.

While these authentication mechanisms secure the connection itself, complete data sharing security requires additional layers of protection. Stream processing capabilities with Apache Flink® enable fine-grained control over what data is shared by filtering sensitive information before making it available for external consumption. This content-level security, combined with encryption of data in transit (for example, with client-side field level encryption), provides robust protection for inter-organizational data sharing.

For additional security, many organizations implement the gateway cluster pattern for stronger isolation between external data product consumers and their internal infrastructures. This further enhances security, provides more control, and ensures that external consumers don’t become "noisy neighbors" to their critical systems.

How organizations can directly exchange data using a Kafka client or REST API client

For recipients, this approach exchanges the ease of setup of the previous patterns for the greater flexibility of access that comes with interacting at a client level while still leveraging the security and functionality that’s available by default in Confluent Cloud. Kafka client applications can produce and consume events directly from the provider's cluster, while REST API clients can produce and consume via a REST proxy or produce only via a REST API.

The reliability of shared data is aided not only by Kafka’s ability to replay (i.e., ordering is maintained) in the event of pipeline failure but also by Confluent Cloud features such as Stream Lineage, which enables providers to visualize and monitor the status of real-time streams.

Confluent: Your Bridge to Inter-Organizational Data Exchange

Inter-organizational data sharing represents the next frontier in the real-time data ecosystem. As organizations increasingly build robust internal streaming platforms, we're witnessing a natural progression: Partners and stakeholders now expect the same real-time access to information that internal teams enjoy. This creates a powerful flywheel effect across business networks, driving adoption and innovation.

When addressing these emerging demands, success begins with understanding the unique challenges for both data producers and consumers. At a high level, data providers are most concerned with security and governance, and recipients are most concerned with the reliability of real-time data streams.

Confluent enables organizations to share real-time data in a scalable, secure way, ensuring that these concerns are addressed. Our platform offers multiple patterns for inter-organizational data sharing, with each suited to different requirements. Solutions range from single-topic, read-only sharing with Stream Sharing to more sophisticated architectures based on Cluster Linking for inter-enterprise replication.

When considered alongside additional features such as stream processing with Flink, robust authorization and authentication, and Stream Governance, Confluent provides a solid foundation for inter-organizational data sharing. Companies implementing these solutions with Confluent can maximize the value of their data assets, build deeper commercial partnerships, and better meet regulatory reporting requirements where necessary (e.g., post-trade reporting).

Explore these resources to learn more about use cases that you can unlock with Confluent:

‎ 

Apache®, Apache Kafka®, Kafka®, Apache Flink®, and Flink® are registered trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by the Apache Software Foundation is implied by the use of these marks. All other trademarks are the property of their respective owners.

  • Alex Stuart is a DSP Specialist at Confluent, ensuring customers have guidance for solving complex, real-time data challenges by "Shifting Left" and optimizing their Data Analytics architectures. Just like the data he works with, Alex is always “in motion” as a dedicated runner, running community leader, and a well travelled globetrotter.

  • Will Stolton is a Product Marketing Manager at Confluent, focusing on core data streaming. He previously worked in product marketing at Snowplow, having transitioned from financial auditing with BDO.

Ist dieser Blog-Beitrag interessant? Jetzt teilen