Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now
Distributed systems are a collection of independent components and machines located on different systems, communicating in order to operate as a single unit.
In this complete introduction, learn how distributed systems work, some real world examples, basic architectures, the benefits and disadvantages, and common solutions for real-time distributed streaming.
Founded by the original creators of Apache Kafka, Confluent is a complete data streaming platform for real-time data integration, processing, and analytics that connects 120+ data sources.
Also known as distributed computing and distributed databases, a distributed system is a collection of independent components located on different machines that share messages with each other in order to achieve common goals.
As such, the distributed system will appear as if it is one interface or computer to the end-user. The hope is that together, the system can maximize resources and information while preventing failures, as if one system fails, it won't affect the availability of the service.
Today, data is more distributed than ever, and modern applications no longer run in isolation. The vast majority of products and applications rely on distributed systems.
The most important functions of distributed computing are:
Modern distributed systems have evolved to include autonomous processes that might run on the same physical machine, but interact by exchanging messages with each other.
Networks
The earliest example of a distributed system happened in the 1970s when ethernet was invented and LAN (local area networks) were created. For the first time computers would be able to send messages to other systems with a local IP address. Peer-to-peer networks evolved and e-mail and then the Internet as we know it continue to be the biggest, ever growing example of distributed systems. As the internet changed from IPv4 to IPv6, distributed systems have evolved from “LAN” based to “Internet” based.
Telecommunication networks
Telephone and cellular networks are also examples of distributed networks. Telephone networks have been around for over a century and it started as an early example of a peer to peer network. Cellular networks are distributed networks with base stations physically distributed in areas called cells. As telephone networks have evolved to VOIP (voice over IP), it continues to grow in complexity as a distributed network.
Distributed Real-time Systems
Many industries use real-time systems that are distributed locally and globally. Airlines use flight control systems, Uber and Lyft use dispatch systems, manufacturing plants use automation control systems, logistics and e-commerce companies use real-time tracking systems.
Parallel Processing
There used to be a distinction between parallel computing and distributed systems. Parallel computing was focused on how to run software on multiple threads or processors that accessed the same data and memory. Distributed systems meant separate machines with their own processors and memory. With the rise of modern operating systems, processors and cloud services these days, distributed computing also encompasses parallel processing.
Distributed artificial intelligence
Distributed Artificial Intelligence is a way to use large scale computing power and parallel processing to learn and process very large data sets using multi-agents.
Distributed Database Systems
A distributed database is a database that is located over multiple servers and/or physical locations. The data can either be replicated or duplicated across systems.
Most popular applications use a distributed database and need to be aware of the homogenous or heterogenous nature of the distributed database system.
A homogenous distributed database means that each system has the same database management system and data model. They are easier to manage and scale performance by adding new nodes and locations.
Heterogenous distributed databases allow for multiple data models, different database management systems. Gateways are used to translate the data between nodes and usually happen as a result of merging applications and systems.
Distributed systems must have a network that connects all components (machines, hardware, or software) together so they can transfer messages to communicate with each other.
To understand this, let’s look at types of distributed architectures, pros, and cons.
Distributed applications and processes typically use one of four architecture types below:
Client-server:
In the early days, distributed systems architecture consisted of a server as a shared resource like a printer, database, or a web server. It had multiple clients (for example, users behind computers) that decide when to use the shared resource, how to use and display it, change data, and send it back to the server. Code repositories like git is a good example where the intelligence is placed on the developers committing the changes to the code.
Today, distributed systems architecture has evolved with web applications into:
The ultimate goal of a distributed system is to enable the scalability, performance and high availability of applications.
Major benefits include:
Every engineering decision has trade offs. Complexity is the biggest disadvantage of distributed systems. There are more machines, more messages, more data being passed between more parties which leads to issues with:
Confluent is the only data streaming platform for any cloud, on-prem, or hybrid cloud environment. Connect 120+ data sources with enterprise grade scalability, security, and integrations for real-time visibility across all your distributed systems.
Get started in minutes with a free 30-day trial.