Kora Engine, Data Quality Rules und mehr in unserem Q2 2023 Launch | Für die Demo registrieren
Also known as distributed computing, distributed systems are a collection of independent components located on different systems, communicating in order to operate as a single unit.
In this complete introduction, learn how distributed systems work, some real world examples, basic architectures, the benefits and disadvantages, and common solutions for distributed messaging/streaming.
Founded by the original creators of Apache Kafka, Confluent is a complete data streaming platform for real-time data integration, stream processing, and analytics that connects 120+ data sources.
Ein verteiltes System, auch bekannt als verteiltes Computing und verteilte Datenbanken, ist eine Reihe unabhängiger Komponenten, die sich auf verschiedenen Rechnern befinden und Messages miteinander austauschen, um gemeinsame Aufgaben zu erfüllen.
Auf diese Weise erscheint das verteilte System dem Endbenutzer wie eine einzige Schnittstelle oder ein einziger Computer. Das Ziel ist, dass das Gesamtsystem Ressourcen und Informationen maximieren und gleichzeitig Ausfälle verhindern kann. Wenn ein System ausfällt, hat dies keine Auswirkungen auf die allgemeine Verfügbarkeit des Services.
Heutzutage sind die Daten stärker verteilt als je zuvor, und moderne Anwendungen laufen nicht länger in Isolation. Die große Mehrheit der Produkte und Anwendungen basiert auf verteilten Systemen.
The most important functions of distributed computing are:
Modern distributed systems have evolved to include autonomous processes that might run on the same physical machine, but interact by exchanging messages with each other.
Networks
The earliest example of a distributed system happened in the 1970s when ethernet was invented and LAN (local area networks) were created. For the first time computers would be able to send messages to other systems with a local IP address. Peer-to-peer networks evolved and e-mail and then the Internet as we know it continue to be the biggest, ever growing example of distributed systems. As the internet changed from IPv4 to IPv6, distributed systems have evolved from “LAN” based to “Internet” based.
Telecommunication networks
Telephone and cellular networks are also examples of distributed networks. Telephone networks have been around for over a century and it started as an early example of a peer to peer network. Cellular networks are distributed networks with base stations physically distributed in areas called cells. As telephone networks have evolved to VOIP (voice over IP), it continues to grow in complexity as a distributed network.
Distributed Real-time Systems
Many industries use real-time systems that are distributed locally and globally. Airlines use flight control systems, Uber and Lyft use dispatch systems, manufacturing plants use automation control systems, logistics and e-commerce companies use real-time tracking systems.
Parallel Processing
There used to be a distinction between parallel computing and distributed systems. Parallel computing was focused on how to run software on multiple threads or processors that accessed the same data and memory. Distributed systems meant separate machines with their own processors and memory. With the rise of modern operating systems, processors and cloud services these days, distributed computing also encompasses parallel processing.
Distributed artificial intelligence
Distributed Artificial Intelligence is a way to use large scale computing power and parallel processing to learn and process very large data sets using multi-agents.
Distributed Database Systems
A distributed database is a database that is located over multiple servers and/or physical locations. The data can either be replicated or duplicated across systems.
Most popular applications use a distributed database and need to be aware of the homogenous or heterogenous nature of the distributed database system.
A homogenous distributed database means that each system has the same database management system and data model. They are easier to manage and scale performance by adding new nodes and locations.
Heterogenous distributed databases allow for multiple data models, different database management systems. Gateways are used to translate the data between nodes and usually happen as a result of merging applications and systems.
Distributed systems must have a network that connects all components (machines, hardware, or software) together so they can transfer messages to communicate with each other.
To understand this, let’s look at types of distributed architectures, pros, and cons.
Distributed applications and processes typically use one of four architecture types below:
Client-server:
In the early days, distributed systems architecture consisted of a server as a shared resource like a printer, database, or a web server. It had multiple clients (for example, users behind computers) that decide when to use the shared resource, how to use and display it, change data, and send it back to the server. Code repositories like git is a good example where the intelligence is placed on the developers committing the changes to the code.
Today, distributed systems architecture has evolved with web applications into:
Das ultimative Ziel eines verteilten Systems ist es, Skalierbarkeit, Leistung und hohe Verfügbarkeit von Anwendungen zu ermöglichen.Zu den größten Vorteilen gehören:
– Geringe Latenz (Rechner, die geografisch näher an den Nutzern stehen, ermöglichen eine schnellere Versorgung der Nutzer).– Fehlertoleranz (Wenn ein Server oder Data Center ausfällt, können andere die Nutzer weiterhin versorgen).– Vorteile von verteilten Systemen:
Every engineering decision has trade offs. Complexity is the biggest disadvantage of distributed systems. There are more machines, more messages, more data being passed between more parties which leads to issues with:
being able to synchronize the order of changes to data and states of the application in a distributed system is challenging, especially when there nodes are starting, stopping or failing.
messages may not be delivered to the right nodes or in the incorrect order which lead to a breakdown in communication and functionality.
more intelligence, monitoring, logging, load balancing functions need to be added for visibility into the operation and failures of the distributed systems
Confluent is the only data streaming platform for any cloud, on-prem, or hybrid cloud environment. Connect 120+ data sources with enterprise grade scalability, security, and integrations for real-time visibility across all your distributed systems.
*Free 30-day trial with no credit card required!