[Webinar] Deliver enterprise-grade Apache Kafka® to your customers | Join Now
A database is an organized collection of information that is stored electronically to be maintained, accessed, and analyzed efficiently. It can store various types of data, including text, numbers, images, videos, and files.
A Database Management System (DBMS) is the software used to manage and interact with the database, enabling users to store, retrieve, and edit data. The combination of the DBMS and the data it manages is often referred to as a “database system,” or simply a “database.”
A Database Management System (DBMS) is software that allows developers to create, modify, retrieve, and maintain information in a database. Database administrators (DBAs) use DBMSes to control users’ access to the database and perform security auditing functions. DBMSes can provide a wide range of capabilities, including:
Storing, retrieving, and modifying data (with efficient data handling and management)
Managing access through Access Control Lists (ACL) and Role-Based Access Control (RBAC)
Safeguarding against data loss by simplifying data backups and snapshots processes along with providing recovery tools to fully or partially restore databases
Continuously monitoring the database to automatically tune performance or alert developers and administrators to take recommended actions
The database as we know it today dates back to the 1960s when the use of computers became popular. Below are some of the main milestones in the history of databases.
In the 1970s, IBM computer scientist Edgar Codd published his paper “A Relational Model of Data for Large Shared Data Banks.” This paper coined the term “relational database” and established a new way to store and access data.
Following Codd’s paper, Michael Stonebraker and Eugene Wong at the University of California in Berkeley created INGRES (Interactive Graphics and Retrieval System). INGRES was a relational database model that used QUEL query language. IBM released their version of a relational database called System R that used Structured Query Language (SQL) in 1974.
Relational databases grew in popularity during the 1980s, and SQL became the standard language for querying and managing the data. Database Management Systems (DBMSes) became essential tools for handling data storage, retrieval, and security for multiple users.
The rise of the internet in the 1990s fueled the next round of growth in the database industry. The Relational Database Management System (RDBMS) model, designed to manage the data of a single organization, wasn’t prepared to handle the volume of data that web applications were generating. Furthermore, with the decline in performance and increase in maintenance costs, developers looked for a new solution, and found MySQL, an open-source relational database.
This period also saw the need to organize data more efficiently, leading to advancements in database architecture and the management of structured and unstructured data.
NoSQL (“not only structured query language”) was initially coined in 1998 and referred to databases that used query languages other than SQL. However, as the internet continued to grow, there was a need for a new kind of database that could store unstructured and semi-structured data. This led to the emergence of NoSQL databases, which became popular due to their speed and flexibility in handling large amounts of unstructured data.
NoSQL databases support different data models, including document, key-value, graph, and column-family. They also provide solutions for modern applications that require scalability and fast access to data.
In recent years organizations have increasingly been adopting cloud-native and purpose-built databases. They are moving away from on-premises and legacy databases to cloud-native databases to improve agility, scalability, and decrease total cost of ownership.
Modern databases now support hybrid cloud computing platforms and integrated data stores with both structured and unstructured data. These advancements help manage distributed data across multiple users and systems. They also ensure data security and compliance.
Learn more how Confluent can help you with your database modernization journey by visiting our solution page.
SQL is a programming language that is used to communicate with relational databases. The American National Standards Institute (ANSI) has considered SQL the standard language for relational database management systems. SQL statements are used to add, remove, modify, and query data, and they can also be used to grant permissions to users or roles. Popular RDBMSes that use SQL are Oracle, Microsoft SQL Server, IBM, MySQL, PostgreSQL, Microsoft Access, Ingres, and more.
There are various types of databases that are designed and built for different purposes. When choosing a database, it’s important to consider how the data will be used, so that you choose the best database for your use case.
A relational database is an organized collection of structured data that have a predefined relationship among them. In this database, rows (tuples) and columns (attributes) are used to store data, which together constitute a table. Each row in the database has a primary key, which is a unique identifier that distinguishes it from other rows in that table. The primary key of one row could be stored as foreign key in another row in a different table to indicate the relationship between two tables. Relational databases are an ideal solution for when data is structured and has a predefined schema.
SQL: SQL is used to store, manipulate, and manage data in a relational database.
Transactions: A database transaction consists of one or more SQL statements and is considered a single unit of work that either is completed as a whole or not at all. In the relational database world, the result of a transaction is either COMMITTED to the database or it’s a ROLLBACK.
ACID Compliance: Relational databases are primarily optimized for transaction operations. And in order to ensure data integrity, all transactions must be ACID compliant. ACID refers to Atomicity, Consistency, Isolation, and Durability.
Atomicity
Means that the data operation will finish successfully or unsuccessfully. "All or nothing" is the directive principle here.
Consistency
Ensures that a transaction (whether it completes successfully or aborted) doesn’t invalidate the database’s state. Meaning the data that is written to the database can only bring the database from one valid state to another.
Isolation
Multiple transactions can be executed simultaneously and in parallel on a single table. Isolation is how relational databases can maintain data consistency while concurrent transactions are executed. Isolation ensures that concurrent execution of transactions leaves the database in the same state as if all those transactions were executed sequentially.
Durability
Durability guarantees that once the transaction is completed and data is committed to the database, the changes are permanent and written to a non-volatile memory. This means in case of a power outage or any system failure the state of the system is not lost.
To simplify, NoSQL refers to any database that doesn't use SQL as its primary data access language. NoSQL databases rose in popularity among developers in the late 2000s when the Internet was on the rise and storage cost was decreasing significantly. Developers didn’t need to define complex data models. Rather, they had the flexibility to store any structured, semi-structured, or unstructured data. Each kind of NoSQL database has its own set of unique capabilities but overall they can be summarized as having:
Flexible schemas: Unlike their relational counterparts, NoSQL databases don’t require a schema in order to store data. Developers have the flexibility to store huge amounts of data and adapt quickly as application requirements change.
Ease of scalability: NoSQL databases are mostly distributed where several machines (nodes) work together as a single cluster. You can scale up by increasing resources or scale out by adding more nodes to the cluster. They can also replicate data to increase redundancy and improve availability. It’s easier and more cost efficient to scale NoSQL databases than relational ones.
BASE: NoSQL databases often follow a more relaxed consistency model called “BASE” whereas relational databases follow a more restricted one called “ACID.” BASE is the acronym for:
Overall, the BASE model favors availability (since scalability is important) over consistency.
There are several types of NoSQL databases but the popular ones are Document-based (MongoDB), Key-value (Redis), Columnar (Apache Cassandra), and Graph (Neo4J).