Databases - DJ Sipe's Notes

# Relational (SQL) Databases In a relational database you would model your data into discrete tables that represent your problem domain. Each table would represent a single entity such as a "user" or "orders". Relationships between entities is defined by foreign keys in each table. ![[example-relational-tables.excalidraw.svg]] Modeling your data in this way has its benefits. It's easier to decompose your problem domain into small normalized entities, for one. You can also enrich your data by creating relationships between the tables that you query only when you need information about that relationship. For example, the `orders` table can have a foreign key referencing the `users` table. This allows you to write queries that pull very targeted sets of data over the network and helps optimize storage on the database host machine by reducing redundant data. The flip side to all of this is performance. Relational databases do not scale well when your dataset starts to grow. Performing `JOIN`s is computationally expensive and time consuming. Also, as the number of concurrent connections to the database increase, so too do the number of deadlocks and other blocking as the database struggles to enforce atomic multi-table transactions. # Document (NoSQL) Databases A NoSQL database is a type of database that doesn’t follow the usual way of storing data in tables. It uses different methods like documents, key-value pairs, graphs, or column families to store and retrieve data. NoSQL databases are great for handling lots of unorganized or semi-organized data, and for applications that need to be able to grow and handle a lot of traffic. As a result they can scale virtually indefinitely and maintain very low latency and resource consumption. They achieve this by not supporting `JOIN`s and not supporting the same level of support for consistency that relational databases do (see [[#CAP Theorem]] below). ## CAP Theorem [CAP Theorem](https://en.wikipedia.org/wiki/CAP_theorem), also known as Brewer's Theorem, describes the trade-offs that must be made when designing distributed computer systems. "CAP" stands for Consistency, Availability, and Partition tolerance. According to the theorem, a distributed system can only achieve two of these three properties at any given time. Consistency means that all nodes in the system see the same data at the same time. Availability means that every request for data receives a response, without guaranteeing that it's the most up-to-date data. Partition tolerance is the system's ability to continue functioning when network partitions occur. ![[CAP Theorum.excalidraw.svg]]