Intro to Relational Databases¶
TopChef relies on a relational database for long-term storage. Relational databases implement a relational algebra for storing and querying data from the database. The most common abstraction for considering relational storage systems is that of cross-indexed tables. Each table has a set of columns associated with it. Each column in a table holds a variable of a given type. For instance, the table
| Students | |
|---|---|
| first_name | String |
| last_name | String |
| student_id | Integer |
Is a table with the name Student, and three columns, that represent
First Name, Last Name, and Student ID.
The databases that TopChef uses allow users to manipulate them through a
domain-specific programming language called Structured Query Language, or
SQL. There is an ANSI standard for SQL, but different database products will
have different features, resulting in many different SQL “dialects”. This is
one of the reasons why this project uses SQLAlchemy.
In order to create the table above, we would send the query
CREATE TABLE Students (
STRING first_name,
STRING last_name,
INTEGER student_id
);
and in order to fill the table up, we may use an INSERT query like
INSERT INTO Students ("First Name", "Last Name", "Student ID") VALUES
("Woll", "Smoth", "123456");
The table will now look something like
| first_name | last_name | student_id |
|---|---|---|
| Woll | Smoth | 123456 |
Each row of data in this table is called a record.
Unfortunately, we’ve made a mistake entering our data. The student written
as Woll Smoth is really named Will Smith. Fortunately, SQL allows us
to change database records using UPDATE queries. The UPDATE query
that will fix this problem is
UPDATE students SET first_name = "Will", last_name = "Smith"
WHERE student_id = 123456;
The table will now look like
| first_name | last_name | student_id |
|---|---|---|
| Will | Smith | 123456 |
We can then retrieve this data at any time using a SELECT query, like
SELECT * FROM Students WHERE student_id = 123456;
The use of the * is a wildcard character standing in for all columns.
Great! We’ve built out our first table, but it doesn’t really tell us much.
Let’s say we need to record what residence Will Smith lives in. We can do
this by introducing another column with the name of the residence. The query
to do this is an ALTER TABLE query, and it would look something like
ALTER TABLE Students ADD "Residence Name" STRING;
This approach will eventually get cumbersome, as the same information will get repeated over and over again. This will be a drag on storage space, but it will also be bad for the consistency of the database. If something about the database were to change, we would have to change it for EVERY student in that table. We would also have to know all the information for the residence when creating new students, which would be quite wasteful.
Lastly, this represents a design problem, as the data between students and residences is obviously related, but we haven’t communicated that relation in any way. To solve this, we create a second table in our database as follows
| Residences | |
|---|---|
| residence_id | Integer |
| name | String |
| address | String |
And we alter our “Students” table to be the following
| Students | |
|---|---|
| first_name | String |
| last_name | String |
| student_id | Integer |
| residence_id | Integer |
Since the student and residence records share a record, we say that these tables are related. This is a way of writing down a one-to-many relation between residences and students, where one residence can have many students.
Now, let’s say we want to select all the students that live in a particular
residence. This is where the relational algebra defines an operation called
a JOIN. There are several types of joins, but in our case, we will want
to take an INNER JOIN. ``JOIN``s must be defined on some logical
condition.
In order to select all the students that live in a residence “REV”, we could send a query like
SELECT * FROM Students INNER JOIN Residences ON
Students.residence_id = Residences.residence_id
WHERE Residences.name = "REV";
Note the use of the member access operator . in order to differentiate
between columns that have the same name in different tables.
It would be useful at this point to introduce the concept of primary keys. A
primary key for a database record is the minimum amount of data to uniquely
identify a record in our database. In this case, it would be useful to
define the student_id as the primary key in our Students table, and
the residence_id as the primary key of the Residences table. We
could also define a “composite key” as a primary key that spans multiple
columns. The entry in each column of the composite key doesn’t have to be
unique, but the combination would have to be unique. It may be tempting to
define first_name and last_name as the composite key of the
Students table, but we would run into some problems if a second Will
Smith decided to enroll in our school.
The uniqueness of primary keys allows us to define foreign keys as well. In
the case of the residence_id column in the Students table, we can
define this column as a foreign key using the query
ALTER TABLE Students
ADD FOREIGN KEY (residence_id) REFERENCES Residences.residence_id
In a CREATE TABLE query, this would be
CREATE TABLE Students (
STRING first_name,
STRING last_name,
INTEGER student_id,
INTEGER residence_id,
FOREIGN KEY (residence_id) REFERENCES Residences.residence_id
);
Composite keys come in handy when expressing many-to-many relations. In the case of many-to-many relations, we use a design pattern called an association table. Let’s say we had some classes in our school as well. In that case, the classes may look something like the table below. Primary keys are identified using bolded text.
| Classes | |
|---|---|
| class_id | Integer |
| name | String |
The association table will look like
| Classes To Students | |
|---|---|
| class_id | Integer |
| student_id | Integer |
We can use the same JOIN syntax discussed above, but we will now need to
make two JOIN opreations, one to JOIN our Students table to the
Classes To Students association table, and one to join the result of the
previous join to the Classes table. This process of removing repeated
data is called normalization, and it’s a major part of database design.
Extra References¶
W3Schools has an excellent SQL tutorial. The Flask Mega-Tutorial also database design in some detail, applying it to the design of a small blogging application.