How to Insert Rows in a Table With a Foreign Key
In this tutorial you can learn how to insert data in tables that has a foreign key constraint, and as such relies on a key provided from another table.
By. Jacob
Edited: 2020-08-13 17:01
Part of normalizing databases is to limit data repetition in tables, since the tables would otherwise grow very large; a good example of this would be a HTTP request log, a log stored in database tables rather than static .txt files.
If for example, you decide to store the User Agent string of your visitors, then this alone is actually enough to cause your database table to grow into the gigabyte'(s) level fairly fast depending on traffic. It can also happen if you rarely clean up data; in reality you would want to delete data that is older than a pre-determined age so that you do not waste space.
In order to drastically limit the amount of space used up by such a logging feature, a decent approach would be to store user agent strings in a separate table, and then add a Foreign Key to the main log-table. The downside of such normalization is that certain operations might become slightly slower, while other operations might become faster — but it of course also depends on your use of indexes.
It is generally faster to perform a select on an int column than it is to perform one on a string column. When using an int column, it might also be possible to use it as part of an index, which is going to speed up some types of queries.
But, if the user_agent column refers to an integer in another table, how do you know which key to use in your insert queries? The answer is surprisingly simple — you first perform a select query.
Note that I am using PHP and MySQL for this tutorial.
Inserting data into a table that has a foreign key
There is more than one way to insert data into tables with foreign key constraints, but this is one of the simplest. The below is a step-by-step guide that is very beginner friendly.
Note. the following assumes we got two tables, their layout is not extremely important, as we will just be focusing on how to store the user agent.
The table names are request_log and user_agents.
- The request_log table is created with a foreign key constraint on the ua_id column.
- The contents of individual ua_id fields must correspond with the unique id column in the user_agents table.
- Selects are performed via the ua_hash column with a unique index; the column contains a md5 hash for speedy look ups.
1. In my case, the first select query can look like this:
$result = $db->prepared_query->("select id from user_agents where ua_hash = ?",
[md5($user_agent)]
);
2.Now I will have to check if the query failed and whether anything was actually returned:
if (false === $result) {
// Do something if the query fails
// This should not happen, so we should probably just log the event..
}
if ($result->num_rows === 0) {
// If the query returned nothing, we are dealing with an unknown user agent,
// and we should first insert a new row in the "user_agents" table before we continue
$result = $db->prepared_query->("insert into user_agent (ua_hash, ua_string) values(?, ?)",
[md5($user_agent), $user_agent]
);
if(false === $result) {
// If the query failed, be sure to log this event...
}
}
Note. The ua_hash is just for faster look ups, and you want to add a index on this column.
3. Now we can finally insert a new row in the request_log table:
// Make an associative array
$ua_row = $result->fetch_assoc();
// Now we can finally send an insert to the "request_log" table
$result = $db->prepared_query->("insert into request_log (ip_address, ua_id) values(?, ?)",
[$ip_address, $ua_row['id']]
);
Another way to insert a row would be by using a subquery, but keep in mind that such a query might fail if a user agent does not already exist in the user_agents table.
Other considerations
This is of course only pseudo code of an actual implementation. You would also want to have a column for the timestamp in the request_log table, and you would probably also want to store the IP address more efficiently.
Normalization does not come entirely without down sides. For example, queries might become more complex, and you can not easily read your tables simply by performing a select * from table_name anymore. Many of us have probably been told to always normalize our database tables, but in reality this is often premature, and it adds an unnecessary level of complexity that is unwanted while just learning about databases — and indeed, also sometimes while working on real-world applications.
I am not saying you should not do it, because if you are already fairly sharp minded with this stuff, it probably would not take you long to gain a decent level of understanding. However, data-repetition does not really matter until it matters.
Tell us what you think: