In my previous post, I explained what NULL does and does not mean, how and why the rule of NULL propagation forces any expression involving NULL to result in NULL. I also mentioned that comparisons involving NULL do not result in NULL but, rather, in a new “truth value” called “Unknown”. But there’s a lot more to be written about Unknown and it’s effect on logical expressions, and that’s exactly what this blog post (and the next one as well) will be about.
Most software developers will be intimately familiar with what is popularly known as Boolean logic but should actually be called a two-valued Boolean algebra – a Boolean algebra that is defined on the two truth values True and False, and the three operators AND, OR, and NOT. However, with the introduction of Unknown as a third truth value, some rules of Boolean algebra are violated, so the original two-valued Boolean algebra has to be replaced by a new, non-Boolean three-valued algebra for logical operations. I’ll assume that most readers already know the basics of this three-valued logic, but I’ll briefly cover them anyway (in this post) before moving on to the more interesting side effects (mostly in a future post).
Truth tables for AND, OR, and NOT
Since we have already established that a proposition such as WHERE Age > 35 can result in Unknown, the logical next step is to define how this affects the result of propositions such as WHERE Age > 35 AND Country = ‘
I believe that simple, easy-to-follow examples are the best way to illustrate a complex subject, so that is exactly how I will explain the logic behind the truth tables for AND, OR, and NOT – by using a simple example. I’ll just stick with my previous example that revolves around my undisclosed age, since that saves me the hassle of not disclosing other personal information.
Let’s look at the AND operator first. In two-valued logic, True AND True evaluates to True, whereas True AND False, False AND True, and False AND False all evaluate to False. The additional truth value of Unknown means that we have to fill in five new cells, for True AND Unknown, False AND Unknown, Unknown AND True, Unknown AND False, and Unknown AND Unknown.
WHERE Age > 35 AND Country = ‘
Many readers will probably already know that I’m Dutch (and those that didn’t, do now), so my country of residence is definitely not the
Congratulations – you have just filled in two of the missing cells in the truth table for AND: both Unknown AND False and False AND Unknown evaluate to False.
WHERE Age > 35 AND Country = ‘
Assuming you know that “
For the truth table, we can now fill in that Unknown AND True (and True AND Unknown) evaluate to Unknown.
WHERE Age > 35 AND Age < 45
This leaves us with just one missing cell in the truth table for AND, for Unknown AND Unknown. The proposition above illustrates this one. Since you don’t know my age, you don’t know if I’m over 35, nor if I’m under 45 – so you have obviously no way of telling whether I’m both over 35 and under 45.
The only logical outcome of Unknown AND Unknown is Unknown.
The truth table of OR also starts with the four cells know from two-valued logic: True OR True, True OR False, and False or True evaluate to True, and False OR False evaluates to False. The same five new cells that were added to the table for AND are added here as well, and we’ll use similar examples to populate them.
WHERE Age > 35 OR Country = ‘
Since I still haven’t moved to the
Unknown OR False and False OR Unknown both evaluate to Unknown.
WHERE Age > 35 OR Country = ‘
Here’s an example where you don’t have to fear the wrath of the shooter. No matter what my age is, you can safely answer that, since I’m living in the
Both Unknown OR True and True OR Unknown evaluate to True.
WHERE Age > 35 OR Age < 30
(Those who have ever seen me will have to stretch their imagination a bit. Just pretend that you really can’t tell for sure that I’m definitely not under 30, to make this a proper example of Unknown OR Unknown, ’kay?)
Not much to be said about this example – Unknown OR Unknown is still Unknown.
Now that the truth tables for AND and OR are complete, we only have the table for NOT left. This one grows from two cells (NOT True = False, and NOT False = True) to three, the third cell being for NOT Unknown. So, here’s the example:
WHERE NOT Age > 35
You don’t know my age. That’s why you don’t know whether I’m over 35. So, how can you possibly know whether I am not over 35? Duh!
NOT Unknown is, of course, Unknown.
So, to wrap things up, here are the completed truth tables for the operators AND, OR, and NOT in three-valued logic
The truth table for AND:
| True | Unknown | False |
True | True | Unknown | False |
Unknown | Unknown | Unknown | False |
False | False | False | false |
The truth table for OR:
| True | Unknown | False |
True | True | True | True |
Unknown | True | Unknown | Unknown |
False | True | Unknown | false |
The truth table for NOT:
True | False |
Unknown | Unknown |
False | True |
Some oddness
If you ever forget the logic tables, you can use examples similar to these to easily reconstruct them on the spot. However, do take care that you use independent propositions. If you don’t, you can easily get at incorrect conclusions, as illustrated by these propositions:
WHERE Age > 35 AND Age < 30
WHERE Age > 35 OR Age < 45
This is a fine example of the difference between humans and computers. Humans think; computers only do as told. You do not need to know my age in order to tell everyone who asks you that Hugo Kornelis is definitely not both over 35 and under 30, or that he definitely is either over 35 or under
CHECK, the odd one out
Once a proposition has been evaluated to a single truth value, the computer has to determine what to do with the result. In most all places where the SQL language rules allow a logical expression, the defined behavior is to treat True one way, and both False and Unknown in another way. For instance, a row is included in the result set of a query if the WHERE clause evaluates to True, but excluded if it evaluates to either False or Unknown. Same goes for the decision to include or exclude a group of rows based on the HAVING clause – it has to evaluate to True; both Unknown and False means that the group is out. The query processor prefers to be safe rather than sorry – if it can’t tell for sure that you want the row included, it won’t be.
For logical expressions in IF and WHILE statements and in CASE expressions, the same logic holds. A condition statement introduced with IF will be executed if the condition is True whereas the ELSE part (if any) is executed on both False and Unknown; a WHILE block will be repeated as long as the expression evaluates to True and execution stops on False and Unknown; and in a CASE expression, the WHEN that evaluates to True determines the result.
There is one notable exception to this – the CHECK constraint. If a row is inserted or updated that causes the CHECK constraint to evaluate to True, the modification is accepted; if it evaluates to False, it is rejected – but if it evaluates to Unknown, the modification is accepted! This is an unexpected change from the accustomed, but not one that has been made without reason – since Unknown usually results from a NULL value in the constrained column, rejecting rows if the CHECK constraint if the proposition evaluates to Unknown would merely duplicate the effect of a NOT NULL constraint. With the definition of a CHECK constraint as it is, illogical as it might seem, it’s very easy to define both an optional age column that will only accept ages over 35 or no age at all, and equally easy to define a mandatory age column that forces you to enter an age of over 35:
CREATE TABLE DemoPersons
(PersonID int NOT NULL
,AgeOptional tinyint NULL CHECK (AgeOptional > 35)
,AgeMandatory tinyint NOT NULL CHECK (AgeMandatory > 35)
,CHECK (AgeOptional = AgeMandatory)
);
Note the third check constraint, which ensures that both age columns are populated with the same age – but only if both age columns are populated at all, since this expression also results in Unknown if either of the age columns is NULL.
More oddness
The different treatment of Unknown in a CHECK constraint versus in a WHERE clause (or anywhere else, for that matter) can result in some interesting behavior. Let’s add some rows to illustrate this.
INSERT INTO DemoPersons (PersonID, AgeOptional, AgeMandatory)
SELECT 1, 36, 36
UNION ALL
SELECT 2, NULL, 37
UNION ALL
SELECT 3, 38, 38
UNION ALL
SELECT 4, 39, 39
UNION ALL
SELECT 5, NULL, 40;
Now, since the check constraint excludes row with an age of 35 or less, the query below should return 0, right? Well, no. It returns 2.
SELECT (SELECT COUNT()
FROM DemoPersons)
– (SELECT COUNT()
FROM DemoPersons
WHERE AgeOptional > 35);
It appears odd, on first sight, that a table with a CHECK constraint that only allows ages of over 35 still allows for rows that do not have an age of over 35 – but it does, in fact, make perfect sense. Remember that rows for which AgeOptional > 35 is Unknown are admitted by the CHECK constraint, but rejected by the WHERE clause!
In the next post on NULL, I’ll show more seemingly odd behavior of NULL, and explain why this behavior is, in fact, not odd at all. I’ll also answer the question if null if null is null null null is null – I bet you can’t wait!
5 Comments. Leave new
Beautifully done. This is a great discussion of three-valued logic. How about tackling Belief Logic and SSAS Data Mining 🙂
Good information Hugo. The only danger is that by trying to relate three-value logic to everyday examples of what we know and don’t know someone might infer that the truth value "unknown" is directly related to its usual meaning in English. You gave one example where that isn’t the case: "WHERE Age > 35 AND Age < 30". An even simpler example would be "WHERE Age = Age". Again, we don’t have to know your age to know that the value of Age is equal to itself. The value of this predicate is not unknown in mathematical terms or in everyday experience. So it would be safer and less counter-intuitive if SQL named the third truth value as something else altogether – the common English meaning of the word "unknown" clearly does not apply.
Note to all – while rereading this post in preparation of the third part of this series (do I hear someone say "at last"?), I noted a disturbing error.
In the second paragraph after the heading "CHECK, the odd one out" I had mistakenly types WHERE instead of WHILE. Not once but twice. Aaarrgghh!!!
Since this is not just an amusing type, but actually an error that changes what I want to say, I just went in and corrected my mistake. If my error has confused you, then please know that I am sorry, and accept my apologies.
The fourth (and final) part of this series about NULL is now available at http://sqlblog.com/blogs/hugo_kornelis/archive/2007/09/30/what-if-null-if-null-is-null-null-null-is-null.aspx.
Thank you so much! Very good explanation that helped me understand what wasn’t explained very well in my lecture.