Today I ran into a little bit of SQL that, on first glance, didn’t make a bunch of sense. Before I show you the SQL I’ll give you a basic idea of the two tables being queried:
AREA
areaid | int NOT NULL
areaname | varchar2(100) NOT NULL
FACILITY
facilityid | int NOT NULL
facilityname | varchar2(100) NOT NULL
areaid | int NULL
There is actually a bit more to it but these columns are the important ones. The query that had been written was trying to find all areas that had no facilities within them:
SELECT ar.* FROM area ar WHERE ar.areaid NOT IN (SELECT f.areaid FROM facility f)
That query returned no rows at all. Meanwhile the following query returned one row:
SELECT ar.* FROM area ar WHERE NOT EXISTS (SELECT f.areaid FROM facility f.areaid = ar.areaid)
Honestly, I’ve given you a big hint to figuring this one out already since I titled this thing Beware the Null and I showed you which columns could be NULL and which couldn’t. But here is the problem.
Any SQL comparision that involves a NULL will evaluate to FALSE
That means all of the following examples would return no rows:
SELECT * FROM area WHERE areaid NOT IN (1,2,NULL);
SELECT * FROM area WHERE areaid IN (1,2,NULL);
SELECT 1 as myRow FROM dual WHERE 1=null;
SELECT 1 as myRow FROM dual WHERE 1 != null;
SELECT 1 as myRow FROM dual WHERE null = null;
All of those return no rows because any comparison operator used with a null evaluates to false (IN, NOT IN, =, !=, etc).
So, how do I fix the NOT IN query to work the way I expected it to?
SELECT ar.* FROM area ar WHERE ar.areaid NOT IN (SELECT f.areaid FROM facility f WHERE f.areaid IS NOT NULL)
Comments
Bryan Price
As an aside, I notice that having a NULL in an IN query is ok, but in a NOT IN query it’s bad.
Thus the statement
SELECT ar.* FROM area WHERE areaid IN ( 1 , 2 , NULL )
will come up with results if areaid = 1 or areaid = 2, but
SELECT ar.* FROM area WHERE areaid NOT IN ( 1 , 2 , NULL )
will get no results in any case, even if there was an areaid=3 in there.
Bill
Thanks for the additional clarification Chris!
Chris
The key to this behaviour is the meaning of NULL. NULL does not mean “not existing”, it means “unknown, if it even exists”.
If the value is unknown, you can not know whether it should be in the subselect result set. But, NOT IN selects a specified resultset.
And if a value is unknown, it can not be part of that result set. ;-)
BTW Doug’s solution is way faster than NOT IN, at least when it comes to bigger subselects… NOT IN can be a real performance killer.
Bill
Thanks Doug, you are correct and your query will return areas with no facilities.
Doug Giles
A left join will give you the results that you’re looking for:
SELECT ar . *
FROM area ar
LEFT JOIN facility fa ON ( ar.areaid = fa.areaid )
WHERE (fa.facilityid IS NULL)
That query should return the areas that do not have a facility.
Bill
Because that won’t work either:
I edited your code a little to work with Oracle:
SELECT ar.*
FROM area ar
INNER JOIN facility fa
ON (ar.areaid = fa.areaid)
WHERE (fa.facilityid IS NULL)
And it too returns no rows.. Where I am trying to find the rows in the area table that don’t exist in the facility table.
Beyond the fact that your query returns no records (because the facility ID can’t be NULL) I don’t even see how I could convert your query to get what I wanted; I really don’t want a join of the two tables, I want to find the differences between the two so to speak.
Rick
Why not a straight-up join?
SELECT ar.*
FROM area AS ar
INNER JOIN facility AS fa
ON (ar.areaid = fa.areaid)
WHERE (fa.facilityid IS NULL)