In some cases, NULL values prevent statements from using their column's index, which costs full table scan and impacts performance severely in a loop.
Let's see a case of comparing using index and full table scan.
NOT NULL - Using Index
First of all, we want to count NOT NULL of MANAGER_ID.
SQL> select count(*) cnt from hr.employees where manager_id is not null;
CNT
----------
106
Then we check the execution plan.
SQL> set linesize 85;
SQL> set pagesize 1000;
SQL> select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID f67nzh2nk7pu0, child number 1
-------------------------------------
select count(*) cnt from hr.employees where manager_id is not null
Plan hash value: 3393209858
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | INDEX FULL SCAN| EMP_MANAGER_IX | 106 | 424 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("MANAGER_ID" IS NOT NULL)
19 rows selected.
As we can see, the statement is using the single-column index EMP_MANAGER_IX, which is not bad.
IS NULL - Full Table Scan
On the other hand, it becomes different when we want to count NULL.
SQL> select count(*) cnt from hr.employees where manager_id is null;
CNT
----------
1
SQL> select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID bp859wff0gpsp, child number 1
-------------------------------------
select count(*) cnt from hr.employees where manager_id is null
Plan hash value: 1756381138
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| EMPLOYEES | 1 | 4 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("MANAGER_ID" IS NULL)
19 rows selected.
This time, it goes TABLE ACCESS FULL. TABLE ACCESS FULL means that it performs a full table scan instead of index scan, which may cost a lot when the table is really big.
Rationale
This is because the single-column index EMP_MANAGER_IX does not contains NULL values of MANAGER_ID, so it has no way to know which field is NULL. As a result, a full table scan applies.
That's why the NUM_ROWS of indexes are sometimes smaller than their depended tables.
SQL> select num_rows from all_indexes where owner = 'HR' and index_name = 'EMP_MANAGER_IX';
NUM_ROWS
----------
106
SQL> select num_rows from all_tables where owner = 'HR' and table_name = 'EMPLOYEES';
NUM_ROWS
----------
107
The difference is real.
Solution
To solve it, we need a composite index, which is able to contain NULL values of MANAGER_ID.
SQL> create index hr.emp_mgr_id_x on hr.employees (manager_id, 1);
Index created.
Since the second position of the composite index has no importance at all, we provide a dummy column as a company.
Let's see the new execution plan.
SQL> select count(*) cnt from hr.employees where manager_id is null;
CNT
----------
1
SQL> select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID bp859wff0gpsp, child number 1
-------------------------------------
select count(*) cnt from hr.employees where manager_id is null
Plan hash value: 2770830432
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | INDEX RANGE SCAN| EMP_MGR_ID_X | 1 | 4 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("MANAGER_ID" IS NULL)
19 rows selected.
Although SQL_ID is the same, PLAN_HASH_VALUE has changed, because the composite index has been used in the execution plan.
Let's summarize the index transition.
Done!