Tuesday, November 26, 2013

Search Results for: select count(*) ---- Is count(col) better than count(*)?


One question, that surfaces sometimes in my courses (especially in Performance Tuning) is, whether there is a benefit in avoiding count(*) in favor of count(col). The short answer is: No.
Dealing with Oracle DBAs and Developers has tought me, though, that these guys usually are not satisfied with the short answer; they require some kind of proof – which is a good thing, in my view. So here we go:
SQL> select count(*) from sales;

 COUNT(*)
----------
 14701488

Elapsed: 00:00:33.30
SQL> select count(cust_id) from sales;

COUNT(CUST_ID)
--------------
 14701488

Elapsed: 00:00:04.06
SQL> select count(*) from sales;

 COUNT(*)
----------
 14701488

Elapsed: 00:00:34.49
SQL> select count(cust_id) from sales;

COUNT(CUST_ID)
--------------
 14701488

Elapsed: 00:00:04.20
I think the above observation is responsible for the appearing of the myth that count(col) is superior. In fact, right now it is faster (about 10 times!) as the count(*). I did the two selects twice to show that caching has not much to say here. Unfortunately, the second select with count(col) is faster, but not necessarily correct! Watch it:
SQL> update sales set cust_id=null where rownum<2;
1 row updated.
Elapsed: 00:00:00.23
SQL> commit;
Commit complete.
Elapsed: 00:00:00.01
SQL> select count(*) from sales;
 COUNT(*)
----------
 14701488
Elapsed: 00:00:34.84
SQL> select count(cust_id) from sales;
COUNT(CUST_ID)
--------------
 14701487
Elapsed: 00:00:03.73
The count(cust_id) is still faster by far – but it shows a wrong result, should you be interested in the number of rows of the table  You probably now see already the point I am after: There is an index on the cust_id column that is used for count(cust_id), but not for count(*). If the column counted does not contain any NULL values, the result is identical, but the runtime is faster. The origin of the myth! The point is: If you would declare the indexed column as NOT NULL, the optimizer would know that it can use the index for the count(*):
SQL> update sales set cust_id=1  where rownum<2;
1 row updated.
Elapsed: 00:00:00.10
SQL> commit;
Commit complete.
Elapsed: 00:00:00.10
SQL> alter table sales modify (cust_id NOT NULL);
Table altered.
Elapsed: 00:00:38.72
SQL> set autotrace on explain
SQL> select count(*) from sales;
 COUNT(*)
----------
 14701488
Elapsed: 00:00:03.81
Execution Plan
----------------------------------------------------------
Plan hash value: 2989616816
-----------------------------------------------------------------------------------
| Id  | Operation             | Name              | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                   |     1 |  8499   (1)| 00:01:42 |
|   1 |  SORT AGGREGATE       |                   |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| SALES_CUST_ID_IDX |    14M|  8499   (1)| 00:01:42 |
-----------------------------------------------------------------------------------
SQL> alter table sales modify (cust_id NULL);
Table altered.
Elapsed: 00:00:00.27
SQL> select count(*) from sales;
 COUNT(*)
----------
 14701488
Elapsed: 00:00:36.00
Execution Plan
----------------------------------------------------------
Plan hash value: 1047182207
--------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 | 19398   (1)| 00:03:53 |
|   1 |  SORT AGGREGATE    |       |     1 |            |          |
|   2 |   TABLE ACCESS FULL| SALES |    14M| 19398   (1)| 00:03:53 |
--------------------------------------------------------------------
As a general rule of thumb, you should always declare columns NOT NULL if you know that NULL values can’t be in that columns, in order to make the optimizer aware of that important information also. By the way, if you do count(1) instead of count(*), the outcome is the same:
SQL>  alter table sales modify (cust_id NOT NULL);
Table altered.
SQL> select count(1) from sales;
 COUNT(1)
----------
 14701488
Elapsed: 00:00:03.12
Execution Plan
----------------------------------------------------------
Plan hash value: 2989616816
-----------------------------------------------------------------------------------
| Id  | Operation             | Name              | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                   |     1 |  8499   (1)| 00:01:42 |
|   1 |  SORT AGGREGATE       |                   |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| SALES_CUST_ID_IDX |    14M|  8499   (1)| 00:01:42 |
-----------------------------------------------------------------------------------

No comments:

Post a Comment