Skip to content
Home » MySQL » Select Group Rows Correctly

Select Group Rows Correctly

Aggregate Functions

You may notice that, when we selected the maximum value (via MAX()) and with other columns, they looked like in the same row, but in fact, they are in different rows. What's wrong with it? Let's see an example.

We have a table fruits which contains price information.

mysql> select * from fruits;
+----+--------+-------+------+
| id | fruit  | price | city |
+----+--------+-------+------+
|  1 | apple  |    20 | NYC  |
|  2 | banana |    10 | LOS  |
|  3 | cherry |    30 | NYC  |
|  4 | apple  |    30 | LOS  |
+----+--------+-------+------+
4 rows in set (0.00 sec)

Now, we'd like to know the fruits with the highest price no matter which city they are. So we expect cherry in NYC and apple in LOS should be returned. We execute the statement contains MAX().

mysql> select id, fruit, max(price) price, city from fruits;
+----+-------+-------+------+
| id | fruit | price | city |
+----+-------+-------+------+
|  1 | apple |    30 | NYC  |
+----+-------+-------+------+
1 row in set (0.00 sec)

It shows the highest price $30 fruit apple in NYC and we know apple in NYC is $20. Apparently, the result mixed up the rows. But don't misunderstand MySQL, this is not a bug, this is a normal behavior, a correct result. You just misused group functions.

The reason is that MAX() is an aggregate (group) function which implies a group by clause there. Group functions do not like row functions, they take a group of rows as a whole to compute the result and they don't care individual rows.

So don't expect atomic rows will be returned in group by statements, no matter what the group by is explicit or implicit.

The simplest way to solve the problem is actually an alternative, we don't use group functions and use sorting technique instead. Try the statement:

mysql> select * from fruits order by price desc limit 2;
+----+--------+-------+------+
| id | fruit  | price | city |
+----+--------+-------+------+
|  3 | cherry |    30 | NYC  |
|  4 | apple  |    30 | LOS  |
+----+--------+-------+------+
2 rows in set (0.00 sec)

The result is correct. But the question is you may not know the exact number of rows to limit.

Solution

A real cure to solve the problem is to use a subquery to identify the highest value.

mysql> select * from fruits where price = (select max(price) from fruits);
+----+--------+-------+------+
| id | fruit  | price | city |
+----+--------+-------+------+
|  3 | cherry |    30 | NYC  |
|  4 | apple  |    30 | LOS  |
+----+--------+-------+------+
2 rows in set (0.00 sec)

This is what we want, and the best thing is you don't need to specify the returned number, it's clean and clear.

I have to say there's a drawback in the statement. It may take a while to complete the statement if the table is big enough, although we see no difference in our case. This is all because you're using group functions, they are slow in general. If you know the exact returned number, I would recommend the first solution which is much faster.

Leave a Reply

Your email address will not be published. Required fields are marked *