Wednesday, December 17, 2008

MySQL Vs PostgreSQL: key differences in queries

Coming from a MySQL background, and trying PostgreSQL for the first time, I am experiencing some key differences in query syntax. Some queries that used to work fine under MySQL now produce errors under PostgreSQL. Generally, PostgreSQL's query syntax is tighter, more strict to ANSI SQL. For anyone moving from MySQL to PostgreSQL, these differences will be helpful to know. So I'll post them in groups as I go deeper in PostgreSQL with time.

The first difference I encountered is that aliasing in PostgreSQL requires an explicit 'as'. This means that the following query, that used to work under MySQL, won't work under PostgreSQL:
SELECT count(id) count FROM users;

Instead, an explicit 'as' is needed before the alias:
SELECT count(id) AS count FROM users;


Another difference is the 'group by' issue. In MySQL queries were allowed to group the results by a subset of, not necessarily all, the selected columns. for example, the following query works under MySQL:
SELECT users.name, users.id, count(telephones.id)
FROM users LEFT JOIN telephones
ON telephones.user_id = users.id
GROUP BY users.id;

Such a query doesn't work under PostgreSQL. When grouping, all selected columns (except aggregated ones) must appear in the group-by clause:
SELECT users.name, users.id, count(telephones.id)
FROM users LEFT JOIN telephones
ON telephones.user_id = users.id
GROUP BY users.id, users.name;

This limitation makes it hard to build a query to return results that are distinct based on specific columns. i.e. This way I can't use 'group by' to return distinct results based on users.id only.
However, PostgreSQL comes with a good feature that will help you return distinct results based on some of, not all, the selected columns; 'distinct on' can be used as follows:
SELECT DISTINCT ON (users.id) users.name, users.id
FROM users LEFT JOIN telephones
ON telephones.user_id = users.id;

Nevertheless, It is worth a note that distinct-on clauses can't be used with order-by clauses in case the columns list in both clauses are different. i.e. The following query won't work because the distinct-on columns list is different than the order-by colmns list:
SELECT DISTINCT ON (users.id) users.name, users.id
FROM users LEFT JOIN telephones
ON telephones.user_id = users.id
ORDER BY users.name;


3 comments:

Regina said...

For this one you can wrap it in a subselect

SELECT u.*
FROM (
SELECT DISTINCT ON (users.id) users.name, users.id
FROM users LEFT JOIN telephones
ON telephones.user_id = users.id
ORDER BY users.id) As u
ORDER BY u.name

Tom said...

The issue with the group by is not PostgreSQL, it's doing the right thing.

It is MySQL that has a relaxed attitude towards it and does a guess at the aggregate column you want if excluded from the group by.

The behavior in MSSQL Server is exactly the same and whilst I enjoy the relaxed nature of MySQL it is not 'correct'.

You can get around the problem by making sure all your selected columns use correct aggregate functions.

I think the behavior of MySQL leads to a lot of newbies to SQL not full understanding what a group-by is doing.

Haitham Mohammad said...

@Tom: Engineering-wise, you're right.

I didn't say that MySql is better than PostgreSql. Neither did I say the opposite. I am just trying to list the differences that will typically face anyone who's migrating from MySql to PostgreSql.. maybe it can be helpful for anyone out there.