Thursday, November 20, 2008

Full Text Search: PostgreSQL beats MySQL

Ever since I can remember, we've been relying on MySQL as the database backend for our open source solutions developed at eSpace. MySQL's popularity is unquestionnable, being one of the most widely used DBMSs in the open source community. However, now with my eyes wide open, I can say that I prefer PostgreSQL over MySQL. And for your surprise, what shapes up my opinion is not some performance benchmark or some detailed 'versus' report. I'd choose PostgreSQL for one single hell of a feature: Full Text Search.

For the first glance, some people could be amazed by my reason. Some people would argue that PostgreSQL is too much powerful for many more important reasons. I agree with those, but let's stick to our context. Others will start to mumble, MySQL does have Full Text Search among its set of features. Yes, of course it does, but let me highlight the difference.

In MySQL, A full text search query, that searches for the phrase "database systems" in the title and body of an 'articles' table, looks like this:
SELECT * FROM articles
WHERE MATCH (title, body)
AGAINST ('database systems');

Now, in order to enable full text search in a table column, A full text index must have been created for this column. That shouldn't be a problem. The problem begins when you know that, In MySQL, a full text index can only be created on tables that use MyISAM storage engine. The problem with MyISAM tables is that they are not transactional, meaning that you can't perform COMMITs and ROLLBACKs on such tables. And that's why most decent MySQL applications rely on InnoDB/BDB as a storage engine, because they are transcational.

See? You simply have to lose one of the most important features given by any DBMS, and jeopardize your data consistency, just to enable text search. Of course, there are some hacks to work around this conflict, including some forms of replication, but it's still just not good.

On contrary, in PostgreSQL, Full Text Search is such a relief. The same full text search query looks like this in PostgreSQL:
SELECT * FROM articles
WHERE to_tsvector('english', title || body)
@@ to_tsquery('database | systems');

This query simply uses 'english' language configuration to search the concatenation of title and body for 'database' or 'systems'. PostgreSQL supports a single storage engine, so you don't change anything. Furthermore, this query will run nice and easy even if you didn't create text indices for the searchable columns. As you can expect, creating the text indices will massively speed up the search for large data volumes, but it's not mandatory. Full text search is enabled by default and you don't have to give up on being transactional to use it. For me, it's the first time that one single feature makes all that difference, changing my preference between two technology alternatives.

No comments: