Thursday, April 24, 2008

PostgreSQL: In place UPDATE

Disclaimer: MVCC is at the heart of PostgreSQL. Any attempts to hack it may leave your data inconsistent. Please use the trick below only if you are sure about what you are doing. Some may find it cool, but for many it will be an ugly hack.

In one of my previous posts, I talked about MVCC and VACUUM in Postgres. There is a nice document authored by Tom which explains it in more detail. One of the main complaints about MVCC is that it creates multiple versions of a row. That not only bloats the table, but also requires periodic maintenance activity to reclaim the dead tuples.

Recently somebody posted a use case on pgsql-general mailing list where the user wants to UPDATE all rows in a particular table. Its a simple update where a particular column is updated with a new value. The user is worried about the table bloat which in this case would be 100% because each row will have two versions at the end of the update operation. He can update a bunch of rows, run VACUUM, then update the next bunch and continue. This would avoid table bloat, but requires multiple passes over the table and indexes for each VACUUM.

The user further stated that he is not bothered about the transactional atomicity of the data. So he is fine if there is a failure or crash mid-way and some of the rows are updated while others are not. He would of course not want to have data corruption.

I thought about this for a while and it occurred to me that this case can possibly be handled by using BEFORE UPDATE triggers and heap_inplace_update() function. So here is a quick example. Its probably not a good idea to publish such foot gun techniques. But at certain times, you are left with no other option, but to use such hacks to solve your problem.

Before we start, here are few necessary conditions I can think of:
  • The updated tuple should be of the same size as the old tuple. IOW you should be updating only fixed size columns. Also, you must not be changing NULL values to non-NULL and vice versa.
  • There must not be any index involving any of the columns being updated. This would include predicates and expressions for partial/expression indexes.
  • There must not be any foreign key or check constraints on the columns being updated.
  • You are not using this on a system catalog table.
  • You should be absolutely sure that you don't need transaction atomicity and isolation for this update operation.
  • You should understand the technique and validate your use case against this. If you are not sure, please don't use it.
  • AFAIR heap_inplace_update() is available only 8.2 onwards.

If your use case meets these requirements, you can proceed further.

Write a BEFORE UPDATE trigger in C, something like this:

PG_FUNCTION_INFO_V1(inplace_update_trigger);
extern Datum inplace_update_trigger(PG_FUNCTION_ARGS);

Datum
inplace_update_trigger(PG_FUNCTION_ARGS)
{
 TriggerData *trigdata = (TriggerData *)fcinfo->context;
  trigdata->tg_newtuple->t_self = trigdata->tg_trigtuple->t_self;
  heap_inplace_update(trigdata->tg_relation, trigdata->tg_newtuple);
  return NULL;
}

Compile the function and generate an shared object (say trigger.so) from it.

Now, create a BEFORE UPDATE trigger on the table you want to update in-place.

CREATE OR REPLACE FUNCTION inplace_update_trigger()
RETURNS TRIGGER
AS 'trigger.so', 'inplace_update_trigger'
LANGUAGE C STRICT;

CREATE TRIGGER inplace_update_trigger BEFORE UPDATE ON test
 FOR EACH ROW EXECUTE PROCEDURE  inplace_update_trigger();

You are now ready for the in-place update operation. I would recommend acquiring and holding an EXCLUSIVE lock on the relation to ensure that there are no concurrent operations on the table.

When an UPDATE hits the table, the BEFORE UPDATE trigger overwrites the existing tuple with the new version. Since the existing copy is overwritten, any failure or crash after this won't rollback the change. So you should be ready to handle this case i.e. you should know how to undo the changes for the affected rows or retry the operation for the remaining tuples. Remember you may not have any easy way to know which tuples were successfully updated before failure/crash. Of course, you can do some clever tricks to store that information explictly in the rows.

The BEFORE UPDATE trigger returns NULL, so the actual UPDATE operation is skipped by the executor. So you would see "UPDATE 0" as the status of the operation. But remember, your existing rows have been updated in-place.

Once again I remind you: It's a hack. It compromises PostgreSQL MVCC and transaction atomicity/isolation properties. Use it at your own risk :-)

Thursday, April 10, 2008

Kudos to Tom Lane

Sometime I wonder how Tom Lane manages to work so hard! Many years back, I asked the same question to my Professor at IIT and he replied, "I enjoy what I do, so I never feel like working hard.". May be the same is true about Tom. He is so passionate about this work that he probably never gets tired, well at least that's my perception :-)

I have seen him jumping from one thread to another, one list to another quite effortlessly. And I am sure that requires tons of knowledge about the project you are working on, PostgreSQL in this case. He rarely gets things wrong in the first attempt and that requires extraordinary quality of work. Not only does he work on his action items, more so he helps bunch of other developers to see that their work is completed and accepted in a quality form. That really requires great patience. And we are fan of his writing skills. I wonder if I should start collecting his quotes - that will make a great collection ;-)

So you may ask why I am praising the guy in public domain ? Because I think he deserves that. Its my way to say "Thank you" for all the efforts he puts into to keep the PostgreSQL community thriving and make Postgres such a wonderful product. Its hard to imagine PostgreSQL community without Tom. Long live Tom!

Tuesday, April 8, 2008

PostgreSQL MVCC and VACUUM

PostgreSQL uses MVCC (multi-version concurrency control) to provide transactional semantics. In the simplest form, MVCC lets each transaction work on a different copy of a row in a table. What it means is whenever a row is updated, a new version of the row is created. The existing readers continue to see the old version of the row whereas new transactions see the new version. This is rather a very simplified presentation of MVCC. In the real world, there can be many versions of the row and a transaction can see one of those versions. The visibility depends on the transaction start time and the transaction isolation level.

The great advantage of MVCC is that readers don't wait for the writers to finish their work. Similarly, writers don't wait for the readers because they work on independent versions of the row. The downside is though that the table may get bloated with the multiple versions of a row. Sooner or later all but one version become invisible to all current or new transactions. The invisible versions are said to be DEAD rows. A maintenance activity called as VACUUM is required to recover these dead rows. The VACUUM scans the entire table, collect the DEAD rows and removes the corresponding index entries. This requires multiple passes over the table. As you can imagine, this can conflict with the normal database operations and can considerably slow down the operations, especially for heavily updated, large tables.

One may argue why we just don't remove the DEAD rows as and when we find them during normal database operations. The problem with that is we also need to cleanup the corresponding index entries. Otherwise the index entries may point nowhere, thus creating dangling references or even worse may point to unrelated rows (if the DEAD row is replaced by some other row), thus giving wrong results.
Finally, it's not possible to reach to the index entries from the heap tuples, making it extremely difficult to remove the dead rows easily.

PostgreSQL has the facility of AutoVacuum which is like a daemon service which periodically checks the tables and automatically vacuums them based on certain configuration parameters. This greatly simplifies the maintenance task.

You can find PostgreSQL documentation here.

Tuesday, April 1, 2008

Open source development!

I got associated with PostgreSQL development since last year
and half. What a amazing experience it has been! I had never
been a part of any open source project before that, so it was a
very refreshing change for me. Now I wonder if I would ever
be able to work in a proprietary software development projects.
That's not as scary as you might think because open source
software are going great guns and I believe they would do
so for next many years and decades.

So you may wonder what do I like so much about open source
and community driven software development. I think there are
quite a few reasons it excites me. First, I get to interact with so
many people who IMHO are much smarter than me. That gives
me opportunity to learn from them. Second, I like the
open communication that goes on the mailing lists. I get a chance
(like everybody else) to express my opinions freely. That also
teaches me how to work in a remote/distributed environment and
the experience is unparalleled in today globalized world. Third, it
brings me a lot of recognition. Remember, all the work I am
doing is available for everybody to see and measure. One can
easily review my work and also see reviews done by others.
I believe that would help me establish credibility over
a period of time.

You may ask, how do I earn my livelihood by working on
a Free and Open Source Software (FOSS) ? Well, thats not so
difficult. For example, with companies like EnterpriseDB
getting solidly behind the open source software projects like
PostgreSQL, its not hard to earn some $s while working on
your favorite technology. Of course, you need to be committed
and focused on your work. There is no getting away from
hard work :-)