Apr42010

Recursive queries in PostgreSQL

Filed under: postgresql 

PostgreSQL now supports recursive queries. This is handy for a few things, most notably recursing graphs.

For instance, assume we create a table accounts that looks something like this:

CREATE TABLE accounts (
  id SERIAL PRIMARY KEY NOT NULL,
  accounts_id INTEGER REFERENCES accounts,
  name TEXT
);

INSERT INTO accounts (accounts_id, name)
VALUES (NULL, 'Account 1');
INSERT INTO accounts (accounts_id, name)
VALUES (NULL, 'Account 2');
INSERT INTO accounts (accounts_id, name)
VALUES (
  (SELECT id FROM accounts WHERE name='Account 1'),
  'Account 3'
);
INSERT INTO accounts (accounts_id, name)
VALUES (
  (SELECT id FROM accounts WHERE name='Account 3'),
  'Account 4'
);


SELECT * FROM accounts;

 id | accounts_id |   name
----+-------------+-----------
  1 |             | Account 1
  2 |             | Account 2
  3 |           1 | Account 3
  4 |           3 | Account 4
(4 rows)

This lets us have accounts that in turn have subaccounts. This type of organization has many applications, such as resellers. The problem is, how to find out things like "what accounts are subaccounts of account x?"

Usually you'd resort to traversal in a procedural language, but with recursive queries you don't have to:

WITH RECURSIVE subaccounts (id) AS (
    SELECT * FROM accounts WHERE accounts_id = 1
  UNION ALL
    SELECT a.* FROM accounts a, subaccounts
    WHERE a.accounts_id = subaccounts.id
)
SELECT id, name, accounts_id
FROM subaccounts
ORDER BY accounts_id, name;

 id |   name    | accounts_id
----+-----------+-------------
  3 | Account 3 |           1
  4 | Account 4 |           3
(2 rows)


1 comments Leave a comment


Dec52006

PostgreSQL stomps MySQL

Filed under: postgresql mysql benchmarks 

For all those (myself included) who thought that PostgreSQL wasn't as fast as MySQL for most stuff, Jonathan Ellis points to some recent benchmarks that show PostgreSQL to pretty much stomp MySQL all around, especially on SMP/multi-core processors.

If you don't bother reading the articles, simply note this single item:

We see, for instance, that adding a second Woodcrest in MySQL 5.0.20a - costing a good 851 dollars - only yields a 6% performance increase.

One of the comments in Jonathan's blog points to this thread: Single PostgreSQL server faster than MySQL cluster.

Personally I switched from MySQL some years ago for the features, finding performance to be adequate if not stellar. After recent experiences with both however I'd say that PostgreSQL now wins on all the following:

  1. ease of installation, configuration, tuning and maintenance
  2. features and extensibility
  3. performance
  4. stability

If you're a programmer, you'll find PostgreSQL to be the most amazing database around, bar none. The ability to program server-side procedures in a plethora of languages (including Python, Ruby, Perl, Tcl and more), terrific client libaries and API's for dozens of languages, and the MVCC approach (vs locking) makes development against PostgreSQL an absolute pleasure.

I chose PostgreSQL for these features in spite of performance doubts. Now that PostgreSQL clearly wins in the performance category as well, choosing it over MySQL is a no-brainer.



0 comments Leave a comment


Mar242010

Extracting consecutive sequences from PostgreSQL

Filed under: postgresql 

I've been pondering a way to efficiently extract sequences of consecutive values from a table without resorting to iteration in an external programming language. My specific problem is a database of thousands of phone numbers (technically ANI's), in varying block sizes:

SELECT ani
FROM phone_numbers
ORDER BY ani;

    ani
------------
 5551230000
 5551230001
 5551230002
 5551230003
 5551230004
 ...
 5554560000
 5554560001
 5554560002
 5554560003
 5554560004
 ...
 5559990000
 5559990001
 5559990002
 5559990003
 5559990004
 ...

Now generally we buy ANI's in blocks of 100 (or multiples of) from CLEC's, but not always. We have some stray blocks with 10, 20, or 50 ANIs. Anyway, sometimes we need to get an inventory of numbers and it's much nicer to have a list of ranges like:

5551230000 - 5551230099
5554560000 - 5554560099
5559990000 - 5559990099

rather than a giant list containing each individual number. The problem is, how to extract these consecutive lists of ANI's when you can't iterate? Well, first we need to figure out how to group these numbers. If the block sizes were uniform, we could perhaps group by some prefix such as:

SELECT substr(ani, 1, 8) AS prefix, count(*)
FROM phone_numbers
GROUP BY prefix
ORDER BY prefix;

Unfortunately this assumes that there are 100 ANI's in each block, something that simply isn't true.

What we really need is a way to somehow index the row number of the query. Oracle has an extension called ROWNUM that contains the index of the row in the current result set:

rownum |    ani
-------+-----------
     0 | 5551230000
     1 | 5551230001
     2 | 5551230002
     ...

etc.

Now, if we had this, we could easily group our results using the difference of the ANI and the ROWNUM (ani - rownum = delta where delta remains constant over each sequence):

rownum |    ani      | delta
-------+-------------+-----------
     0 | 5551230000  | 5551230000
     1 | 5551230001  | 5551230000
     2 | 5551230002  | 5551230000
    ...
    10 | 5554560000  | 5554559990
    11 | 5554560001  | 5554559990
    ...

At this point it would be a simple matter of grouping by delta.

Unfortunately PostgreSQL doesn't have this feature. Well, not exactly.

PostgreSQL supports a feature called window functions and these will be just what we need, since this provides us with a function row_number() that is basically the equivalent of Oracle's ROWNUM (except it can only be used in the context of a window).

Here's an example:

SELECT ani, row_number() OVER block
FROM phone_numbers
WINDOW block AS (ORDER BY ani);

   ani     | row_number
-----------+-----------
5551230000 |          0
5551230001 |          1
5551230002 |          2
5551230003 |          3
5551230004 |          4
...

That's what we need! Now let's finish it out:

WITH indexed AS (
  SELECT
    ani,
    ani::numeric - row_number() OVER block AS delta
  FROM phone_numbers
  WINDOW block AS (ORDER BY ani)
)
SELECT
  min(ani),
  max(ani),
  count(*)
FROM indexed
GROUP BY delta
ORDER BY min;

    min     |    max     | count
------------+------------+-------
 5551230000 | 5551230099 |   100
 5554560000 | 5554560009 |    10
 5558880000 | 5558880024 |    25
 5559990000 | 5559990009 |    10
 ...

Looks good! It also happens to be very fast for the amount of data I'm dealing with (returns instantly with around 5000 ANI's). I should note that my actual query was slightly more complicated since I also needed to partition based on CLEC, but I didn't want to cloud the logic with those details.

As an aside, I'm simply using the WITH clause as a convenience method for describing a subselect, although it can do much, much more (see WITH RECURSIVE).

The credit for this approach (at least, the first I've seen it) belongs to Rob Farley from this thread.



0 comments Leave a comment


Aug192010

Recursive Queries in PostgreSQL redux

Filed under: postgresql 

Thought I'd follow up on a previous post regarding recursive queries in PostgreSQL. I had opportunity to use this particular query again, and I thought I'd share a couple small enhancements:

CREATE FUNCTION subaccounts(INTEGER) RETURNS SETOF accounts AS $$

DECLARE
  row accounts%rowtype;

BEGIN
  FOR row IN (
    WITH RECURSIVE subs(id) AS (
      SELECT * FROM accounts WHERE id = $1
    UNION ALL
      SELECT a.* FROM accounts a, subs
      WHERE a.accounts_id = subs.id
    )
    SELECT *
    FROM subs
  ) LOOP RETURN NEXT row;
  END LOOP;
END;

$$ LANGUAGE 'plpgsql';

The key features here are

  1. Refactored into a function, so you can pass it an id. This allows you to select any branch from the tree.
  2. Included the parent record in the result set. This turns out to be more useful to me.

Here's an example usage scenario:

SELECT * FROM accounts;

 id | accounts_id |   name
----+-------------+-----------
  1 |             | Account 1
  2 |             | Account 2
  3 |           1 | Account 3
  4 |           3 | Account 4
(4 rows)

SELECT * FROM subaccounts(1);

 id | accounts_id |   name
----+-------------+-----------
  1 |             | Account 1
  3 |           1 | Account 3
(2 rows)


0 comments Leave a comment




Copyright © 2007, Cliff Wells