I've been pondering a way to efficiently extract sequences of consecutive values from a table without resorting to iteration in an external programming language. My specific problem is a database of thousands of phone numbers (technically ANI's), in varying block sizes:
SELECT ani
FROM phone_numbers
ORDER BY ani;
ani
------------
5551230000
5551230001
5551230002
5551230003
5551230004
...
5554560000
5554560001
5554560002
5554560003
5554560004
...
5559990000
5559990001
5559990002
5559990003
5559990004
...
Now generally we buy ANI's in blocks of 100 (or multiples of) from CLEC's, but not always. We have some stray blocks with 10, 20, or 50 ANIs. Anyway, sometimes we need to get an inventory of numbers and it's much nicer to have a list of ranges like:
5551230000 - 5551230099
5554560000 - 5554560099
5559990000 - 5559990099
rather than a giant list containing each individual number. The problem is, how to extract these consecutive lists of ANI's when you can't iterate? Well, first we need to figure out how to group these numbers. If the block sizes were uniform, we could perhaps group by some prefix such as:
SELECT substr(ani, 1, 8) AS prefix, count(*)
FROM phone_numbers
GROUP BY prefix
ORDER BY prefix;
Unfortunately this assumes that there are 100 ANI's in each block, something that simply isn't true.
What we really need is a way to somehow index the row number of the query. Oracle has an extension called ROWNUM that contains the index of the row in the current result set:
rownum | ani
-------+-----------
0 | 5551230000
1 | 5551230001
2 | 5551230002
...
etc.
Now, if we had this, we could easily group our results using the difference of the ANI and the ROWNUM (ani - rownum = delta where delta remains constant over each sequence):
rownum | ani | delta
-------+-------------+-----------
0 | 5551230000 | 5551230000
1 | 5551230001 | 5551230000
2 | 5551230002 | 5551230000
...
10 | 5554560000 | 5554559990
11 | 5554560001 | 5554559990
...
At this point it would be a simple matter of grouping by delta.
Unfortunately PostgreSQL doesn't have this feature. Well, not exactly.
PostgreSQL supports a feature called window functions and these will be just what we need, since this provides us with a function row_number() that is basically the equivalent of Oracle's ROWNUM (except it can only be used in the context of a window).
Here's an example:
SELECT ani, row_number() OVER block
FROM phone_numbers
WINDOW block AS (ORDER BY ani);
ani | row_number
-----------+-----------
5551230000 | 0
5551230001 | 1
5551230002 | 2
5551230003 | 3
5551230004 | 4
...
That's what we need! Now let's finish it out:
WITH indexed AS (
SELECT
ani,
ani::numeric - row_number() OVER block AS delta
FROM phone_numbers
WINDOW block AS (ORDER BY ani)
)
SELECT
min(ani),
max(ani),
count(*)
FROM indexed
GROUP BY delta
ORDER BY min;
min | max | count
------------+------------+-------
5551230000 | 5551230099 | 100
5554560000 | 5554560009 | 10
5558880000 | 5558880024 | 25
5559990000 | 5559990009 | 10
...
Looks good! It also happens to be very fast for the amount of data I'm dealing with (returns instantly with around 5000 ANI's). I should note that my actual query was slightly more complicated since I also needed to partition based on CLEC, but I didn't want to cloud the logic with those details.
As an aside, I'm simply using the WITH clause as a convenience method for describing a subselect, although it can do much, much more (see WITH RECURSIVE).
The credit for this approach (at least, the first I've seen it) belongs to Rob Farley from this thread.