One of the latest exciting additions to PostGIS 2.1 is a complete C implementation of ST_DumpPoints
which is a patch contributed by Nathan Wagner. Past versions of ST_DumpPoints piggy backed on
ST_Dump and plpgsql for recursion so this has been a long-awaited todo. This function is dear to my heart because it is a sub-step of many geoprocessing workflows. What's dearer to my heart though is getting rid of the memory copy bottlenecks that make plpgsql processes inherently
slower than C-native ones. That's probably too much to ask. In theory this new ST_DumpPoints version should be faster, but how much.
There was also some intimation that the new ST_DumpPoints implementation
has improvements over the C-based ST_Dump. I decided to put both of these theories to the test.
The trivial lets Dump Points from multipoint collection
If you had a multipoint collection, in prior versions of PostGIS you'd be better off
using ST_Dump instead of ST_DumpPoints, but in 2.1.0, this is not the case. Why was it faster to use ST_Dump -- ST_Dump is a pure C and ST_DumpPoints is just going to call ST_Dump anyway and it apparently does it less efficiently than
I had hoped, probably because of unnecessary memory copy calls.
First we'll create a dummy table with a single row multipoint collection.
CREATE TABLE miscpoints(geom geometry(MULTIPOINTZ));
INSERT INTO miscpoints(geom)
SELECT ST_Collect(ST_MakePoint(x,y,z))
FROM generate_series(1,100) As x
CROSS JOIN generate_series(200,300) As y
CROSS JOIN generate_series(-10,10) As z;
Dumping points from a multipoint with ST_DumpPoints
Now to test ST_DumpPoints under both 2.1 and 2.0 (I'm not bothering with 1.5 since I don't have that readily accessible and should be in theory same speed as 2.0)
This is also testing under PostgreSQL 9.2 32-bit on a windows 7 box with not much memory. PostGIS versions tested are:
2.1.0 SVN r10609
2.0.2 SVN r10315
SELECT (gd).path[1] As pos, (gd).geom As geom
FROM (SELECT ST_DumpPoints(geom) As gd
FROM miscpoints) As foo;
SELECT count(*)
FROM
(SELECT (gd).path[1] As pos, (gd).geom As geom
FROM (SELECT ST_DumpPoints(geom) As gd
FROM miscpoints) As foo
) as foo;
As you can see, the new implementation is clearly faster. It's about 10 times faster and the speed between the 2 versions is a bit dwarfed if you needed
to transfer these records somewhere like to pgAdmin so in practice you may only see a 2-fold to 5 fold improvement.
SELECT (gd).path[1] As pos, (gd).geom As geom
INTO miscdp
FROM (SELECT ST_DumpPoints(geom) As gd
FROM miscpoints) As foo;
Dumping to another table is not so bad
Dumping points from a multipoint with ST_Dump
ST_DumpPoints seems to be faster than it's older ST_Dump sibling as demonstrated here. In 2.0 you see ST_Dump is faster than using ST_DumpPoints for this use-case and shockingly
much much faster. However in 2.1.0, the tides have changed, and ST_DumpPoints even for points is faster.
SELECT count(*)
FROM
(SELECT (gd).path[1] As pos, (gd).geom As geom
FROM (SELECT ST_Dump(geom) As gd FROM miscpoints) As foo
) as foo;
Dumping Points from Huge Multipolygons
One use case of dump points is to expand a multipolygon so you can feed it to something like PL/R to be reconstituted. This is one one of the reasons
why the path variable is important. So I tried this with US State 2011 tiger boundaries (top 3 states with most points) which you can determine by running this query.
SELECT stusps
,ST_NPoints(geom) As npoints
, ROW_NUMBER() OVER(ORDER BY ST_NPoints(geom) DESC) As n
FROM tl_2011_us_state
ORDER BY n LIMIT 3;
Now to test the dumping speeds
SELECT stusps
, (gd).path[1] As polyp
, (gd).path[2] As ringp
, (gd).path[3] As pointp
, (gd).geom as pt
FROM (
SELECT stusps, geom, ST_DumpPoints(geom) as gd
FROM tl_2011_us_state
WHERE stusps IN('TX', 'MN', 'VA') ) As foo;
SELECT count(*)
FROM (
SELECT stusps
, (gd).path[1] As polyp
, (gd).path[2] As ringp
, (gd).path[3] As pointp
, (gd).geom as pt
FROM (
SELECT stusps, geom, ST_DumpPoints(geom) as gd
FROM tl_2011_us_state
WHERE stusps IN('TX', 'MN', 'VA') ) As foo ) As foo;
So for this use-case, 2.1.0 ST_DumpPoints is about 8 times faster, but again the differences between the two speeds will be mostly drowned by network
effects.