Since the dawn of PostGIS, PostGIS users have needed to mutilate their geometries in often painful and horrific ways.
Why is ST_ClipByBox2D function useful, because its a much faster way of mutilating your geometries by a rectangular mold than using ST_Intersection. Why would you want to mutilate your geometries?
There are many reasons, but I'll give you one: As your geometry approaches the area of your bounding box and as your bounding box size decreases, the more efficient your spatial index becomes.
You can consider this article, Map dicing redux of the article I wrote (eons ago) - Map Dicing and other stuff which describes the same approach with much older technology. Though I will be using more or less the same dataset Massachusetts TOWNSSURVEY_POLYM (its newer so you can't really compare) and I tried to simplify my exercise a bit (not resorting to temp tables and such), my focus in this article will be to compare the speed between the new ST_ClipByBox2D approach and the old ST_Intersection approach. The spoiler for those who don't have the patience for this exercise is that using ST_ClipByBox2D at least on my sample data set on my puny Windows 7 64-bit desktop using PostgreSQL 9.4beta2 64-bit
was about 4-5 times faster than using ST_Intersection. There was a downside to this speedup. With the ST_Intersection approach, I had no invalid polygons. In the case of ST_ClipByBox2D, I had one invalid polygon. So as noted in the docs, use with caution. We'd be very interested in hearing other people's experiences with it.
One other benefit that ST_ClipByBox2D has over ST_Intersection which I didn't test out is that although ST_Intersection doesn't work with invalid geometries, ST_ClipByBox2D can.
ST_Tile usage (can be replaced with ST_PixelAsPolygons)
The code in both uses ST_Tile of an empty raster to create square boxes. This is expressed as a common table expression called grid. Sadly I realized that using a CTE instead of just inlining was about 50-100ms slower (seemed consistently). You probably also gain performance by using a temp table and indexing that. Boxes don't need to be square, they can be rectangular too. The 10, -10 you see is in this case saying each pixel is 10 meters by 10 meters (negative because in reverse of geometric space) per pixel. For this I could have just kept it at 1, -1 and it would be fine. The empty raster pixel width/height I set to 100x100 (and also experimented with 200x200) which would resolve to
1000x1000 meters, 2000x2000 meters respectively in Massachusetts State Plane geometric space. I have the timings in the comments of the chopping.
Now some people may wonder why I didn't opt for using ST_PixelAsPolygons raster function.
I found the code slightly more unpleasant to look at, and the speed was slightly worse -- 29932 ms for my 1000x1000m (using ST_PixelAsPolygons) vs. 29092 ms (using ST_Tile). I suspect the reason might be the unneeded pixel value returned and the ST_AddBand call. ST_Tile works happily with an empty raster, but ST_PixelAsPolygons seems to require at least one band.
If you would like to compare replace the WITH grid AS (..)
WITH grid As (
SELECT ( ST_PixelAsPolygons(
ST_AddBand(
ST_MakeEmptyRaster(ceiling( (ST_XMax(ext) - ST_Xmin(ext))/1000 )::integer,
ceiling((ST_YMax(ext) - ST_Ymin(ext))/1000)::integer, ST_XMin(ext),
ST_YMax(ext), 1000, -1000,0,0, 26986), '1BB'::text,1,0) )
).geom As b
FROM (SELECT ST_Extent(geom) As ext from towns) As f )
ST_ClipByBox Code: Much faster
DROP TABLE IF EXISTS towns_grid_clipbox;
CREATE TABLE towns_grid_clipbox(id serial primary key,
town varchar(50), geom geometry(MUltiPolygon, 26986));
-- 9351 rows affected, 9920 ms execution time. (2000 m x 2000 m tiles)
-- 29117 rows affected, 29092 ms execution time. (1000 m x 1000 m tiles)
INSERT INTO towns_grid_clipbox(town, geom)
WITH grid As (
SELECT ST_Tile(
ST_MakeEmptyRaster(
ceiling( (ST_XMax(ext) - ST_Xmin(ext))/10 )::integer,
ceiling((ST_YMax(ext) - ST_Ymin(ext))/10)::integer,
ST_XMin(ext), ST_YMax(ext), 10, -10,0,0, 26986), 100,100
)::geometry As b
FROM (SELECT ST_Extent(geom) As ext from towns) As f )
SELECT t.town, ST_Multi(ST_ClipByBox2D(t.geom, grid.b::box2d) )
FROM towns AS t INNER JOIN grid ON ST_Intersects(t.geom, grid.b) ;
ST_Intersection Code: Slower
DROP TABLE IF EXISTS towns_grid_inter;
CREATE TABLE towns_grid_inter(id serial primary key,
town varchar(50),
geom geometry(Multipolygon, 26986));
-- 9351 rows affected, 42544 ms execution time. (2000 m,2000 m tile)
-- 29117 rows affected, 129649 ms execution time (1000 m, 1000 m tile)
INSERT INTO towns_grid_inter(town, geom)
WITH grid As (
SELECT ST_Tile(
ST_MakeEmptyRaster(
ceiling( (ST_XMax(ext) - ST_Xmin(ext))/10 )::integer,
ceiling((ST_YMax(ext) - ST_Ymin(ext))/10)::integer,
ST_XMin(ext), ST_YMax(ext), 10, -10,0,0, 26986), 100,100
)::geometry As b
FROM (SELECT ST_Extent(geom) As ext from towns) As f )
SELECT t.town, ST_Multi(ST_Intersection(t.geom, grid.b) )
FROM towns AS t INNER JOIN grid ON ST_Intersects(t.geom, grid.b) ;
Validity Test
Though clip box was much faster, it produced one invalid geometry where as the ST_Intersection approach produced no invalid geometries. As such you probably want to check the validaty of your geometries after and run an ST_MakeValid on the invalid ones.
SELECT t.id, t.town, (d).reason, ST_AsText((d).location)
from towns_grid_clipbox AS t, ST_IsValidDetail(t.geom) As d
WHERE NOT ST_IsValid(geom) ;
id | town | reason | st_astext
------+----------+------------------------+-----------------------------------------
15444 | WEYMOUTH | Ring Self-intersection | POINT(247863.732500002 889768.12460001)