Finished json layer in openlayers
Home » » Polygon Dissolve and Generalise
This write up outlines a solution developed to dissolve and generalise a landuse polygon dataset into something suitable for a lightweight vector layer.
The raw data comprises 200mb of shape file in wgs84 projection, and incudes numerous offshore islands including those on both sides of the 180 degree longitude boundary. The shapes possess reasonable topological integrity, but are highly noded.
For our purpose we just want to show which is conservation estate and which is not. So we dont need or want to retain the boundarys of individual parks, and the emphasis is on minimum data size. Hence the less lines the better, which means dissolving the shapes into the fewest possible.
Secondly the process of simplifying data while maintaining topology is not straightforward in postigs at present. See stackoverflow for discussion on how to do this while retaining topological integrity. In particular st_simplify is not aware of shared boundarys and st_simplifypreservetopology is no smarter. All the latter does is, when simplifyiing shapes down to almost nothing, is try not to leave invalid shapes like polygons with two sides and the like. Frankly i think this should be standard behavior of st_simplify, but, alas, moving on.
Just remember that simplify should always be used with care on polygons, as it has a tendancy to leave gaps and overlaps between adjacent features. This naturally creates a catch 22 situation. To dissolve adjacent parks into one means doing it unsimplified so as to avoid the slithers that would be created by simplification. OTOH doing it unsimplified means doing one of the most memory and cpu intensive operation that postgis possesses on a large and noisy dataset,
This workaround was adapted from the methodology found in a linfiniti article. Basically it involves simplifying somewhat first, then buffering the boundarys enough to cover the gaps that were formed between adjacent shapes, then unioning the resulting shapes into fewer larger shapes.
So methodology in hand, one might expect this to be quickly achieved via a quick simplify, buffer, union, and dump. However this dataset shows that in particular the union step is quite problematic in postgis after you exceed a certain complexity. My attempts in this case either crashed postgis from memory exhaustion (st_union) or threatened to take days or weeks to complete using st_memunion. Various experiments over several weeks (you might know this as 'postgis drama') showed that theres some sort of internal logic threshold where things go to custard. A union on the whole dataset is still going a day later, but on a 1/6th of it takes only 20 seconds.
Not being in a position to help develop postgis, clearly another strategy was needed. The forums all said break the data into subsets and dissolve those. I manually made 6 postgis layers from the 2 islands, called them wk_a thru wk_f. But in hindsight could have just used the conservancy attribute of which there are 11.
Anyway here is the procedure that worked.
First prepare the whole dataset.
-- remove attributes and buffer to 50m
create temp table doc_land2 as
select
st_buffer(st_simplifypreservetopology(st_buffer(st_buffer(the_geom,-50),100),10),0) as the_geom
from doc_land;
So we buffer inwards to remove the exterior noisiness, and outwards to remove the interior noisiness, but leaving it buffered out for the duration of the union. To avoid too many simplifications we do a single light simplification on the buffer results in preparation for unioning. Buffer 0 just to be safe -- union does not tolerate invalids at all well. Then split the data into 6 tables. After that its good to go.
-- disolve, and repeat for a - f
create temp table wk_a2 as
select
ST_Union(the_geom) as the_geom
from wk_a;
-- merge 6 parts back together
create temp table doc_land3 as
select the_geom from wk_a2
union all
select the_geom from wk_b2
union all
select the_geom from wk_c2
union all
select the_geom from wk_d2
union all
select the_geom from wk_e2
union all
select the_geom from wk_f2;
At this point we have a table containing 6 shapes each containing a couple thousand multipolgons. There are still some adjoining shapes on the edges of the sets which need dissolving. Theres a bit an iterative dance to prevent union choking again.
-- multipart to isolate smaller shapes
create temp table doc_land4 as
select
(st_dump(the_geom)).geom as the_geom
from doc_land3;
-- remove small isolated parks, less than 100ha
-- this removes 2/3 of the shapes
alter table doc_land4 add area float;
update doc_land4 set area= st_area(the_geom);
delete from doc_land4 where area <1000000
-- redissolve to merge the overlapping parks on the boundarys of the six sets
-- now quicker because theres lots less shapes
create temp table doc_land5 as
select
(st_dump(ST_Union(the_geom))).geom as the_geom
from doc_land4;
-- unbuffer
create temp table doc_land6 as
select
st_buffer(st_simplifypreservetopology(st_buffer(the_geom,-50),10),0) as the_geom
from doc_land5;
-- dump again to seperate any new little islands from the unbuffer
create temp table doc_land7 as
select
(st_dump(the_geom)).geom as the_geom
from doc_land6;
Lastly the process has left quite a number of tiny interior rings, which we remove, again using 100ha as a threshold. To do this google turned up a neat little function from Regina. It removes rings under a certain size.
-- only works on polygons,not multipolygons
CREATE OR REPLACE FUNCTION filter_rings(geometry,float) RETURNS geometry AS
$$ SELECT ST_BuildArea(ST_Collect(a.geom)) as final_geom
FROM ST_DumpRings($1) AS a
WHERE a.path[1] = 0 OR
(a.path[1] > 0 AND ST_Area(a.geom) > $2)
$$
LANGUAGE 'sql' IMMUTABLE;
create table my_doc_lands as
select
filter_rings(the_geom,1000000) as the_geom
from doc_land7;
Before (200MB)
After (3MB)
Add back in some attributes, indexes and what not, and bobs your uncle. Stay tuned for more fun and games with postgis. Remember nothing in life is ever simple ;)
Another Method
After trying the pseudo topology method suggested by Paul Ramsey and also spelled out here it is clear that while basically sound, his sql has a few things missing:
Here is a more complete version, tested on a small dataset: NZ regions.
create table polys AS select (ST_Dump(the_geom)).geom AS the_geom FROM multipolys;
create table rings AS select(ST_DumpRings(the_geom)).geom AS the_geom FROM polys;
create table ringlines AS select(ST_boundary(the_geom) AS the_geom FROM rings;
create index ringlines_gist on ringlines using GIST (the_geom);
create table mergedmultiline as select ST_Union(the_geom) AS the_geom FROM ringlines ;
create table mergedlines as select ST_linemerge(the_geom) AS the_geom FROM mergedmultiline ;
create table lines as select (ST_dump(the_geom)).geom AS the_geom FROM mergedlines;
Adjacent Polygons
The resulting intersections
This gives you one line per shared polygon edge. You can then proceed with simplification and polygonization as in the original sql, if you wish to reassemble it. Or just use it as lines as you use case dictates.
There is also an alternative method, which i possibly quicker. It has another upside in that it removes the non shared lines. For the likes of regions, you can just render the lines between regions and forget the country outline, which you probably already have in another layer. (If you really need the non shared lines aswell you can union all the shapes and then use st_boundary, or st_exteriorring on the resulting supershape).
Start as per Paul:
create table polys AS
select (ST_Dump(the_geom)).geom AS the_geom
FROM multipolys;
create table rings AS
select(ST_DumpRings(the_geom)).geom AS the_geom
FROM polys;
This gets us a single polygon per feature with any internal rings as seperate features. Now instead of unioning the lines, well find the intersections of the polygons.
alter table rings add gid serial;
create index rings_gist on rings using GIST (the_geom);
create table sharedlines AS
select ST_linemerge(ST_intersection(a.the_geom, b.the_geom)) AS the_geom
from rings a, rings b
where a.gid>b.gid and ST_touches(a.the_geom, b.the_geom);
The intersection of two adjoining polygons is sensibly enough a linestring. However the intersection produces multilines, so linemerge the multilines into lines.
So now we have a single linestring where there were previously a pair of polygon edges. You can now simplify to your hearts content. And then if you want to rebuild the polygons you can do so with the rest of Pauls sql. If you wanted the nonshared line then add that in first, then:
CREATE TABLE newcollection AS SELECT ST_Polygonize(the_geom) AS the_geom FROM sharedlines;
CREATE TABLE newsingles AS SELECT (ST_Dump(the_geom)).geom FROM newcollection;
CREATE TABLE newpolys AS SELECT new.the_geom, old.attr FROM newsingles new, origpolys old WHERE ST_Contains(new.the_geom, ST_PointOnSurface(old.the_geom));
Remember that for this methodology to work, the data must be immaculately noded in a topological sense. Any slithers or overlaps will result in odd shapes or broken lines in the intersections.
HTH.