Web Applications Development, Open Source CMS, Webspaces

Polygon Dissolve and Generalise

Finished json layer in openlayers

Home » » Polygon Dissolve and Generalise

This write up outlines a solution developed to dissolve and generalise a landuse polygon dataset into something suitable for a lightweight vector layer.

The raw data comprises 200mb of shape file in wgs84 projection, and incudes numerous offshore islands including those on both sides of the 180 degree longitude boundary. The shapes possess reasonable topological integrity, but are highly noded.

For our purpose we just want to show which is conservation estate and which is not. So we dont need or want to retain the boundarys of individual parks, and the emphasis is on minimum data size. Hence the less lines the better, which means dissolving the shapes into the fewest possible.

Secondly the process of simplifying data while maintaining topology is not straightforward in postigs at present. See stackoverflow for discussion on how to do this while retaining topological integrity. In particular st_simplify is not aware of shared boundarys and st_simplifypreservetopology is no smarter. All the latter does is, when simplifyiing shapes down to almost nothing, is try not to leave invalid shapes like polygons with two sides and the like. Frankly i think this should be standard behavior of st_simplify, but, alas, moving on.

Just remember that simplify should always be used with care on polygons, as it has a tendancy to leave gaps and overlaps between adjacent features. This naturally creates a catch 22 situation. To dissolve adjacent parks into one means doing it unsimplified so as to avoid the slithers that would be created by simplification. OTOH doing it unsimplified means doing one of the most memory and cpu intensive operation that postgis possesses on a large and noisy dataset,

This workaround was adapted from the methodology found in a linfiniti article. Basically it involves simplifying somewhat first, then buffering the boundarys enough to cover the gaps that were formed between adjacent shapes, then unioning the resulting shapes into fewer larger shapes.

So methodology in hand, one might expect this to be quickly achieved via a quick simplify, buffer, union, and dump. However this dataset shows that in particular the union step is quite problematic in postgis after you exceed a certain complexity. My attempts in this case either crashed postgis from memory exhaustion (st_union) or threatened to take days or weeks to complete using st_memunion. Various experiments over several weeks (you might know this as 'postgis drama') showed that theres some sort of internal logic threshold where things go to custard. A union on the whole dataset is still going a day later, but on a 1/6th of it takes only 20 seconds.

Not being in a position to help develop postgis, clearly another strategy was needed. The forums all said break the data into subsets and dissolve those. I manually made 6 postgis layers from the 2 islands, called them wk_a thru wk_f. But in hindsight could have just used the conservancy attribute of which there are 11.

Anyway here is the procedure that worked.

First prepare the whole dataset.

open in qgis in wgs84, delete all off shore islands outside of the nztm zone, and save as NZTM, import into postgis.
remove attributes
reduce the number of vertices in the data by an initial amount in order to reduce the time taken to do all the succeding steps.
remove the narrow corridors both internal and external that were a feature of the dataset. The former are access strips retained in private ownership, while the latter are riparian margins. For our purposes these are noise. This is achieved using erosion and dilation

-- remove attributes and buffer to 50m

create temp table doc_land2 as

select

st_buffer(st_simplifypreservetopology(st_buffer(st_buffer(the_geom,-50),100),10),0) as the_geom

from doc_land;

So we buffer inwards to remove the exterior noisiness, and outwards to remove the interior noisiness, but leaving it buffered out for the duration of the union. To avoid too many simplifications we do a single light simplification on the buffer results in preparation for unioning. Buffer 0 just to be safe -- union does not tolerate invalids at all well. Then split the data into 6 tables. After that its good to go.

-- disolve, and repeat for a - f

create temp table wk_a2 as

select

ST_Union(the_geom) as the_geom

from wk_a;

-- merge 6 parts back together

create temp table doc_land3 as

select the_geom from wk_a2

union all

select the_geom from wk_b2

union all

select the_geom from wk_c2

union all

select the_geom from wk_d2

union all

select the_geom from wk_e2

union all

select the_geom from wk_f2;

At this point we have a table containing 6 shapes each containing a couple thousand multipolgons. There are still some adjoining shapes on the edges of the sets which need dissolving. Theres a bit an iterative dance to prevent union choking again.

-- multipart to isolate smaller shapes

create temp table doc_land4 as

select

(st_dump(the_geom)).geom as the_geom

from doc_land3;

-- remove small isolated parks, less than 100ha

-- this removes 2/3 of the shapes

alter table doc_land4 add area float;

update doc_land4 set area= st_area(the_geom);

delete from doc_land4 where area <1000000

-- redissolve to merge the overlapping parks on the boundarys of the six sets

-- now quicker because theres lots less shapes

create temp table doc_land5 as

select

(st_dump(ST_Union(the_geom))).geom as the_geom

from doc_land4;

-- unbuffer

create temp table doc_land6 as

select

st_buffer(st_simplifypreservetopology(st_buffer(the_geom,-50),10),0) as the_geom

from doc_land5;

-- dump again to seperate any new little islands from the unbuffer

create temp table doc_land7 as

select

(st_dump(the_geom)).geom as the_geom

from doc_land6;

Lastly the process has left quite a number of tiny interior rings, which we remove, again using 100ha as a threshold. To do this google turned up a neat little function from Regina. It removes rings under a certain size.

-- only works on polygons,not multipolygons

CREATE OR REPLACE FUNCTION filter_rings(geometry,float) RETURNS geometry AS

$$ SELECT ST_BuildArea(ST_Collect(a.geom)) as final_geom

FROM ST_DumpRings($1) AS a

WHERE a.path[1] = 0 OR

(a.path[1] > 0 AND ST_Area(a.geom) > $2)

LANGUAGE 'sql' IMMUTABLE;

create table my_doc_lands as

select

filter_rings(the_geom,1000000) as the_geom

from doc_land7;

Before (200MB)

After (3MB)

Add back in some attributes, indexes and what not, and bobs your uncle. Stay tuned for more fun and games with postgis. Remember nothing in life is ever simple ;)

Another Method

After trying the pseudo topology method suggested by Paul Ramsey and also spelled out here it is clear that while basically sound, his sql has a few things missing:

dump rings doesnt work on multipolys so you need to dump first
ommits the step to convert the poly to a linestring
the union step chops up the lines into indiv segments, requiring a linemerge before simplifying will work
union in this process badly needs an index, otherwise it will take all week

Here is a more complete version, tested on a small dataset: NZ regions.

create table polys AS select (ST_Dump(the_geom)).geom AS the_geom FROM multipolys;

create table rings AS select(ST_DumpRings(the_geom)).geom AS the_geom FROM polys;

create table ringlines AS select(ST_boundary(the_geom) AS the_geom FROM rings;

create index ringlines_gist on ringlines using GIST (the_geom);

create table mergedmultiline as select ST_Union(the_geom) AS the_geom FROM ringlines ;

create table mergedlines as select ST_linemerge(the_geom) AS the_geom FROM mergedmultiline ;

create table lines as select (ST_dump(the_geom)).geom AS the_geom FROM mergedlines;

Adjacent Polygons

The resulting intersections

This gives you one line per shared polygon edge. You can then proceed with simplification and polygonization as in the original sql, if you wish to reassemble it. Or just use it as lines as you use case dictates.

There is also an alternative method, which i possibly quicker. It has another upside in that it removes the non shared lines. For the likes of regions, you can just render the lines between regions and forget the country outline, which you probably already have in another layer. (If you really need the non shared lines aswell you can union all the shapes and then use st_boundary, or st_exteriorring on the resulting supershape).

Start as per Paul:

create table polys AS

select (ST_Dump(the_geom)).geom AS the_geom

FROM multipolys;

create table rings AS

select(ST_DumpRings(the_geom)).geom AS the_geom

FROM polys;

This gets us a single polygon per feature with any internal rings as seperate features. Now instead of unioning the lines, well find the intersections of the polygons.

alter table rings add gid serial;

create index rings_gist on rings using GIST (the_geom);

create table sharedlines AS

select ST_linemerge(ST_intersection(a.the_geom, b.the_geom)) AS the_geom

from rings a, rings b

where a.gid>b.gid and ST_touches(a.the_geom, b.the_geom);

The intersection of two adjoining polygons is sensibly enough a linestring. However the intersection produces multilines, so linemerge the multilines into lines.

So now we have a single linestring where there were previously a pair of polygon edges. You can now simplify to your hearts content. And then if you want to rebuild the polygons you can do so with the rest of Pauls sql. If you wanted the nonshared line then add that in first, then:

CREATE TABLE newcollection AS SELECT ST_Polygonize(the_geom) AS the_geom FROM sharedlines;

CREATE TABLE newsingles AS SELECT (ST_Dump(the_geom)).geom FROM newcollection;

CREATE TABLE newpolys AS SELECT new.the_geom, old.attr FROM newsingles new, origpolys old WHERE ST_Contains(new.the_geom, ST_PointOnSurface(old.the_geom));

Remember that for this methodology to work, the data must be immaculately noded in a topological sense. Any slithers or overlaps will result in odd shapes or broken lines in the intersections.

HTH.

Polygon Dissolve and Generalise

www.webspaces.net.nz