GIS Howtos
Making NZ Road Points as an exercise to learn postgis.
Ive been working with the LINZ topo vector lately, and been learning POSTGIS along the way. Below ill show how i converted a polyline (polyline in qgis, linestring in postgis) roading layer into a point layer for search related purposes.
First after you hit the limit of qgis`s geoprocessing tools you need to gett your head around postgis. My basic requirement was to find the mid points of the road centerlines, something qgis cant do.
This layer has feature attribute called type: metalled, unmetalled, sealed, and another attribute for highway number. Hence you can divide the layer into 4 layers, highways, sealed, metalled, and unmetalled, and you have a pretty nice roading heirarchy. About 80% of them have names too.
Install POSTGIS/POSTGRES by following this:
[www.gpsfiledepot.com/tutorials/installing-and-setting-up-postgresql-with-postgis/]
Next off theres good reading here:
[http://workshops.opengeo.org/postgis-intro/]
And the manual
[http://www.postgis.org/docs/reference.html]
Postgis stores both the attibutes and geometry in the same table. Geometry is just another column type, albeit with special tools. Unlike shapefiles there must be a unique integer column, ie the serial primary key.
Ok first off you need to get to grips with a workflow, after the comfortable world of ftools gui. Theres 3 qgis plugins , rt sql, postgis manager, and spit. I found rt sql unstable with large datasets so skipped that. I found postgis manager useful for adding and deleting tables, adding columns, indexes etc. But its query tool is very rudimentary, so i recommend using pgadmin3, it remembers all your querys, has a button to cancel a query that looks like it wont finish in your lifetime, buttons to check the syntax and optimisation of your query before you run it, and many other useful things. Finally dont assume you have to grasp all the nuances of postgis at the outset, if you know a scripting language like python /perl/ php use it to handle the more complex tasks. Postgis is easy to pickup the basics but after that it can get quite mind boggling.
Use spit to import your shapefiles, its easy enough.
After that to execute a typical geoprocessing task instead of using ftools to create a output shape layer, create a new table from other tables using syntax like so:
create table mynewtable as
select
row_number() over (order by gid)::int AS my_id,
other attributes you want,
the_geom
from
tables you want
That row_number() stranger is a postgis windowing function that creates an auto_incremented id primary key attribute, without which qgis will most likely complain, unless your data already has a suitable key. After that you can load the table as a layer in qgis, and you are away laughing.
Now back to road mid points. Heres what i did after much trial and error.
Postgis has a function ST_Line_Interpolate_Point which does pretty much exactly what we want. But first theres a bunch of complications.
1. We need to bin all the roads without names, mostly dirt farm tracks and the like, and a small handful of roads without both a name and provider id. Tip: this unique id is called road_name_ in all that follows.
-- remove all those roads with no names to r2
create table r2 as
select gid,road_name_,name,road_highw,the_geom
from nzroads
where name is null and road_name_ is null
-- load the rest to r3
create table r3 as
select gid,road_name_,name,road_highw,road_surfa,the_geom
from nzroads
where not ( name is null and road_name_ is null)
-- take out malformed name/ids to r4 (20 or so)
create table r4 as
select gid,road_name_,name,road_highw,road_surfa,the_geom
from r3
where name is null or road_name_ is null or road_name_ <`1000000000000` or road_name_ ~* `[a-z]`
delete from r3
where name is null or road_name_ is null or road_name_ <`1000000000000` or road_name_ ~* `[a-z]`
2. Now the unique roads are in fact singlepart in qgis jargon, ie: spread over several features, (as the road surface changes.)
-- multipart the singlepart linestrings from r3
create table r5 as
select
row_number() over (order by road_name_)::int AS gid,
count(gid) as nlines,
road_name_,
(st_collect(the_geom)) as the_geom
from r3
group by road_name_
3. Now that each road is on its own row as MULTILINESTRINGS, we try to merge the metalled to the sealed. For this purpose we dont care about the surface transitions, just unique roadnames.
-- linemerge the multis so they can be labelled as one feature
create table r6 as
select
row_number() over (order by road_name_)::int AS gid,
road_name_,
(st_linemerge(the_geom)) as the_geom
from r5
4. However by looking at the output in postgis manager in the table viewer, we can see that we have a mix of LINESTRINGs and MULTILINESTRINGs. Which can only mean one thing our merge wasnt entirely sucessful. SO after studying the MULTIs we can see two reasons,
a) some roads have a small y where they intersect another road, so three lines all called Blogg Road, clearly they wont merge.
b) some end and start nodes are not exactly the same and linemerge doesnt consider them the same line.
This is where it gets interesting. First we look at sn_snaptogrid, but if you think about it this will fail in rare cases where points sit near the cut off point, eg:
select
st_astext(
st_snaptogrid( st_geomfromtext(`MULTILINESTRING((0 0,100.49 200),(100.51 200, 300 300))`),1 )
)
-- gives MULTILINESTRING((0 0,100 200),(101 200,300 300))
People say that postgis 2.0 will have a function to fix this, but meanwhile we need another strategy. We could write a script to search in a radius around each end point, but because of the Y problem its a lot of work for nothing. So, instead i retry an idea i initially tried in qgis. Buffer the roads to 1m, then find the center of that. But st_centroid is a fairly unsmart thing, however st_pointonsurface comes to our rescue. It returns a point within the polygon, and somewhere near the middle of the biggest part. Its not super precise but good enough. Like so:
-- first separate the resulting single and multis to r7 and r8
-- to save cpu on the much larger set of simpler cases
create table r8 as
select
row_number() over (order by road_name_)::int AS gid,
road_name_,
the_geom
from r6
where GeometryType(`the_geom`) = `MULTILINESTRING`
create table r7 as
select
row_number() over (order by road_name_)::int AS gid,
road_name_,
the_geom
from r6
where GeometryType(`the_geom`) = `LINESTRING`
-- the multis are the trickiest
-- buffer to 1m, and merge by id
create table r9 as
select
row_number() over (order by road_name_)::int AS gid,
road_name_,
ST_Buffer(the_geom, 1) as the_geom
from r8
5. Its at this point that i change my mind and decide if a unique road has two quite separate parts, ie further apart than our 1m buffer, then it should have two labels. Think about those roads that are divided by a river or a rail line.
-- re-multipart so we can label the sep parts independently
create table r10 as
select
row_number() over (order by road_name_)::int AS gid,
road_name_,
(st_dump(the_geom)).geom as the_geom
from r9
6. Finally we get nearer the crux of it. Lets find our two sets of mid points.
-- label the resultant spaghetti polys using pointonsurface, a close enough approximation to line mid point
create table r11 as
select
row_number() over (order by road_name_)::int AS gid,
road_name_,
st_pointonsurface(the_geom) as the_geom
from r10
-- label the easier single linestring midpoints
create table r12 as
select
row_number() over (order by road_name_)::int AS gid,
road_name_,
ST_Line_Interpolate_Point(the_geom,0.5) as the_geom
from r7
-- merge both sets of points, r11 and r12
create table r13 as
SELECT * FROM r11
UNION ALL
SELECT * FROM r12
7. Now we need to join back in our nice attributes.
-- join road names back in from r3
-- is there a way to do this join in one step?
create table r14 as
select distinct on (road_name_) * from r3
order by road_name_
create table r15 as
SELECT
r13.gid,
r14.name,
r14.road_name_,
r14.road_highw,
r14.road_surfa,
r13.the_geom
FROM r13 LEFT JOIN r14 ON r14.road_name_ = r13.road_name_
order by r13.road_name_
-- and lastly add in region and council attributes from stats nz region and tla shapefiles
-- this one will take a while, about a half hour all up for 60K records.
create table roads_merged as
select
r13.gid,
r13.road_name_ as linzid
r13.name as name,
regions.regcode as regcode,
tlas.tacode as tacode,
regions.name as region,
tlas.name as tla,
r13.the_geom
from r13
left join regions on st_intersects(r13.the_geom,regions.the_geom)
left join tlas on st_intersects(r13.the_geom,tlas.the_geom)
The reason i used left join was because i wanted to be sure that all roads fell within the council polygons, (inner join would be quicker otherwise). And so there were about 10 roads with NULL tla attribute. Studying them its immediately obvious why, the line mid point was on a bridge over water which was no mans land as far as tlas are concerned (apparently). So 3 mins of hand editing and we finally have our complete road center points, eminently suitable for a search tool.
Oh dont forget to add primary keys, and spatial indexes to your finished layer, and update the geometry metadata table.