In our last attempt we used a recursive SQL function to do a nearest neighbor search in PostgreSQL. The problem with that solution is that it requires us to create a helper function for each dataset we want to scan for nearest neighbors.
The reason I chose this instead of a PLPGSQL solution was the following:
- Easier to prototype a solution in SQL than PLPGSQL
- PostgreSQL planner doesn't allow set returning PLPGSQL functions in the select part of a statement (has to be in the from) and currently the only way to return sets of records from a function where the function parameters dynamically depend on inputs from a from clause is putting it in the select. I described my annoyance with this in Solving the Nearest Neighbor Problem in PostGIS
- SQL functions are basically treated like inline functions for all intensive purposes which means whats going on inside the function is pretty transparent to the planner and the planner can use that information to optimize. With any other language, the planner will pretty much treat the function as a blackbox and only utilize the volatile, stable, immutable predicates to determine if it can use a cached answer.
Now what can PLPGSQL or any other PL language in PostgreSQL provide us that our trusty SQL can't? It allows us to write dynamic SQL statements and allows us procedural flow control which is very important in order to create an all purpose solution. If dynamic SQL statements and procedural flow were possible in the regular SQL language function, I would not have had to resort to writing a recursive function. So the problem now is to try to trick the planner into allowing us to use a PLPGSQL set returning argument dependency in the select part.
Originally I had thought that having the planner have visibility into the function would be useful, but since all the difficult sql processing is
happening within the confines of the function, I don't think it helps and actually could be harmful.
So my plan is to use PLPGSQL for the processing and try to compensate for PGSQL's incompatibilites by using the Coated pill AKA Trojan Horse approach.
If the above doesn't work, I'll have to resort to using a bound C function which is a bit out of the scope of my knowledge at this point.