Geospatial data in Mongo
I'm currently building a simple web API using Clojure and Monger. The API currently takes a longitude and latitude and will return a vector of locations that fall within a predefined distance of that point. (Note: this will change).
The code itself is pretty trivial, it's less than 30 lines of code. I get all of the locations from the database, then filter out all of the locations that don't fall within the predefined distance. The distance calculation is more of an approximation since it's all going to be displayed using Google maps. The calculation uses Pythagoras's theorem, which for geographical coordinates that are close to each other will be a reasonably accurate approximation and that is good enough for this application. I then memoize the call to the database since there are only about 2500 locations.
I'm getting reasonably good performance out of this. According to the time function the first call to lookup and filter the locations takes about 0.8 msecs and subsequent calls take about 0.05 msecs. But when we start adding more data it's definitely going to get worse.
What I want to do is use Mongo's indexing to do all of this work for me. Apparently using Mongo's geospatial indexes I can get back all of the locations that fall within a bounding box. Not only would this make a better API since we'll be getting back points within a box and the locations will be displayed in a box (Google Maps). It also makes my code trivial! I can cut 30 lines of code down to about 1!
From what I've read Mongo uses B-trees to do the indexing but the ideal representation of geospatial data is R-trees, but for just a few thousand locations I don't think this will be an issue. I am interested to see how the performance compares to filtering memoized data.
I'll post more when I've done some more experimentation.
The code itself is pretty trivial, it's less than 30 lines of code. I get all of the locations from the database, then filter out all of the locations that don't fall within the predefined distance. The distance calculation is more of an approximation since it's all going to be displayed using Google maps. The calculation uses Pythagoras's theorem, which for geographical coordinates that are close to each other will be a reasonably accurate approximation and that is good enough for this application. I then memoize the call to the database since there are only about 2500 locations.
I'm getting reasonably good performance out of this. According to the time function the first call to lookup and filter the locations takes about 0.8 msecs and subsequent calls take about 0.05 msecs. But when we start adding more data it's definitely going to get worse.
What I want to do is use Mongo's indexing to do all of this work for me. Apparently using Mongo's geospatial indexes I can get back all of the locations that fall within a bounding box. Not only would this make a better API since we'll be getting back points within a box and the locations will be displayed in a box (Google Maps). It also makes my code trivial! I can cut 30 lines of code down to about 1!
From what I've read Mongo uses B-trees to do the indexing but the ideal representation of geospatial data is R-trees, but for just a few thousand locations I don't think this will be an issue. I am interested to see how the performance compares to filtering memoized data.
I'll post more when I've done some more experimentation.