CouchDB makes a great data cache (with help from Ruby)
Tuesday, January 12, 2010 at 5:02PM I was working on pulling in some data from this nasty web service (slow, complex, unreliable) where the data is not structured the way I want it to be (oh yeah, and there’s that too…). To get all the data I need there’s about seven or so queries I have to make. Not wanting to have my main app deal with all these queries and storing the data in its relational (and rigid) structure I thought this seemed like a good use case for CouchDB. The semi-structured data format in CouchDB allows me to layer data on top of existing documents as need. Each query adding more data to the existing data and in the end giving me the full picture of the end objects I am after. “Progressive data enhancement” if you will. Once I’ve got this full picture of the data I can sit my CouchDB database between my main app and this ugly, awful web service. My main app will hit CouchDBs nice, clean RESTful!!!, interface for the data I want. A small Ruby script (on a chron perhaps) deals with all the aforementioned nastiness, grabs data from the web service and loads it into CouchDB. This setup encapsulates the idiosyncrasies of the outside world in one place and allows my main app to operate in reality distortion field of my own design. Sweet!
Ultimately, this data will end up (at least part of it) in a relational DB but setting up the needed tables, columns, etc… require a lot upfront to build out the all that good relational-ness. At this point, it’s pretty exploratory. As I add more webservice calls I’d like to immediately see that data and use it in my app. Not have to write a migration that I may or may not have to roll back if the additional data element is wrong or not needed. I’m trying to keep it agile over here!
So, I have my code query the remote web service and I get back a hash for each object I want to put in my CouchDB database (conveniently I designed this code to turn ugly SOAP mapping objects into plain old arrays of hashes). The initial web service query returned over 12,000 objects so iterating over each one to stuff into the CouchDB seemed silly. Luckily CouchDB can handle bulk document creation (using the CouchRest gen this is exposed as ‘bulk_save’). The first time I did this it worked fine and returned all the new id’s it created. After some thought about how to get the rest of my data in there I decided to use custom document IDs instead of the autogenerated UUIDs in couch assigns. Using the default id would mean I’d have to hit Couch to look up the id on each web service returned object just to get the document. This is an extra lookup in CouchDB which is fine because Couch is very fast but there’s no need for this.
CouchDB bulk create allows you to specify your own unique id to use instead of the auto assigned/generated one. In my situation this would allow me to just blindly post to Couch with my own ID in the URL and have the new data from my other queries go directly to the document. The progressive data enhancement I was after without the redundant lookup to get the right document first.
What I needed to do was wipe out my database that held the CouchDB created document with the CouchIDs, modify the hashes I get back from the web services to have “id”=>”myuniqueid” in there so when I did my bulksave CouchDB would not create IDs for me but just use the ones given to it.
Here’s (finally) where Ruby comes in.
Let’s say that here is the array of hashes I get back from the webservice:
vals =[{:a => "1"},{:a => "2"},{:a => "3"}]
(where :a is the key for the web service unique id value)
I need to modify each hash in this array to add {“id” => “valueof‘a’_”} so we want the first element to look like:
{"_id" => "1",:a => "1"}
for example…
Ruby #map to the rescue!
vals.map{|v| v["_id"]=v[:a]}
Gives us:
vals.inspect
=> "[{"_id"=>"1", :a=>"1"}, {"_id"=>"2", :a=>"2"}, {"_id"=>"3", :a=>"3"}]"
With our hash formatted the right way to use our web service given ID as the CouchDB ID bulk_save will create all the documents we need so that we can access them (using this sample data) like:
curl -X GET 'http://127.0.0.1:5984/test_db/1'
Should give you:
{"_id":"1","_rev":"1-10085f96b70ddbb6155710a391194304","a"=>"1"}
(your rev id will be different of course)
That’s all there is to it!

Reader Comments