Being Awesome with the MongoDB Ruby Driver

by Ethan Gunderson on December 21, 2010

Being Awesome with the MongoDB Ruby Driver

This guest post is by Ethan Gunderson, who is a software developer living in Chicago. By day he is a developer at Obtiva, where he helps clients deliver projects and be more awesome. By night, he is part of the gathers.us team, a co-organizer of ChicagoDB, and contributes when he can to the MongoDB community. You can find him at ethangunderson.com, or on Twitter as @ethangunderson.

Ethan Gunderson MongoDB is fast becoming one of the more popular and widely used NoSQL databases, and rightfully so. Its flexible key/value store, powerful query language and sexy scaling options is enough to piqué any developers interest. While most Ruby developers may jump right into the warm embrace of the Active Record replacements Mongoid and MongoMapper, that often robs developers of a valuable learning experience.

The MongoDB Ruby driver is not only simple to use, but it will get you familiar with how queries look and how they operate. Armed with this knowledge, moving into an ORM becomes much easier. You’ll not only be able to understand what is abstracted away, but you’ll be able to spot bad and inefficient generated queries, making performance troubleshooting a snap. To help you hit the ground running, we’ll be building up some of common queries you would find in a common blog. Let’s get started!

Installation

Since the driver is just a gem, installation is simple:

(sudo) gem install mongo

There is one more piece to install, bson_ext. Essentially, this is a collection of C extensions used to increase the speed of serialization. While this is optional, I recommend that you install it.

(sudo) gem install bson_ext

Now that we have everything installed that we need, lets hop into some code and see what we can do.

Getting Started

First things first, we need a database connection. For the sake of simplicity, we’ll be using localhost with the default port.

require 'mongo'
include Mongo

db = Connection.new.db('ruby-learning')

The next thing we’ll need is a place to store all of our posts. Let’s go ahead and get a post collection started.

posts = db.collection('posts')

That’s it! If you notice, there are two things we *didn’t* do that are kind of cool: we didn’t create the database or the collection. In fact, neither still exist at the database level, and won’t until we insert some data.

Inserting & Updating Documents

Let’s get our blog rolling with a high quality post.

new_post = { :title => "RubyLearning.com, its awesome", :content => "This is a pretty sweet way to learn Ruby", :created_on => Time.now }
post_id = posts.insert(new_post)

So, what did we just do? MongoDB stores its data as key/value pairs, which maps nicely to Ruby’s Hash. After creating a hash with our data, we inserted it into the posts collection, and in return, we received the ObjectId for the post from MongoDB. Pretty simple, right? It’s just as simple to update that document as well.

post = Posts.find( :_id => post_id ).first
post[:author] = "Ethan Gunderson"
posts.update( { :_id => post_id }, post )

Using the ObjectId we got back from our insert query, we find that same document again. After changing the data as we see fit, we issue an update query. An update query takes two arguments, the first one is conditions used to find the document (just the ObjectId in our case), and the second is the data.

While this works, it’s kind of silly if you think about it. We query the database for our document, change a small amount of information, and insert the entire document again. There’s gotta be a better way! Luckily, there is. MongoDB has the concept of Query Operators. One of these operators is $set, which allows you to, as I’m sure you can guess, set the value of an attribute.

posts.update( { :_id => post_id }, '$set' => { :author => 'Ethan Gunderson' } )

Here, we supply our find conditions, similarly to our previous update, but instead of supplying the entire document, we just set the values we wish to change. Now, instead of having to issue two queries against the database, we can accomplish the same task in one query, and less code to boot.

Now let’s take care of the post index page next.

posts = Posts.find

If you run this, you’ll probably notice a problem. Most of the time, blogs list their posts in descending order. Let’s change our query to account for this.

posts.find.sort( [['_id', -1]] )

This query has a couple of interesting points. Firstly, note that we are sorting on id. Since MongoDB’s ObjectId’s contain a timestamp, we can accurately sort based on that. This effectively removes the need for a created_at timestamp as well! Secondly, the sort parameter must always take an array of array, even if there is only one field you are sorting on.

So, that wasn’t so hard, but pretty naïve. What happens when we have 1,000 posts? We don’t want to torture our visitors with a ridiculous page load time, so let’s trim that back.

posts.find.sort( [['_id', -1]] ).limit(5)

Again, this was pretty simple, and by now you should be noticing a pattern. Building up relatively complex queries is just a matter of chaining methods together. To further this example, here’s a query showing a theoretical pagination query.

posts.find.sort( [['_id', -1]] ).limit(5).skip(5)

Tags

Another common element to blogs is the concept of tags. To accomplish this in our example, we’ll be adding an array of tags to our blog post.

post = Posts.find( :_id => post_id ).first
post[:tags] = ['mongo', 'nosql', 'awesome'] 
posts.update( { :_id => post_id }, post )

Now lets find a post based on a specific tag.

posts.find( :tags => 'mongo' )

It really doesn’t get more simple than that, folks. Now that the basic implementation is out-of-the-way, how do we find posts that match more than one tag? To accomplish this, we’ll be using another Query Operator called $all. As you can imagine, the $all operator specifies that selected documents contain all the elements in the supplied array.

posts.find( :tags => { '$all' => ['mongo', 'awesome'] } ) 

To round out our tags feature, let’s build a query that will list all the unique tags in our system. There are a couple of ways to skin this particular cat, since we don’t need to do any aggregation, and it needs to be performant, we’ll be using distinct. Though, if we needed to also produce a count of tag occurrences, Map/Reduce may be a better option.

posts.distinct('tags')

Indexing

Now that our blog is starting to grow in complexity, we’ll need to start thinking about adding proper indexes. If you notice in our tags implementation, we’re now querying on an attribute that is not indexed. Let’s fix that.

posts.create_index('tags')

And there we have it. In a relatively short amount of time, we’ve built up a lot of the common queries you would see in a standard blog. While I’ve only touched the surface of what you can accomplish with the MongoDB Ruby driver, I hope that I’ve shown you it’s power. I’ve included some more learning material and references below to continue your learning. Of course, if you have any questions, feel free to ask questions and give feedback in the comments section of this post. Thanks!

References

Technorati Tags: , , ,

Posted by Ethan Gunderson

{ 12 comments… read them below or add one }

Kyle Banker December 21, 2010 at 10:06 am

Great post, Ethan! And thanks for emphasizing the driver’s ease of use. Learning a driver is a great way to learn MongoDB (and a very good thing to do before starting to use an ORM).

Kyle

Reply

Ethan Gunderson December 23, 2010 at 12:10 am

Glad you enjoyed it, and thanks for putting so much effort into the driver’s usability!

Learning the driver first is something that I’ve been pushing for awhile now, so I’m glad to see that is has 10gen employee approval as well. :)

Ethan

Reply

Buddy Lindsey December 21, 2010 at 10:57 am

This is awesome. I have been kind of looking at MongoDB from afar kind of wanting to look at it, but nervous to do so. I think after reading this and seeing how simple it is. I might just have to take a dive and see what it is like.

Thanks.

Reply

Glenn Goodrich December 21, 2010 at 7:02 pm

Ethan,

Do you have rules of thumb for when you use MongoDB vs an RDBMS? Or is it all-Mongo-all-the-time?

Great article, btw….thanks!

Glenn

Reply

Ethan Gunderson December 22, 2010 at 11:53 pm

Glenn,

I’m a big believer in using the right tool for the job. Sometimes that’s Mongo, other times it’s not. I suggest doing some up front analysis of your problem space, and pick the datastore that best solves it best. This really isn’t any different than what most people do for the rest of their technology stack(Should I use Sinatra, or Rails?), but seems to get lost when picking a datastore.

Hope that helps!
Ethan

Reply

Tony December 21, 2010 at 6:25 pm

Great Post! I’ve been playing with the mongo ruby driver on and off for a few weeks now and this really helped solidify some concepts.

Reply

Ethan Gunderson December 22, 2010 at 11:55 pm

Glad you enjoyed it!
Ethan

Reply

Kevin Taylor December 21, 2010 at 10:50 pm

Thanks for the clear introduction, Ethan.

Now I have no excuse for not playing with MongoDB.

Reply

Tim Linquist December 22, 2010 at 3:18 am

Great post. I’d like to point out Mongoid exposes a lot more of the native driver functionality than the other popular MongoMapper. For our needs this is important but maybe not an issue for everyone.

Reply

Ethan Gunderson December 22, 2010 at 11:59 pm

Thanks for the tip Tim!

I’ve used both MongoMapper, and Mongoid in projects before, and feel that both are pretty solid. They each definitely have their own sets of pros and cons though, so for anyone deciding between the two, make sure to do a little research before hand.

Ethan

Reply

Eric Lubow December 25, 2010 at 10:26 pm

I’ve spent a lot of time with the Mongo Ruby driver and find it extremely easy to use. There is one thing which I haven’t found documentation for doing an $or query in an array with Ruby.

Here is the query in Javascript: {$or:[ {field:{$in:[X]}} , {field:{$size:0}}]}

Here is the same query in Ruby where the $or is added to the query hash: query['$or'] = [{'field' => {'$in' => [42]}},{‘field’ => { ‘$size’ => 0}}]

Reply

Daniel January 22, 2014 at 2:54 pm

Thanks for your post. Just a short question on indexes:
When do you call the #create_index method? Once on start up of the application, or every time I add a post or once in a applications lifetime using a separate rake task?

Thanks,
Dan

Reply

Leave a Comment

{ 43 trackbacks }

Previous post:

Next post: