Sunday, September 16, 2012

PyMongo and MongoDB

In the last post I mention that JSon is the way a lot of information over the web is serialized and exchanged. As part of a movement to move away from a pull model (SNMP) to push model (syslog, sFlow), many network vendors are picking up JSon as the way to bubble up information to the network engineers.

So now that you have a JSon object, how do you persist it? You can obviously pickle and shelve it, but that is not recommended for long term storage, plus that does not scale very well. Any SQL flavor will obviously do the job, but to be honest, coming up with a database schema is kind of a pain in the butt. Especially in the development stage. I want some simple database that allows me to persist my object the way I view it and get out of the way. Also would be nice if it is proven scalable so I dont need to change much by the time I am done.

In comes MongoDB. It is pretty awesome that it matches the object model as I write the script, so it doesn't require a mind shift. It is scalable as many web scale companies are using it. More importantly, it just let me store the object, and "gets out of my way", I say that in a positive sense, of course. No offense, but I have enough headache on hand trying to solve the problem at hand without needing to worry about databases. MongoDB also has the added benefit of cross-platform and batteries included, both are pretty Pythonic.

Note that I am barely scratching the surface of the MongoDB capabilities, especially the query portion. But just to share my experience so far. Leave me comments if you have interesting ideas and use cases.

Here is how to get started:

1. Go to the MongoDB site to install the package according to the OS you are using. I have installed them on Win7 and Mac, pretty easy.
MongoDB Site

2. Install PyMongo, on both Win7 and Mac, I just use easy install.
PyMongo

3. (Optional) You can watch the tutorial given by the primary developer of PyMongo during PyCon 2012.
PyCon2012 MongoDB Tutorial Video

4. (Optional) Book: MongoDB and Python
MongoDB and Python: Patterns and processes for the popular document-oriented database

Here are some of the things I have tried:

0. Start MongoDB using the binary with default port on localhost:
$ ./mongod

1. Import the PyMongo and create connection:

>>> import pymongo
>>> from pymongo import Connection
>>> connection = Connection()
>>> connection
Connection('localhost', 27017)

2. Connects to database 'network' (it will just create one if it does not exist already): 
>>> db = connection.network
>>> 
>>> db
Database(Connection('localhost', 27017), u'network')
>>>

3. Connects to 'switches' collection (again, it will just create one if it does not exist already): 
>>> collection = db['switches']

4. Create 1 document and insert: 
>>> post = {"Name": "R1", "ASN": "65001"}
>>> posts = db.posts
>>> posts.insert(post)
ObjectId('5056039ac4c1b468ddbbe27f')
>>> db.collection_names()
[u'system.indexes', u'posts']
>>> 
>>> posts.find_one()
{u'_id': ObjectId('5056039ac4c1b468ddbbe27f'), u'Name': u'R1', u'ASN': u'65001'}
>>> posts.find_one({"ASN": "65001"})
{u'_id': ObjectId('5056039ac4c1b468ddbbe27f'), u'Name': u'R1', u'ASN': u'65001'}
>>>

3. Multiple inserts: 

>>> new_posts = [{"Name": "R2", "ASN": "65002"},
... {"Name": "R3", "ASN": "65003"}]
>>> posts.insert(new_posts)
[ObjectId('5056045ac4c1b468ddbbe280'), ObjectId('5056045ac4c1b468ddbbe281')]
>>> 

4. Query all the documents: 
>>> for post in posts.find():
...     post
...
{u'_id': ObjectId('5056039ac4c1b468ddbbe27f'), u'Name': u'R1', u'ASN': u'65001'}
{u'_id': ObjectId('5056045ac4c1b468ddbbe280'), u'Name': u'R2', u'ASN': u'65002'}
{u'_id': ObjectId('5056045ac4c1b468ddbbe281'), u'Name': u'R3', u'ASN': u'65003'}
>>> 

5. Query for particular match: 
>>> for post in posts.find({"Name": "R2"}):
...     print("Found: ", post)
...
('Found: ', {u'_id': ObjectId('5056045ac4c1b468ddbbe280'), u'Name': u'R2', u'ASN': u'65002'})
>>>

6. Trying out the count method: 
>>> posts.count()
3
>>>
>>> posts.find({"Name": "R1"}).count()
1
>>> 

7. Try to remove entry: 
>>> for post in posts.find():
...     posts
...
Collection(Database(Connection('localhost', 27017), u'network'), u'posts')
Collection(Database(Connection('localhost', 27017), u'network'), u'posts')
Collection(Database(Connection('localhost', 27017), u'network'), u'posts')
>>>
>>>
>>> posts.remove({"Name": "R2"})
>>> for post in posts.find():
...     post
...
{u'_id': ObjectId('5056039ac4c1b468ddbbe27f'), u'Name': u'R1', u'ASN': u'65001'}
{u'_id': ObjectId('5056045ac4c1b468ddbbe281'), u'Name': u'R3', u'ASN': u'65003'}
>>> 

>>> for post in posts.find():
...     post
...
{u'_id': ObjectId('5056039ac4c1b468ddbbe27f'), u'Name': u'R1', u'ASN': u'65001'}
{u'_id': ObjectId('5056045ac4c1b468ddbbe281'), u'Name': u'R3', u'ASN': u'65003'}
>>>
>>>
>>>
>>>

8. Try to update R1 entry: 
>>> posts.update({"Name": "R1"}, {"DataCenter": "SJC"})
>>> for post in posts.find():
...     post
...
{u'DataCenter': u'SJC', u'_id': ObjectId('5056039ac4c1b468ddbbe27f')}
{u'_id': ObjectId('5056045ac4c1b468ddbbe281'), u'Name': u'R3', u'ASN': u'65003'}
>>> 

9. Test that the data persist. Exit out of Python and start again: 
$ python
>>> from pymongo import Connection
>>> connection = Connection()
>>> connection
Connection('localhost', 27017)
>>> db = connection.network
>>> db
Database(Connection('localhost', 27017), u'network')
>>> posts = db.posts
>>> posts
Collection(Database(Connection('localhost', 27017), u'network'), u'posts')
>>> db.collection_names()
[u'system.indexes', u'posts']
>>> for post in posts.find():
...     post
...
{u'_id': ObjectId('5056039ac4c1b468ddbbe27f'), u'Name': u'R1'}
{u'_id': ObjectId('5056045ac4c1b468ddbbe281'), u'Name': u'R3', u'ASN': u'65003'}
>>> 

So there you have it, a quick 5 minute introductory to PyMongo in a network flavor. I will share more experiences if I start to work with PyMongo/MongoDB more. So far it looks to be the one I will use if I ever need a database in my projects. 




No comments:

Post a Comment