MongoDB…Ops Stuff

In the previous two posts we have seen some basic querying and how to leverage the querying mechanism to get up and running. Now, we are off in the wild world and we also need to some more complicated stuff.

Creating Indexes
The api surface is really smooth with this, allowing us to specify the sort order of the indexes and the manner of building them foreground or background.

 public void CreateIndex()
{
var QuestionConnectionHandler = new MongoConnectionHandler<Question>("MongoDBDemo");
QuestionConnectionHandler.MongoCollection.EnsureIndex( 
                          IndexKeys.Ascending("Difficulty"), IndexOptions.SetBackground(true));
}

Dropping indexes is also easy with

QuestionConnectionHandler.MongoCollection.DropAllIndexes();

What happens when you want to see what is going on under the hood ? You let the database Explain it’s Plan.

QuestionConnectionHandler.MongoCollection.AsQueryable()
                  .Where(q => q.Difficulty >= 3).Explain();
//or if you went the other way 
var query = Query<Question>.GTE(q => q.Difficulty, 3);
var explainPlan = QuestionConnectionHandler.MongoCollection
                          .FindAs<Question>(query).Explain();

Now, if only we could have some stats about our database and the indexes. All wrapped in a nice syntax.

    var stats = QuestionConnectionHandler.MongoCollection.GetStats();
    Console.WriteLine("Namespace : {0}", stats.Namespace);
    Console.WriteLine("DataSize : {0}", stats.DataSize);
    Console.WriteLine("Index Count : {0}", stats.IndexCount);
    stats.IndexSizes.Keys.ForEach(Console.WriteLine);
    var size = QuestionConnectionHandler.MongoCollection.GetTotalDataSize();
    Console.WriteLine("The total datasize for this collection is {0}", size);

A routine task is to get all the collections in a database and all the databases on the server itself. Easy peasy!!

    var collections = QuestionConnectionHandler.MongoCollection.Database.GetCollectionNames();
    Console.WriteLine("\nThe following collections are present in the database");
    collections.ForEach(Console.WriteLine);
    var client = new MongoClient(@"mongodb://localhost");
    var server = client.GetServer();
    var databases = server.GetDatabaseNames().ToList();
    Console.WriteLine("\nAll the databases in the server");
    databases.ForEach(Console.WriteLine);
Advertisements

MongoDB C# Driver Part 3

In the previous post we have seen some simple queries. It is time we move onto something more concrete and realistic. There are basically two ways of querying MongoDB with the driver. First, as I showed last time is using LINQ. To use LINQ we need to first move into the Queryable world and then proceed with actual querying.
Be careful about pulling all the documents locally and then performing operations on them. What we really want to do is offload all our querying to MongoDB and then only use the results. The driver implements the IQueryable interface and hence we should use it.

	var result = UserConnectionHandler.MongoCollection.AsQueryable()
                                      .Where(u => u.Reputation > reputation);

The alternative to this form of querying is using a BsonDocument and a MongoQuery. The way to build up such a query is below. Note, that the first lambda is the property and the second parameter is the key for the filtering. The query builder is in the MongoDB.Drivers.Builders namespace.

var query = Query<User>.GT(u => u.Reputation, reputation);
var result = UserConnectionHandler.MongoCollection.FindAs<User>(query);

It is a bit more work to specify queries by hand so I would prefer LINQ, but both options are available.
Another, interesting querying mechanism is Regex. It was kind of hard to locate in the API ( or may be I just didn’t know where to look). It is present in Bson namesapce.

public void UserNameStartsWith(string searchKey)
{
var query = Query.Matches("Name", new BsonRegularExpression(string.Format("^{0}", searchKey)));
var result = UserConnectionHandler.MongoCollection.Find(query);
Console.WriteLine("We found {0} Users whose name starts with {1}", result.Count(), searchKey);
}

Select does not result in fewer fields being returned from the server. The entire document is pulled back and passed to the native Select method. Therefore, the projection is performed client side. We should use the IQueryable implementation from the MongoDB.Driver.Linq namespace. Alternatively, there is SetFields() that is available to selectively bring fields back from the database.

var query = Query.Matches("Name", new BsonRegularExpression(string.Format("^{0}", searchKey)));
var result = UserConnectionHandler.MongoCollection.Find(query)
     			.SetFields(Fields<User>.Include(u => u.Name, u => u.Reputation));

MongoDB C# Driver Part 2

Having done the inital work for talking to MongoDB we can now create some POCO classes and then do some querying on top of it. As usual, my model is Questions and Users.

public class Question : MongoEntity
    {
        public string Text { get; set; }
        public string Answer { get; set; }
        public DateTime CreatedOn { get; set; }
        public int Difficulty { get; set; }
    }
public class User : MongoEntity
    {
        public string Name { get; set; }
        public int Reputation { get; set; }
    }

Finally, we get down to some rela stuff.

public class SimpleQueries
    {
        protected readonly MongoConnectionHandler<User> UserConnectionHandler;
        protected readonly MongoConnectionHandler<Question> QuestionConnectionHandler;

        public SimpleQueries()
        {
            UserConnectionHandler = new MongoConnectionHandler<User>("MongoDBDemo");
            QuestionConnectionHandler = new MongoConnectionHandler<Question>("MongoDBDemo");
        }

        public void CreateQuestion(Question question)
        {
            //// Save the entity with safe mode (WriteConcern.Acknowledged)
            var result = QuestionConnectionHandler.MongoCollection.Save<Question>(question, 
                                 new MongoInsertOptions { WriteConcern = WriteConcern.Acknowledged});

            if (!result.Ok)
            {
                Console.WriteLine(result.LastErrorMessage);
            }
            else if (result.Response["err"] != null)
            {
                Console.WriteLine("Insertion was successfull");
            }
        }

        public void CreateUser(User user)
        {
            //// Save the entity with safe mode (WriteConcern.Acknowledged)
            var result = UserConnectionHandler.MongoCollection.Save<User>(user, 
                              new MongoInsertOptions { WriteConcern = WriteConcern.Acknowledged });

            if (!result.Ok)
            {
                Console.WriteLine(result.LastErrorMessage);
            }
            else if (result.Response["err"] != null)
            {
                Console.WriteLine("Insertion was successfull");
            }
        }

        public void GetAllQuestions()
        {
            var cursor = QuestionConnectionHandler.MongoCollection.AsQueryable();
            var resultSet = cursor.ToList();

            Console.WriteLine("Writing out all the questions");
            foreach (var result in resultSet)
            {
                Console.WriteLine("Text : {0},  Answer : {1}", result.Text, result.Answer);
            }
        }

        public ObjectId GetOneQuestion()
        {
            var cursor = QuestionConnectionHandler.MongoCollection.AsQueryable().FirstOrDefault();

            Console.WriteLine(cursor.Id);
            return cursor.Id;
        }

        public void DeleteQuestion(ObjectId id)
        {
            var result = QuestionConnectionHandler.MongoCollection.Remove(
                Query<Question>.EQ(e => e.Id, id), RemoveFlags.None, WriteConcern.Acknowledged);

            if (!result.Ok)
            {
                Console.WriteLine(result.ErrorMessage);
            }
            else
            {
                Console.WriteLine("Delete Operation OK : {0}", result.Ok);
            }
        }
    }

Now, that we have some capabilities in our application, we can query away.

//Seed Data
var question = new Question { Text = "Who are you ?", Answer = "I am MongoDB.",
                              CreatedOn = DateTime.Now, Difficulty = 3 };
var user = new User {Name = "Ashutosh", Reputation = 100};
var queries = new SimpleQueries();
queries.CreateQuestion(question);
queries.CreateUser(user);
var queries = new SimpleQueries();
queries.GetAllQuestions();
var id = queries.GetOneQuestion();
queries.DeleteQuestion(id);

If all is well then you will see some output and the world will be a better place.

MongoDB..C# Driver Part 1

There are several drivers available for C#. I do not plan to go thorugh all of them here. Since, the official driver now has LINQ(although not complete yet) support, we will go with it.
Basic Setup..get the stuff of NuGet. It should put in two dll’s in there
1. MongoDB.Bson
2. MongoDB.Driver

We will get to what does what later. For now assume that we only want to get some data in and out of MongoDB.
Let’s connect to MongoDB now(the code below is just quick and dirty, we will see a better version later).

//MongoDB should be running by now, and assuming you have inserted some documents in there
var client = new MongoClient(@"mongodb://localhost");
var server = client.GetServer();
var database = server.GetDatabase("YourDataBaseName");
var mongoCollection = database.GetCollection("SomeCollectionName");
//Getting all the documents
var cursor = mongoCollection.AsQueryable();
cursor.ForEach(Console.WriteLine);	

It is time for some explanataion.

What is MongoClient ?
MongoClient is the standard way of accessing the driver. I have dabbled with the python driver(pymongo) and it is the same there. I believe it was changed to keep the drivers for different languages in sync.
Reading up on this told me that SafeMode settings were dropped in favour of WriteConcern and instead of SlaveOK , ReadPreference should be used. The settings were present previously in MongoServerSettings, the new ones are on MongoClientSettings. IpV6 setting is also in MongoClientSettings.

What is MongoServer ?
The server manages the life cycle of ServerProxies. Gives access to databases and some sort of connection management. More needs to be said about the server, I will stop short for now.

Notice that I did not use a genric GetCollection here (yet, will do so soon). The generic method is also available, which we will put to use soon.

All documents in MongoDB have an Id which has the type ObjectId (ObjectId resides in MongoDB.Bson).
So, we can have an interface which takes care of this and subsequently all our types can implement this.

public interface IMongoEntity
{
    ObjectId Id { get; set; }
}
public class MongoEntity : IMongoEntity
{
   public ObjectId Id { get; set; }
}

Refinement and obtaining a better MongoDBHandler(you can do much better than what I will show you here, but that depends to large extent on your taste).

 public class MongoConnectionHandler<T> where T : IMongoEntity
    {
        public MongoCollection<T> MongoCollection { get; private set; }
        private const string ConnectionString = @"mongodb://localhost";

        public MongoConnectionHandler(string databaseName)
        {
            var client = new MongoClient(ConnectionString);
            var server = client.GetServer();
            var database = server.GetDatabase(databaseName);
            MongoCollection = database.GetCollection<T>(typeof (T).Name.ToLower() + "s");
        }
    }

Having a the *databaseName* as a parameter is very subjective, if you only have a single database then you might want to just stick it directly in the method call, otherwise it just increases the burden of the caller and spreads the database name all over the code base. Another option that comes to my mind striaghtaway is having an Enum of DatabaseName or in some sort of configuration files. The same goes for ConnectionString, put it in some place configurable. The collection names are plural so you need to just stick an extra “s” in there(this took me quite a while to figure out).
So, we are setup nicely to go forward and do some more interesting work with MongoDB.

MongoDB : Shell

MongoDB stores the documents in BSON format. It is a binary serialisation format and a super set of JSON.
It has more types than JSON which enables better integration with various languages that support these types.
Languages like perl and JS have a smaller type system which can cause problems.
ObjectId which uniquely identifies a document in mongodb is also part of the bson specification.
_id field is the primary key which is of type ObjectId.
It is immutable.
Object Id = 4-byte timestamp + machine id + process id + counter

Shell
Note : This is not comprehensive list of what mongodb can do. These are just bits that I find interesting.

MongoDb does not have a querying language like SQL. It has it’s own wire protocol with codes for doing things. The shell is basically a JS interpreter.
Here is a run down of some operators and tid-bits about them.

$regex is the operator for passing in perl lie regular expression.
$exists is for existential check.
$or is a prefix operator unlike most of the other operators.
//For querying arrays we can directly write property: value in the search criteria.
$all operator can help us match a property which should contain all of the supplied values 
Property : {$all : [value1 , value2]}
$in is the enumeration of the values for the given field.

Multiple filters on the same property must be in the same sub document or
The javascript parser will ensure that your last filter will win, since the last literal will override everything else.

db.users.find({$or: [
				{
				},
				{
				}
			]
		})

Upsert : This is kind of unique and took me by surprise the first time ( a pleasant one mind you ).
db.collection.update( , , , )

Quips and Quirks
Empty document selector {} matches every document in the collections. This has the effect of selecting all the documents in the collection.
From the docs

“Optional. If set to true, creates a new document when no document matches the query criteria. The default value is false, which does not insert a new document when no match is found. The syntax for this parameter depends on the MongoDB version. See Upsert Parameter.”

This is a rather useful operator once which I would dearly love SQL Server to have.

When an empty document selector is passed as an argument to the update method then mongodb will only update the first document that it finds and not all of the collection. To affect all the documents specify multi : true in your Find based operations.

db.superAwesome.update(
                     { name: "awesome" },
                     { $inc : { age: -1 } },
                     { multi: true }
                   )

If you want to remove all the documents in a collection then use drop() instead of remove(). It is just a little faster since remove() goes through all of the documents one at a time. Further, the metadata will remain if you use remove() ( like indexes etc..) , with drop all of that will go away.
Remove() is not thread safe. Each document is however atomically removed.

If you want to find out about the last error in the database, then the runCommand is available. It can also be used to find out information about the last write performed in the database. It has a property n which gives us the number of records affected.

GetLastError
Db.runCommand( { getLastError : 1 }  )

Entity Framework Code First

The ADO.net team came out with its first offering for the Code-first development in the Entity Framework. This is an alternative to the existing methodologies of Database-first and the Model-first approaches. Now the suite has 3 alternatives and any approach can be chosen depending on the requirement.

Briefly explained below are the 3 approaches:-

Database-first: – Allows you to plug-in an existing database into the application. Any change to the database also changes the data access layer in the application.

Model-first: – Allows you to make a data model (domain model) in the designer surface through entities and relationships among them. A database can be generated from the model and any changes in the model can be pushed down to the database through re-creation of the database.

Code-first: -Allows you to create a set of classes to model your business domain. These classes can be configured and tables can be created using the public attributes with these classes. Changes in these classes will trigger automatic changes in the database which can be controlled by code.

Code-first borrows two important concepts from the dynamic language paradigm, Convention over Configuration and fluent API. Before we dive into the topic below are a few basic terms that will be used: –

Entity: – Any business object that needs to be mapped to a persistent storage (e.g. sales representative, account, commission).

Mapping: -It is the act of determining how entities and their relationships will be persisted in the database.

Relationship Mapping: – It is the act of determining how the relationships (association, aggregation, composition) will be mapped to the database.

Association: -An association is the definition of the relationship between two entities (e.g. a foreign key)

Navigation Property: -It is the property that allows you to traverse from one end of the association to the other end. Navigation properties are optional but are dependent on the presence of an association for their existence.

Association Set: -Association set is the logical grouping of the association instances of the same type. It is not a data modeling construct.

Association End Multiplicity: -It defines the number of entities that can be at either end of the association. It can be 1(exactly one), 0…1(zero or one) or *(zero, one or many).

The implementation is done as follows: –

· When the multiplicity is singular (1), define a navigational property.

· When the multiplicity is plural (many) then use ICollection<T> of the Target entity involved.

Directionality of relationships: – The relationships can be Uni-directional or Bi-directional. Uni-directionality indicates that only one of the entities involved is aware of the other i.e. the navigational property exists only at one end. Bi-directionality indicates that both the entities are aware of each other and navigational property exists at both the ends.

A Simple Model

First add the entity framework package to the visual studio project. The new way is to do it through

Nu-get .

Figure 1 Adding Entity Framework Through NuGet Command Line

In the package manager console (which is powershell) do the following: –

PM> Install-Package EntityFramework

The package manager will go to the official nu-get package source and look for the package and bring it down and put it in your assemblies. The end message will be something similar to this (the version numbers can differ, the last thing represents your project name).

Successfully added ‘EntityFramework 4.1.10715.0’ to EFCodeFirst

The alternative is to use the Add Library Package Reference

Figure 2 Adding Entity Framework through NuGet Web Interface

Note: – Nuget is not available for visual studio 2010 C# express edition; we will need the standalone installer for Entity Framework.

Now, create a simple POCO class


using System;
using System.ComponentModel.DataAnnotations;
namespace EFCodeFirst
{ public class UserDetail
//[Key] public int UserDetailId
{ get; set; }
[Required] [MaxLength(10, ErrorMessage = "UserName must be 10 characters or less"), MinLength(5)]
public String Name { get; set; }
[Required] [MaxLength(20, ErrorMessage = "Password must be between 6 and 20 characters"), MinLength(6)]
public String Password { get; set; }
public String UserRole { get; set; }
public DateTime DateOfCreation { get; set; }
 }
}

Firstly, the notion of convention over configuration allows us to have UserDetailId as the primary key. EF will pick the property with class name appended with Id as the default primary key. Of course, if our requirements differ we can also use the data annotations to explicitly mark a property as key. Similarly, Foreign key can also marked in the POCO class.

The annotations associated with the properties will allow us to have the concept of DRY (Do Not Repeat Your Code).These validation will also be pushed down to the tables in the database.

Next, we will create a context for managing the entities.

using System.Data.Entity;
namespace EFCodeFirst
{
   public class MyDatabaseContext : DbContext 
   {
   public MyDatabaseContext() 
   {
   Database.SetInitializer(new DropCreateDatabaseIfModelChanges<MyDatabaseContext>());
   }
   public DbSet<UserDetail> UserDetails { get; set; }
   protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
       modelBuilder.Entity<UserDetail>().ToTable("Users"); 
    } 
}
}

The DropCreateDatabaseIfModelChanges is a really handy because in development there will often be rapid changes and this method allows you to do that. To customize the database that the code creates we can also specify the Connection Parameters for the database using the DefaultConnectionFactory property.

How does the EF Framework know that the model has changed?

· When we first make use of the entities the framework goes ahead and creates the database (default provider is SQLExpress, we can change it to SQLCE or Oracle etc..).

· When it creates the database (if it is successful) it will generate MetaData for the model and store it in the database.

· Next time if we change something in the model (e.g. adds a new entity, modify properties or relationships) it will check the hash of the new model against the hash of the existing model.

· If the hash does not match it will realize that the model has changed and it will drop it and re-create with the new model.

· If the creation of the database fails (e.g. foreign key is not specified properly etc…), there will be no hash generated for the model. We will have manually drop the database and then re-run the code after fixing the errors.

Figure 3 EDM Meta Data Table

DbSet represents the collection of the entities in the context. The overridden method is where the real control resides for the mapping. This is where we will create your mappings specify the foreign keys and map the POCO classes to tables etc.

In the above snippet the entity UserDetail has been mapped to a table “Users”. The parameter for the ToTable(“TableName”) ,method is the name of the table in the database. This line of code borrows the concept of fluent api (basically the code flows like method1().method2().method3()……) from the dynamic languages. Fluent Api is much more than method chaining and requires more thought in the api construction.

The final thing would be to use this context and create an instance of the entity and save it to the database.


using System; using System.Linq;
using System.Data.Entity.Validation;
namespace EFCodeFirst
{
   class Program { static void Main(string[] args) {
   using (var context = new MyDatabaseContext()) {
   try
   {
       context.UserDetails.Add(new UserDetail{
                                      Name = "Ashutosh",
                                      Password = "Shyamu",
                                      UserRole = "Administrator",
                                      DateOfCreation = DateTime.Now });
       context.SaveChanges();
       var result = from u in context.UserDetails select u;
       foreach (var temp in result) 
       {
       Console.WriteLine(temp.UserDetailId + “\n” + temp.Name);
       }
    }
 
    catch (DbEntityValidationException dbe)
    {
      foreach(var error in dbe.EntityValidationErrors)
      {
          Console.WriteLine("An error has occured");
          Console.WriteLine("Property :- " + error.ValidationErrors.First().PropertyName + "\nMessage :-" +          error.ValidationErrors.First().ErrorMessage); 
      }
   }
}
 Console.ReadKey(); 
  }
 }
} 

We create an instance of the context and use that to add a new UserDetail to the context. It is imperative to call the SaveChanges() method, since it is only then that the validations kick in and the entity is pushed down to the database. If there are any errors while validating the entities the context will throw an exception and we will see our error messages that were defined as annotations for the properties of the entity.

When we run the code a new database will be created with the “Users” table.

image

Figure 4 Users Table

If we run a simple sql select query for the users table we will see the following.
image

Figure 5 an instance of UserDetail Entity in the Users Table

We did not specify the primary key when we initialized the object, yet the key was assigned. This is because the default behavior of EF is to mark primary key as auto increment. We can turn this on or off as per our requirements.

Now, assume we had initialized the object as follows: –

new UserDetail
{
Name = "Ashu",
Password = "Shyamu",
UserRole = "Administrator",
DateOfCreation = DateTime.Now
};

image

Figure 6 Exception caught when validation rules were violated

· The validation errors would be caught and we will see the error message we specified as annotation for the property.

· These validation errors can also be thrown to a view we might have in a web application or a windows application.

· These validation messages can also be globalized providing us with the capability to have messages in different languages as well.