NoSQL

Background

There has been a lot of debate surrounding the advantages/disadvantages of using the non-relational NoSQL database vs more traditional RDBMS databases such as MySQL.

In brief, NoSQL databases have a less strict data model and can provide a simple record storage mechanism for applications consisting of key-value data pairs. They have been designed to work with large amounts of data and are easily scalable in order to store millions of data records.

Read about NoSQL on wikipedia at

https://en.wikipedia.org/wiki/NoSQL

and some interesting articles at

http://slashdot.org/topic/bi/sql-vs-nosql-which-is-better/

http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772

There are 2 solutions in particular which are of interest:

  • mongoDB
  • Apache Cassandra

mongoDB

mongoDB ( derived from the term “humongous” ) is an open-source document database (document store) system and is referred to as the most popular NoSQL database

http://www.mongodb.org/

mongoDB stores structured data as BSON documents  which consist of an ordered list of elements. Each element has a field name, type and value. Field names are strings , various types are available including strings, integers doubles etc. BSON documents are designed to be efficient for storage and scan speed.

Here is an example

var mydoc = {
               _id: ObjectId("5099803df3f4948bd2f98391"),
               name: { first: "Alan", last: "Turing" },
               birth: new Date('Jun 23, 1912'),
               death: new Date('Jun 07, 1954'),
               contribs: [ "Turing machine", "Turing test", "Turingery" ],
               views : NumberLong(1250000)
            }

http://docs.mongodb.org/manual/core/document/

I will be using this database for the Landscape project.

Apache Cassandra

Apache Cassandra is an open-source distributed database management system that provides a NoSQL solution based on a key-value store.

http://cassandra.apache.org/

Cassandra’s data model offers the convenience of column indexes with the performance of log-structured updates, strong support for denormalization and materialized views, and powerful built-in caching.

Cassandra is perfect for managing large amounts of structured, semi-structured, and unstructured data across multiple data centers and the cloud. Cassandra delivers linear scalability and performance across many commodity servers with no single point of failure, and provides a powerful dynamic data model designed for maximum flexibility and fast response times – ref http://www.planetcassandra.org/

Other Links

http://www.10gen.com/leading-nosql-database