Cassandra is a distributed storage system that utilizes a lot of recent advancements in distributed storage, fault tolerance, replication and failure detection. It uses a multi-dimensional map to store data in a strictly NoSQL approach. The values of the map are usually sets called column families which are of two types - simple columns or super-columns – a column family within a column family.
Cassandra’s load balancing is in some ways unique to the other existing ring implementations such as the Chord or Amazon’s Dynamo. Load balancing in Cassandra revolves around analyzing load information on the right and adding lightly-loaded nodes into the ring to alleviate workloads of heavily-loaded nodes. The time taken to identify faults within the Cassandra ring is minimized using Accrual failure detection. This technique involves calculating a suspicion level for the failure of each monitored node within the Cassandra system. The basic idea is to express the failure detection as a value that adjusts itself dynamically based on network and load conditions.
The Cassandra system albeit highly scalable and fault tolerant has some drawbacks.
The load balancing which involves splitting the keys is an expensive process.
In a 150 node cluster having 50TB+ data, each node (assuming equal load) is responsible for 300GB of data. When a new node is added, half of this needs to be transferred to the new node (~150GB). Depending on current load on the node and network usage, this may take considerable time.
(credit: Guru Prasad)
A non-relational, columnar, sparse, multi-dimensional key-value storage system for storing petabytes of data. The storage format is of the form (row,column,timestamp).Timestamp helps to support multiple versions of the same row,column to exist. Columns are grouped into column families, where column families are fixed and columns count can be any number.
BigTable uses Chubby, a lock service based on Paxos, storing root table, and schema information, bootstrap information, available tablets and authorization details. Master takes care of assigning tablets to tablet servers, balancing tablet server load, and GFS garbage. BigTable relies on GFS system for durability. Client communicates directly to the Tablet servers, thus single master distributed storage is effective. Chubby lock service is effectively used, where tablet servers create lock files for each tablet. Each lock file has an expiry value. Tablet servers have to refresh them periodically.
Bigtable provides a good control for the locality. Bigtable’s data structure is able to satisfy many application requirements. It is interesting to see writes are faster than reads. Especially random reads are affected since entire SSblock has to be read from disk. It might affect some application functionalities. Unable to support transactions across row keys is a limitation.
The authors of BigTable seek to come up with a solution for distributed storage to manage very large amounts of structured data, in particular for Google services such as web indexing, Google Earth, and Google Finance. The authors devised BigTable, which is designed to scale to petabytes of data across thousands of servers and provides a simple data model giving clients dynamic control over data layout and format.
Given the goals of this project at Google – wide applicability, scalability, high performance, and high availability – they made the design choice of something not quite like traditional relational data models but instead provides a basic data model that gives clients the ability to control data layout and format. Clients can control the locality of their data through schemas and can choose whether to serve data out of memory or from disk.
BigTable is indexed using row and column names as strings and treats data as interpreted strings as well. This assumption is made that clients can serialize their structured data into strings for keys and/or values in the tables. They also consider time a very important attribute of their data and timestamp each of the cells in the table with the option of having BigTable remove old versions. They also utilize commit logs for reliability and recovery during failures. Many of the design choices (initially or after some changes) were in the spirit of keeping simple designs, which they claim was the most important lessons they learned during this project. Overall, even given the possible difficulty of new users picking up the unusual interface, they demonstrate good performance evaluation and claim a decent number of current users, indicating that the design works well in practice.