Build a Scalable, Production-Ready Data Store with Bitnami's Cassandra Cluster

Vikram Vaswani

Vikram Vaswani


Share this article

Apache Cassandra is an open source distributed database management system, providing high availability with no single point of failure. However, setting up a Cassandra cluster for use in a production environment, with all the attendant requirements of security, scalability and high availability, can be a complex task.

That's where the Cassandra Cluster by Bitnami comes in. This solution, now available in the Azure Marketplace and Google Cloud Console, lets you deploy a multi-node, production-ready Cassandra cluster in just a few clicks.

Topology

The Cassandra Cluster is configured following current best practices for security and scalability. The solution stores your data across a configurable number of Cassandra nodes with full replication support, providing a fault-tolerant and a decentralized solution with zero downtime.

Deploying the Solution

To deploy the Cassandra Cluster, sign in to the Google Cloud Console or Azure Marketplace, choose the number of nodes you need and deploy the cluster. Once the cluster is successfully deployed, you can log in to the primary node via SSH (Microsoft Azure instructions and Google Cloud Platform instructions) and run the nodetool status command to check replication status, as shown below:

Status

This command will list the IP addresses and details of the other members of the cluster. "UN" status indicates that the node is "up and normal". If you see output similar to the image above, your cluster is good to go!

Understanding the Default Network Configuration and Security

The default ports for Cassandra are 9042 (client port), 7000 (transport port) and 7199 (JMX port). For security reasons, these ports are not open for external connections by default. To allow external access to the cluster, you can use virtual network peering, an SSH tunnel or an IP address whitelist. Refer to our documentation (Microsoft Azure instructions and Google Cloud Platform instructions) for more information on these options.

The main Cassandra configuration file is at /opt/bitnami/cassandra/conf/cassandra.yaml, and the Cassandra logs are stored at /opt/bitnami/cassandra/logs/cassandra.log.

By default, the Cassandra Cluster is configured with n1-standard-1 instances on Google Cloud Platform (1vCPU, 3.75 GB RAM) and D1 V2 instances on Microsoft Azure (1vCPU, 3.5 GB RAM). Of course, you can change these default instance types when deploying the solution, and you can also add nodes to the cluster later.

Understanding Data Replication

The Cassandra Cluster by Bitnami stores replicas on multiple nodes to ensure reliability and fault tolerance. It uses multiple virtual machines in a ring topology. This "masterless architecture" means that any node can accept any request (read, write, or delete) and route it to the correct node even if the data is not stored in that node.

When you create a keyspace in the cluster, you can also specify the replication factor eg. a replication factor of 2 means two copies of each row, where each copy is on a different node. If any of the cluster nodes fails,you can still access the information on the other nodes, so long as the keyspace was replicated.

You can see Cassandra replication in action by connecting to any node of the cluster and creating a keyspace with replication factor 2. Create a table and insert some sample data. If you then connect to any other node of the cluster, you should see the same table and data already present on it.

Collecting Metrics

You can use the nodetool info command to keep track of server metrics, such as the load, cache hit rate and memory usage. Here's an example:

Server metrics

You can use nodetool tablestats to examine table and keyspace performance, including read and write counts, latency and space consumed, as shown below:

Table metrics

For more information, you can also use nodetool tpstats for thread pool statistics and nodepool netstats for network performance data.

Sounds interesting? Get started by launching the Bitnami Cassandra Cluster using the links below, and then tweet @bitnami and tell us what you think!