Worried about Big Data? Scale Up with Bitnami's Secure, Fault-Tolerant PostgreSQL Cluster

Vikram Vaswani

Vikram Vaswani


Share this article

PostgreSQL is a popular open source RDBMS with support for foreign keys, joins, views, triggers and stored procedures. To make it quick and efficient to use PostgreSQL in a production environment, Bitnami now lets you deploy a scalable PostgreSQL cluster with just a few clicks on Google Cloud Platform and Microsoft Azure.

Topology

Bitnami's PostgreSQL with Replication solution is pre-configured according to current best practices for security and scalability. It supports replication out of the box and uses multiple virtual machines to replicate data from the master node to a configurable number of slave nodes. It's also highly flexible - you can scale the cluster up or down by adding or removing nodes even after the initial deployment.

Deploying the Solution

To deploy the PostgreSQL cluster,browse to the Google Cloud Console or Azure Marketplace page for the solution, choose the number of nodes you need and deploy the cluster.

Once the cluster is successfully deployed, you can log in to the primary node via SSH (Microsoft Azure instructions and Google Cloud Platform instructions) and start the PostgreSQL command-line client. Enter your PostgreSQL administrative password when prompted and then run the query shown below:

$ psql -U postgres
psql (10.3)
Type "help" for help.
postgres=# SELECT client_addr, state FROM pg_stat_replication;

This query will list the IP addresses of the other members of the PostgreSQL cluster. If you see output similar to the image below, your cluster is good to go!

Cluster status

Understanding the Default Network Configuration and Security

The PostgreSQL cluster operates on the standard port 5432. For security reasons, this port is not open for external connections by default. To allow a client application running in a different network to use the cluster, you can use virtual network peering, an SSH tunnel or an IP address whitelist. Our documentation (Microsoft Azure instructions and Google Cloud Platform instructions) has more information on these options.

The PostgreSQL configuration settings are located in the /opt/bitnami/postgresql/conf/postgresql.conf file, while the client authentication configuration file is located at /opt/bitnami/postgresql/data/pg_hba.conf. Logs are stored in the /opt/bitnami/postgresql/logs/postgresql.log file.

By default, Bitnami's PostgreSQL with Replication solution is configured with n1-standard-2 instances on Google Cloud Platform (2 vCPU, 7.5 GB RAM), and D1 V2 instances on Microsoft Azure (1 vCPU, 3.5 GB RAM). Of course, you can change these default instance types when deploying the solution, and you can also add nodes to the cluster later (Microsoft Azure instructions and Google Cloud Platform instructions).

Understanding Data Replication

A key feature of Bitnami's PostgreSQL with Replication solution is that it comes pre-configured to provide a horizontally scalable and fault-tolerant deployment. When deploying the solution, you can specify the number of nodes you need. By default, the solution is configured with 3 nodes (one master and two slaves).

Data automatically replicates from the master node to all slave nodes. The master node receives all write operations, while the slave nodes repeat the operations performed by the master node on their own copies of the data set and are used for read operation. This model improves the overall performance of the solution. It also simplifies disaster recovery, because a copy of the data is maintained on each node in the cluster.

The PostgreSQL with Replication solution is configured to use streaming replication by default, which is asynchronous. This approach produces a more fault-tolerant deployment, as it keeps standby servers up-to-date and therefore results in a smaller window of disruption if the master node fails and needs to be replaced by a standby.

Collecting Metrics

PostgreSQL comes with a built-in statistics collection module, which you can use to dynamically track the performance of your cluster. A number of pre-defined views are available to track different metrics and indicators of server activity, including replication lag, index performance, transaction counts, user-defined function execution and more.

To illustrate, here's a simple query that returns counts of records scanned, returned, inserted and deleted from the database, to give you an overview of database activity:

postgres=# SELECT tup_fetched, tup_returned, tup_inserted, tup_deleted FROM pg_stat_database;

Metrics

If this sounds interesting to you, why not try it now? Launch the PostgreSQL with Replication solution now on Google Cloud Platform or Microsoft Azure, then tweet @bitnami and tell us what you think!