Add Scalable, High-Performance Search to your App with the Bitnami Elasticsearch Cluster

Vikram Vaswani

Vikram Vaswani


Share this article

The Bitnami Elasticsearch Cluster is now available in the Azure Marketplace and Google Compute Platform. This new solution lets you deploy Elasticsearch in a multi-node cluster that supports data replication and federated indexing, all in just a few clicks.

Cluster topology

What does this give you? In a word, scalability. With the Bitnami Elasticsearch Cluster, you can extend your application's real-time search capabilities to hundreds of thousands of documents without breaking a sweat. Add Bitnami's automatic updates and support for multiple cloud platforms, and you've got something that's ideal for high-traffic websites or mission-critical usage.

Deploying the Elasticsearch cluster

If you're new to Elasticsearch, why not try it out now? Sign in to the Google Cloud Console or Azure Marketplace, search for "bitnami elasticsearch cluster", choose the number of nodes you need (the default is 3) and deploy the cluster. Once the cluster is successfully deployed, you can log in to the primary node via SSH (Microsoft Azure instructions and Google Cloud Platform instructions) and run the command below to check that everything is working:

$ curl -XGET 'localhost:9200/_cat/health?v&pretty'

Cluster status

If the cluster status is "green", that means your cluster is in good health and you can start experimenting with it!

Experimenting with Elasticsearch

As you've seen, Elasticsearch provides a REST API for common operations. You can create indices, update documents, query data and perform other tasks by sending requests to the API with the correct HTTP verbs and JSON parameters. A good way to see this in action is by loading the cluster up with some data, then running sample queries on it with curl.

A number of data set examples are available for Elasticsearch; for this blog post, we'll use the Shakespeare data set, which consists of the complete works of William Shakespeare. In this data set, each line of dialogue from The Bard's plays is represented as a separate document, with metadata fields for the play name, speaker name, act and scene. In all, the data set contains more than 110,000 documents.

Here's an example of a document from the Shakespeare data set:

{
  "type": "line",
  "line_id": 8,
  "play_name": "Henry IV",
  "speech_number": 1,
  "line_number": "1.1.5",
  "speaker": "KING HENRY IV",
  "text_entry": "No more the thirsty entrance of this soil"
}

As stated in the official Elasticsearch documentation, your first step should be to create an index and specify a field mapping for these documents. Use the following command, which sets up the speaker and play_name fields as keyword fields:

$ curl -XPUT 'localhost:9200/shakespeare?pretty' -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "doc": {
      "properties": {
       "speaker": {"type": "keyword"},
       "play_name": {"type": "keyword"}
      }
    }
  }
}'

If all goes well, Elasticsearch will respond with an acknowledgement and create an index named shakespeare. You can then download and import the data set (this may take some time):

$ wget https://download.elastic.co/demos/kibana/gettingstarted/shakespeare_6.0.json
$ curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/shakespeare/doc/_bulk?pretty' --data-binary @shakespeare_6.0.json

You can verify that the documents were added to the index by checking index status:

$ curl -XGET 'localhost:9200/_cat/indices?v&pretty'

Here's the output:

Cluster data

Once your data is indexed, you can start running queries on it. For example, let's say you want to identify this famous quote fragment using full-text search (can you identify the play faster than Elasticsearch?):

$ curl -XPOST 'localhost:9200/shakespeare/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "text_entry": {
        "query": "by any other name",
        "operator": "and"
      }
    }
  }
}'

Full-text query

"Hamlet" is well known for its soliloquies, so it should come as no surprise that it's the longest play. Most of the lines are spoken by the lead character… but do you know exactly how many? Or who the second most talkative character is? Elasticsearch can answer these questions with an aggregation:

$ curl -XPOST 'localhost:9200/shakespeare/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "query": {
    "match": {
      "play_name": "Hamlet"
    }
  },
  "aggs": {
    "speakers": {
      "terms": {
        "field" : "speaker"
      }
    }
  }
}'

Aggregation

Testing redundancy and retrieving node information

Data in the Bitnami Elasticsearch Cluster is replicated across nodes, so that if a single node fails, the cluster will continue to operate normally. To see this in action, turn off one of the nodes in the cluster using your cloud control panel and then run the query below, which returns a count of how many times Shakespeare used the insult "knave" in his plays. Even though one of the nodes is not operational, you should still see an accurate result:

$ curl -XPOST 'localhost:9200/shakespeare/_count?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "text_entry": "knave"
    }
  }
}'

Query result with one node down

You can obtain details of the nodes in the cluster using the command below. This includes detailed information on each node name, IP address, host name, index buffer, roles and operating system and process information:

$ curl -XGET 'localhost:9200/_nodes?pretty'

Here's a sample of the output:

Cluster node information

You can also use the Elasticsearch API to keep track of cluster statistics, such as index sizes, cache operations, memory usage and load, search times and more. Use the following command to see available metrics:

$ curl -XGET 'localhost:9200/_nodes/stats?pretty'

Here's a sample of the output:

Cluster metrics

Of course, these examples are just the tip of the iceberg. Elasticsearch lets you run many different types of search queries on your documents and gain new actionable insights from them. And the Bitnami Elasticsearch Cluster gives you a solid, secure and scalable foundation on which to run it all.

Find out more about configuring and using the Bitnami Elasticsearch Cluster in our docs and tweet @bitnami to tell us what you liked (or didn't like) about it!