Performance Benchmarking ElasticSearch (ELK) with Rally

Introduction

ELK (Elastic, Logstash and Kibana) is one of the top used logging and monitoring solution as of today primary because its open source and the range of features it provides latest one being Elastic APM (Application Performance Monitoring).

Be it any product that you setup and use in a landscape, performance is something which decides its fate. If you are investing or using a logging and monitoring solution which is slow or unstable then it won’t sustain. That’s why performance benchmarking becomes imperative to know if your solution is setup properly with optimal configurations and results should reflect that.

Who all have used ELK in a big landscape know that if not setup with optimal configurations like shard size, threadpool, jvm, index size, rollover etc then your ElasticSearch stack can turn out to be nightmare to manage and operate. So we need a tool which can do the benchmarking for us and certify how the stack is performing.

ES Rally

image-35

ES Rally is an open source solution developed by elastic that can help with the following tasks:

(It is used by the Elasticsearch development team to run their nightly benchmarking tests)
  • Setup and teardown of an Elasticsearch cluster for benchmarking
  • Management of benchmark data and specifications even across Elasticsearch versions
  • Running benchmarks and recording results
  • Finding performance problems by attaching so-called telemetry devices
  • Comparing performance results

It also has a very well defined documentation which can help a lot to identify the features and its implementation.

Requirement

Benchmarking should be done based on what use cases have been implemented in ELK.

For me it was Elastic APM so I wanted to do performance benchmarking over APM index data and its search performance.

  1. APM data indexing performance
  2. APM data search performance
  3. Search error rate and exceptions

The purpose of ESRally was two folds for me-

  • Certify new ELK builds from performance perspective
  • Do a performance comparison between any config changes done on elastic for optimization
image-32

Installing Rally

Rally installation depends on the OS flavour being used. It is not support on Windows platform as of now but all other linux based platforms are supported.

Setup is pretty straight forward, follow the documentation to setup and install Rally-

https://esrally.readthedocs.io/en/stable/install.html

After installation just run the below command for basic config. There are options for advanced config as well.

esrally configure
image-29

Run below command to validate it setup is working as expected

esrally list tracks

Using Default Tracks

A “race” in Rally is the execution of a benchmarking experiment.

A “track” is a specification of one or more benchmarking scenarios with a specific document corpus. It defines for example the involved indices, data files and the operations that are invoked.

There are few predefined tracks available with Rally –

image-30

To start a race you have to define the track and challenge to run. For example:

esrally --distribution-version=6.0.0 --track=geopoint --challenge=append-fast-with-conflicts

For details on predefined tracks visit – Track Reference

Custom Track for indexing APM sample data

There is a custom track provided by elastic team to index sample APM data but it had issues with latest elastic versions. So I have created a new custom track using the same track as reference and fixing the support issues.

This works fine with ES 7.6.X or later. Track downloads the pre stored events from AWS S3 and indexes it to the referenced ES cluster.

However if you already have APM events data at a different patch then it can be updated in track.json file.

To run this custom track clone the git repo inside the .rally/benchmarks/tracks path where Rally is installed.

esrally --track-path="/root/.rally/benchmarks/tracks/rally-apm-data" --target-hosts=<ES Node IP>:9200,<ES Node IP>:9200,<ES Node IP>:9200 --pipeline=benchmark-only --client-options="use_ssl:true,verify_certs:false,basic_auth_user:'<username>',timeout:120,basic_auth_password:'<password>'" --user-tag="name-env:<Name>-<Env>"

This will download the compressed data files from S3, decompress them and then injest to mentioned ES nodes. This data provided is from 2018.

image-31

Once data is indexed, it can be verified by logging into Kibana and validating the apm indexes – apm-span, apm-error, apm-transaction.

Fetching latest APM data for indexing

If you want to index your own custom APM data for indexing then use the python script – fetch_data.py provided inside _tools folder.

Provide the relevant ES host details and update the document count as per requirement.

es = Elasticsearch(["https://{host}:9200"],http_auth=('{user}', '{password}'),use_ssl=True,verify_certs=False)
events = {"error": 1000000,
"transaction": 2000000,
"span": 4000000}

This will generate event json files which would then need to be compressed and uploaded to a shared location for track to reference from it.

Note that once new data files are generated corresponding values would need to be updated in track.json

"document-count": 4000000,
"uncompressed-bytes": 4349496039,
"compressed-bytes": 253661240

Custom Track for search queries on APM data

For performance benchmarking on the cluster we need to execute search queries on the elasticsearch nodes to measure the latency, response time metrics etc.

I have come up with a custom track to execute search queries specific to APM data covering two use cases-

1) Execute search queries on live apm data

2) Execute search queries on sample data ingested in the previous section

The custom track has two challenges specific to these use cases but the search queries can be modified as per requirement in the operations/querying json files.

Create a file params-file.json to provide the runtime of the test in seconds-

Ex below with runtime of 30 mins.

{
"query_time_period": 1800
}

Command to run test on live data-

esrally --track-path=/root/.rally/benchmarks/tracks/rally-apm-search/eventdata --target-hosts=<ES Node IP>:9200,<ES Node IP>:9200,<ES Node IP>:9200 --pipeline=benchmark-only --client-options="use_ssl:true,verify_certs:false,basic_auth_user:'<username>',timeout:120,basic_auth_password:'<password>'" --challenge=apm-search-queries-livedata --track-params=./params-file.json --user-tag="name-env:<Name>-<Env>"

Command to run test on sample data-

esrally --track-path=/root/.rally/benchmarks/tracks/rally-apm-search/eventdata --target-hosts=<ES Node IP>:9200,<ES Node IP>:9200,<ES Node IP>:9200 --pipeline=benchmark-only --client-options="use_ssl:true,verify_certs:false,basic_auth_user:'<username>',timeout:120,basic_auth_password:'<password>'" --challenge=apm-search-queries-sampledata --track-params=./params-file.json --user-tag="dc-env:<Name>-<Env>"

Explanation of the Search Challenge

Below is the challenge json which shows total number of clients and time interval for each query runs.

These can be customized as per requirement and further search queries can be updated in operations/querying json files.

{
"parallel": {
"warmup-time-period": 0,
"time-period": {{ p_query_time_period }},
"clients": 8, //Number of parallel search threads
"tasks": [
{
"operation": "search-services-1h-timestamp",
"target-interval": 30 //time interval of query run in seconds
},
{
"operation": "search-opbeans-transactions-1h-timestamp",
"target-interval": 30
},
{
"operation": "search-opbeans-spans-1h-timestamp",
"target-interval": 30
},
{
"operation": "search-spandetails-1h-timestamp",
"target-interval": 30
},
{
"operation": "search-services-24h-timestamp",
"target-interval": 60
},
{
"operation": "search-opbeans-transactions-24h-timestamp",
"target-interval": 60
},
{
"operation": "search-opbeans-spans-24h-timestamp",
"target-interval": 60
},
{
"operation": "search-spandetails-24h-timestamp",
"target-interval": 60
}
]
}
}

Result Analysis in Kibana

To export Rally metrics and results to ElasticSearch configure Rally with –advanced-config tag and provide the reporting datastore details-

esrally configure --advanced-config

Or edit the rally.ini file and update the reporting section-

[reporting]
datastore.type = elasticsearch
datastore.host = <ElasticNode IP>
datastore.port = 9200
datastore.secure = True
datastore.user = <Username>
datastore.ssl.verification_mode = none
datastore.password = <Password>

At the end of a race, Rally stores all metrics records in its metrics store (ES in our case). Rally stores the metrics in the indices rally-metrics-*

  • latency: Time period between submission of a request and receiving the complete response. It also includes wait time, i.e. the time the request spends waiting until it is ready to be serviced by Elasticsearch.
  • service_time Time period between start of request processing and receiving the complete response. This metric can easily be mixed up with latency but does not include waiting time. This is what most load testing tools refer to as “latency” .
  • throughputNumber of operations that Elasticsearch can perform within a certain time period, usually per second. 
  • error rateError rate % during the execution of request.

Rally does not provide a dashboard by default so we have come up with a custom dashboard to view Rally result in Kibana.

https://github.com/Abmun/rally-apm-search/blob/master/Rally-Results-Dashboard.ndjson

image-34
image-33
Rally Kibana Dashboard
Categories
Comments
All comments.
Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  1. Swapnil

    That was a really helpful post! Thanks for attaching the kibana dashboard as well, it saved my day!