How to Migrate ElasticSearch Data Using Logstash

Learn how to migrate a data cluster in ElasticSearch with a new method for purposes like data backup during a system upgrade.

Leona Zhang

Jun. 08, 18 · Tutorial

Likes (1)

Comment

Save

11.9K Views

Engineers often find themselves in a position where they need to migrate data in ElasticSearch. The purpose of migrating a cluster could be to ensure data backup and system upgrade. There are just as many methods as there are reasons to perform the migration; for example, you can use ElasticSearch-dump, snapshot, or even reindex method. In this article, we will introduce a new method to quickly migrate an ElasticSearch cluster using Logstash.

I hope that with this explanation, you will be able to understand the theory behind using Logstash to migrate data. In its essence, the operation consists of using Logstash to read data from the source ElasticSearch cluster, then writing the data into the target ElasticSearch cluster. I have outlined the exact operation in the following section.

Steps to Migrate ElasticSearch Using Logstash

Step 1: Create a data sync conf file in the Logstash directory

vim ./logstash-5.5.3/es-es.conf

Step 2: Ensure Identical Names: When configuring the conf file, ensure that the index names are identical in both the target and source clusters. Refer to the screenshot below.

input {
    ElasticSearch {
        hosts => ["********your host**********"]
        user => "*******"
        password => "*********"
        index => "logstash-2017.11.07"
        size => 1000
        scroll => "1m"
    }
}
# a note in this section indicates that filter can be selected
filter {
}
output {
    ElasticSearch {
        hosts => ["***********your host**************"]
        user => "********"
        password => "**********"
        index => "logstash-2017.11.07"
    }
}

Step 3: Running Logstash: Once you have configured the conf file, run Logstash

bin/logstash -f es-es.conf

Sometimes running this command will generate the following error message

[FATAL][logstash.runner] Logstash could not be started because there is already another instance using the configured data directory. If you wish to run multiple instances, you must change the "path.data" setting.

This is because the current version of Logstash does not allow multiple instances to share the same path.data. Therefore, when you start it up, include "--path.data PATH” in the command to define different paths for different instances.

bin/logstash -f es-es.conf --path.data ./logs/

If all goes as intended, you can use the following command to view the corresponding index in the target ElasticSearch

curl -u username:password host:port/_cat/indices

Let us now look at a sample use case.

Migrating ElasticSearch Data Using Logstash Sample Use Case

**A lot of clients using their own home-built versions of ElasticSearch have been paying close attention to the Alibaba Cloud ElasticSearch products. They want to use it but have difficulties migrating their data from their own ElasticSearch to Alibaba Cloud ElasticSearch. The following will be an explanation of how to use Logstash to quickly migrate home built ElasticSearch index data on the cloud.

The logic behind this solution is quite simple, you require to configure multiple es-to-es conf file. However, this can be a cumbersome process. You can make this easier by using Logstash. Before I start explaining how you can do this, let me explain three core concepts of Logstash.

Metadata: The concept of metadata was introduced in Logstash 1.5. You can use it to describe an event and can change it whenever you require it. However, you cannot write into it or affect the event results. Furthermore, the metadata field contains the metadata information for the event and can survive throughout the entire operation of the input, filter, and output plug-ins. You can know more about metadata by clicking here.
Docinfo: A parameter in the ElasticSearch input plug-in which is set to false by default. The description on the official website is, "If set, include ElasticSearch document information such as index, type, and the id in the event.” That means once this field is set to take effect, the metadata will record the index, type, and id information. This also means that you can use index, type, and id parameters at any point in the entire lifecycle of the event.
*: The index parameter in the ElasticSearch input plug-in supports the wildcard character "*” to represent all objects.

Because of the way metadata works, you can "inherit” the index and type information from the input to the output. Additionally, you can create the index, type, and id information in the target cluster that is identical to that in the source cluster.

If at any point in the process you want to see and debug the metadata information, you need to add the following setting to the output:

stdout { codec => rubydebug { metadata => true } }

Use the following configuration code:

input {
    ElasticSearch {
        hosts => ["yourhost"]
        user => "**********"
        password => "*********"
        index => "*"#This wildcard requires the process to read all index information
        size => 1000
        scroll => "1m"
        codec => "json"
        docinfo => true
    }
}
# a note in this section indicates that filter can be selected
filter {
}

output {
    ElasticSearch {
        hosts => ["yourhost"]
        user => "********"
        password => "********"
        index => "%{[@metadata][_index]}"

    }
    stdout { codec => rubydebug { metadata => true } }

}

After running the command, Logstash will copy all of the indexes in the source cluster to the target cluster, carrying with it the mapping information. Next, it will begin gradually migrating the data inside the indexes.

When formally executing, you will see a setting which looks like this:

stdout { codec => rubydebug { metadata => true } }

I would recommend you to delete this setting to prevent your screen from being filled with metadata information.

Conclusion

I hope this article helped you understand how you can migrate ElasticSearch data using Logstash. I have also described the core concepts of Logstash, which you should be aware before you start the migration process.

Elasticsearch Data (computing) cluster Metadata

Published at DZone with permission of Leona Zhang. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending