CockroachDB CDC With Hadoop Ozone S3 Gateway and Docker Compose - Part 4
This is the fourth tutorial post on CockroachDB and Docker Compose. Today, we'll evaluate the Hadoop Ozone object store for CockroachDB object-store sink viability.
Join the DZone community and get the full member experience.
Join For FreeThis is the fourth in the series of tutorials on CockroachDB and Docker Compose.
Today, we're going to evaluate the Hadoop Ozone object store for CockroachDB object-store sink viability. A bit of caution, this article only explores the art of possible, please use the ideas in this article at your own risk! Firstly, Hadoop Ozone is a new object store Hadoop Community is working on. It exposes an S3 API backed by HDFS and can scale to billions of files on-prem!
You can find the older posts here: Part 1, Part 2, Part 3.
- Information on CockroachDB can be found here.
- Information on Docker Compose can be found here
- Information on Hadoop Ozone can be found here
- Download ozone 0.4.1 distro
wget -O hadoop-ozone-0.4.1-alpha.tar.gz https://www-us.apache.org/dist/hadoop/ozone/ozone-0.4.1-alpha/hadoop-ozone-0.4.1-alpha.tar.gz tar xvzf hadoop-ozone-0.4.1-alpha.tar.gz
- Modify the compose file for Ozone to include CRDB
cd ozone-0.4.1-alpha/compose
Notice the plethora of compose recipes available here!
We will focus on the ozones3
as we need the S3 gateway. As a homework exercise, try ozones3-haproxy
once you're done with this tutorial. I can see a lot of interesting use cases with that!
cd ozones3
Edit the file and add Cockroach:
crdb: image: cockroachdb/cockroach:v21.2.3 container_name: crdb-1 ports: - "26257:26257" - "8080:8080" command: start-single-node --insecure volumes: - ${PWD}/cockroach-data/data:/cockroach/cockroach-data:rw
The whole docker-compose file should look like so now:
# Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. version: "3" services: datanode: image: apache/ozone-runner:${HADOOP_RUNNER_VERSION} volumes: - ../..:/opt/hadoop ports: - 9864 command: ["ozone","datanode"] env_file: - ./docker-config om: image: apache/ozone-runner:${HADOOP_RUNNER_VERSION} volumes: - ../..:/opt/hadoop ports: - 9874:9874 environment: ENSURE_OM_INITIALIZED: /data/metadata/om/current/VERSION env_file: - ./docker-config command: ["ozone","om"] scm: image: apache/ozone-runner:${HADOOP_RUNNER_VERSION} volumes: - ../..:/opt/hadoop ports: - 9876:9876 env_file: - ./docker-config environment: ENSURE_SCM_INITIALIZED: /data/metadata/scm/current/VERSION command: ["ozone","scm"] s3g: image: apache/ozone-runner:${HADOOP_RUNNER_VERSION} volumes: - ../..:/opt/hadoop ports: - 9878:9878 env_file: - ./docker-config command: ["ozone","s3g"] crdb: image: cockroachdb/cockroach:v21.2.3 container_name: crdb-1 ports: - "26257:26257" - "8080:8080" command: start-single-node --insecure volumes: - ${PWD}/cockroach-data/data:/cockroach/cockroach-data:rw
- Start docker-compose with CRDB and Ozone.
By default, Ozone will start with a single data node, we're going to start it with 3 data nodes at once.
docker-compose up -d --scale=datanode=3
Creating network "ozones3_default" with the default driver Creating ozones3_s3g_1 ... done Creating ozones3_om_1 ... done Creating ozones3_datanode_1 ... done Creating ozones3_datanode_2 ... done Creating ozones3_datanode_3 ... done Creating crdb-1 ... done Creating ozones3_scm_1 ... done
- Check logs for
om
ands3g
docker logs `ozones3_s3g_1` docker logs `ozones3_om_1`
To make sure everything works and S3, as well as Ozone Manager, are up.
2020-01-06 16:30:42 INFO BaseHttpServer:207 - HTTP server of S3GATEWAY is listening at http://0.0.0.0:9878
2020-01-06 16:30:50 INFO BaseHttpServer:207 - HTTP server of OZONEMANAGER is listening at http://0.0.0.0:9874
- Browse the UI.
Ozone exposes a few UIs via HTTP, specifically:
- HDFS Storage Container Manager: http://localhost:9876/#!/
- Gateway: http://localhost:9878/static/
After the bucket is created, you can browse to it:
http://localhost:9878/bucket1?browser
- Create a bucket.
aws s3api --endpoint http://localhost:9878/ create-bucket --bucket=ozonebucket
{ "Location": "http://localhost:9878/ozonebucket" }
- Upload a file to the bucket.
touch test aws s3 --endpoint http://localhost:9878 cp test s3://bucket1/test
artem@Artems-MBP ozones3 % aws s3 --endpoint http://localhost:9878 cp test s3://ozonebucket/test upload: ./test to s3://ozonebucket/test
You can browse the bucket using UI, hit refresh if necessary.
http://localhost:9878/ozonebucket?browser
You can also use aws API:
aws s3 ls --endpoint http://localhost:9878 s3://ozonebucket
artem@Artems-MBP ozones3 % aws s3 ls --endpoint http://localhost:9878 s3://ozonebucket 2020-01-06 12:59:39 0 test
- Setup a changefeed in CRDB to point to Ozone.
The steps here are not much different from the Minio changefeed described in the previous post.
Access the cockroach CLI.
docker exec -it crdb-1 ./cockroach sql --insecure
SET CLUSTER SETTING cluster.organization = '<organization name>'; SET CLUSTER SETTING enterprise.license = '<secret>'; SET CLUSTER SETTING kv.rangefeed.enabled = true; CREATE DATABASE cdc_demo; SET DATABASE = cdc_demo; CREATE TABLE office_dogs ( id INT PRIMARY KEY, name STRING); INSERT INTO office_dogs VALUES (1, 'Petee'), (2, 'Carl'); UPDATE office_dogs SET name = 'Petee H' WHERE id = 1;
- Create an Ozone-specific changefeed.
CREATE CHANGEFEED FOR TABLE office_dogs INTO 'experimental-s3://ozonebucket/dogs?AWS_ACCESS_KEY_ID=dummy&AWS_SECRET_ACCESS_KEY=dummy&AWS_ENDPOINT=http://ozones3_s3g_1:9878' with updated;
root@:26257/cdc_demo> CREATE CHANGEFEED FOR TABLE office_dogs INTO 'experimental-s3://ozonebucket/dogs?AWS_ACCESS_KEY_ID=dummy&AWS_SECRET_ACCESS_KEY=dummy&AWS_ENDPOINT=http://ozones3_s3g_1:9878' with updated; job_id +--------------------+ 518597966522974209 (1 row) Time: 20.3764ms
The AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
are set with dummy data to make changefeed work, Ozone needs to run in Kerberos mode to configure AWS secrets, see this
At this point, go back to the S3 UI and make sure dogs
directory is created. Alas, the directory is there and if you browse to the farthest child directory, you will notice the JSON file.
Again, modifying the rows in the table will produce new files on the filesystem.
UPDATE office_dogs SET name = 'Beathoven' WHERE id = 1;
Clicking on the file will open a new browser tab with the following data:
{"after": {"id": 1, "name": "Beathoven"}, "key": [1], "updated": "1578334166605122300.0000000000"}
We can also confirm the files are there with CLI:
artem@Artems-MBP ozones3 % aws s3 ls --endpoint http://localhost:9878 s3://ozonebucket/dogs/2020-01-06/ 2020-01-06 13:05:45 191 202001061805395834869000000000000-aa12c96bd4b5919c-1-2-00000000-office_dogs-1.ndjson 2020-01-06 13:09:28 99 202001061808465775434000000000001-aa12c96bd4b5919c-1-2-00000001-office_dogs-1.ndjson
aws s3 cp --quiet --endpoint http://localhost:9878 s3://ozonebucket/dogs/2020-01-06/202001061808465775434000000000001-aa12c96bd4b5919c-1-2-00000001-office_dogs-1.ndjson /dev/stdout {"after": {"id": 1, "name": "Beathoven"}, "key": [1], "updated": "1578334166605122300.0000000000"}
Hope you enjoyed this tutorial and come back for more! Please share your feedback in the comments.
Published at DZone with permission of Artem Ervits. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments