Using CockroachDB Dedicated With Starburst Galaxy
In this article, we're going to look at connecting Starburst Galaxy, a Trino SaaS service running with CockroachDB Dedicated.
Join the DZone community and get the full member experience.
Join For FreePrevious Articles
I've written about Presto and CockroachDB in the past, you may find the article below:
Data federation with CockroachDB and Presto
Motivation
I had a customer evaluate CockroachDB Serverless for their workload and one of the requirements was connecting it to Starburst Galaxy. In this article, we're going to discuss the challenges and the workarounds necessary to integrate Galaxy with Cockroach Cloud.
High-level Steps
- Start a 9-node multi-region cluster (CockroachDB Dedicated)
- Create a Galaxy Account
- Spin up a Trino cluster
- Conclusion
Step-by-Step Instructions
Start a 9-Node Multi-region Cluster (CockroachDB Dedicated)
I am using a multi-region CockroachDB Dedicated cluster spanning us-east-1, us-east-2 and us-west-2 regions in AWS. You can get a 30-day trial of CockroachDB Dedicated following this link.
While we're in the Cockroach Cloud console, let's capture the required info for the next step:
Click "Connect" > select "IP Allowlist" or "AWS PrivateLink" > select user, region and database > click "Next"
On the next page, select "Connection parameters"
The host displayed is pointing to a regional load-balanced endpoint. If this is a multi-region cluster, you need to repeat the step for each region where you want Galaxy to communicate with the CockroachDB cluster.
You can click "Back" once and from the "Region" drop-down, select the region of interest
Once you select the region, you can repeat the steps above to capture the connection parameters
Here's what the West 2 region connection parameters look like
Create a Galaxy Account
You can get a free Galaxy account using the following link. Once you sign up for an account and log in, you will be greeted with an option to create a catalogue.
We are going to use CockroachDB Dedicated for catalogue. Click on configure a catalog
button.
Select PostgreSQL
Here you will enter your CockroachDB Dedicated information
Enter your desired catalogue name and description > host, which is the host you captured in the previous step, i.e. artem-mr-7xw.aws-us-east-1.cockroachlabs.cloud
> port, i.e. 26257 > database name > user > password
Once complete, click "Test connection"
If the next step fails and you're greeted with an error like below
You must add the IP CIDR from Galaxy to the CockroachDB IP Allowlist
Once that's complete, click "Test connection" again
Click "Connect catalog"
On the next page, set access roles
Click "Save access control"
On the next page, you must choose an existing cluster or create a new cluster
I don't have an existing cluster and will create one
When you click "Add to cluster" you will get a pop up
I will click "Query my data" to start browsing my data from within Galaxy
At this point, you may see a spinning wheel. I prefer to filter my criteria and will select a specific schema in the query editor
At this point, you can query the data
At this point, you can create additional catalogues for each participating region in CockroachDB and have the associated Galaxy endpoint talk with the regional CockroachDB cluster.
Conclusion
In conclusion, I'd like to say this is a fairly straightforward setup except for one hang-up. Galaxy requires CIDR IP Allowlisted database services and it is a hard requirement which disqualifies our CockroachDB Serverless offering as there is currently no IP Allowlisting option available. Hopefully, you will find this tutorial useful.
Published at DZone with permission of Artem Ervits. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments