Integrating Salesforce With Google BigQuery for Cortex Framework Deployment
Step-by-step process of connecting your Salesforce instance with Google BigQuery using Cloud Composer DAGs that are provided by Google Cortex Framework.
Join the DZone community and get the full member experience.
Join For FreeIn this document, I am going to put together a step-by-step process of connecting your Salesforce instance with Google BigQuery using Cloud Composer DAGs that are provided by Google Cortex Framework.
Steps To Be Performed on the Salesforce Account
For this setup, I’ve used a free version of the Salesforce Developer account (free version). You can do so by logging in at their site.
1. Create a Profile in Salesforce
That has the following permissions:
- Apex REST Services and API Enabled.
- View All permissions for all objects that you would like to replicate.
- Is ideally not granted any permissions related to user interface login.
For this setup, I am using the profile System Administrator. The recommended approach is to create a profile specifically for this task with the above-mentioned permissions. Below is how and where you can create and update the permissions.
Logging into your salesforce account with a user that has permission to create a profile and then go to Setup. In the left-hand menu, go to Administration → Users → Profiles.
For permissions on object to replication, you can manage that in the Field-Level Security.
2. Create or Use Existing Salesforce Profile
The next step is to create or use an existing user in Salesforce and assign it to the profile that has been created in Step #1. Note down the following details:
- Username
- Password
- Security Token
Caution: In case you don’t have the security token, you can use the below steps to generate it. However, in case you have existing applications using this security token for authentication, they would need to be updated with the new token to make it work. If you are in a production system, please confirm with your Administrator before doing this!
- Profile Avatar → Settings → My Personal Information → Reset My Security Token
3. Create a Connected App
The next step is to create a Connected App, which will be used for establishing the connection to the outside world, i.e., Cortex Framework’s SFDC->RAW Ingestion module. This will be done with the profile and user ID you created in the above steps.
- Setup → App Manager → New Connected App → Provide the App Name (in this case GCPBQ)
- In the API (Enable OAuth Settings) → Select Oauth Scopes (In this case Full Access) → Callback URL (In this case OAuth/callback.)
- Enable the checkbox box ‘Require Secret for Web Server Flow’ and ‘Require Secret for Refresh Token Flow.’
- You will need the Consumer Key/Client ID in the later steps. To do so, click on ‘Manage Consumer Details’ → It will open a new page and will ask for a verification code that will be sent to your email address.
4. Assign the Connected App
The last step here is to assign the connected app you created above to the profile created in Step #1.
- Setup → Profile → Edit → Connected App Access → checkbox your connected app (in this case GCPBQ)
Steps To Be Performed on Your Google Cloud Account
1. Deploy the Cortex Framework for Salesforce Data
Deploy the Cortex Framework for Salesforce data with Test Data as false. Follow the steps mentioned in the readme of this git repo. This will get the DAGs in the GCS bucket you’ve mentioned in the config.json. These DAGs will later be used for extracting data from your Salesforce instance to your BigQuery ‘RAW’ dataset.
Make sure to deploy to a target bucket that is NOT your Cloud Composer DAG bucket. This way, you can inspect the generated DAGs to ensure correctness before they actually get executed.
2. Create a Cloud Composer Instance
The next step is to install some dependencies for Salesforce to execute the Python scripts in the DAGs provided by the Data Foundation framework. Refer to this link for details.
Here is the command to install the required dependencies:
$ gcloud composer environments update <ENVIRONMENT_NAME> \ --location <LOCATION> \ --update-pypi-package <PACKAGE_NAME><EXTRAS_AND_VERSION>
- For Airflow version 1.10 use - “apache-airflow-backport-providers-salesforce>=2021.3.3”
- For Airflow version 2.x use - “apache-airflow-providers-salesforce~=5.2.0”
You can find the required libraries based on your composer version in this document.
3. Create a Secret to Sore Your Salesforce Connection Information
Create a secret to store your Salesforce connection information in the Google Cloud Secret Manager. Follow this document for details.
The secret name should have a prefix airflow-connections-<your secret name>. In this case, it should be named airflow-connections-salesforce-conn.
The value of this secret is:
http://<username>:<password>@https%3A%2F%2F<instance-name>.lightning.force.com?client_id=<client_id>&security_token=<secret-token>
Replace the username, password, and security token from Step #2 in the section “Steps to be performed in the Salesforce account.”
To get the instance name of your Salesforce account, refer to this document.
4. Create the Connection in the Composer Environment for Bigquery
Details about different connections for the composer settings are mentioned in this document.
- Go to the Airflow webserver UI.
- Go to Admin → Connections → Add a new record.
Provide the following details:
- Connection Id: sfdc_cdc_bq
- Connection Type: Google Cloud
- Project ID: <your project ID where the bq datasets are created>
- Keyfile JSON: <JSON Keyfile for the service account, which has BigQuery Data Editor and BigQuery Job User roles.
Refer to this document on how to create the JSON Key for the service account.
Similarly, you need to create another connection, sfdc_reporting_bq.
5. Enable Secret Manager as a Backend in Cloud Composer
Refer to this doc for more details regarding Secrets Manager.
Go to your composer instance and then go to Airflow Configuration Overrides and maintain below two secrets:
Key: backend
Value:airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend
2. Key: backend_kwargs
Value: {"project_id": <gcp project id>”, "connections_prefix":"airflow-connections", "variables_prefix":"airflow-variables", "sep":"-"}
6. Allow the Composer Service Account to Access the Secrets
Look for the service account for your composer instance and make sure it has the Secret Manager Secret Accessor permission.
- The default is the GCE service account.
7. Copy Generated Files
After the inspection and ensuring correctness, copy generated files from the generated DAG bucket to your actual Cloud Composer DAG bucket.
Once the composer instance is created, use the below command to copy the DAGs and data from the output bucket, which was used in the config.json file for Cortex deployment to the composer bucket:
gsutil -m cp -r gs://<output bucket>/dags/ gs://<composer dag bucket>/
gsutil -m cp -r gs://<output bucket>/data/ gs://<composer sql bucket>/
8. Go Into Airflow UI and Start the RAW DAGs
You can navigate to Airflow UI from your cloud composer instance (will open a new page). You’ll see all the dags that were copied in the previous step. You can adjust the frequency of the DAGs based on the requirement, and it will run as per the schedule.
9. Checkout Your BigQuery RAW Datasets
Once the DAGs are executed, you’ll be able to see the data in RAW and CDC datasets within your BigQuery in the target project.
This document only focuses on the integration option using Salesforce API. Refer to the Cortex framework for the Salesforce blog to see what analytics content is available out of the box for you to use to unlock richer insights and make smart business decisions.
Google Cloud is a top choice for customers seeking data-driven insights, and data integration is a first step in the journey. Massive scalability, reliability, security, and AI-rich features make Google Cloud an ideal platform to accomplish your business goals and objectives and pave the way for an innovation-driven expedition.
DISCLAIMER: The opinions and viewpoints are solely mine in this article and do not reflect my employer's official position or views. This article should not be considered an official endorsement or statement from my employer.
Opinions expressed by DZone contributors are their own.
Comments