External Data Operations on Salesforce Analytics Using Mulesoft Salesforce Analytics Connector Part 2
This article is a continuation of the series dedicated to Salesforce Analytics Integration using Mulesoft's Salesforce Analytics Connector.
Join the DZone community and get the full member experience.
Join For FreeThis post is a continuation of the series dedicated to Salesforce Analytics Integration using Mulesoft's Salesforce Analytics Connector.
If you have missed reading Part 1 of the series, make sure you read it first in order to be familiarized with the various terms and grasp the basic concepts of Salesforce Analytics Integration.
Let's start going with the various scenarios that can come into the picture for loading data in Salesforce Analytics Cloud System.
Scenario 1: Creating a New Dataset/Adding to an Existing Dataset and Adding Records in the Dataset
This is a sample MetaData File, which I have created for our example:
{
"fileFormat": {
"charsetName": "UTF-8",
"fieldsDelimitedBy": ",",
"fieldsEnclosedBy": "\"",
"linesTerminatedBy": "\n",
"numberOfLinesToIgnore": 1
},
"objects": [{
"connector": "CSV",
"fullyQualifiedName": "sampledataforWave_csv",
"label": "Sample Data for Wave",
"name": "sampledataforWave_csv",
"fields": [{
"fullyQualifiedName": "Field1",
"name": "Field1",
"type": "Text",
"label": "Field1"
}, {
"fullyQualifiedName": "Field2",
"name": "Field2",
"type": "Text",
"label": "Field2"
}, {
"fullyQualifiedName": "Field3",
"name": "Field3",
"type": "Date",
"label": "Field3",
"format": "MM/dd/yyyy",
"firstDayOfWeek": -1,
"fiscalMonthOffset": 0,
"isYearEndFiscalYear": true
}, {
"fullyQualifiedName": "Field4",
"name": "Field4",
"type": "Text",
"label": "Field4"
}, {
"fullyQualifiedName": "Field5",
"name": "Field5",
"type": "Text",
"label": "Field5"
}]
}]
}
a. Uploading Record In Multiple Batches
<sub-flow name="salesforce-analytics-batchappend-Sub_Flow">
<set-variable variableName="dataSetContainerName" value="${dataSetContainerName}" doc:name="Variable : DataSetContainerName" doc:description="DataSet Container Name - Salesforce ID or Developer Name of the App in which Dataset is to be created"/>
<enricher source="#[payload]" target="#[flowVars.datasetname]" doc:name="Message Enricher" doc:description="Get the Salesforce ID of the Dataset Created in a variable.">
<sfdc-analytics:create-data-set config-ref="Salesforce_Analytics_Cloud__Basic_authentication" operation="APPEND" description="Sample data Set" label="Data Set 2" dataSetName="demodataset2" edgemartContainer="#[flowVars.dataSetContainerName]" type="metadata\sampledataforWave.json:RELATIVE" doc:name="Salesforce Analytics Cloud : Create DataSet"/>
</enricher>
<dw:transform-message doc:name="Create Sample Data for DataSet">
<dw:set-payload><![CDATA[%dw 1.0
%output application/java
%var sampleSize = 10000
---
(1 to sampleSize) map {
"Field1" : "Field1Value" ++ "$",
"Field2" : "Field2Value" ++ "$",
"Field3" : now as :date,
"Field4" : "Field4Value" ++ "$",
"Field5" : "Field5Value" ++ "$"
}]]></dw:set-payload>
</dw:transform-message>
<batch:execute name="salesforce-analytics-appBatch" doc:name="Batch Execute"/>
</sub-flow>
<batch:job name="salesforce-analytics-appBatch">
<batch:process-records>
<batch:step name="Batch_Step">
<batch:commit size="1000" doc:name="Batch Commit">
<sfdc-analytics:upload-external-data config-ref="Salesforce_Analytics_Cloud__Basic_authentication" type="metadata\sampledataforWave.json:RELATIVE" dataSetId="#[flowVars.datasetname]" doc:name="Salesforce Analytics Cloud : Upload Data Part">
<sfdc-analytics:payload ref="#[payload]"/>
</sfdc-analytics:upload-external-data>
</batch:commit>
</batch:step>
</batch:process-records>
<batch:on-complete>
<sfdc-analytics:start-data-processing config-ref="Salesforce_Analytics_Cloud__Basic_authentication" dataSetId="#[flowVars.datasetname]" doc:name="Salesforce Analytics Cloud : Trigger Data Processing" doc:description="Trigger the processing of data which was uploaded in Parts till now.
On the Data processing is triggered the status can be monitored in Data Manager"/>
</batch:on-complete>
</batch:job>
The Edgemart Container that is used here is "SharedAPP." The SharedApp's Developer name is configured in a System Property, which is collected in a variable. Create Data Set operation is invoked with the Edgemart Container obtained above. APPEND sub-Operation is used herein "Create Data Set" operation.
When creating a new dataset: The name of the dataset provided in "Create Data Set" operation should be unique across the organization.
When appending on an existing dataset: The dataset name needs to be used if the same name is not provided, then a different Dataset would be created.
Create Data Set operation is invoked inside a Message Enricher in order to collect the Salesforce ID of the InsightsExternalData object in a variable, which is needed in later operations for uploading data and data processing. A Transform Message component is used here to generate some random data, but in actual use cases, we would be passing/Transforming the data from some other source(s). Make sure that the actual data being passed is aligned properly to the MetaData JSON. For example: notice that for the Field3 of type Date in Metadata JSON, A Dataweave :date Object (java.util.Date) is passed. Similarly, for Text types, a String is passed.
The transformed data is passed on to the Batch Job, which has only one Batch Step with Batch Commit, which contains the Salesforce connector with "Upload External Data" operation. This operation uses the transformed data and the Salesforce ID received from the previous operation to create various Data Parts associated with the same parent record. The size of the data part is controlled by the Batch Commit Size. Though it is configured to 1000 here, it can be customized to some other value as per requirement, keeping in mind that the maximum size of the InsightsExternalDataPart is 10 MB. After the batch job is completed, the data processing is triggered in the on-complete phase using the "Start Data Processing" operation. This initiates creation of the Salesforce Analytics "Job," which will take care of adding the records from the Data parts created into the actual DataSet.
b. Uploading Record In one Batch
<sub-flow name="salesforce-analytics-append-dataset-Sub_Flow">
<set-variable variableName="dataSetContainerName" value="${dataSetContainerName}" doc:name="Variable : DataSetContainerName" doc:description="DataSet Container Name - Salesforce ID or Developer Name of the App in which Dataset is to be created"/>
<dw:transform-message doc:name="Create Sample Data for DataSet">
<dw:set-payload><![CDATA[%dw 1.0
%output application/java
%var sampleSize = 1000
---
(1 to sampleSize) map {
"Field1" : "Field1Value" ++ "$",
"Field2" : "Field2Value" ++ "$",
"Field3" : now as :date,
"Field4" : "Field4Value" ++ "$",
"Field5" : "Field5Value" ++ "$"
}]]></dw:set-payload>
</dw:transform-message>
<sfdc-analytics:upload-external-data-into-new-data-set-and-start-processing config-ref="Salesforce_Analytics_Cloud__Basic_authentication" type="metadata\sampledataforWave.json" operation="APPEND" description="Sample Data Set 1" label="Data Set 1" dataSetName="demodataset1" edgemartContainer="#[flowVars.dataSetContainerName]" doc:name="Salesforce Analytics Cloud : Create,Upload and Start Processing">
<sfdc-analytics:payload ref="#[payload]"/>
</sfdc-analytics:upload-external-data-into-new-data-set-and-start-processing>
</sub-flow>
This approach will create just one data part on InsightsExternalData record and is ideal for scenarios where the amount of data to be loaded is low. "Upload External Data into new Dataset and Start Processing" operation is used for this. The configuration of parameter's Edgemart Container, Operation, Type is same as the when uploading in batches. The payload is prepared and the Connector is invoked, the rest is taken care by the Connector.
When creating a new dataset: The name of the dataset provided in"Upload External Data into new Dataset and Start Processing" operation should be unique across the organization.
When appending on an existing dataset: The dataset name needs to be used, if the same name is not provided then a different Dataset would be created.
Scenario 2: Overwriting the dataset with the new set of Data.
For Overwriting the dataset, the dataset's name needs to be configured in "Create Data Set" or "Upload External Data into new Dataset and Start Processing" operation whichever is used and OVERWRITE sub-operation needs to be selected. All other configurations are same as APPEND scenario.
<sub-flow name="salesforce-analytics-batch-overwrite-Sub_Flow">
<set-variable variableName="dataSetContainerName" value="${dataSetContainerName}" doc:name="Variable : DataSetContainerName" doc:description="DataSet Container Name - Salesforce ID or Developer Name of the App in which Dataset is to be created"/>
<enricher source="#[payload]" target="#[flowVars.datasetname]" doc:name="Message Enricher" doc:description="Get the Salesforce ID of the Dataset Created in a variable.">
<sfdc-analytics:create-data-set config-ref="Salesforce_Analytics_Cloud__Basic_authentication" operation="OVERWRITE" description="Sample data Set" label="Data Set 2" dataSetName="demodataset2" edgemartContainer="#[flowVars.dataSetContainerName]" type="metadata\sampledataforWave.json" doc:name="Salesforce Analytics Cloud : Overwrite DataSet"/>
</enricher>
<dw:transform-message doc:name="Create Sample Data for DataSet">
<dw:set-payload><![CDATA[%dw 1.0
%output application/java
%var sampleSize = 10000
---
(1 to sampleSize) map {
"Field1" : "Field1Value" ++ "$",
"Field2" : "Field2Value" ++ "$",
"Field3" : now as :date,
"Field4" : "Field4Value" ++ "$",
"Field5" : "Field5Value" ++ "$"
}]]></dw:set-payload>
</dw:transform-message>
<batch:execute name="salesforce-analytics-appBatch" doc:name="Batch Execute"/>
</sub-flow>
The source code of the above scenarios can be found here.
Please note that this post is applicable only for Mule 3. For Mule 4, we have a Salesforce analytics Module instead of a connector. I will be doing a post for it later.
Opinions expressed by DZone contributors are their own.
Comments