External Data Operations on Salesforce Analytics Using Mulesoft Salesforce Analytics Connector Part 3
This post is a continuation of the series dedicated to Salesforce Analytics Integration using Mulesoft's Salesforce Analytics Connector.
Join the DZone community and get the full member experience.
Join For FreeThis post is a continuation of the series dedicated to Salesforce Analytics Integration using Mulesoft's Salesforce Analytics Connector.
If you have missed reading Part 1 and Part 2 of the series, make sure you read it first.
We already covered some scenarios in our previous post. In this post, we will be using upsert, delete, operations over a Salesforce Analytics Dataset.
Scenario 3: Creating a Dataset With a Primary Key and Inserting/Updating Records in the Dataset
This is a sample MetaData File, which I have created for our example:
{
"fileFormat": {
"charsetName": "UTF-8",
"fieldsDelimitedBy": ",",
"fieldsEnclosedBy": "\"",
"linesTerminatedBy": "\n",
"numberOfLinesToIgnore": 1
},
"objects": [{
"connector": "CSV",
"fullyQualifiedName": "uniquedataforWave_csv",
"label": "Unique Data for Wave",
"name": "uniquedataforWave_csv",
"fields": [{
"fullyQualifiedName": "CustomerID",
"name": "CustomerID",
"type": "Text",
"label": "Customer ID"
}, {
"fullyQualifiedName": "PurchaseID",
"name": "PurchaseID",
"type": "Text",
"label": "Purchase ID"
}, {
"fullyQualifiedName": "ItemsPurchased",
"name": "ItemsPurchased",
"type": "Text",
"label": "Items Purchased",
"isMultiValue": true,
"multiValueSeparator" : ";"
}, {
"fullyQualifiedName": "PurchaseAmount",
"label": "Purchase Amount",
"name": "PurchaseAmount",
"defaultValue": "0",
"type": "Numeric",
"precision": 10,
"scale": 2,
"format": "$#,##0.00"
}, {
"fullyQualifiedName": "PurchaseDate",
"name": "PurchaseDate",
"type": "Date",
"label": "Purchase Date",
"format": "MM/dd/yyyy",
"firstDayOfWeek": -1,
"fiscalMonthOffset": 0,
"isYearEndFiscalYear": true
}, {
"fullyQualifiedName": "PrimaryKeyField",
"name": "PrimaryKeyField",
"type": "Text",
"label": "PrimaryKeyField",
"isUniqueId": true,
"description": "Combination of Purchase ID and Purchase Date"
}]
}]
}
Please note that I have declared a Primary key "PrimaryKeyField" by using the attribute "isUniqueId" and setting it to true. This primary key will enable us to upsert and delete records in the existing dataset. To create and upload data into Salesforce Wave with this Metadata, we will be using UPSERT operation. APPEND can't be used when declared the primary key. For an UPSERT operation, Wave will identify the records that were already present in the Dataset and will perform an update on those records with the new values, and for the records loaded, which were not already present in the DataSet wave and will insert those records in the dataset.
<sub-flow name="salesforce-analytics-batch-upsert-Sub_Flow">
<set-variable variableName="dataSetContainerName" value="${dataSetContainerName}" doc:name="Variable : DataSetContainerName" doc:description="DataSet Container Name - Salesforce ID or Developer Name of the App in which Dataset is to be created"/>
<enricher source="#[payload]" target="#[flowVars.datasetid]" doc:name="Message Enricher" doc:description="Get the Salesforce ID of the Dataset Created in a variable.">
<sfdc-analytics:create-data-set config-ref="Salesforce_Analytics_Cloud__Basic_authentication2" operation="UPSERT" description="Sample data Set" label="Data Set 2" dataSetName="demodataset2" edgemartContainer="#[flowVars.dataSetContainerName]" type="metadata\uniqueDataForWave.json:RELATIVE" doc:name="Salesforce Analytics Cloud : Upsert DataSet"/>
</enricher>
<dw:transform-message doc:name="Create Sample Data for DataSet">
<dw:set-payload><![CDATA[%dw 1.0
%output application/java
---
[
{"CustomerID":"CustomerID1","PurchaseID":"PurchaseID1","ItemsPurchased":["Item1","Item7","Item9"],"PurchaseAmount":1.1,"PurchaseDate":"2018-06-23"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID1-23/06/2018"},
{"CustomerID":"CustomerID2","PurchaseID":"PurchaseID2","ItemsPurchased":["Item2"],"PurchaseAmount":2.2,"PurchaseDate":"2018-06-24"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID2-24/06/2018"},
{"CustomerID":"CustomerID3","PurchaseID":"PurchaseID3","ItemsPurchased":["Item6","Item10"],"PurchaseAmount":3.3,"PurchaseDate":"2018-06-25"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID3-25/06/2018"},
{"CustomerID":"CustomerID4","PurchaseID":"PurchaseID4","ItemsPurchased":["Item4","Item2"],"PurchaseAmount":4.4,"PurchaseDate":"2018-06-26"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID4-26/06/2018"},
{"CustomerID":"CustomerID5","PurchaseID":"PurchaseID5","ItemsPurchased":["Item9","Item1"],"PurchaseAmount":5.5,"PurchaseDate":"2018-06-27"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID5-27/06/2018"},
{"CustomerID":"CustomerID6","PurchaseID":"PurchaseID6","ItemsPurchased":["Item6"],"PurchaseAmount":6.6,"PurchaseDate":"2018-06-28"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID6-28/06/2018"},
{"CustomerID":"CustomerID7","PurchaseID":"PurchaseID7","ItemsPurchased":["Item2","Item4"],"PurchaseAmount":7.7,"PurchaseDate":"2018-06-29"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID7-29/06/2018"},
{"CustomerID":"CustomerID8","PurchaseID":"PurchaseID8","ItemsPurchased":["Item1"],"PurchaseAmount":8.8,"PurchaseDate":"2018-06-30"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID8-30/06/2018"},
{"CustomerID":"CustomerID9","PurchaseID":"PurchaseID9","ItemsPurchased":["Item9","Item6"],"PurchaseAmount":9.9,"PurchaseDate":"2018-07-01"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID9-01/07/2018"},
{"CustomerID":"CustomerID10","PurchaseID":"PurchaseID10","ItemsPurchased":["Item1","Item8","Item9","Item4"],"PurchaseAmount":10.1,"PurchaseDate":"2018-07-02"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID10-02/07/2018"}
]]]></dw:set-payload>
</dw:transform-message>
<batch:execute name="salesforce-analytics-appBatch2" doc:name="Batch Execute"/>
</sub-flow>
<batch:job name="salesforce-analytics-appBatch2">
<batch:process-records>
<batch:step name="Parts_Upload_Batch_Step">
<batch:commit size="5" doc:name="Batch Commit">
<sfdc-analytics:upload-external-data config-ref="Salesforce_Analytics_Cloud__Basic_authentication2" type="metadata\uniqueDataForWave.json:RELATIVE" dataSetId="#[flowVars.datasetid]" doc:name="Salesforce Analytics Cloud : Upload Data Part">
<sfdc-analytics:payload ref="#[payload]"/>
</sfdc-analytics:upload-external-data>
</batch:commit>
</batch:step>
</batch:process-records>
<batch:on-complete>
<sfdc-analytics:start-data-processing config-ref="Salesforce_Analytics_Cloud__Basic_authentication2" dataSetId="#[flowVars.datasetid]" doc:name="Salesforce Analytics Cloud : Trigger Data Processing" doc:description="Trigger the processing of data which was uploaded in Parts till now. On the Data processing is triggered the status can be monitored in Data Manager"/>
</batch:on-complete>
</batch:job>
For the sake of illustration, I have kept Batch commit size as 5, however, in actual use cases, the Size should be greater than this. We have to decide on a size keeping in mind that "upload-external-data" operation will create an InsightsExternalDataPart object, which can accommodate up to 10MB data.
Also, note that the sample data created in Dataweave in the above snippet are all unique records. After the Batch job completes, A dataset gets created and the records get inserted in analytics dataset. If the dataset was already present, then the records get upserted (update if the record already exists; insert if the record was not present before).
For a Dataset declared with a primary key, we have to be aware of how Upsert actually works in Salesforce Analytics Cloud system on a dataset. As I have mentioned earlier in the previous posts, after the "start-data-processing" operation is completed using Mulesoft's connector, a job gets created in Salesforce Analytics, which takes care of the actual upload/delete/append/overwrite. For an Upsert operation, Salesforce Analytics does not allow duplicate records i.e. records with the same primary key. The entire job will fail in such cases.
For example, if, suppose, 2 out of 10 records were passed with the same primary key, "PurchaseID1-23/06/2018," then after the Mule Bbtch job completes, the Salesforce Analytics job will fail out complaining that, "Something went wrong while executing the digest node: Primary Key validation has failed for dimension PrimaryKeyField. It contains the following non-unique value(s): PurchaseID1-23/06/2018." Even the eight unique records will be rejected. So, take care with the interface design approach so that only unique values are passed to the Wave on one particular invocation (it doesn't matter even if duplicate records were scattered across different data parts or in the same data part).
Also, keep in mind that every time "start-data-processing" is used and a Salesforce Analytics job has been created, it's actually a costly operation since a Salesforce organization generally has a very small number of these job quotas per day.
Scenario 4: Deleting Records in the DataSet
For Deleting records in the dataset, the dataset's name needs to be configured in "Create Data Set" or "Upload External Data into new Dataset and Start Processing" operation whichever is used and DELETE sub-operation needs to be selected.
<sub-flow name="salesforce-analytics-batch-delete-Sub_Flow">
<set-variable variableName="dataSetContainerName" value="${dataSetContainerName}" doc:name="Variable : DataSetContainerName" doc:description="DataSet Container Name - Salesforce ID or Developer Name of the App in which Dataset is to be created"/>
<enricher source="#[payload]" target="#[flowVars.datasetid]" doc:name="Message Enricher" doc:description="Get the Salesforce ID of the Dataset Created in a variable.">
<sfdc-analytics:create-data-set config-ref="Salesforce_Analytics_Cloud__Basic_authentication2" operation="DELETE" description="Sample data Set" label="Data Set 2" dataSetName="demodataset2" edgemartContainer="#[flowVars.dataSetContainerName]" type="metadata\uniqueDataForWave.json:RELATIVE" doc:name="Salesforce Analytics Cloud : Upsert DataSet"/>
</enricher>
<dw:transform-message doc:name="Create Sample List of Records to be Deleted">
<dw:set-payload><![CDATA[%dw 1.0
%output application/java
---
[
{"PrimaryKeyField":"PurchaseID1-23/06/2018"},
{"PrimaryKeyField":"PurchaseID2-24/06/2018"},
{"PrimaryKeyField":"PurchaseID5-27/06/2018"},
{"PrimaryKeyField":"PurchaseID7-29/06/2018"},
{"PrimaryKeyField":"PurchaseID10-02/07/2018"}
]]]></dw:set-payload>
</dw:transform-message>
<batch:execute name="salesforce-analytics-appBatch2" doc:name="Batch Execute"/>
</sub-flow>
<batch:job name="salesforce-analytics-appBatch2">
<batch:process-records>
<batch:step name="Parts_Upload_Batch_Step">
<batch:commit size="10000" doc:name="Batch Commit">
<sfdc-analytics:upload-external-data config-ref="Salesforce_Analytics_Cloud__Basic_authentication2" type="metadata\uniqueDataForWave.json:RELATIVE" dataSetId="#[flowVars.datasetid]" doc:name="Salesforce Analytics Cloud : Upload Data Part">
<sfdc-analytics:payload ref="#[payload]"/>
</sfdc-analytics:upload-external-data>
</batch:commit>
</batch:step>
</batch:process-records>
<batch:on-complete>
<sfdc-analytics:start-data-processing config-ref="Salesforce_Analytics_Cloud__Basic_authentication2" dataSetId="#[flowVars.datasetid]" doc:name="Salesforce Analytics Cloud : Trigger Data Processing" doc:description="Trigger the processing of data which was uploaded in Parts till now. On the Data processing is triggered the status can be monitored in Data Manager"/>
</batch:on-complete>
</batch:job>
Note that, for delete, we have to pass the Primary Key as the onlyfield. Using the primary key Analytics will delete the record if it exists. Same as upsert, we have to pass a unique set of primary keys in this case as well.
The source code of the above scenarios can be found here.
Please note that this post is applicable only for Mule 3. For Mule 4, we have a Salesforce analytics Module instead of a connector. I will be doing a post for it later.
References
Opinions expressed by DZone contributors are their own.
Comments