Data Migration from AWS DocumentDB to Atlas on AWS
AWS Database Migration Service provides heterogeneous migrations between different platforms. See the migration scenario between DocumentDB and MongoDB Atlas on AWS.
Join the DZone community and get the full member experience.
Join For FreeAWS Database Migration Service (DMS) provides heterogeneous migrations between different database platforms. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database. Currently, DMS can use AWS DocumentDB as a database source or target; however, DMS only supports MongoDB as a source endpoint for data replication.
Following are the steps to replicate data from DocumentDB to Atlas using the DMS. Please be aware that the twisted configuration may not be supported by AWS.
Here is the architecture diagram:
Notes
Security Group: All DocumentDB cluster, EC2, and DMS replication instances share the same security group for simplicity. The inbound rule required is as follows:
Protocol port source TCP
27017
<security group name>
TCP
22
<your local IP address for ssh access>
Protocol Port Source ALL
ALL
0.0.0.0/0
According to Amazon Premium Support: "By default, TLS configuration is enabled for Amazon DocumentDB clusters. To disable the TLS configuration, create a new custom Amazon DocumentDB cluster parameter group. Set the tls parameter to disabled, and then modify the cluster to use the new cluster parameter group. For more information, see Managing Amazon DocumentDB cluster parameter groups."
As the AWS DMS service is unable to resolve SRV records, you need to provide the FQDN of the Atlas primary node as the server name for the target endpoint configuration. However, the replication task will fail if the primary member of the replica set in Atlas has been re-elected. You need to stop the task, reconfigure the target endpoint with a new primary host name and then resume the task again. You can’t modify the target endpoint settings while the task is running.
Alternatively, you can create a single-sharded cluster and use the hostname and port# of one of the Mongo processes for the target endpoint configuration.
DMS supports migrating multiple databases in a single task. As stated in the AWS Database Migration Service User Guide:
"In the AWS Management Console for AWS DMS, leave Database name empty under Endpoint configuration on the Create endpoint page."
"For each database that you want to migrate from this Amazon DocumentDB source endpoint, specify the name of each database as the name of a schema in the table-mapping for the task using either the guided input in the console or directly in JSON." For example:
{
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "1",
"object-locator": {
"schema-name": "Customers",
"table-name": "%"
},
"object-locator": {
"schema-name": "Orders",
"table-name": "%"
},
"object-locator": {
"schema-name": "Inventory",
"table-name": "%"
},
"rule-action": "include"
}
]
}
Steps
1. Create a Document DB Cluster
- Create a DocumentDB cluster:
- Select the DocumentDB service.
- Click the Clusters from the left panel.
- Click on the Create button.
- Configure the following settings:
- Configuration:
- Cluster identifier: <name of the DocumentDB cluster>
- Engine Version: 4.0.0
- Instance class: db.t3.medium (free trial eligible)
- Number of instances: 1
- Authentication:
- Master username: <name of the master user>
- Master password/Confirm master password: <password of the master user>
- Enable “Show advanced settings.“
- Network settings:
- Virtual Private Cloud: <use default or any specific VPC>
- Subnet group: <use default or any specific subnet group>
- VPC security groups: <use the one specified on Note A>
- Cluster options:
- Port: 27017
- Cluster parameter group: default.docdb4.0. You can take the default values for other settings.
- Click the Create cluster button. Reference Note B if you want to disable TLS on the DocumentDB cluster.
2. Create an EC2 Instance to Load the Sample Data
DocumentDB can be accessed directly by Amazon EC2 instances or other AWS services that are deployed in the same Amazon VPC. However, you can only use SSH tunneling to access your Amazon DocumentDB resources if you want to access the resource from outside the cluster's VPC.
An EC2 instance was created with the same network settings (VPC, subnet group, and security group) as the DocumentDB cluster. Once the EC2 instance is ready, install the mongo
shell and load some sample data into the DocumentDB cluster. You can reference steps 4 to 6 on the Get Started Guide.
Run the following command to enable change streams on the source database.
db.adminCommand({modifyChangeStreams: 1,
database: "<source database name>",
collection: "",
enable: true});
3. Create a Replication Subnet Group
- Select the Data Migration Service service.
- Click the Subnet groups from the left panel.
- Click on the Create subnet group button.
- Provide the VPC and subnet(s) you want the replication instance created on.
- Click on the Create subnet group button.
4. Create a DMS Replication Instance
- On the same service, Click the Replication instances from the left panel.
- Click on the Create replication instance button.
- Configure the following settings.
- Replication instance configuration:
- Name: <name of replication instance>
- Descriptive Amazon Resource Name (ARN): Leave it blank
- Description: <description of instance>
- Instance class: dms.t3.small
- Engine version: 3.4.6
- Allocated storage (GiB): 50
- VPC: <use default or any specific VPC>
- Multi AZ: Single-AZ
- Publicly accessible: Tick on the box (add this IP address to the whitelist on Atlas)
b. Advanced security and network configuration:
- Replication subnet group: <subnet group created on step 3>
- VPC security group(s): <use the one specified on Note A>
You can take the default values for other settings.
4. Click the Create button.
5. Import DocumentDB Certificate (Optional)
If the DocumentDB cluster is TLS enabled, perform the steps from AWS documentation so that DMS can connect to Amazon DocumentDB using TLS.
6. Create a DocumentDB Source Endpoint
- On the same service, Click the Endpoints from the left panel.
- Click on the Create endpoint button.
- Configure the following settings:
- Endpoint type: Source endpoint
- Endpoint configuration:
- Endpoint identifier: < name of the endpoint>
- Descriptive Amazon Resource Name (ARN): Leave it blank
- Source engine: Amazon DocumentDB
- Access to endpoint database: Provide access information manually
- Server name: < server name of DocumentDB cluster>
e.g.mdb-ddb-db.cluster-c50ma7ye6nnl.ap-southeast-2.docdb.amazonaws.com - Port: 27017
- Secure Socket Layer (SSL) mode: verify-full (none if tls has been disabled)
- User name: <name of the master user>
- Password: <password of the master user>
- Authentication source: admin
- Database name: <replicated database name>
- Metadata mode: document
- _id as a separate column: Leave it as un-checked
- Expand Test endpoint connection (option):
- VPC: <select the VPC the replication instance created on>
- Replication instance: <select the one created on Step 4>
- Click the Run test button.
- You may need to troubleshoot the issue if the test failed.
- Click Create endpoint button.
7. Import Atlas Certificate
- Download and save the Atlas root CA certificate on your local drive.
- On the same service, Click the Certificates from the left panel.
- Click on the Import Certificate button.
- Configure the following settings:
- Certificate configuration:
- Certificate identifier: ISRG-Root-X1
- Import certificate file: <choose the file saved in step a>
- Certificate configuration:
- Click the Import certificate button.
8. Update the IP Access List
Update the IP access list of the Atlas project with the public IP address of the DMS replication instance.
9. Create an Altas Target Endpoint
- On the same service, Click the Endpoints from the left panel.
- Click on the Create endpoint button.
- Configure the following settings:
- Endpoint type: Target endpoint
- Endpoint configuration:
- Endpoint identifier: < name of the endpoint>
- Descriptive Amazon Resource Name (ARN): Leave it blank.
- Target engine: Amazon DocumentDB
- Access to endpoint database: Provide access information manually.
- Server name: <primary server name of Atlas replication set>; e.g., cluster1-shard-00-02.2psdk.mongodb.net. The replication task will fail if the primary member has been re-elected (see Note C).
- Port: 27017
- Secure Socket Layer (SSL) mode: verify-full
- CA certificate: ISRG-Root-X1
- User name: <name of the master user>
- Password: <password of the master user>
- Database name: <target database name>
- Expand Test endpoint connection (option).
- VPC: <use the VPC the replication created on>
- Replication instance: <select the one created on Step 4>
- Click the Run test button. You may need to troubleshoot the issue if the test failed.
- Click Create endpoint button.
10. Create a Full Load + CDC Replication Task
- On the same service, click the Database migration tasks from the left panel.
- Click on the Create task button.
- Configure the following settings.
- Task configuration:
- Task identifier: <name of the replication task>
- Descriptive Amazon Resource Name (ARN): Leave it blank
- Replication instance: <select the one created on step 4>
- Source database endpoint: <select the endpoint created on step 6>
- Target database endpoint: <select the endpoint created on step 9>
- Migration type: Migrate existing data and replicate ongoing changes
- Task settings:
- Editing mode: Wizard
- Target table preparation mode: Drop tables on target
- Stop task after the full load completes: Don't stop
- Include LOB columns in replication: Limited LOB mode
- Maximum LOB size (KB): 32
- Enable validation: Leave it as un-check
- Enable CloudWatch logs: Leave it as un-check
- Table mappings:
- Editing mode: Wizard
- Click on Add new selection rule
- Schema: Enter a schema
- Source name: <replicated database name from source>
- Table name: <replicated collection name from source>
- Action: Include
- You can get more information about DMS selection rules from the official documentation.
- Premigration assessment:
- Enable premigration assessment run: Leave it as un-checked.
- Migration task startup configuration:
- Start migration task: Automatically on create
- Task configuration:
- Click on the Create task button.
11. Monitor the Migration Task
Once the task has started, you can monitor the progress of Full Load and CDC replication on the individual collection with the Table Statistics tab.
Opinions expressed by DZone contributors are their own.
Comments