Auto Remediation of GuardDuty Findings for a Compromised ECS Cluster in AWSVPC Network Mode
The solution described in this blog will help to quarantine the EC2 instance and also the ECS cluster running on it in the case of Malware attack.
Join the DZone community and get the full member experience.
Join For FreeSummary
It is of utmost importance for enterprises to protect their IT workloads, running either on AWS or other clouds, against a broad range of malware (including computer viruses, worms, spyware, botnet software, ransomware, etc.
AWS GuardDuty Malware Protection service helps customers detect those malicious files in an agent-less mechanism. Once the findings are received, the customers need to automate the process of taking necessary remediation actions. When ECS/MaliciousFile
finding types are received for Amazon ECS clusters running on Amazon EC2 instances; there is more than one way of remediating based on the network mode of ECS tasks in the cluster.
When tasks are running with a bridge or host, the remediating process is relatively simple and requires attaching a security group with no inbound and outbound rules to the underlying EC2 instance. Remediation becomes more complex when tasks are running in awsvpc network mode. This blog will show how to leverage AWS Lambda and AWS EventBridge to automatically isolate an infected ECS Cluster running on EC2 instances in awsvpc network mode.
Prerequisites
2 AWS accounts using AWS Organization, 1 as a root account and another as a member account
GuardDuty should be enabled on both accounts, and the root account should be assigned as an admin account for GuardDuty.
GuardDuty Malware Protection is enabled on the accounts
2 AWS Profiles for using AWS CLI (this needs to be created on the m/c where the concepts described in this blog can be implemented), 1 for the root account and another for a member account, both configured with the user having Administrator Access policy
Limitations
The GuardDuty Malware Protection runs once in 24 hours. There is a wait time of 24 hours for the automatic remediation to trigger. This is not a near real-time solution.
Target Architecture
The GuardDuty-Tester project will be used to simulate a malicious actor in the ECS Cluster. The cloud formation stack provided with that project will set up the following infrastructure in the member account.
Amazon VPC with 1 private and 1 public subnet.
ECS cluster running on EC2 instances with default networking mode in the private subnet and a bastion host in the public subnet.
The following steps are required to be performed to run the ECS cluster in awsvpc
networking mode.
Edit the guardduty-tester.template
as per the instructions given below.
In the section on the definition of
taskdefinition:
ofType: 'AWS::ECS::TaskDefinition'
add the following NetworkMode configuration.YAMLNetworkMode: 'awsvpc' ExecutionRoleArn: Fn::GetAtt: ECSExecutionRole.Arn TaskRoleArn: Fn::GetAtt: TaskInstanceIAMRole.Arn RequiresCompatibilities: - EC2
In the section on the definition of
service:
ofType: 'AWS::ECS::Service'
add the following Network ConfigurationYAMLNetworkConfiguration: AwsvpcConfiguration: SecurityGroups: - !Ref RedTeamSecurityGroup Subnets: - !Ref PrivateSubnet AssignPublicIp: DISABLED
Add the code snippet to create the role
ECSExecutionRole
YAMLECSExecutionRole: Type: AWS::IAM::Role Properties: Path: / AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: "sts:AssumeRole" Principal: { "Service": "ecs-tasks.amazonaws.com"} ManagedPolicyArns: - 'arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly' - 'arn:aws:iam::aws:policy/CloudWatchLogsFullAccess'
Add the code snippet given below to create the role
TaskInstanceIAMRole
YAMLTaskInstanceIAMRole: Type: AWS::IAM::Role Properties: Path: / AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: "sts:AssumeRole" Principal: { "Service": "ecs-tasks.amazonaws.com"} ManagedPolicyArns: - 'arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly' - 'arn:aws:iam::aws:policy/CloudWatchLogsFullAccess' Policies: - PolicyName: ECSTaskRole PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: - "cloudformation:List*" - "cloudformation:Describe*" - "cloudformation:Get*" Resource: "*" - Effect: "Allow" Action: - "cloudwatch:PutMetricData" Resource: "*" - Effect: "Allow" Action: - "ecs:DescribeTaskDefinition" - "ecs:DescribeTasks" Resource: "*" - Effect: "Allow" Action: - "ec2:DescribeSubnets" Resource: "*"
Add the code snippet below to create the
ECSCrossAccountRole
, which will be assumed by the Remediation Lambda function to modify the security groups during remediation.YAMLECSCrossAccountRole: Type: AWS::IAM::Role Properties: Path: / AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: "sts:AssumeRole" Principal: { "AWS": "arn:aws:sts::<admin_account_no>:assumed-role/<remediation lambda role>/<remdiation lambda name>"} Policies: - PolicyName: ECSCrossAccountPolicy PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: - "ecs:ListServices" - "ecs:UpdateService" Resource: "*" - Effect: "Allow" Action: - "ec2:CreateSecurityGroup" - "ec2:ModifyNetworkInterfaceAttribute" - "ec2:RevokeSecurityGroupEgress" - "ec2:RevokeSecurityGroupIngress" - "ec2:DescribeNetworkInterfaces" - "ec2:DescribeSecurityGroupRules" - "ec2:DeleteSecurityGroup" - "ec2:DescribeSecurityGroups" Resource: "*"
Cross-account role to be assumed from a Lambda function running in an Admin account.
As part of the remediation actions, the following components need to be created in the Admin account:
Event-bridge rule to capture “ECS/MaliciousFile” findings and trigger Lambda function.
Lambda function to assume the cross-account role and isolate the infected instances.
Unlike other network modes of running ECS tasks (e.g., host, where the host network is used, or bridge, where there dockers in the built network are leveraged), tasks are allocated their own elastic network interface (ENI) and a primary private IPv4 address when running in awsvpc network mode. Since these ENIs are created by AWS, it is not allowed to change the security group associated with them. Hence the EC2s approach of quarantining the ECS cluster and its tasks doesn’t work for this configuration. To quarantine these tasks, one has to iterate through the list of a security group associated with each ENI and explicitly remove the inbound and outbound rules. The section below described the steps for achieving the same.
A simulated malicious actor logs into the Bastion Host and simulates placing malicious files within the ECS Cluster. Please follow the Step 1,2 and 3 provided in the README.md file of the GuardDuty-Tester project to simulate this.
If the pre-requisite steps are successfully implemented, then the following steps will happen automatically.
The GuardDuty Malicious Protection scans the member account, discovers the presence of a malicious file, and reports that in the form of an
Execution:ECS/MaliciousFile
findings. The below screenshots will validate the same:The finding is pushed to GuardDuty in the Admin account.
GuardDuty findings in the Admin account trigger a CloudWatch Event.
- The CloudWatch Event triggers a rule to invoke the Remediation Lambda.
Remediation Lambda does the following steps.
Assumes a role in the member account which has all the required permissions
-
Python
sts_connection = boto3.client('sts') account_no = os.getenv('CHILD_ACCOUNT') acct_b = sts_connection.assume_role( RoleArn=f"arn:aws:iam::{account_no}:role/<role-name>", RoleSessionName="cross_acct_lambda" ) print('acct_b',acct_b) ACCESS_KEY = acct_b['Credentials']['AccessKeyId'] SECRET_KEY = acct_b['Credentials']['SecretAccessKey'] SESSION_TOKEN = acct_b['Credentials']['SessionToken']
-
Gets the list of services running on the ECS Cluster.
-
Python
cluster = event_dict.get('detail').get('resource').get('ecsClusterDetails').get('arn') #response.get('clusterArns')[0] print('cluster:',cluster) ecs = boto3.client('ecs', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, aws_session_token=SESSION_TOKEN) response = ecs.list_services( cluster=cluster, launchType='EC2', schedulingStrategy='REPLICA' ) service = response.get('serviceArns')[0] for service in response.get('serviceArns'): print('service:',service) for networkInterface in event_dict.get('detail').get('resource').get('instanceDetails').get('networkInterfaces'): vpc_id = networkInterface.get('vpcId') subnet = networkInterface.get('subnetId') eni_id = networkInterface.get('networkInterfaceId')
-
Creates a security group with no inbound and outbound rule.
# create service client using the assumed role credentials, e.g. S3 ec2 = boto3.client( 'ec2', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, aws_session_token=SESSION_TOKEN, ) response = ec2.describe_security_groups( Filters=[ { 'Name': 'vpc-id', 'Values': [vpc_id] } ] ) print('response sg', response) security_group_id = "" for sg in response.get('SecurityGroups'): if sg.get('GroupName') == 'sg_quarantine': security_group_id = sg.get('GroupId') print('qurantined sg',security_group_id) if security_group_id == None or security_group_id == "": print('new sg created') response = ec2.create_security_group(GroupName='sg_quarantine', Description='quarantine security group', VpcId=vpc_id) security_group_id = response['GroupId'] print('new qurantined sg',security_group_id)
-
Python
def remove_all_permission(security_group_id,ec2): response = ec2.describe_security_group_rules( Filters=[ { 'Name': 'group-id', 'Values': [ security_group_id, ] }, ], DryRun=False ) print('sg rules',response) for rule in response.get('SecurityGroupRules'): sg_ruleid = rule.get('SecurityGroupRuleId') if rule.get('IsEgress'): response = ec2.revoke_security_group_egress( DryRun=False, GroupId=security_group_id, SecurityGroupRuleIds=[ sg_ruleid, ] ) else: response = ec2.revoke_security_group_ingress( DryRun=False, GroupId=security_group_id, SecurityGroupRuleIds=[ sg_ruleid, ] ) print('rule response',response)
-
Associates the security group with no inbound/outbound rule with the ECS service using the below code block.
-
Python
def update_sg_service(security_group_id,cluster,service,subnet,ecs ): try: print('service mapping of sg to be changed',service,subnet) response = ecs.update_service( cluster=cluster, service=service, networkConfiguration={ 'awsvpcConfiguration': { 'subnets': [ subnet ], 'securityGroups': [ security_group_id ] } }) print('response after s mapping change:',response) except ClientError as e: print('exception while service remediation',e)
-
Gets the list of network interfaces associated with the tasks.
Iterates through the list of network interfaces
Iterates through the list of security groups associated with each network interface.
Removes all inbound and outbound permission associated with the security group.
Pythondef update_sg_eni(security_group_id,ec2,eni_id): print('eni mapping to be changed:',eni_id) try: response = ec2.describe_network_interfaces( DryRun=False, NetworkInterfaceIds=[ eni_id, ] ) print("Owner", response.get('NetworkInterfaces')[0].get('RequesterManaged')) if response.get('NetworkInterfaces')[0].get('RequesterManaged') == False: eni_response = ec2.modify_network_interface_attribute( DryRun=False, Groups=[ security_group_id ], NetworkInterfaceId=eni_id ) print('response after eni mapping change:',eni_response) else: for group in response.get('NetworkInterfaces')[0].get('Groups'): print('before removing all permisions') remove_all_permission(group.get('GroupId'),ec2) except ClientError as e: print('exception while eni remediation',e)
- The following screenshots taken after the Remediation lambda has successfully runs validates that the ECS cluster has been completely quarantined.
- Shows the ECS cluster created by the Tester project and highlights the VPC, Subnet, and associated Security Group.
2. Please note the Security Group Id in the above screenshot is actually the id of the sg_quarantine
created with no inbound and outbound rule.
3. Search for the guardduty-tester-RedTeamSecurityGroup
in the EC2->Security Groups, and you should see the no inbound and outbound rule for that security group.
Conclusion
The solution described in this blog will help to quarantine the EC2 instance and also the ECS cluster running on it in the case of Malware attack. The auto-remediation helps check the spread of the Malware within the network, and the quarantined instance can then be later inspected for more details or to run some forensics on it.
Opinions expressed by DZone contributors are their own.
Comments