Verifying Infrastructure, Using Ansible With Very Little Knowledge of It
Take a look at how you can verify a Linux-based environment with Ansible's standup module.
Join the DZone community and get the full member experience.
Join For Free
Introduction
Ansible has been a popular tool for verifying Linux-based infrastructure environment. Usage of the standup module makes it extremely easy as it demands very little knowledge of Ansible itself.
An infrastructure environment, whether it is provisioned in a public cloud or private cloud or built using bare-metal machines in a data center, would need some amount of verification before deploying applications, however ephemeral the life of infrastructure would be. Typically, such verifications are needed during various stages of provisioning and configuring a cluster of machines, and, later when that cluster is available as the infrastructure for running software systems.
The verification checks are needed to ensure that:
- The infrastructure is provisioned with the required compute and storage capacities. While the use of machine images and automated provisioning processes minimizes many oversights in this regard, just like newly-built software is tested prior to put it in use, an outcome of Infrastructure-as-Code (IaC) should be verified as well, and performing that step manually would be incongrous and be avoided.
- The baseline software applications are running on the machines and they are configured correctly. The baseline software tend to be baked into a machine image and so their availability is assured when the machines are brought up. However, their configuration and running state have to be verified.
- The network access is open. The internal network configuration of the newly-built envirorment, such as connectivity between machines, load balancers and proxy servers, is part of provisioning process and that has to be verified. The working of a newly-created environment would also depend on external network access such as access to a shared data repository in a different network or subnet and configuration management services, and access to other application instances that need to be integrated.
- The applications are running fine. While the focus here is to verify the infrastructure, the methods used to check on the baseline applications could be used to verify the main application as well — after a new deployment as well as periodically to check runtime health.
All except for the last task discussed above are performed one time. The last step, the verification of applications deployed in a newly stood up environment, also overlaps with general monitoring requirements.
The Standup Module
Ansible can be used for both provisioning and configuring infrastructure. There are related Ansible modules available, especially those helping with provisioning infrastructure in AWS, Azure, and VMWare. There are also modules available for configuring popular baseline software such as Docker and RabbitMQ.
However, each verification step has to be coded as a task or role in Ansible. The standup module we will explore here provides a framework for automating the verification steps. Using that, you can define a suite of verification steps in a YAML file and run those checks using Ansible against a cluster of machines and the software applications run on them, both baseline and the applications.
The advantages of using this module are numerous:
- There is no need to have the knowledge of Ansible for writing tests cases as the tests are specified using simple system commands and YAML.
- Robust error checking and flexible options to determine what is a valid status of running a test.
- Option to heal the state of an environment if an issue is identified.
- Provides hook to integrate a verification process with monitoring applications so the statuses and metrics generated by the vertification checks could be used for rolling out standard monitoring features.
Nomenclature
The terminology used in the module must be well understood to configure verification tests effectively.
Cluster
A group of machines, typically in the same network, created for running one or more applications.
Role
Software role of a machine in a cluster, such as API server, web server, etc., that determines which baseline software and application components would get installed on it. A machine can have multiple roles.
Checks File
The checks that are executed and evaluated by the standup module are specified in a YAML file. Multiple checks can be present in a checks file and each check can be tagged with one or more role names. These tags would help to execute a test on a subset of relevant machines in the cluster.
A sample checks file is shown below:
#CentOS smoke checks for standup module.
---
title: Test suite for verifying standup module on CentOS.
checks:
- name: "Test common 1"
description: Check if the OS is CentOS
command: cat /etc/os-release |grep ^NAME|grep CentOS
- name: "Test web 1"
description: Check if Apache service is running
roles: web
command: sudo service httpd status | grep -v "not running"
heal: sudo service httpd start
- name: "Test web 2"
description: Check if there is any Apache activity
roles: web
command: ls /var/log/httpd/access_log
- name: "Test db 1"
description: Check if mysql service is running
roles: db
command: ps -ef |grep mariadb|wc -l
heal: sudo service mariadb start
output_compare:
type: number
value: 3
operator: EQ
- name: "Test web 3"
description: Verify if there are more than 2 Apache processes running
command: ps -ef | grep httpd | grep -v color | wc -l
output_compare:
type: number
value: 2
operator: GT
A check has the following attributes and options:
name
- A label for the check. Required.description
- A short description for the check. Required.roles
- One or more tags to indicate the role of a machine, delimited by comma. Optional.command
- The system command to run on the machine. The status of a check is determined based on the result of executing this command. Both OS system status ($? == 0) and pattern check of the output are supported. Required.ignore_status
- The system result of executing the check command is ignored and will be considered success always. Usully the verification will be based on the output of a check in such cases. Default: false.heal
- If heal action is specified as one of the module options (heal_state=true), this command will be run to heal a state if the check command fails. If heal command succeeds, the check command runs again to verify if the status improved. Optional.output_compare
- Output of check command is evaluated as below, optionally:-
type
- str or number, indicating string comparison or numeric comparison of the output. -
value
- The reference value using which the output must be compared. -
operator
- The options are GE, EQ, GT, LE and LT. Only EQ is supported for str type, and all operators are supported for number type.
-
An Ansible task that uses the standup module is marked successful if all the checks executed from the input checks file are run successfully.
Using the Standup Module
Once a Checks File is prepared for verifying an environment, it can be invoked from an Ansible role or directly from a playbook as follows:
- name: Run only db and web related checks
standup:
checks_file_path: verify-cluster-checks.yml
roles: web,db
- name: Run all the checks
standup:
checks_file_path: verify-cluster-checks.yml
Note that the checks defined in Checks File can be executed selectively if the tests are tagged with role names.
The only development involved is the creation of one or more Checks File to cover all the test cases needed for the verification. The knowledge of Ansible required is very minimal as the framework supported by the standup module encapsulates the complexitity of running the test commands and checking their outputs.
Installation
Drop the related Python code module, standup.py, under a directory library, where the playbook and related roles are stored. The latest version of the code can be downloaded from here.
Any supported methods available to use a custom Ansible module from a playbook can be used for the installation as well. Please note that standup module is not a standard Ansible module yet; the Python code has to be installed on the control machine from where the playbook is executed.
Documentation
The latest documentation on standup module is available here for reference. If you are skilled in developing Ansible modules, extending this module for your specific requirements is also a possibility, even though that is rarely required.
Integrating With Monitoring Systems
The verification of infrastructure and monitoring runtime systems have some overlaps. In practice, that is rarely addressed and taken advantage of, mainly because the former is usually accomplished using custom scripts or playbooks and the latter is covered by features of standard monitoring systems.
However, off-the-shelf monitoring systems are not very useful in monitoring applications as that task is custom in nature. The integration options provided by the monitoring systems, like the option to check a REST API endpoint or run a SQL query and record the results, are used to extend monitoring systems to support application monitoring.
Similar customizations could be rolled out using the standup module's highly robust but simple method to execute and validate system commands. The tests in a role or playbook can be configured to run REST API calls or SQL queries against an environement and post the outputs to the monitoring system including the status. For the latter part, the options provided by the monitoring system can be used, usually a REST API.
As monitoring is an ongoing activity, the playbook used to post metrics and events has to be run periodically. Scheduling it from a process management tool like Jenkins would take care of that requirement. The related article listed in the references provides details on how to setup an Ansible playbook to run from Jenkins.
Opinions expressed by DZone contributors are their own.
Comments