As I mentioned at the end of yesterday’s TerraForm post, one of the two outstanding tasks with my test setup is to kick-off an initial configuration of the VM.

Azure supports cloud-init for this. What I plan to do is to use cloud-init to launch this environment’s in-band configuration management tool, Ansible in order to configure the VM initially.

The first step is to create a cloud-init configuration. While this can be done in-line in TerraForm, an alternative approach (which I find preferable, as it separates the cloud-init configuration from the TerraForm configuration) is to use the cloud-init provider’s cloudinit_config resource.

First thing to do is to add it to the list of providers:

terraform {
    required_providers {
        #...
        cloudinit = {
            source = "hashicorp/cloudinit"
            version = ">=2.2.0"
        }
    }
}

After adding the new provider, it is necessary to re-run terraform init.

Next, I created a cloud-init configuration file called cloud-init-ansible.yaml:

# Configure proxy server so install packages etc. will work behind proxy
# based on https://bugs.launchpad.net/cloud-init/+bug/1089405/comments/15
bootcmd:
- |
  cloud-init-per once env sh -c "mkdir -p /etc/systemd/system/cloud-config.service.d &&
  mkdir -p /etc/systemd/system/cloud-final.service.d && { cat > /etc/cloud/env <<-EOF
  http_proxy=http://192.168.1.10:7890
  https_proxy=http://192.168.1.10:7890
  no_proxy=localhost,127.0.0.1
  EOF
  } && { cat > /etc/systemd/system/cloud-config.service.d/override.conf <<-EOF
  [Service]
  EnvironmentFile=/etc/cloud/env
  EOF
  } && { cat > /etc/systemd/system/cloud-final.service.d/override.conf <<-EOF
  [Service]
  EnvironmentFile=/etc/cloud/env
  EOF
  } && systemctl daemon-reload"

runcmd:
  # Can't do this with packages as runcmd runs before packages so we cannot use software installed by packages
  - [ yum, -y, install, python3, python-virtualenv, git ]
  - [ mkdir, -p, /opt/ansible/virtualenv, /opt/ansible/configuration ]
  - [ chown, azureuser, /opt/ansible/virtualenv, /opt/ansible/configuration ]
  - [ chgrp, azureuser, /opt/ansible ]
  - [ chmod, "0750", /opt/ansible ]
  - [ su, -c, "git clone ${ansible_source} /opt/ansible/configuration", azureuser ]
  - [ su, -c, virtualenv -ppython3 /opt/ansible/virtualenv, azureuser ]
  - [ su, -c, /opt/ansible/virtualenv/bin/pip install --upgrade pip, azureuser]
  - [ su, -c, /opt/ansible/virtualenv/bin/pip install ansible, azureuser]
  - [ su, -c, /opt/ansible/virtualenv/bin/ansible-galaxy install azure.azure_modules, azureuser]
  - [ su, -c, /opt/ansible/virtualenv/bin/pip install -r $HOME/.ansible/roles/azure.azure_modules/files/requirements-azure.txt, azureuser]
  - [ su, -c, bash -c "export AZURE_SUBSCRIPTION_ID=${azure_subscription_id}; export AZURE_CLIENT_ID=${azure_client_id}; export AZURE_SECRET='${azure_client_secret}'; export AZURE_TENANT=${azure_tenant_id}; /opt/ansible/virtualenv/bin/azure-playbook -i /opt/ansible/configuration/Ansible/inventory/all.azure_rm.yml --limit $HOSTNAME -c local /opt/ansible/configuration/Ansible/site.yml", azureuser] ]

I then created a resource for the cloud-init configuration, in a file called cloud-init.tf:

data "cloudinit_config" "ansible" {
  gzip = true
  base64_encode = true

  part {
    filename = "cloud-init-ansible"
    content_type = "text/cloud-config"
    content = templatefile(
        "cloud-init-ansible.yaml",
        {
            ansible_source = "https://deployuser:deploykey@git.host.domain.tld/group/repo"
            azure_subscription_id = vars.azure_subscription_id
            azure_client_id = vars.azure_client_id
            azure_client_secret = vars.azure_client_secret
            azure_tenant_id = vars.azure_tenant_id
        }
    )
  }
}

And finally added it to my VM’s custom_data:

resource "azurerm_linux_virtual_machine" "vm-1" {
  #...
  custom_data = data.cloudinit_config.ansible.rendered
  #...
}

To test, I ran terraform plan followed by terraform apply.

With hindsight, I think there are better approaches to running Ansible locally on the newly built VM. I come from using SaltStack, in a client-server setup, where I can install the client, connect to the server and trigger a configuration from the client-side. Ansible, in contrast, is designed to be an agent-less tool run from a central system that connects to and then configures a client. Knowing this, downloading a copy of the Ansible configuration to each machine and telling them configure themselves only feels like an anti-pattern. Due to the restrictions on approved software at work, I have little choice over the tools I can use so I need to work with Ansible.

Part of my thinking was that I want the machines to be self-updating, as some are routinely turned off (however I now have ansible playbooks to turn on those that are off and turn off any that were automatically turned on at the end of the run). Reflecting on working through this piece, I think that a better option for the initial configuration would be to use local-exec (with a remote-exec timing trick) to run Ansible on the system Terraform was run on. That creates another challenge, however, as I am currently testing Terraform in the Azure cloud shell but the VMs are provisioned in an private network so there is no way for Ansible to connect to them from there - conversely the private network has no internet access so Terraform cannot communicate with the Azure APIs from there.

Other approaches I considered are be to either write a bash script to run Terraform then launch Ansible to configure the new machine on a central command system (after writing this, I found a very similar approach documented online), or launch Terraform from Ansible then add the newly created VMs to a group before applying the rest of the configuration.

With any of these approaches, I think that the updating will have to done by running Ansible centrally - either automated (e.g. with cron) or manually.