Getting started with Ansible and Azure
This post stated, like most of my posts, with me making notes as I went along trying to accomplish a technical task. I quickly became a bit of a rant as I encountered numerous issues with Ansible’s Azure integration.
Beginning with a moan
When I started this post, I thought I had found a minor problem that I would resolve (or find a workaround for) and carry on. It was not to be, and in the end (as you will see, if you continue reading from the next heading) I adopted a completely different approach.
The first thing I tried to do was to deploy a VM - this is a regular task, as I create a new VM for every user (so they have their own sandboxed workstation, amongst other reasons). Automating this process better, particularly to reduce duplication between the deployment and configuration scripts, will significantly reduce the maintenance overhead (and free up my time).
Almost immediately, I hit a slight snag with the azure.azcollection.azure_rm_networkinterface module. In order to create the VM, I first need to create a NIC for it with the IP configuration I desired. Initially I thought this may just be unclear documentation, rather than a bug. The documentation for the option create_with_security_group
says:
Whether a security group should be be created with the NIC. If this flag set to
True
and nosecurity_group
set, a default security group will be created.
To me, this reads that if set to True
a new security group will be created. This is not the behaviour I wanted, as I wanted to associate an existing security group. The documentation for the option security_group
says:
An existing security group with which to associate the network interface. If not provided, a default security group will be created when
create_with_security_group=true
. It can be the name of security group. Make sure the security group is in the same resource group when you only give its name. It can be the resource id. It can be a dict contains security_group’sname
andresource_group
.
This seems straightforward enough - if create_with_security_group
is enabled then a new security group will be created (“Whether a security group should be be created”(sic)) and security_group
on its own would just associate an existing security group. However this is not what happens - unless create_with_security_group
is set to True
then the interface’s network security group is set to None. That’s right, the existing group is removed if there is one. No matter what I set for security_group
, with create_with_security_group
set to False
the NSG on the interface is removed.
I suspect the documentation for create_with_security_group
should read:
Whether a security group should be associated with the NIC If this flag is set to
False
the NIC will have any associated security group removed. IF this flag is set toTrue
and nosecurity_group
set, a default security group will be created.
This seems to be an old issue from before the collection was spun out of ansible’s bundled modules.
However, once I had resolved that, with create_with_security_group
set to True
and security_group
correctly defined, the module started throwing an exception saying Error creating or updating network interface nicName - Parameter 'FlowLog.target_resource_id' can not be None.
that I was unable to resolve (I was testing this with an existing VM and the configuration as it was currently so there should have been no change). I opened an issue for this and started thinking about another approach to enable me to get on with my work. (Update on the ticket: developers originally responded “works for me” however another user said they have the same problem in their environment and there are now duplicate issue reports for it.)
I have also recurrently encountered problems with conflicting python module versions of Microsoft’s modules between the Ansible collection and the azure-cli tool.
Another approach (“plan B”)
Having failed to get the Ansible “native” Azure modules working properly, I thought up with another plan - use ansible.builtin.cmd
to do the operations via the az
azure-cli command. This is less than satisfactory for a number of reasons:
- It is not idempotent - even for commands (such as
deployment
with a template which matches the existing environment)az
will effectively replace existing resources - The tasks involved will always show
changed
even if no changes are actually made (Ansible cannot tell if anything changed, so assumes all commands have side-effects and I used temporary files which are always created and deleted, resulting in changes) - It seems like a hack to use
cmd
instead of Ansible native modules (not least because of the above properties of doing so)
Sanity prevails (“plan C”)
In the end, I decided to separate the deployment (infrastructure-as-code) from configuration (configuration management/desired state configuration). Conceptually this has some advantages, logically separating the in-band OS configuration from the out-of-band “bare metal” (if we can call it that in the cloud/vm world) creation. However, it also means that the infrastructure is not maintained in the same way and while the configuration management tools should be used regularly and thus prevent configuration drift, it introduces a higher risk of the infrastructure drifting from the defined (and, presumably, desired) state without good discipline or robust CI/CD pipelines to ensure everything remains consistent with the definition.