Setting up Ceph on Proxmox

Following from setting up my first vm and turning it into a Proxmox cluster I wanted to setup Ceph as a decentralised and shared storage infrastructure which should allow more seamless migration of VMs between hosts.

Creating some storage

I had already setup some bits of Ceph when I installed Proxmox on my first system and then added 2 more nodes when I turned it into a Proxmox cluster. Ceph requires dedicated storage and on install Proxmox has used all of the disk, although it allocated the bulk (around 220GB) for VM storage.

This process has to be done on each node.

I started by deleting the “lvm-thin” storage through the UI on the nodes not running my Domain Controller VM (as the VM is stored there but the space is unused on the others). I then manually created a logical volume for Ceph’s storage (you can get the number of free extents for -l with vgdisplay pve):

lvcreate -l 56150 -n ceph-osd pve

Add the storage to Ceph

Ceph has native support for lvm volumes however the Proxmox UI only exposes physical disks, and pveceph also refused to work with LVM volumes, so adding them to Ceph has to be done manually using ceph’s tools:

Clear any previous data on the volume (as per the Proxmox documentation):

ceph-volume lvm zap pve/ceph-osd

I ran into problems with the keyrings being missing (I presume pveceph would have created them if it had worked) so they had to be manually created (using advice from someone else who encountered the same issue). On one of my nodes only one of the files was missing:

ceph auth get client.bootstrap-osd > /etc/pve/priv/ceph.client.bootstrap-osd.keyring
ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring

Then add the volume to the Ceph cluster:

ceph-volume lvm create --data pve/ceph-osd

Adding the 3rd node

Once this was all done, I added an “RBD” storage type called “ceph-rbd” to Proxmox using the Ceph “data_pool” as the back end. I selected both “Disk image” and “Container” for the content.

I then triggered a migration of the VM from the node it was on to one of the others, moving the storage to the ceph-rbd storage in the process. Once this was complete, I simply repeated the above instructions on the 3rd node.

It took about 30minutes for Ceph to rebalance the data across the nodes (whilst migrating VMs between nodes, as I was having a play with that at the same time). Once complete, live-migrating VMs took as long as it took to snapshot and copy the VM’s memory - not quick over a 1G ethernet network but a lot faster than copying the VM’s disk as well!