How I Fixed: Failed To Delete Snapshot in Virtualbox

Man alive. I hate stuff like this. Virtualbox is a great piece of software, but it does some whacky things.

Recently I migrated my infrastructure from Virtual Machines to Docker.

I replaced a Digital Ocean VPS with a local Virtualbox VM for running my private GitLab. The primary reason for this is that Docker images take up a chunk of space, and a low tier DO droplet just doesn’t cut it in terms of disk space.

I had a spare 120gb SSD laying around so figured: hey, why not use that and 4x my usable disk space for GitLab? Sounds like a good idea, right?

Actually, it took a lot of effort. But in the end, it worked. I decided to use thin provisioning and make the virtual box image think it had a 2tb disk, when in reality it was sharing the same SSD with another Virtualbox machine that runs my GitLab CI multi-runner instance.

Ok, so the whys-and-wherefores of that set up are for a forthcoming / different post.

What I didn’t expect is for my disk to fill up in less than 2 weeks. I mean, I knew my Docker images took up a chunk of space, but I had purposefully mounted a totally different disk for GitLab backups, and disabled container backups along the way. How could it be that within 2 weeks I had 98% disk utilisation?

Well, it turns out: snapshots.

Or more specifically, one single 90.1gb snapshot:

du -h
84G	./rancher-node-2/Snapshots
88G	./rancher-node-2

What I had done is taken a “base snapshot” just after creating the VM, and then promptly forgotten about said snapshot entirely.

2 weeks later, I log on today and try to hit my GitLab, but got a 503 error:

Fun times.

A bit of digging showed me that both my “rancher-node-2” VM, and the GitLab CI Multi-Runner VM were in a paused state.

A little further digging showed I had 2gb of disk space left. And that’s where I found out about the snapshot.

Ok, so simple solution – delete the snapshot.

Yeah, if only:

So, that’s not enough free disk space to delete a file then? Heh, not quite. Apparently deleting a snapshot also involves merging snapshots, or some such – I didn’t dive into the technicals.

But still, seems daft.

Anyway, the advice I found out there on the ‘net was to have at least as much disk space again in order to do the delete. In other words, if you have a 10gb VM, and a 20gb snapshot, in order to delete the snapshot you’d need a 60gb disk. But of course!

Sadly, I don’t have another spare 240gb disk laying around. I don’t use large disks anymore as I’ve lost two 2tb disks (old spinny stuff, but still) in recent years and the data loss is mildly irritating to put it politely. I stick to smaller disks so if data loss does occur, it isn’t as bad. In theory.

Fortunately, I did have a spare 100gb or so on a different partition. But on the face of it, that doesn’t seem that useful, right?

My Solution

This may seem a little unorthodox but here goes.

To begin with, I tried to simply clone the existing VM. Doing a full clone gives the option to disregard any snapshots.

I moved my second VM off the 120gb disk freeing up about 18gb or so.

I tried to clone, it took a very long time, and then it promptly failed:

Don’t be fooled by that timer, it took a lot longer than that.

Anyway, that didn’t work, so I came up with a more geeky plan.

I moved the snapshot file from my 120gb disk. This freed up a huge amount of space:

df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1       110G  3.4G  101G   4% /mnt/kingston-120-second

Then, I symlinked the snapshot back into place:

➜  Snapshots ln -s "/media/chris/Data/Virtual Machines/{43895f1b-1b8a-4eab-9d47-40627ccca33f}.vdi" ./{43895f1b-1b8a-4eab-9d47-40627ccca33f}.vdi
➜  Snapshots ls -la
total 12
drwx------ 2 chris chris 4096 Apr 30 20:37 .
drwxrwxr-x 4 chris chris 4096 Apr 30 20:10 ..
lrwxrwxrwx 1 chris chris   77 Apr 30 20:37 {43895f1b-1b8a-4eab-9d47-40627ccca33f}.vdi -> /media/chris/Data/Virtual Machines/{43895f1b-1b8a-4eab-9d47-40627ccca33f}.vdi

Symlinks seem scary. Here’s how I remember the syntax:

It’s just like the copy command.

ln {path to source} {path to become my symlink}

# just like 'cp'

cp {path to copy from} {path to new file}

I tried to clone the VM at this point, but this again failed with an out of disk space error.

Instead, I then tried to delete the snapshot.

This consumed nearly all the disk space, but finally worked. Hoorah, right? Not quite.

There was still a downside. My .vdi file was now at 97.3gb. I could boot the VM and see that inside the VM I was only using 46gb. Hmm.

What I had to do was to somehow shrink the disk back down to as close to 46gb as I could. This was a little involved, and took a while.

I did the following:

chris@rancher-node-2:~$ sudo dd if=/dev/zero | pv | sudo dd of=/bigemptyfile bs=4096k

dd: error writing '/bigemptyfile': No space left on device                                                         ]
2017+63027194 records in
2017+63027193 records out
2103230164992 bytes (2.1 TB, 1.9 TiB) copied, 5693.08 s, 369 MB/s
1.91TiB 1:34:53 [ 352MiB/s] [           <=>                                                                        ]

chris@rancher-node-2:~$ Connection to 192.168.0.37 closed by remote host.
Connection to 192.168.0.37 closed.

I can’t say this is my own solution – I found it on StackOverflow 🙂

As you can see, this command ran until it failed. It never consumed any disk space on my physical hard disk – which is nice, as as I say, I thin provisioned this disk so that wouldn’t have worked out so well.

Still, once this process failed, I wasn’t done.

I then ran:

vboxmanage modifyhd rancher-node-2/rancher-node-2.vdi --compact
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%

This took about 10 minutes, but after finishing I was down to a 56gb .vdi file. Good enough.

Finally, remember to delete the bigemptyfile :

rm /bigemptyfile

Published by

Code Review

CodeReviewVideos is a video training site helping software developers learn Symfony faster and easier.

5 thoughts on “How I Fixed: Failed To Delete Snapshot in Virtualbox”

  1. I also had the problem. My dynamic vdi had suddenly grown until the hosts hdd was completely full. Neither I could remove snapshots nor I could do anything else. My vdi internally just hat around 80GB but all in all the VM used around 400GB. Unfortunately I had no external HDD big enough to get around it, only 500GB ones. Also the method with ln -s did not work. (Maybe vbox was too old?)
    After hours of moving files around, I ended up with the following solution:
    I mounted another disk
    mount /media/anotherdisk /dev/sdX
    moved the vm there
    rsync -aviP –remove-source-files /vbox/hugevm /media/anotherdisk/
    then removed and created the original dir (double check before!)
    rm -rf /vbox/hugevm && mkdir /vbox hugevm
    then I mounted the dir there
    mount –bind /media/anotherdisk/hugevm /vbox/hugevm
    At this point the VM was functional again, so I then cloned the actual state (to the original HDD).
    Additionally I used your bigemptyfile method and compacted the hdd.
    At the end I successfully ended with a usage of around 80GB space (and a full backup on my external Drive)
    Thanks!

    1. Oof, sounds like a fun way to spend a few hours.

      Honestly I hate situations like this – all I want to do is get on with the tasks at hand, not spend precious hours of my life fighting computer problems.

      Really glad you got this sorted though, and thanks for sharing your approach.

  2. i had to used a symlink instead of a hardlink (so ln -s etc etc), but everything else worked, as you described 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.