How I Fixed: Failed To Delete Snapshot in Virtualbox

Man alive. I hate stuff like this. Virtualbox is a great piece of software, but it does some whacky things.

Recently I migrated my infrastructure from Virtual Machines to Docker.

I replaced a Digital Ocean VPS with a local Virtualbox VM for running my private GitLab. The primary reason for this is that Docker images take up a chunk of space, and a low tier DO droplet just doesn’t cut it in terms of disk space.

I had a spare 120gb SSD laying around so figured: hey, why not use that and 4x my usable disk space for GitLab? Sounds like a good idea, right?

Actually, it took a lot of effort. But in the end, it worked. I decided to use thin provisioning and make the virtual box image think it had a 2tb disk, when in reality it was sharing the same SSD with another Virtualbox machine that runs my GitLab CI multi-runner instance.

Ok, so the whys-and-wherefores of that set up are for a forthcoming / different post.

What I didn’t expect is for my disk to fill up in less than 2 weeks. I mean, I knew my Docker images took up a chunk of space, but I had purposefully mounted a totally different disk for GitLab backups, and disabled container backups along the way. How could it be that within 2 weeks I had 98% disk utilisation?

Well, it turns out: snapshots.

Or more specifically, one single 90.1gb snapshot:

What I had done is taken a “base snapshot” just after creating the VM, and then promptly forgotten about said snapshot entirely.

2 weeks later, I log on today and try to hit my GitLab, but got a 503 error:

Fun times.

A bit of digging showed me that both my “rancher-node-2” VM, and the GitLab CI Multi-Runner VM were in a paused state.

A little further digging showed I had 2gb of disk space left. And that’s where I found out about the snapshot.

Ok, so simple solution – delete the snapshot.

Yeah, if only:

So, that’s not enough free disk space to delete a file then? Heh, not quite. Apparently deleting a snapshot also involves merging snapshots, or some such – I didn’t dive into the technicals.

But still, seems daft.

Anyway, the advice I found out there on the ‘net was to have at least as much disk space again in order to do the delete. In other words, if you have a 10gb VM, and a 20gb snapshot, in order to delete the snapshot you’d need a 60gb disk. But of course!

Sadly, I don’t have another spare 240gb disk laying around. I don’t use large disks anymore as I’ve lost two 2tb disks (old spinny stuff, but still) in recent years and the data loss is mildly irritating to put it politely. I stick to smaller disks so if data loss does occur, it isn’t as bad. In theory.

Fortunately, I did have a spare 100gb or so on a different partition. But on the face of it, that doesn’t seem that useful, right?

My Solution

This may seem a little unorthodox but here goes.

To begin with, I tried to simply clone the existing VM. Doing a full clone gives the option to disregard any snapshots.

I moved my second VM off the 120gb disk freeing up about 18gb or so.

I tried to clone, it took a very long time, and then it promptly failed:

Don’t be fooled by that timer, it took a lot longer than that.

Anyway, that didn’t work, so I came up with a more geeky plan.

I moved the snapshot file from my 120gb disk. This freed up a huge amount of space:

Then, I symlinked the snapshot back into place:

Symlinks seem scary. Here’s how I remember the syntax:

It’s just like the copy command.

I tried to clone the VM at this point, but this again failed with an out of disk space error.

Instead, I then tried to delete the snapshot.

This consumed nearly all the disk space, but finally worked. Hoorah, right? Not quite.

There was still a downside. My .vdi file was now at 97.3gb. I could boot the VM and see that inside the VM I was only using 46gb. Hmm.

What I had to do was to somehow shrink the disk back down to as close to 46gb as I could. This was a little involved, and took a while.

I did the following:

I can’t say this is my own solution – I found it on StackOverflow 🙂

As you can see, this command ran until it failed. It never consumed any disk space on my physical hard disk – which is nice, as as I say, I thin provisioned this disk so that wouldn’t have worked out so well.

Still, once this process failed, I wasn’t done.

I then ran:

This took about 10 minutes, but after finishing I was down to a 56gb .vdi file. Good enough.

Finally, remember to delete the  bigemptyfile :

Shares

Published by

Code Review

Code Review

CodeReviewVideos is a video training site helping software developers learn Symfony faster and easier.

Leave a Reply

Your email address will not be published. Required fields are marked *