A Tweak For ELK On Docker

One annoyance I’ve hit whilst running ELK on Docker is, after rebooting my system, the same error keeps returning:

 * Starting periodic command scheduler cron
   ...done.
 * Starting Elasticsearch Server
   ...done.
waiting for Elasticsearch to be up (1/30)
waiting for Elasticsearch to be up (2/30)
waiting for Elasticsearch to be up (3/30)
waiting for Elasticsearch to be up (4/30)
waiting for Elasticsearch to be up (5/30)
waiting for Elasticsearch to be up (6/30)
waiting for Elasticsearch to be up (7/30)
waiting for Elasticsearch to be up (8/30)
waiting for Elasticsearch to be up (9/30)
waiting for Elasticsearch to be up (10/30)
waiting for Elasticsearch to be up (11/30)
waiting for Elasticsearch to be up (12/30)
waiting for Elasticsearch to be up (13/30)
waiting for Elasticsearch to be up (14/30)
waiting for Elasticsearch to be up (15/30)
waiting for Elasticsearch to be up (16/30)
waiting for Elasticsearch to be up (17/30)
waiting for Elasticsearch to be up (18/30)
waiting for Elasticsearch to be up (19/30)
waiting for Elasticsearch to be up (20/30)
waiting for Elasticsearch to be up (21/30)
waiting for Elasticsearch to be up (22/30)
waiting for Elasticsearch to be up (23/30)
waiting for Elasticsearch to be up (24/30)
waiting for Elasticsearch to be up (25/30)
waiting for Elasticsearch to be up (26/30)
waiting for Elasticsearch to be up (27/30)
waiting for Elasticsearch to be up (28/30)
waiting for Elasticsearch to be up (29/30)
waiting for Elasticsearch to be up (30/30)
Couln't start Elasticsearch. Exiting.
Elasticsearch log follows below.
[2017-07-14T08:36:42,337][INFO ][o.e.n.Node               ] [] initializing ...
[2017-07-14T08:36:42,437][INFO ][o.e.e.NodeEnvironment    ] [71cahpZ] using [1] data paths, mounts [[/var/lib/elasticsearch (/dev/sde2)]], net usable_space [51.1gb], net total_space [146.6gb], spins? [possibly], types [ext4]
[2017-07-14T08:36:42,438][INFO ][o.e.e.NodeEnvironment    ] [71cahpZ] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-07-14T08:36:42,463][INFO ][o.e.n.Node               ] node name [71cahpZ] derived from node ID [71cahpZ4SjeKKAoH8X5dYg]; set [node.name] to override
[2017-07-14T08:36:42,463][INFO ][o.e.n.Node               ] version[5.3.0], pid[64], build[3adb13b/2017-03-23T03:31:50.652Z], OS[Linux/4.4.0-45-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_121/25.121-b13]
[2017-07-14T08:36:43,910][INFO ][o.e.p.PluginsService     ] [71cahpZ] loaded module [aggs-matrix-stats]
[2017-07-14T08:36:43,911][INFO ][o.e.p.PluginsService     ] [71cahpZ] loaded module [ingest-common]
[2017-07-14T08:36:43,911][INFO ][o.e.p.PluginsService     ] [71cahpZ] loaded module [lang-expression]
[2017-07-14T08:36:43,911][INFO ][o.e.p.PluginsService     ] [71cahpZ] loaded module [lang-groovy]
[2017-07-14T08:36:43,911][INFO ][o.e.p.PluginsService     ] [71cahpZ] loaded module [lang-mustache]
[2017-07-14T08:36:43,911][INFO ][o.e.p.PluginsService     ] [71cahpZ] loaded module [lang-painless]
[2017-07-14T08:36:43,911][INFO ][o.e.p.PluginsService     ] [71cahpZ] loaded module [percolator]
[2017-07-14T08:36:43,911][INFO ][o.e.p.PluginsService     ] [71cahpZ] loaded module [reindex]
[2017-07-14T08:36:43,911][INFO ][o.e.p.PluginsService     ] [71cahpZ] loaded module [transport-netty3]
[2017-07-14T08:36:43,911][INFO ][o.e.p.PluginsService     ] [71cahpZ] loaded module [transport-netty4]
[2017-07-14T08:36:43,911][INFO ][o.e.p.PluginsService     ] [71cahpZ] no plugins loaded
[2017-07-14T08:36:45,703][INFO ][o.e.n.Node               ] initialized
[2017-07-14T08:36:45,703][INFO ][o.e.n.Node               ] [71cahpZ] starting ...
[2017-07-14T08:36:45,783][WARN ][i.n.u.i.MacAddressUtil   ] Failed to find a usable hardware address from the network interfaces; using random bytes: 58:01:4e:51:11:f3:c9:da
[2017-07-14T08:36:45,835][INFO ][o.e.t.TransportService   ] [71cahpZ] publish_address {172.20.0.3:9300}, bound_addresses {0.0.0.0:9300}
[2017-07-14T08:36:45,839][INFO ][o.e.b.BootstrapChecks    ] [71cahpZ] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-07-14T08:36:45,840][ERROR][o.e.b.Bootstrap          ] [71cahpZ] node validation exception
bootstrap checks failed
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2017-07-14T08:36:45,842][INFO ][o.e.n.Node               ] [71cahpZ] stopping ...
[2017-07-14T08:36:45,909][INFO ][o.e.n.Node               ] [71cahpZ] stopped
[2017-07-14T08:36:45,909][INFO ][o.e.n.Node               ] [71cahpZ] closing ...
[2017-07-14T08:36:45,914][INFO ][o.e.n.Node               ] [71cahpZ] closed

The core of the fix to this problem is helpfully included in the output:

max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

And on Ubuntu fixing this is a one-liner:

sudo sysctl -w vm.max_map_count=262144

Only, this is not persisted across a reboot.

To fix this permenantly, I needed to do:

sudo vim /etc/sysctl.d/60-elasticsearch.conf

And add in the following line:

vm.max_map_count=262144

I am not claiming credit for this fix. I found it here. I’m just sharing here because I know in future I will need to do this again, and it’s easiest if I know where to start looking 🙂

Another important step if persisting data is to ensure the folder is owned by 991:991 .

Centrifugo Docker Compose Config

    centrifugo:
        image: centrifugo/centrifugo:1.7.3
        environment:
          - CENTRIFUGO_SECRET=potato
          - CENTRIFUGO_ADMIN_PASSWORD=potato
          - CENTRIFUGO_ADMIN_SECRET=potato
        command: centrifugo --web
        ports:
          - "8569:8000"
        networks:
          crv_network:
            aliases:
              - crv_centrifugo

Admittedly not likely useful by many, but hopefully will save someone some time in the future.

This is just stuff I found throughout various GitHub tickets that’s good enough to get Centrifugo up and running on http://127.0.0.1:8569/

How I Fixed: \Swift_Message::newInstance() not found

I had a recent requirement to overwrite the Mailer  class provided as part of FOSUserBundle.

There’s a protected method in this class as follows:

    /**
     * @param string       $renderedTemplate
     * @param array|string $fromEmail
     * @param array|string $toEmail
     */
    protected function sendEmailMessage($renderedTemplate, $fromEmail, $toEmail)
    {
        // Render the email, use the first line as the subject, and the rest as the body
        $renderedLines = explode("\n", trim($renderedTemplate));
        $subject = array_shift($renderedLines);
        $body = implode("\n", $renderedLines);

        $message = \Swift_Message::newInstance()
            ->setSubject($subject)
            ->setFrom($fromEmail)
            ->setTo($toEmail)
            ->setBody($body);

        $this->mailer->send($message);
    }

Seems fairly straightforward.

Notice that PhpStorm sees nothing wrong with this method:

Being the lazy dev, I started by copy / pasting the entire contents of this class to form the basis of my own Mailer  implementation. Cue confusion:

I’ve used this \Swift_Message::newInstance()  code before, so I know it works. A quick check of the Symfony docs, and the SwiftMailer docs both seemed to confirm that what I was trying to do was correct:

(I checked the docs for Symfony 3.2, 3.3, and 3.4)

I thought it was just PhpStorm being a bit weird, but then I ran my code and saw things like this:

Attempted to call an undefined method named “newInstance” of class “Swift_Message”.

Diving through the code did indeed seem to show no references to newInstance .

Quite odd.

Anyway, sending an email – whilst important – wasn’t super urgent, so I commented out the code and added it to my GitLab issues list.

Whilst browsing Twitter later that evening I noticed:

I then remembered that being hasty to try new and shiny things was probably the cause of my problems.

Sure enough it turned out I’m using dev-master of SwiftMailer in my project. A quick glance of the changelog:

6.0.0 (2017-05-19)
------------------

 * added Swift_Transport::ping()
 * removed Swift_Mime_HeaderFactory, Swift_Mime_HeaderSet, Swift_Mime_Message, Swift_Mime_MimeEntity,
   and Swift_Mime_ParameterizedHeader interfaces
 * removed Swift_MailTransport and Swift_Transport_MailTransport
 * removed Swift_Encoding
 * removed the Swift_Transport_MailInvoker interface and Swift_Transport_SimpleMailInvoker class
 * removed the Swift_SignedMessage class
 * removed newInstance() methods everywhere
 * methods operating on Date header now use DateTimeImmutable object instead of Unix timestamp;
   Swift_Mime_Headers_DateHeader::getTimestamp()/setTimestamp() renamed to getDateTime()/setDateTime()
 * bumped minimum version to PHP 7.0

This looks like the culprit:

removed newInstance() methods everywhere

Fixing this is really simple:

$message = \Swift_Message::newInstance()
    ->setSubject('My important message subject')
    ->setFrom($this->supportEmail)
    ->setTo($user->getEmailCanonical())
    ->setBody($body, 'text/html')
;

becomes:

$message = (new \Swift_Message('My important subject here'))
    ->setFrom($this->mailingFromAddress, $this->mailingFromName)
    ->setTo($user->getEmailCanonical())
    ->setBody($body, 'text/html')
;

And as is often the case, when the provided objects / methods are used properly, things do work 🙂

Update – There is already an open PR to fix this in the docs: https://github.com/swiftmailer/swiftmailer/issues/925

How I Fixed: Failed To Delete Snapshot in Virtualbox

Man alive. I hate stuff like this. Virtualbox is a great piece of software, but it does some whacky things.

Recently I migrated my infrastructure from Virtual Machines to Docker.

I replaced a Digital Ocean VPS with a local Virtualbox VM for running my private GitLab. The primary reason for this is that Docker images take up a chunk of space, and a low tier DO droplet just doesn’t cut it in terms of disk space.

I had a spare 120gb SSD laying around so figured: hey, why not use that and 4x my usable disk space for GitLab? Sounds like a good idea, right?

Actually, it took a lot of effort. But in the end, it worked. I decided to use thin provisioning and make the virtual box image think it had a 2tb disk, when in reality it was sharing the same SSD with another Virtualbox machine that runs my GitLab CI multi-runner instance.

Ok, so the whys-and-wherefores of that set up are for a forthcoming / different post.

What I didn’t expect is for my disk to fill up in less than 2 weeks. I mean, I knew my Docker images took up a chunk of space, but I had purposefully mounted a totally different disk for GitLab backups, and disabled container backups along the way. How could it be that within 2 weeks I had 98% disk utilisation?

Well, it turns out: snapshots.

Or more specifically, one single 90.1gb snapshot:

du -h
84G	./rancher-node-2/Snapshots
88G	./rancher-node-2

What I had done is taken a “base snapshot” just after creating the VM, and then promptly forgotten about said snapshot entirely.

2 weeks later, I log on today and try to hit my GitLab, but got a 503 error:

Fun times.

A bit of digging showed me that both my “rancher-node-2” VM, and the GitLab CI Multi-Runner VM were in a paused state.

A little further digging showed I had 2gb of disk space left. And that’s where I found out about the snapshot.

Ok, so simple solution – delete the snapshot.

Yeah, if only:

So, that’s not enough free disk space to delete a file then? Heh, not quite. Apparently deleting a snapshot also involves merging snapshots, or some such – I didn’t dive into the technicals.

But still, seems daft.

Anyway, the advice I found out there on the ‘net was to have at least as much disk space again in order to do the delete. In other words, if you have a 10gb VM, and a 20gb snapshot, in order to delete the snapshot you’d need a 60gb disk. But of course!

Sadly, I don’t have another spare 240gb disk laying around. I don’t use large disks anymore as I’ve lost two 2tb disks (old spinny stuff, but still) in recent years and the data loss is mildly irritating to put it politely. I stick to smaller disks so if data loss does occur, it isn’t as bad. In theory.

Fortunately, I did have a spare 100gb or so on a different partition. But on the face of it, that doesn’t seem that useful, right?

My Solution

This may seem a little unorthodox but here goes.

To begin with, I tried to simply clone the existing VM. Doing a full clone gives the option to disregard any snapshots.

I moved my second VM off the 120gb disk freeing up about 18gb or so.

I tried to clone, it took a very long time, and then it promptly failed:

Don’t be fooled by that timer, it took a lot longer than that.

Anyway, that didn’t work, so I came up with a more geeky plan.

I moved the snapshot file from my 120gb disk. This freed up a huge amount of space:

df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1       110G  3.4G  101G   4% /mnt/kingston-120-second

Then, I symlinked the snapshot back into place:

➜  Snapshots ln -s "/media/chris/Data/Virtual Machines/{43895f1b-1b8a-4eab-9d47-40627ccca33f}.vdi" ./{43895f1b-1b8a-4eab-9d47-40627ccca33f}.vdi
➜  Snapshots ls -la
total 12
drwx------ 2 chris chris 4096 Apr 30 20:37 .
drwxrwxr-x 4 chris chris 4096 Apr 30 20:10 ..
lrwxrwxrwx 1 chris chris   77 Apr 30 20:37 {43895f1b-1b8a-4eab-9d47-40627ccca33f}.vdi -> /media/chris/Data/Virtual Machines/{43895f1b-1b8a-4eab-9d47-40627ccca33f}.vdi

Symlinks seem scary. Here’s how I remember the syntax:

It’s just like the copy command.

ln {path to source} {path to become my symlink}

# just like 'cp'

cp {path to copy from} {path to new file}

I tried to clone the VM at this point, but this again failed with an out of disk space error.

Instead, I then tried to delete the snapshot.

This consumed nearly all the disk space, but finally worked. Hoorah, right? Not quite.

There was still a downside. My .vdi file was now at 97.3gb. I could boot the VM and see that inside the VM I was only using 46gb. Hmm.

What I had to do was to somehow shrink the disk back down to as close to 46gb as I could. This was a little involved, and took a while.

I did the following:

chris@rancher-node-2:~$ sudo dd if=/dev/zero | pv | sudo dd of=/bigemptyfile bs=4096k

dd: error writing '/bigemptyfile': No space left on device                                                         ]
2017+63027194 records in
2017+63027193 records out
2103230164992 bytes (2.1 TB, 1.9 TiB) copied, 5693.08 s, 369 MB/s
1.91TiB 1:34:53 [ 352MiB/s] [           <=>                                                                        ]

chris@rancher-node-2:~$ Connection to 192.168.0.37 closed by remote host.
Connection to 192.168.0.37 closed.

I can’t say this is my own solution – I found it on StackOverflow 🙂

As you can see, this command ran until it failed. It never consumed any disk space on my physical hard disk – which is nice, as as I say, I thin provisioned this disk so that wouldn’t have worked out so well.

Still, once this process failed, I wasn’t done.

I then ran:

vboxmanage modifyhd rancher-node-2/rancher-node-2.vdi --compact
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%

This took about 10 minutes, but after finishing I was down to a 56gb .vdi file. Good enough.

Finally, remember to delete the bigemptyfile :

rm /bigemptyfile