How I Fixed: docker: invalid reference format With Makefile

I’m a big fan of makeshift Makefiles. Largely because I don’t seem to have the brain capacity to remember long winded commands. Or maybe, I just don’t like typing out long winded commands. One of the two reasons, for sure.

In using a Makefile to set up a local DNS server to help resolve an issue with multiple different related, but separated Docker projects being able to talk to one another, I ended up coming across this Stack Overflow post which was very helpful.

The solution from the post calls for running a Docker command to spin up a local DNS server, which then allows all the disparate services to talk to one another via hostname, rather than IP address.

I dutifully created a new Makefile entry:

start_dev_dns:
	@docker run \
		--rm \ 
		--hostname=dns.mageddo \
		-p 5380:5380 \
		-v /var/run/docker.sock:/var/run/docker.sock \
		-v /etc/resolv.conf:/etc/resolv.conf \
		defreitas/dns-proxy-server

It’s hard to spot the fault here. That command looks right to me.

I copy / pasted the command direct from the Stack Overflow post, added in the --rm flag to ensure the resulting container gets removed when I kill it, and then tried to run the command:

 ➜ myproject.whatever make start_dev_dns
docker: invalid reference format.
See 'docker run --help'.
make: *** [Makefile:8: start_dev_dns] Error 125

Boo.

Anyway, after a bit of head scratching, it turns out the issue was an extra space after the line continuation slash.

Really hard to display this one on a blog post. But essentially if you see this and you’re in a similar situation, check you don’t have any rogue spaces after the \‘s at the end of each line.

Docker Ate My Harddisk – No space left on device

I’ve recently been forced to migrate from Rancher v1.x to using docker-compose to manage my production Docker containers.

In many ways its actually a blessing in disguise. Rancher was a nice GUI, but under the hood it was a total black box. Also, fairly recently they migrated to v2, and no true transition path was provided – other than “LOL, reinstall?”

Anyway, one thing that Rancher was doing for me – at least, I think – was making sure the log files didn’t eat up all my hard disk space.

This one just completely caught me by surprise, as I am yet to get my monitoring setup back up and running on this particular box:

➜ ssh chris@1.22.33.44 # obv not real

Welcome to chris server
Documentation: https://help.ubuntu.com
Management: https://landscape.canonical.com
Support: https://ubuntu.com/advantage
System information as of Wed May 13 12:28:24 CEST 2020
System load: 3.37

Usage of /: 100.0% of 1.77TB

Memory usage: 13%
Swap usage: 0%
Processes: 321
Users logged in: 0

=> / is using 100.0% of 1.77TB

Yowser.

There’s a pretty handy command to drill down into exactly what is eating up all your disk space – this isn’t specific to Docker either:

chris@chris-server / # du -h --max-depth=1
16G ./export
607M ./lib
60M ./boot
13M ./bin
5.1M ./etc
48K ./tmp
180K ./root
8.0K ./media
2.3T ./var
4.0K ./mnt
0 ./sys
4.0K ./srv
28K ./home
0 ./dev
59G ./docker
1.2G ./usr
13M ./sbin
0 ./proc
8.0K ./snap
4.0K ./lib64
316M ./run
16K ./lost+found
16K ./opt
2.4T .

The culprit here being /var with its 2.3T used… of 1.8T file system? Yeah.. idk.

Anyway you can keep drilling down with the disk usage command until you isolate the culprit. But as this is Docker related, I’ll save you the bother:

chris@chris-server /var/lib/docker/containers # du -h --max-depth=1
270G ./b3ec5a04079d5b3060d5575e011dd5e482950a794d52b4470659823dc5d5a6be
1.9M ./4b1eff96be524bdb20e87448520a30329a5e42bb36f9ba523f458d87cb8b5dc7
44K ./fc7552b499aca543a9bc6d8ef223a09d1ed21c4d054aa150b29c1d4023c151d9
40K ./43ef97d1a135f7404ca4a0330fb0ae6310ef87316358324a05bf6a65cc8d06b0
64K ./a0155805cc59d65b5815352f97939f12a4e87e30662eafa253ae8395eb553e0c
44K ./b4fd3f35324278dc832a5e98fe5a3e5655fecaf61d4de8c5e8080f8af63c22fe
40K ./b37593cd946407872e71c57af6d30b1c85c30fc42a8e075aad542fa820fbdc97
192K ./cb1e3eac9d1c1d997471f915d42fd0f999454fbee772ddc451f28e60ff1a4d22
281G ./3dd2886e627555cd6db45f68e0d1d9e520a6ab4ad443a2e43c70498b51b81fc4
1.9M ./552a0d7a7dd13afa65050381104e86395b2510c6437f650782260664ba195deb
40K ./111da287b97d0eff6b710cd7a9f0d80b7febd03c4ee30db8e2c0519e9a506c8a
40K ./47112f1fd4b0399c506fc1318ba2c0d03502b3ad731169f43a317a09556b2a6d
40K ./afeb8b8c2fbc918caaf199ae0aaca78f68e91412f9a7bc389963c6c15d6e2832
1.9M ./cb3c71ddb7181f9bbd3803cc644f36c05eedd922d4529c62354c68dedfc3bc02
624K ./c7ba5a354cdb2ab6f31fe443c04b72fad2a2e8107bc42a875422c4d80a164144
212K ./5d2b649bdb9eca4f0e14304e4ffad22b7d50d5c099a72072862a1135488105c6
56K ./781635651e0890695e1002639b5167cebf56e78fdeed3e839fc91c0681d9edd2
28M ./65aa3252acc8cf7a7af9301a0df9c117670c602f08bdb0f256eb37382a3eb859
68K ./8c8bf1febde03cf957cae5b695baf70427f6e2dec53d07d3a495ecbf2c9a3ac0
44K ./990279783af4cfacaa2e71c18eba0e6870d472dfd11e261e40a512779783fef5
40K ./37dfd1291842da4debff4c07c58d60b4efcd9e060ebfe29f1508bf418c284ca8
304K ./34c7bfcf01a6ca8e830e6108056eb0df70d4fe88eda699f3e3aa067c1afca2da
40K ./c7be4d913f95ffc040d72374bb77ad37a31846a80f40404c8b82bd5ec725d99f
1.9M ./af173be6ea5d961be308596a28e8039a20e38bc0facf623a65ee0894f28920e2
44K ./3cb4c8b263ca4c802b6df699510374929df50784b25ab8e63802ee8f2054ffc9
328K ./574599803c9143e2f87cabcc211fa3ee47f28f1e765298141f081fa1e457ced6
281G ./6bfcad1f93a7fffa8f0e2b852a401199faf628f5ed7054ad01606f38c24fc568
108K ./e5656136a35ffb242590abf50c203f5478e09bf954a6ca682cb391c11328d251
1.1M ./0150dafffbdbb830dc6ab158913eb8bd4003bf07e280579421aaeb29a7ac5623
271G ./0b5947f687fa47a70b71869468d851e4aeb3857a599a2d7eba8cf58f3b8d6bda
40K ./d014ae83b6314bcad0a159ae3cfad03a8589d4631e7c66a68ad55f1ad722f2fe
52K ./a85b7ab40a097b1548a8dea2826277742815a5e9b0c3c803b065504a2c526bb4
588K ./b5e0a0a4900e1c6fd066f5ae8d534bb832708f0caf917839c578d7debaea3783
52K ./53dd57dcbbca5248456423ebb4f0499b5d15eb5f4be7c0821979a7c880dbaa89
1.1T .

Son of a diddly.

Basically, this wasn’t caused by Docker directly. This was caused by my shonky migration.

The underlying issue here is that some of the Docker containers I run are Workers. They are little Node apps that connect to RabbitMQ, pull a job down, and do something with it.

When the brown stuff hits the twirly thing, they log out a bit of info to help me figure out what went wrong. Fairly standard stuff, I admit.

However, in this new setup, there was no limit to what was getting logged. I guess previously Rancher had enforced some max filesize limits or was helpfully rotating logs periodically.

In this case, the first port of call was to truncate a log. This might not actually be safe, but seeing as it’s my server and it’s not mission critical, I just truncated one of the huge logs:

/var/lib/docker/containers/6bfcad1f93a7fffa8f0e2b852a401199faf628f5ed7054ad01606f38c24fc568 # ls -la
total 294559628
drwx------ 4 root root 4096 May 9 10:26 .
drwx------ 36 root root 12288 May 9 16:50 ..
-rw-r----- 1 root root 301628608512 May 13 12:44 6bfcad1f93a7fffa8f0e2b852a401199faf628f5ed7054ad01606f38c24fc568-json.log
drwx------ 2 root root 4096 May 2 10:14 checkpoints
-rw------- 1 root root 4247 May 9 10:26 config.v2.json
-rw-r--r-- 1 root root 1586 May 9 10:26 hostconfig.json
-rw-r--r-- 1 root root 34 May 9 10:25 hostname
-rw-r--r-- 1 root root 197 May 9 10:25 hosts
drwx------ 3 root root 4096 May 2 10:14 mounts
-rw-r--r-- 1 root root 38 May 9 10:25 resolv.conf
-rw-r--r-- 1 root root 71 May 9 10:25 resolv.conf.hash

truncate --size 0 6bfcad1f93a7fffa8f0e2b852a401199faf628f5ed7054ad01606f38c24fc568-json.log

That freed up about 270gb. Top lols.

Anyway, I had four of these workers running, so that’s where all my disk space had gone.

Not Out Of The Woods Just Yet

There’s two further issues to address though at this point:

Firstly, I needed to update the Docker image to set the proper path to the RabbitMQ instance. This would stop the log file spam. Incidentally, within the space of truncating and then running a further ls -la, the log was already at 70mb. That’s some aggressive connecting.

This would have been nicer as an environment variable – you shouldn’t need to do a rebuild to fix a parameter. But that’s not really the point here. Please excuse my crappy setup.

Secondly, and more importantly, I needed a way to enforce Docker never to misbehave in this way again.

Fortunately, docker-compose has a solution to this problem.

Here’s a small sample from my revised config:

version: '3.7'

x-logging:
  &default-logging
  options:
    max-size: '12m'
    max-file: '5'
  driver: json-file

services:

    db:
        image: someimage:version
        environment:
          BLAH: 'blah'
        logging: *default-logging

    worker:
        image: path.to.my/worker:version
        depends_on:
          - rabbitmq
        logging: *default-logging

OK, obviously a bit stripped down, but the gist of this is I borrow the config directly from the Docker Compose docs.

The one thing that I had to do was to put the x-logging declaration above the services declaration. Not sure why the order matters, but it didn’t seem to want to work until I made this change.

Once done, restarting all the Docker containers in this project (with the revised Docker image for the workers) not only resolved the log spam, but helpfully removed all the old containers – and associated huge log files – as part of the restart process.

Another fine disaster averted.

A Makefile To Run Makefiles

Here’s a small helper thingy that I use to help manage multiple docker-compose setups in production. But this will work with anything that uses Makefiles.

The idea is that I have a helpful Makefile per docker-compose project, as kinda mentioned in this post.

In short, using a Makefile per project allows me to mask away some long winded commands that make kick starting each environment much easier than it may otherwise have been.

The problem is that I have one Makefile per project directory, and some projects have several services. An example might be a project with:

  • www
  • api
  • management console
  • demo site

And so on.

Whilst it’s nice to have one command per service, it does still mean I have to log on to the server, cd to each dir, then run the make start command. And in some cases this needs to be done in a particular order, so that dependant services are up before workers try to connect, and so on.

A better way is to have one Makefile in the project root dir, which then calls the make start command in each sub dir. Something like this:

touch /docker/myproject.com/Makefile
vim /docker/myproject.com/Makefile

start:
    make start --directory /docker/myproject.com/api && \
        make start --directory /docker/myproject.com/admin && \
        make start --directory /docker/myproject.com/www

This way I can now just run one command on the project root dir, and it will take care of calling all the sub tasks that kick start the project.

How I Fixed: argument “$entityManager” of method “__construct()” references class “Doctrine\ORM\EntityManager” but no such service exists

Ok, mega crazy title. And honestly, this is just the tip of the iceberg. Allow me to set the scene:

Lately I have had email conversations, read threads on hackernews, and even had a forum post challenging how and why I do things the way I do.

The summary of the email conversations being why I persist with Symfony / PHP generally, when other, “better” solutions exist. And the same can be said for the linked forum post.

And then yesterday I saw that linked Hacker News thread:

It was at about ~320 comments when I read it. The top reply was the most interesting for me:

There’s a bit more to it than that, and the thread itself is worth a read. There’s basically 400+ different suggested ways to “get a web app up quickly in 2018”. I’d disagree with a bunch of them, but then, they are the way I do things.

Wait, what?

Yeah, I’d disagree with Docker + Ansible + Terraform + nginx + (Symfony/Rails/Go/etc) + Postgres, etc, being quick to get up and running.

Sure, once you know the drill / have projects to copy / paste from, it can be quick, relative to the first time you had to learn and implement all this stuff. But it’s not quick quick. It still takes me ages.

And so I challenged myself: Just how quickly could I get a typical project up and running for myself? The perfect question for a Saturday night.

My Setup

The setup I most typically use is:

  • Terraform for spinning up a server
  • Ansible for prep’ing the box
  • Docker for running stuff
  • GitLab for code hosting + CI
  • nginx for my web server
  • Symfony / PHP 7 for the code
  • Postgres for the DB

This is a lot of stuff, and it’s not super quick to set up.

This is why I started by mentioning the email / forum conversations whereby people ask: is Symfony / PHP the best tool of choice?

Well, maybe not. I don’t know. I just know I’m more productive with Symfony and PHP generally than everything else – though JavaScript is a close second.

Over the past few years I’ve tried other setups. It’s hard to invest time in learning another stack when the end result may be basically identical – what did I gain from the time invested? Could that time have been better invested elsewhere? Hard questions to answer.

But yeah, Node and more recently, Golang have been stronger contenders than usual for my attention. Anyway, that’s a bit of a digression.

The Problem

As mentioned above, that’s my stack. Learning it all took ages (years?), but as each project is, from an infrastructure point of view, very similar, I can now spin up a new environment very quickly.

My challenge was to find out how quickly. I got most of the core stuff up and running in ~1.5 hours.

I didn’t get the Behat testing environment set up in that time. Because I hit on an issue.

I wanted a simple JSON API as the outcome of this process. By simple I mean basically CRUD.

With the basic stack up and running, I created a basic entity (one property), and updated the DB accordingly. Doctrine was used for DB interactivity. Again, very typical for my projects.

In order to get data out of my repo, I needed to create a repository. There’s an awesome post on this by Tomas Votruba called How to use Repository with Doctrine as Service in Symfony.

As a side note here: if you haven’t already, I would highly recommend reading Tomas’ blog, as it’s jam packed with things you’d likely find very useful and interesting. Also, check out his GitHub projects, with Rector in particular being incredible.

I followed the linked article, and hit upon the following:

Cannot resolve argument $temporaryEmailRepository of "App\Controller\TemporaryEmailController::cget()": Cannot autowire service "App\Repository\TemporaryEmailRepository": argument "$entityManager" of method "__construct()" references class "Doctrine\ORM\EntityManager" but no such service exists. Try changing the type-hint to one of its parents: interface "Doctrine\ORM\EntityManagerInterface", or interface "Doctrine\Common\Persistence\ObjectManager".

What was weird to me at this point is that I’ve followed this article before, but never hit upon any problems.

Anyway, I did as I was told – I switched up the code to reference the EntityManagerInterface instead:

<?php

declare(strict_types=1);

namespace App\Repository;

use App\Entity\TemporaryEmail;
use Doctrine\ORM\EntityManagerInterface;
use Doctrine\ORM\EntityRepository;

final class TemporaryEmailRepository
{
    /**
     * @var EntityRepository
     */
    private $repository;

    /**
     * TemporaryEmailRepository constructor.
     *
     * @param EntityManagerInterface $entityManager
     */
    public function __construct(EntityManagerInterface $entityManager)
    {
        $this->repository = $entityManager->getRepository(TemporaryEmail::class);
    }

    /**
     * @return array
     */
    public function findAll(): array
    {
        return $this->repository->findAll();
    }
}

This is a really simple class.

For complete clarity, here’s basically the rest of the app at this point:

<?php

namespace App\Controller;

use App\Repository\TemporaryEmailRepository;
use FOS\RestBundle\Controller\Annotations;
use FOS\RestBundle\Controller\FOSRestController;

class TemporaryEmailController extends FOSRestController
{
    /**
     * @Annotations\Get("/")
     *
     * @param TemporaryEmailRepository $temporaryEmailRepository
     *
     * @return \FOS\RestBundle\View\View
     */
    public function cget(TemporaryEmailRepository $temporaryEmailRepository)
    {
        return $this->view([
            'data' => $temporaryEmailRepository->findAll(),
        ]);
    }
}

And the entity:

<?php

namespace App\Entity;

use Doctrine\ORM\Mapping as ORM;
use Symfony\Component\Validator\Constraints as Assert;

/**
 * @ORM\Entity(repositoryClass="App\Repository\TemporaryEmailRepository")
 * @ORM\Table(name="temporary_email")
 */
class TemporaryEmail implements \JsonSerializable
{
    /**
     * @ORM\Column(type="guid")
     * @ORM\Id
     * @ORM\GeneratedValue(strategy="UUID")
     */
    private $id;

    /**
     * @ORM\Column(type="string", name="domain", unique=true, nullable=false)
     * @Assert\Url()
     * @var string
     */
    private $domain;

    /**
     * @return mixed
     */
    public function getId()
    {
        return $this->id;
    }

    /**
     * @return string
     */
    public function getDomain(): string
    {
        return $this->domain;
    }

    /**
     * @param string $domain
     *
     * @return TemporaryEmail
     */
    public function setDomain($domain): self
    {
        $this->domain = $domain;

        return $this;
    }

    /**
     * @return array
     */
    public function jsonSerialize(): array
    {
        return [
            'id'     => $this->id,
            'domain' => $this->domain,
        ];
    }
}

This is basically a generated entity with a couple of tweaks. It’s not the final form, so don’t take this as good practice, or whatever.

The purpose of what this class is supposed to do is also not relevant here, but will be discussed in a future video.

Anyway, the problem is evident in the code above. If you can spot it, then good stuff 🙂

If not, keep reading.

So with three records in the DB, all the connectivity setup, things looking decent, I sent in a request to my only endpoint – GET /.

And it didn’t work. I hit a 504 Gateway Timeout  error from nginx.

2018/06/03 20:02:19 [error] 7#7: *25 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 172.18.0.1, server: temporary-email.dev, request: "GET / HTTP/1.1", upstream: "fastcgi://172.18.0.3:9000", host: "0.0.0.0:807

Very confusing, overall. I mean, this is basically copy / paste from a different project that works just fine. Only, I’ve renamed the project name. What the heck?

I hit refresh a few times, you know, to make sure the computer wasn’t lying to me. And then everything started going unresponsive. Very odd. I’ve just bumped the system from 16gb to 32gb, and all I have is a few Docker containers running, a browser with admittedly too many open tabs, and one instance of PHPStorm. Surely this couldn’t be taxing the system. htop  told me a different story:

Yeah, I know, that swap size is ridiculous. Forgive me.

The nginx logs weren’t really that helpful. I needed to look at the PHP log output, which in this case is achieved via docker logs :

docker logs php_api_te

[03-Jun-2018 20:06:12] WARNING: [pool www] child 6 said into stderr: "NOTICE: PHP message: PHP Fatal error:  Maximum execution time of 60 seconds exceeded in /var/www/api.temporary-email.dev/src/Repository/TemporaryEmailRepository.php on line 23"

Line 23 of TemporaryEmailRepository  is:

public function __construct(EntityManagerInterface $entityManager)

I mucked around a bit, trying out injecting the ObjectManager instead, but hit the same issue.

Then I wondered if it was the act of injecting itself, or actually using the injected code (durr). So I commented out the call:

    /**
     * TemporaryEmailRepository constructor.
     *
     * @param EntityManagerInterface $entityManager
     */
    public function __construct(EntityManagerInterface $entityManager)
    {
        // $this->repository = $entityManager->getRepository(TemporaryEmail::class);
    }

Reloading now, I was no longer seeing the massive RAM spike, and looking at what that call was doing pushed me down the right lines.

I’ll admit, it took me a much longer amount of time than I’d of liked to realise my mistake:

/**
- * @ORM\Entity(repositoryClass="App\Repository\TemporaryEmailRepository")
+ * @ORM\Entity()
 * @ORM\Table(name="temporary_email")
 */
class TemporaryEmail implements \JsonSerializable

Now, I’m not 100% certain on the conclusion here, but this is my best guess.

I believe I had created a circular reference. I’d injected the Entity Manager into the repo. Immediately I’d asked for the entity. The entity has an annotation pointing at the repo, which triggered the endless loop.

Anyway, removing the repositoryClass attribute fixed it up. Kinda obvious in hindsight.

The Conclusion

I’m convinced I could get an environment up faster than this. Without hitting this issue I believe I would be at the ~2 hour mark to go from idea to having a solid setup that’s good to write code in a sane, reproducible, reliable / testable way.

I think back to 10+ years ago, where I’d be up and running so much faster. PHP is essentially a scripting language. With shared hosting, you’d have the DB ready, the web server ready, you just needed to write a bit of code, connect to the DB, push the code up somehow (FTP :)) and bonza, you’re up and running.

Looking at that way now, I’m amazed how far I’ve come. There’s a massive overhead with using frameworks – time spent learning (which never stops, unless your framework of choice goes EOL), patching, managing all this stuff, learning new ways to make things better… is it all worth it? I think so.

I think the biggest takeaway for me lately is that whilst within the last ~5 years I’ve shipped a lot less code to prod than in the 5 years preceding this, the code I do ship is more stable, and maintainable.

Nagging in my mind, however, is that what’s the point in this slow, methodical approach if the end result is it takes so long, I either don’t bother with entire ideas, or by the time I’ve shipped them, I’m so burned out by the seeming complexity of the whole thing that I lose interest in taking them further.

Anyway, I appreciate this is half helpful, half rant. I just needed to blog it and get these thoughts out of my head.

My GitLab Runner Config.toml [Example]

I hit on an annoying issue this week, which I’m not sure of the root cause.

Last week I bumped GitLab from 10.6, to 10.8, and somehow broke my GitLab CI Runner.

Somewhere, I have a backup of the config.toml file I was using. I run my GitLab CI Runner in a Docker container. I only run one, as it’s only for my projects. And one is enough.

Somehow, the Runner borked. And annoyingly I neither had a reference of the running version (never use :latest unless you like uncertainty), and recreating without the config.toml file has been a pain.

So for my own future reference, here is my current GitLab Runner config.toml file:

user@8818901c05c8:/# cat /etc/gitlab-runner/config.toml

concurrent = 1
check_interval = 0

[[runners]]
  name = "runner-1"
  url = "https://my.gitlab.url"
  token = "{redacted}"
  executor = "docker"
  [runners.docker]
    tls_verify = false
    image = "docker:dind"
    privileged = true
    pull_policy = "if-not-present"
    disable_cache = false
    volumes = ["/var/run/docker/sock:/var/run/docker.sock","/cache"]
    shm_size = 0
  [runners.cache]
    insecure = false

FWIW this isn’t perfect. I’m hitting on a major issue currently whereby GitLab CI Pipeline stages with multiple jobs in the stage are routinely failing. It’s very frustrating. It’s also not scheduled for fix until v11, afaik.