The last 4 weeks have been an epic journey. I started off with next to no Docker experience and have finished up with a working Docker-in-production setup.
Along the way I have learned an absolute ton of new stuff. But oh my Lord have I missed writing some code. Any code. Like, I have written absolutely no code in the last 4 weeks, and it’s killing me.
But anyway, I want to burst the bubble. Deploying Docker is not simple. At least, my experience has been anything but plain sailing.
Take for instance the obvious stuff, like having to learn the fundamentals of Docker – from the Dockerfile through to Volumes, and Networking. You need to know this stuff for development, let alone production.
And then there’s the less obvious. The “gotchas”. The ‘enjoyable’ headscratchers that left me stumped for days on end. Allow me to share a couple of comedy errors:
I wanted to run my API and my WordPress site on one of my Rancher hosts. This simply means I have a couple of Digital Ocean droplets that could potentially host / run my Docker containers, and I specifically wanted one of them to run both the API and the WordPress site.
Now, both sites are secured by SSL – a freebie cert from LetsEncrypt. So far, so good.
Both sites using SSL means both need access to port 443.
My initial impression was that I should be exposing port 443 from both containers, and we should be good, right?
Only one of them can use port 443. The other will simply not come online, and the error isn’t very obvious.
No problem, Rancher has a Load Balancer. Let’s use that.
So I get the Load Balancer up, and with a bit of effort, both of my sites are online and I feel good about things.
“Where’s the comedy error?” you might ask. Good question.
This load balancer, it’s pretty useful. Shortly after getting the websites online I’m feeling fairly adventurous and decide to migrate from GitLab on a regular, plain old standalone, Ansible-managed Digital Ocean Droplet to a fully Dockerised, Rancher-managed ‘stack’.
What could possibly go wrong?
Well, it turns out… everything.
Sure, my backups worked, but they kinda didn’t. My Droplet used GitLab CE, but my Docker image is built from source. I don’t know specifically why, but I couldn’t get the two to play ball. No major loss, just 2 years worth of my GitLab down the drain.
I soldiered on. I got GitLab up and running, but this was when the real fun started.
To get my CI pipeline going I needed to host my Docker images inside GitLab. That’s cool, GitLab has a Docker Registry feature baked in as of a late minor release of the 8.x.x branch, and I was rocking GitLab 9.0.5. Also, this was something that was working on my trusty old Droplet.
Of course, this all went totally pear shaped.
It took me a couple of days just to get the SSL certificates to play ball. All the while, my GitLab server wouldn’t show me any of the Docker images I’d uploaded. Fun times. I mean, they were there, but GitLab was just having none of it. Where there any helpful error messages? Of course not.
Ok, I swear the comedy errors are coming really soon.
Anyway, GitLab and the Docker Registry need a bunch of open ports: 80, 443, 5000… and 22.
Sweet, I know how to work with ports, so I’ll just add a bunch more into my Load Balancer and everything “just works”, right? Well, after about 3 more days, yes, everything “just worked”.
However, for some reason that totally escapes me now, I needed to SSH into the specific Rancher Host that had my GitLab instance on it. Not the container, the virtual machine / Droplet.
Lo-and-behold, no matter what I tried this just wouldn’t work. I was absolutely stumped.
In Linux-land, particularly on the server, it’s fairly uncommon to reboot. It rarely works quite as well as the usual “turn it off and on again” routine / meme would suggest.
But I tried it. And here’s the really weird part:
I could get in. Boom, I was in. But was it a fluke? I immediately logged out, and sure enough, I couldn’t log in. Same thing. Permission denied (publickey).
5 hours later I realised the mistake. I had redirected port 22 on the Load Balancer to my GitLab server so I wouldn’t end up with funky looking URLs for my git repos :/
Yep, port 22 wasn’t going to my Rancher Host. Instead, Rancher’s Load Balancer was forwarding port 22 on to my GitLab container.
5 hours. Seriously. I wish I was joking.
It’s this sort of thing that’s surprisingly difficult to Google for.
I promised a couple of comedy errors, so here’s the second:
I had gone full-scale cheapskate on my Rancher setup and opted for the 1gb Droplets. 1gb to run multiple instances of MySQL, nginx, PHP, RabbitMQ, Graylog, GitLab.
I wanted to push the limits of what Docker could do, and it turns out, Java inside a container is still Java. It will chew through a gig of ram in no time.
I wasn’t overly concerned by that. For you see, I had a cunning plan. And if Baldrick has taught us anything about cunning plans, it’s that they cannot fail.
Being the resourceful young man that I am, I provisioned a third node. A more beefy node. A node that was in fact, my very own computer. Chris, you genius!
Using Rancher’s “scheduling” feature I forced GitLab from the 1gb droplets to instead run exclusively on my machine. All was good.
Until yesterday when my ISP decided to soil itself:
— Code Review (@CodeReviewVids) April 20, 2017
Anyway, during that downtime some gremlins crawled into the system and started tearing apart important and sadly not very backed up pieces of my shiny infrastructure.
Late in the day, my ISP’s hard working network gophers fixed whatever network-related mishap that had broken the Internet and I got back online. Around that time I noticed my own computer was showing as “disconnected” in Rancher. A bit odd, seeing as I was online on this very box.
At this point I should mentioned that I had migrated to GitLab midway through my journey into Rancher. Most of the containers had been provisioned from the original standalone GitLab that I had been using for the last two years. These containers were available via port 4567.
My GitLab Docker container version used port 5000 instead.
Wouldn’t it have been a silly move to reprovision a bunch of my containers at this point, being that my Docker GitLab was playing up, and my old server was no longer available? Yep. Still, I did that anyway. Whoops. I managed to down 2/3rds of my infrastructure. Bad times.
You see, the new Registry container on port 5000 wasn’t up, and couldn’t come up. It needed access to a volume, but the volume was available via NFS on the other node, and connectivity to that node had been lost during the ISP-related downtime.
Not quite understanding the problem with my SSH / port 22 issue, I’d rebooted the other server anyway, and NFS decided not to come back online automatically. Compounded problems, anyone?
Of course, I’d deleted my old GitLab droplet by this point, so the old 4567 registry wasn’t available either… ughh.
Anyway, I’m sure reading this it all sounds… if not obvious, then not that bad. Yeah, I guess not, but like a decent financial savings plan, the real wins are found in compounding. And in my case, it wasn’t so much wins as losses. Many, many small details stacked up to turn a “this will only take me, at most, 2 weeks” to “over 4 weeks and just about getting there”.
It’s been a total mission.
I tell you this as I get so disheartened when I see others say things like this are easy.
It’s not easy. Very little of “development” is easy.
That said, I firmly believe if you have a stubborn persistence, there is actual enjoyment to be had in here somewhere 🙂
Ok, that’s enough ranting from me.
This week saw three new videos added to the site:
Again, like last week I will start you off here and let you follow through to the next two videos in the series.
We’re almost ready to get started with the Security portion of this course, where – in my opinion – things get a lot more interesting.
I will leave you this week by saying thank you to everyone who has been in touch – whether via email, comments, or similar – as ever, I am extremely grateful for all your feedback. Thank you.
Have a great weekend, and happy coding.