Code Review Videos > Broken Link Checker > TypeScript > First Pass At Implementing Visit

First Pass At Implementing Visit

We now have:

That should be enough to make a first stab at creating a real, working, well tested bit of software.

Let’s have a go.

Entry Point

In the proof of concept we slapped everything right there inside the index.ts file.

That does work, but it locks us in to consuming our link checker in a very specific way. A better approach would be to think of the linker checker as a kind of library. A standalone package almost, that we can call from the command line, or a REST endpoint, or via a GUI, or really anything that we need.

In this particular instance I do want to create a command line app.

I want to allow the user to provide a URL when they run the script, and that URL will be used as the link to be checked.

We can achieve this using Node’s process.argv array:

// index.ts

import { visit } from "./visit";

const defaultHref = "https://codereviewvideos.com";

const url = process.argv[2] ?? defaultHref;

(async () => {
  try {
    const journey = await visit(url);
    console.log(journey);
  } catch (e) {
    console.error(e);
  }
})();
Code language: TypeScript (typescript)

There are extra libraries you can use to make working with command line arguments far nicer.

But really all we care about is a string. It should be a valid URL, but it might not be. Fortunately, we have that covered.

Let’s step through, line by line, and cover off what’s happening.

Line 3

We’re going to put our URL checker into a re-usable library. I’m calling that the Visitor, and it will export a visit function.

Therefore we are importing that function on line 3. It doesn’t exist yet, but we will fix that in a moment.

Line 5-7

We set up a default, known valid URL on line 5.

I’m going with the variable name of href rather than url, because it might not be a URL. There’s no true distinction here, I’m simply trying to keep the concepts distinct for human’s reading the code.

On line 7 we look at the command line arguments use when running the script index.ts script.

In Node.js, process.argv is an array that contains the command-line arguments passed to the script when it is run. The first element of the array is the path to the Node.js executable, and the second element is the path to the script file. The remaining elements of the array are any additional command-line arguments passed to the script.

For example, if you run the command node script.js arg1 arg2 arg3 in the command line, the value of process.argv in the script file would be ['node', 'script.js', 'arg1', 'arg2', 'arg3'].

Here we use nullish coallescing (??) to say if there was no URL argument passed when calling the script then fall back to the defaultHref.

Line 9-16

This code is using an immediately invoked async function expression (IIFE) to execute some asynchronous code.

The code inside the IIFE is an async function that is immediately invoked, so it runs as soon as the JavaScript engine encounters it. Inside the async function, it’s using a try / catch block to handle any errors that might occur.

The first line inside the try block, const journey = await visit(url);, is awaiting the completion of an asynchronous function visit(url), which is our as yet unwritten library code. The resolved value of this function is then assigned to the variable journey.

The next line, console.log(journey) will log out the value of the journey variable to the console. This seems like a good way to ensure our application is behaving, for now.

The catch block catch (e) { console.error(e); }, is catching any errors that occur in the try block and logs them to the console.

No Tests

This index.ts file is the only file I am not going to cover with unit tests.

The reason being is that I have further plans for this code and this will not be the eventual way I consume the visit function.

But for now, this serves a good enough purpose.

Or to put it another way, if a call to index.ts doesn’t work, then I’ll know straight away anyway, so I am happy to leave this uncovered.

Implementing Visit

From the proof of concept we know a few things about how we expect the visit function to behave:

  1. Should fail and return early if:
    • given a bad URL
    • the URL is valid but does not begin with http or https
  2. Should fail and return a bad request object if the fetch process fails for an unexpected reason
  3. Should return and exit the function if the current request was a 2xx response code / ok
  4. Should call the visit function / itself again if the current request redirected

Call me a pessimist, but I like to start with the unhappy paths when writing tests.

Mostly I write tests when I expect the software to be complex enough to have more than two outcomes. If the total outcomes are one happy path and one sad path, the likelihood of the software being useful enough to stick around for the long term, or serious enough to need a test suite in the first place, are both fairly low.

For real projects though, I try to think of everything that can (and likely will, at some point) go wrong.

And then I write the tests for them.

In the list above, I’d start with #1. How would that look?

Tests For Fail Early And Return

Here’s where I’d start:

import { visit } from "./visit";

const validUrl = "https://some.valid.url";

describe("visit", () => {
  afterEach(() => {
    jest.resetAllMocks();
  });

  test("should return early if given an invalid URL", async () => {
    expect(await visit("")).toEqual([
      {
        status: -1,
        ok: false,
        redirected: false,
        headers: {},
        url: "",
        statusText: "Invalid URL",
      },
    ]);
  });

  test("should only proceed when working with http and https urls", async () => {
    expect(await visit("tel:+1-303-499-7111")).toEqual([
      {
        status: -1,
        ok: false,
        redirected: false,
        headers: {},
        url: "tel:+1-303-499-7111",
        statusText: `Unsupported protocol: "tel:"`,
      },
    ]);
  });
});Code language: TypeScript (typescript)

There’s a bit of repetition in there.

Keen eyed readers may recall four of those properties from the badRequest object we set up previously:

export const badRequest = {
  status: -1,
  ok: false,
  redirected: false,
  headers: {},
};Code language: TypeScript (typescript)

And that raises the question of whether or not to spread in the ...badRequest rather than hardcoding the values.

It’s a good question. On the one hand it reduces lines in the tests. But on the other I think I prefer being explicit as to exactly what shape I expect in the eventuality of a bad response.

So I’m going to stick with the explicit / verbose test cases, until such time as I change my mind or find reason to do otherwise.

First Pass At The visit Implementation

With those two test cases, we can have a stab at making them pass by improving on the proof of concept code:

import { VisitedURL } from "./types";
import { badRequest } from "./bad-request";
import { isValidUrl } from "./is-valid-url";

export const visit = async (
  href: string,
  requests: VisitedURL[] = []
): Promise<VisitedURL[]> => {
  if (!isValidUrl(href)) {
    return [
      ...requests,
      {
        ...badRequest,
        url: href,
        statusText: `Invalid URL`,
      },
    ];
  }

  const hrefToUrl = new URL(href);

  if (!["http:", "https:"].includes(hrefToUrl.protocol)) {
    return [
      ...requests,
      {
        ...badRequest,
        url: href,
        statusText: `Unsupported protocol: "${hrefToUrl.protocol}"`,
      },
    ];
  }

  return [];
};
Code language: TypeScript (typescript)

This gives two passes.

Let’s cover the interesting lines.

Line 8

We defined the custom type of VisitedURL in the proof of concept.

I’ve ported that concept to this implementation, copying the type to our types.d.ts file:

export type VisitedURL = {
  url: string;
  status: number;
  statusText: string;
  ok: boolean;
  redirected: boolean;
  headers: Record<string, string>;
};

export type FetcherResponse = {
  url: string;
  status: number;
  statusText: string;
  ok: boolean;
  headers: Record<string, string>;
};Code language: TypeScript (typescript)

There’s some repetition creeping in there. I will come back to address that a little later.

What Line 8 says is we will return an array of VisitedURL shaped objects.

Because this is an async function, it must return a Promise of some type of value. That could be a Promise<string> if we were returning a string. Or a Promise<boolean> if we return true or false.

But in this case it’s an array of VisitedURL objects.

Line 9-18

Pretty straightforward.

  if (!isValidUrl(href)) {
    return [
      ...requests,
      {
        ...badRequest,
        url: href,
        statusText: `Invalid URL`,
      },
    ];
  }Code language: TypeScript (typescript)

We created the isValidUrl in a previous step.

All we do here is make use of the boolean return value.

It either returns false, and we return immediately, appending our current badRequest to any existing requests we are aware of.

Or, the URL is valid, and so this block is skipped over.

The one brain bender in this entire function is the use of recursion. All we need to know at this point is that requests will always be an array.

It will either be an empty array (on the first invocation), or it will be an array of N elements, where N is the amount of times the function has previously been called.

It is empty on the first invocation because line 7 – requests: VisitedURL[] = [] – initialises it as an empty array.

Hopefully that makes sense.

Line 20

Line 20 is a little odd, in so much as we already did make a new instance of URL as part of isValidUrl.

That’s true.

const hrefToUrl = new URL(href);Code language: TypeScript (typescript)

But in that case we threw away the outcome on success. Personally I think the creation of a new URL is so cheap, computationally, that there is no harm in doing this.

Your opinion may differ. That’s fine. But for me, I’d rather an is... function returned a boolean.

An alternative is that isValidUrl returns false or URL. But I’m not a fan of that.

Easier to have a bit of repetition in the form of creating a new URL, and at this point we absolutely know that this string will be a valid URL.

What we don’t know though, is whether that URL is one we can handle. That’s the job of …

Line 22-31

This is very similar to lines 9-18:

  if (!["http:", "https:"].includes(hrefToUrl.protocol)) {
    return [
      ...requests,
      {
        ...badRequest,
        url: href,
        statusText: `Unsupported protocol: "${hrefToUrl.protocol}"`,
      },
    ];
  }Code language: TypeScript (typescript)

We saw on line 20 we have a known valid string that will allow us to create a new URL instance.

But that URL may have the wrong protocol.

The only protocols we can visit are http and https. All others, of which there are many, are irrelevant to us.

Side note: is this a protocol, or a scheme? I looked into this for a while and found the answers inconclusive.

Basically if we get given a URI / href that looks like: “tel:+44-1772-112233” then we can’t really do very much with it. We can’t link check a phone number. It is a valid URI – try it, especially on your mobile phone, it will open up the dialing app – but for our purposes it’s basically unsupported.

Could we extract the logic though?

!["http:", "https:"].includes(hrefToUrl.protocol)Code language: TypeScript (typescript)

We did this for isValidUrl, so why not for checking the scheme / protocol?

Well, simply because I’m only doing it here. If this was needed elsewhere then extracting would be a good move. For now, it’s as easy to put this here as anywhere else. So here it stays.

Line 33

This one is temporary:

return [];Code language: TypeScript (typescript)

We have our two if conditionals, but our function’s type definition (as per line 8) says we must return an array of VisitedURLs.

That array can be empty. So by returning an empty array we satisfy the functions type declaration, even though it’s not the true outcome we want.

That will come next.

Next Steps

That’s two tests passing and a working function.

But it’s far from complete, so let’s continue adding tests and logic.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.