Code Review Videos > Broken Link Checker > TypeScript > The Proof of Concept

The Proof of Concept

It’s really easy for me to work on some code in the background, then put it up here as I went from oh that’s an interesting idea to and here is the finished product without a whole load of trial and error in between.

For me, seeing that trial and error process is often even more valuable than only seeing the end result. I guess that’s why so many teachers give extra marks for showing your workings.

Rather than me going piece by piece, I’m going to show the final “ok, that works” code that I had before I began re-doing everything in a tested manner. And this will lead on to how and why I extract things in the next few parts of this section.

My starting point was my usual project setup approach.

// index.ts

type VisitedURL = {
  url: string;
  status: number;
  statusText: string;
  ok: boolean;
  redirected: boolean;
  headers: Record<string, string>;
};

const badRequest = {
  status: -1,
  ok: false,
  redirected: false,
  headers: {},
};

let previousController: AbortController | undefined = undefined;

const fetcher = async (href: string) => {
  console.log(`fetcher`, href);
  const currentController = new AbortController();

  if (previousController) {
    previousController.abort();
  }

  previousController = currentController;

  const { url, status, statusText, ok, headers } = await fetch(href, {
    method: "head",
    redirect: "manual",
    signal: currentController.signal,
  });

  if (ok) {
    currentController.abort();
    previousController = undefined;
  }

  console.log(`headers`, headers);

  const location = headers.get("location");
  const redirected = null !== location;

  let nextLocation = undefined;
  try {
    if (redirected) {
      const hrefToUrl = new URL(location);
      console.log(`hr`, hrefToUrl);
    }
  } catch (e) {}

  return { url, status, statusText, ok, headers };
};

const visit = async (
  href: string,
  requests: VisitedURL[] = []
): Promise<VisitedURL[]> => {
  console.log(`href`, href);
  const hrefToUrl = new URL(href);
  console.log(`hrefToUrl`, hrefToUrl);

  if (!["http:", "https:"].includes(hrefToUrl.protocol)) {
    return [
      ...requests,
      {
        ...badRequest,
        url: href,
        statusText: `Unsupported protocol: "${hrefToUrl.protocol}"`,
      },
    ];
  }

  try {
    const { url, status, statusText, ok, headers } = await fetcher(href);

    const newLocation = headers.get("location");
    const redirected = null !== newLocation;

    if (redirected) {
      const hrefToUrl = new URL(newLocation);
      console.log(`hr1`, hrefToUrl);
    }

    const updated = [
      ...requests,
      {
        url,
        status,
        statusText,
        ok,
        redirected,
        headers: Object.fromEntries(headers),
      },
    ];

    if (ok) {
      return updated;
    }

    if (!redirected) {
      return updated;
    }

    console.log(`newLocation`, newLocation);

    return await visit(newLocation, updated);
  } catch (e) {
    console.error(e);

    return [
      ...requests,
      {
        ...badRequest,
        url: href,
        statusText: `An unhandled error occurred: ${e}`,
      },
    ];
  }
};

// const href = "https://a.bad.url";
const href = "https://aka.ms/new-console-template";
// const href = "https://codereviewvideos.com";
// const href = "http://codereviewvideos.com";
// const href = "http://codereviewvideos.com/444444";
// const href = "https://codereviewvideos.com/typescript-tuple";
// const href = "tel:+1-303-499-7111";
// const href = "mailto:someone@example.com";

(async () => {
  try {
    const journey = await visit(href);
    console.log(journey);
  } catch (e) {
    console.error(e);
  }
})();
Code language: TypeScript (typescript)

There are some mistakes in here.

And there is some evidence of me using console.log for debugging. I’m trying to move away from this, but it is a hard habit to kill.

The following are the parts that, to me, are the most interesting.

Defining VisitedURL

Early on, even with proof of concept or small project ideas I will look to define some types. These are initially rough and fluid ideas of what I ultimately want, and slowly but surely take on more rigidity as they firm up.

One of the challenges in this code is working with fetch.

Being native to Node, it is super easy to rely concretely on fetch, and as such my code is then tied to whatever implementation details that fetch exposes.

Mostly, the response from a fetch call uses JavaScript primitive types for the returned values:

const { url, status, statusText, ok, headers } = await fetch('http://example.com')

// where

url: string;
status: number;
statusText: string;
ok: boolean;
redirected: boolean;Code language: JavaScript (javascript)

But one thing I specifically care about here is the response headers. And they are of type Headers:

Even during prototyping I was aware that relying very specifically on Headers in this way would make my life hard(er) when coming to write a more robust implementation.

That is why I took at stab at thinking I’d convert Headers to a standard JS object:

type VisitedURL = {
  url: string;
  status: number;
  statusText: string;
  ok: boolean;
  redirected: boolean;
  headers: Record<string, string>;
};
Code language: TypeScript (typescript)

Basically saying my object would contain keys and values where both would be strings.

This would require a conversion step, but hopefully that would be trivial.

Extracting The fetch Process

Part of the fun of this task is learning more about what happens in the redirection journey.

You can absolutely ignore all of these extra stuff if you don’t care how a link is redirected, only that it is successfully redirected.

But where is the learning in that?

let previousController: AbortController | undefined = undefined;

const fetcher = async (href: string) => {
  console.log(`fetcher`, href);
  const currentController = new AbortController();

  if (previousController) {
    previousController.abort();
  }

  previousController = currentController;

  const { url, status, statusText, ok, headers } = await fetch(href, {
    method: "head",
    redirect: "manual",
    signal: currentController.signal,
  });

  if (ok) {
    currentController.abort();
    previousController = undefined;
  }

  console.log(`headers`, headers);

  const location = headers.get("location");
  const redirected = null !== location;

  let nextLocation = undefined;
  try {
    if (redirected) {
      const hrefToUrl = new URL(location);
      console.log(`hr`, hrefToUrl);
    }
  } catch (e) {}

  return { url, status, statusText, ok, headers };
};
Code language: TypeScript (typescript)

It turns out that when we set redirect: "manual", the first gotcha is that the request will kinda just… hang.

The request will sit in progress until the request timeout value is hit, which I think is 30 seconds. I haven’t been able to track down the exact default time out value for the Node version of fetch.

This is a problem as when run, the program will hang waiting for each of the redirected requests to timeout before finally finishing.

The attempt above uses the concept of an AbortController to abort the previous request, then create a new one and repeat as many times as there are redirects.

Aborting the previous request did solve the problem, but this approach wasn’t the final code I went with. Thankfully. There is, I believe, a major gotcha here in that if multiple callers hit fetcher, they could inadvertently mess up each others requests due to the previousController relying on closure. Maybe I’m wrong here, but I felt like this was bad all the same.

Losing My Head

The second thing in this code of interest was the use of method: "head".

I thought I had been very clever here by not using get. After all, head won’t download all the content of the page. However it turns out head doesn’t follow redirects the same way for some reason. A new discovery for me, but one that meant this approach didn’t work for all links.

Already Extracting

With an eye to the future I had already begun thinking about extracting the fetch process.

I figured if I was going to write tested code around this I was almost certainly going to need to hide fetch away at some point. Mocking becomes much easier when you own the things you mock.

However I strongly suspected I couldn’t mask away fetch entirely, at least not without writing some fairly convoluted code that would be unnecessary for a project of this size. Keep the Java out of JavaScript, and all that.

Recursion

Elixir was the first language that really got me playing around with Recursion.

This problem appears to lend itself quite well to the idea of a recursive function call.

Regardless of whether we are looking at the first link in a redirection chain, or the 59th, the idea is always the same.

We visit the link and whatever info we gather along the way we append it to the array of requests.

If the current request has a header called location then we know to look there next.

We can then call the exact same function / itself and pass in the new location to visit and the current array of request information.

const visit = async (
  href: string,
  requests: VisitedURL[] = []
): Promise<VisitedURL[]> => {
  
  ...

  return await visit(newLocation, updated);
};Code language: TypeScript (typescript)

I thought this was quite elegant, and was feeling quite smug about this.

As we shall see in the C# section of this series, I soon got myself a slap in the face over this.

Always be humble 🙂

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.