Code Review Videos > Broken Link Checker > TypeScript > Resolve The Next Location

Resolve The Next Location

In the previous step we saw that whilst the code ran, and appeared to work, there were two issues to address when given a real world use case.

  1. The visit function appeared to hang even though the result was seemingly correct.
  2. We unexpectedly encountered location headers that could be just a path, not a fully qualified URL as initially anticipated.

It is that second issue that we shall address now.

Here’s the code we have that handles the location redirection:

    if (result.headers.location) {
      const updatedRequests = [...requests, { ...result, redirected: true }];

      return await visit(result.headers.location, updatedRequests);
    }Code language: TypeScript (typescript)

Clearly we need a more robust implementation here. We could start whacking in more conditionals and whatever, trying to handle the problem inside the visit function. But really that whole process needs extracting and hiding behind a single function call.

Starting With Tests

I’m going to suggest we move the logic out of visit and into a new function.

Here’s some starting tests, given what we currently have:

import { getNextLocation } from "./get-next-location";

const validUrl = "https://some.valid.url";

describe("getNextLocation", () => {
  afterEach(() => {
    jest.clearAllMocks();
  });

  test("should return null if headers['location'] is undefined", () => {
    expect(getNextLocation({})).toEqual(null);
  });

  test("should return null if headers['location'] is an empty string", () => {
    expect(getNextLocation({ location: "" })).toEqual(null);
  });
});Code language: TypeScript (typescript)

We’re going to create a function called getNextLocation that takes the headers object and tries to figure out if there is another URL to visit.

export const getNextLocation = (
  headers: { [key: string]: string }
) => {
  const location = headers["location"];

  if (!location) {
    return null;
  }

  return null;
};Code language: JavaScript (javascript)

The function takes in one parameter, an object called headers, which has a key that is a string, and the value of that key is also a string.

We saw this is how the fetch call actually returns data, so we are making use of this knowledge:

  {
    url: 'https://aka.ms/new-console-template',
    status: 301,
    statusText: 'Moved Permanently',
    ok: false,
    headers: {
      'cache-control': 'max-age=0, no-cache, no-store',
      connection: 'keep-alive',
      'content-length': '0',
      date: 'Mon, 23 Jan 2023 16:02:21 GMT',
      expires: 'Mon, 23 Jan 2023 16:02:21 GMT',
      location: 'https://learn.microsoft.com/dotnet/core/tutorials/top-level-templates',
      pragma: 'no-cache',
      'request-context': 'appId=cid-v1:7d63747b-487e-492a-872d-762362f77974',
      server: 'Kestrel',
      'strict-transport-security': 'max-age=31536000 ; includeSubDomains',
      'x-response-cache-status': 'True'
    },
    redirected: true
  },Code language: CSS (css)

Each one of those headers is a string key, and a string value. Hence the type information in the function parameters matches this.

This passes, but doesn’t actually do anything useful.

Let’s keep going.

Returning Any Location Header Information

Returning null proves our function behaves, but it doesn’t actually do anything of use.

Let’s add in another test to keep iterating towards what we want to happen:

  test("should return the headers['location'] value if available", () => {
    expect(
      getNextLocation({
        location: validUrl,
      })
    ).toEqual(validUrl);
  });Code language: PHP (php)

OK, so if the function is given a headers object that has a key called location, and that key has something in it, then we expect that value to be returned.

Again, to make this pass:

export const <em>getNextLocation </em>= (headers: { [key: string]: string }) => {
  const location = headers["location"];

  if (!location) {
    return null;
  }

  return location;
};Code language: TypeScript (typescript)

Sadly, this is very naive and doesn’t solve our problem.

We can see this by passing any junk in, and we will get that junk spat back out:

  test("should return the headers['location'] value if available", () => {
    expect(
      getNextLocation({
        location: 'burger burger',
      })
    ).toEqual('burger burger');
  });Code language: PHP (php)

It still passes, but of course it’s nonsense.

Returning Only Valid Location Header Information

Fortunately we already have a URL validator.

Let’s chuck that into the mix:

  test("should return null if the headers['location'] value if not a valid URL", () => {
    expect(
      getNextLocation({
        location: "burger burger",
      })
    ).toEqual(null);
  });

  test("should return the headers['location'] value if available", () => {
    expect(
      getNextLocation({
        location: validUrl,
      })
    ).toEqual(validUrl);
  });Code language: TypeScript (typescript)

One passes, one fails:

Let’s update the code:

export const getNextLocation = (headers: { [key: string]: string }) => {
  const location = headers["location"];

  if (!location) {
    return null;
  }

  if (isValidUrl(location)) {
    return location;
  }

  return null;
};Code language: TypeScript (typescript)

Sweet, both pass.

What About Paths?

When giving the visit function a real world road test, we just so happened to stumble upon an example where Microsoft did some redirecting that used paths, not full URLs.

Will our code work properly now that we’ve made the changes above? Let’s add another test to find out:

  test("should return a full URL if the headers['location'] value is only a path", () => {
    expect(
      getNextLocation({
        location: "/a/path/to/somewhere",
      })
    ).toEqual("hmmm");
  });Code language: TypeScript (typescript)

OK, so we figured out an issue here when writing out the test.

Our getNextLocation function will be given some headers. That’s good. We can figure out the location, if available, and return that… if it’s a valid URL.

But when the location is a path, then what?

Let’s look again at the response when the location is a path:

  {
    url: 'https://learn.microsoft.com/dotnet/core/tutorials/top-level-templates',
    status: 301,
    statusText: 'Moved Permanently',
    ok: false,
    headers: {
      'akamai-cache-status': 'Miss from child, Miss from parent',
      'cache-control': 'no-cache, no-store',
      connection: 'keep-alive',
      'content-length': '0',
      date: 'Mon, 23 Jan 2023 16:17:37 GMT',
      expires: 'Mon, 23 Jan 2023 16:17:37 GMT',
      location: '/en-us/dotnet/core/tutorials/top-level-templates',
      nel: '{"report_to":"network-errors","max_age":604800,"success_fraction":0.01,"failure_fraction":1.0}',
      'report-to': '{"group":"network-errors","max_age":604800,"endpoints":[{"url":"https://mdec.nelreports.net/api/report?cat=mdocs"}]}',
      'request-context': 'appId=cid-v1:8f3babe3-1612-4642-87ca-e9e867ad0935',
      'set-cookie': 'ARRAffinity=6932f15aacbcddad59baccd7ce183bb8bb314c0aee6f8ecbfd618e801c02bb4c;Path=/;HttpOnly;Secure;Domain=learn.microsoft.com, ARRAffinitySameSite=6932f15aacbcddad59baccd7ce183bb8bb314c0aee6f8ecbfd618e801c02bb4c;Path=/;HttpOnly;SameSite=None;Secure;Domain=learn.microsoft.com, original_req_url=https://learn.microsoft.com/dotnet/core/tutorials/top-level-templates; expires=Mon, 23-Jan-2023 16:17:42 GMT; secure; HttpOnly; SameSite=Lax',
      'strict-transport-security': 'max-age=31536000; includeSubDomains; preload',
      'x-azure-ref': '0IbPOYwAAAADdKm97QQWPSapdOXSLvEYaTE9OMjFFREdFMTcxOQA3MTY4OTIwZS05ZjViLTRhNjItYjE2ZS1kNWJlNjNjZTYxZTc=',
      'x-content-type-options': 'nosniff',
      'x-datacenter': 'eus',
      'x-frame-options': 'SAMEORIGIN',
      'x-ua-compatible': 'IE=edge',
      'x-xss-protection': '1; mode=block'
    },
    redirected: true
  },Code language: CSS (css)

There’s a heck of a lot of information in there, but in the context of what we are trying to achieve, most of it is noise.

We’re only passing in the headers object to getNextLocation.

Is there anything in the headers that can help us work out the full URL?

It doesn’t look like it.

We do have the original URL though. In that example above, we do have:

url: 'https://learn.microsoft.com/dotnet/core/tutorials/top-level-templates'

At a total guess, could we take the first part: https://learn.microsoft.com and concatenate this with the path?

node dist/v2/index.js https://learn.microsoft.com/en-us/dotnet/core/tutorials/top-level-templates

[
  {
    url: 'https://learn.microsoft.com/en-us/dotnet/core/tutorials/top-level-templates',
    status: 200,
    statusText: 'OK',
    ok: true,
    headers: {
      'akamai-cache-status': 'RefreshHit from child, RefreshHit from parent',
      'cache-control': 'public, max-age=598',
      connection: 'keep-alive',
      'content-encoding': 'gzip',
      'content-length': '13190',
      'content-type': 'text/html',
      date: 'Mon, 23 Jan 2023 16:32:03 GMT',
      etag: '"XBEYuDmondeByB1nBAVBBgZVcH8/PHFG7itIJ4FrUv4="',
      expires: 'Mon, 23 Jan 2023 16:42:01 GMT',
      nel: '{"report_to":"network-errors","max_age":604800,"success_fraction":0.01,"failure_fraction":1.0}',
      'report-to': '{"group":"network-errors","max_age":604800,"endpoints":[{"url":"https://mdec.nelreports.net/api/report?cat=mdocs"}]}',
      'request-context': 'appId=cid-v1:8f3babe3-1612-4642-87ca-e9e867ad0935',
      'set-cookie': 'ARRAffinity=0717d172aa605ff8313096b84695912b9aad91d616d460cfaa931ab50b308cc9;Path=/;HttpOnly;Secure;Domain=learn.microsoft.com, ARRAffinitySameSite=0717d172aa605ff8313096b84695912b9aad91d616d460cfaa931ab50b308cc9;Path=/;HttpOnly;SameSite=None;Secure;Domain=learn.microsoft.com',
      'strict-transport-security': 'max-age=31536000; includeSubDomains; preload',
      vary: 'Accept-Encoding',
      'x-azure-ref': '0dhjLYwAAAABDoGYhBE2UQr9FPtxbusxrTE9OMjFFREdFMTcxMQA3MTY4OTIwZS05ZjViLTRhNjItYjE2ZS1kNWJlNjNjZTYxZTc=',
      'x-content-type-options': 'nosniff',
      'x-datacenter': 'eus',
      'x-frame-options': 'SAMEORIGIN',
      'x-rendering-stack': 'Dynamic',
      'x-ua-compatible': 'IE=edge',
      'x-xss-protection': '1; mode=block'
    },
    redirected: false
  }
]Code language: JavaScript (javascript)

That looks good!

It also looks like it will require a bit of a refactor.

Let’s build this into our test suite.

Concat The Path

I’m going to change the getNextLocation function to take an additional parameter: the current URL.

We will need to change all the tests to ensure the getNextLocation function has the correct signature. For now I’ll go with:

  test("should return null if headers['location'] is undefined", () => {
    expect(getNextLocation("", {})).toEqual(null);
  });Code language: TypeScript (typescript)

Note the empty string as the first argument.

As such our implementation code becomes:

import { isValidUrl } from "./is-valid-url";

export const getNextLocation = (
  href: string,
  headers: { [key: string]: string }
) => {
  const location = headers["location"];

  if (!location) {
    return null;
  }

  if (isValidUrl(location)) {
    return location;
  }

  return null;
};

Code language: TypeScript (typescript)

Looking at the code, the only possible way we would ever care about the path is if the isValidUrl check returns false.

With that in mind, here are the tests I think are relevant:

  describe("isValidUrl(location) === false", () => {
    test("should return null if unable to create a valid new URL(currentUrl)", () => {
      expect(
        getNextLocation(validUrl, {
          location: "",
        })
      ).toEqual(null);
    });

    test("should return null if current url is the same as the guessedNextLocation", () => {
      expect(
        getNextLocation(`${validUrl}/`, {
          location: `/`,
        })
      ).toEqual(null);
    });

    test("should return the guessedNextLocation", () => {
      expect(
        getNextLocation(`${validUrl}/`, {
          location: `/123`,
        })
      ).toEqual(`${validUrl}/123`);
    });
  });Code language: JavaScript (javascript)

I’ve nested everything in a describe block as this indicates that a pre-condition has occurred relevant to all tests.

Then I’ve added a new beforeEach step that will only apply to these three tests which ensures that the pre-condition is setup for every test in this block.

The first test is straightforward enough. If the location is an empty string, there’s nowhere to go, so return null.

The second test feels like an edge case. What if the path is /?

So the given URL is https://codereviewvideos.com, and the path is /, then when the two are concatenated, the end result is https://codereviewvideos.com/. That’s the same URL with regards to this implementation, so I don’t want to keep visiting the same URL over and over.

As such, if the two URLs match, return null.

Lastly we have the actual situation we are encountering with the Microsoft URL. That is the path is added to the base URL, and we get a brand new URL to visit. In that case, we should return that new URL as it would be “interesting” to us.

Building The Next URL

The implementation to make these tests pass is as follows:

import { isValidUrl } from "./is-valid-url";

export const getNextLocation = (
  currentUrl: string,
  headers: { [key: string]: string }
) => {
  const location = headers["location"];

  if (!location) {
    return null;
  }

  if (isValidUrl(location)) {
    return location;
  }

  if (!isValidUrl(currentUrl)) {
    return null;
  }

  const { origin: currentOrigin, pathname: currentPathname } = new URL(
    currentUrl
  );

  const guessedNextLocation = `${currentOrigin}${location}`;

  if (guessedNextLocation === `${currentOrigin}${currentPathname}`) {
    return null;
  }

  return guessedNextLocation;
};
Code language: TypeScript (typescript)

Everything from line 17 to line 29 is new.

Lines 17-19 ensure that if we provide a current URL, that it is indeed a valid URL. This stops any funky business happening on the next step, which is…

Line 21 uses destructuring assignment to extract properties from a known valid URL object. It creates a new URL object by passing in the currentUrl parameter and then it extracts the origin and pathname properties from the URL object and assigns them to new variables currentOrigin and currentPathname respectively.

Line 25 is where the concatenation takes place.

Lines 27-29 compare the URL we just guessed at – very similar to our manual Microsoft link test earlier – to the known valid current URL.

If they match, we return null.

Otherwise, we know we have a new URL to visit.

Clean Up

Previously we put in this test:

  test("should return a full URL if the headers['location'] value is only a path", () => {
    expect(
      getNextLocation("", {
        location: "/a/path/to/somewhere",
      })
    ).toEqual("hmmm");
  });Code language: TypeScript (typescript)

This test fails because, at the time, we didn’t know enough to write a proper test.

I’m going to remove this, as it was covered – in my opinion – in the “should return the guessedNextLocation” test.

That gives us a passing test suite once more, and a function we can use back inside visit (maybe?) to figure out what URL to call next.

We’re almost there!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.