The ultimate goal of this little utility we are coding is to give us a clear visual overview of exactly what is happening when we visit a link that redirects.
Ultimately what we want is to be able to give our code a starting URL, and then get an array of output consisting of one or more links that were actually visited before the final URL was located.
So far we have the ability to visit a URL, but because we set up our fetch
call to explicit require manually handling a redirect, right now things kinda come to a halt if the requested URL does redirect.
Here’s the fetcher
code for reference:
import { FetcherResponse } from "./types";
export const fetcher = async (href: string): Promise<FetcherResponse> => {
const { url, status, statusText, ok, headers } = await fetch(href, {
redirect: "manual",
});
const headersObject = Object.fromEntries(headers);
return { url, status, statusText, ok, headers: headersObject };
};
Code language: TypeScript (typescript)
And here’s what happens if we visit a link that should redirect:
We have a couple of issues here:
- We hardcoded
redirected: false
- We’re not actually following the redirect.
Hopefully we can fix them both in one go.
A Recursive Approach
The process of fetching a URL is always the same.
It doesn’t matter whether we provided the URL, or the URL was provided from a location
header in the response.
We start by calling visit
with a URL:
export const visit = async (
href: string,
requests: VisitedURL[] = []
): Promise<VisitedURL[]> => {
Code language: TypeScript (typescript)
There is a second parameter called requests
, which is an array of previous requests. If we don’t provide a value here, it defaults to an empty array. When calling visit
manually, we would likely never pass in a second parameter.
Where this parameter becomes useful is if the visit
function determines there is another URL that it has to visit. It can then append the current request information to the requests
array, and call itself with the new URL and that updated array of requests
.
That should mean we can use a recursive approach to find the full link journey, regardless of how many redirects are actually involved.
Although, that said, we likely do not want to redirect indefinitely. So we will add in a bit of logic to ensure we stop at some high value – such as after 50 redirects. That should almost never happen in the real world, unless something has gone very wrong.
How Do We Known When We Should Redirect?
There are at least a couple of ways we could determine if our fetch
request encountered a redirect.
One way is to look at the HTTP response code. All of the 3xx
codes indicate some kind of redirection occurred.
A better way, for us at least, is to look in the response headers
. This will contain a bunch of information that is very specific to the current request, but the one header we care about is the location
.
If there is a location
header then we have a possible next URL to visit.
That is the check we shall make.
Let’s write a test to cover this:
test("should recursively call the visit function if a valid location header exists", async () => {
const validUrl = "https://some.valid.url";
const nextUrl = "https://next.url";
const fetcherSpy = jest
.spyOn(Fetcher, "fetcher")
.mockResolvedValueOnce({
ok: false,
url: validUrl,
status: 307,
statusText: "Temporary Redirect",
headers: { location: nextUrl, c: "d" },
})
.mockResolvedValueOnce({
ok: true,
url: nextUrl,
status: 200,
statusText: "OK",
headers: { a: "b" },
});
expect(await visit(validUrl)).toEqual([
{
ok: false,
url: validUrl,
status: 307,
statusText: "Temporary Redirect",
headers: { location: nextUrl, c: "d" },
redirected: true,
},
{
ok: true,
url: nextUrl,
status: 200,
statusText: "OK",
headers: { a: "b" },
redirected: false,
},
]);
expect(fetcherSpy).toHaveBeenCalledTimes(2);
expect(fetcherSpy.mock.calls).toEqual([[validUrl], [nextUrl]]);
});
Code language: TypeScript (typescript)
There’s a lot happening here, so let’s break it down.
Like in previous tests, jest.spyOn(Fetcher, "fetcher")
creates a spy on the fetcher
method of the Fetcher
object.
We said above that the way the visit
function should work is by recursively calling itself.
We’re going to set up our test so that we make an initial call to visit
with the URL in the variable validUrl
. This happens on line 22.
Internally, this will call our fetcher
.
By using the chained syntax on lines 7 through 20, we tell our fetcherSpy
how it should respond to the first call to fetcher
(lines 7-13), and then how it should respond to the second call (lines 14-20).
In the first mocked response we will return some fake, but real looking data – the most important of which to this particular test is the location
header on line 12.
Internally our visit
function will need to be updated to contain logic that says, hey, I just noticed a location
header, let’s use that as the new URL and call the visit
function again.
On line 41 we assert that we did indeed recursively call the visit
function twice, as the fetcher
is invoked once per visit
.
expect(fetcherSpy).toHaveBeenCalledTimes(2);
Code language: TypeScript (typescript)
On line 42 we explicitly check that our fetcher
function was called with the expected URLs.
expect(fetcherSpy.mock.calls).toEqual([[validUrl], [nextUrl]]);
// a good way of figuring out this stuff is to:
console.log(fetcherSpy.mock.calls);
// add this in your unit test code, and it will dump out as part of your test output
Code language: TypeScript (typescript)
And on lines 23-38 we cover off the anticipated response data that we should get from our visit
function, if everything behaves the way we would like.
[
{
ok: false,
url: validUrl,
status: 307,
statusText: "Temporary Redirect",
headers: { location: nextUrl, c: "d" },
redirected: true,
},
{
ok: true,
url: nextUrl,
status: 200,
statusText: "OK",
headers: { a: "b" },
redirected: false,
},
]
Code language: TypeScript (typescript)
Right now though, this test fails:
We can no longer get away with hardcoding the redirected
value to false
.
Let’s work now to make this pass.
Implementing The Recursive visit
Call
Here’s a first pass at making this test go green:
try {
const result = await fetcher(href);
if (result.headers.location) {
const updatedRequests = [...requests, { ...result, redirected: true }];
return await visit(result.headers.location, updatedRequests);
}
return [
...requests,
{
...result,
redirected: false,
},
];
} catch (e) {
// removed for brevity
}
Code language: TypeScript (typescript)
There’s two changes here.
The first is that if we got a location
header on the response then:
- Create a new array of
updatedRequests
by taking any previousrequests
, and adding in the currentresult
along with aredircted
value oftrue
. We know this must be true, or we wouldn’t have thelocation
header. - Then, call the
visit
function again recursively.
The second is that I forgot to include any previous requests when returning the result if we didn’t redirect.
That should now pass:
I’m really not keen on that conditional. It spills out implementation details from the fetcher
in a way that makes me unhappy.
We could refactor that, but let’s try using the implementation right now and see if it really does work.
Road Test
Previously we set up our code so that we can either provide a URL from the command line, or it will call the default:
// index.ts
import { visit } from "./visit";
const defaultHref = "https://codereviewvideos.com";
const url = process.argv[2] ?? defaultHref;
(async () => {
try {
const journey = await visit(url);
console.log(journey);
} catch (e) {
console.error(e);
}
})();
Code language: TypeScript (typescript)
We will need to compile and run the code.
// from your project root dir node ./node_modules/.bin/tsc
This should, if you are following along with the way I’ve been working, spit out lots of JavaScript files in your ./dist
directory.
You can then call the index.js
file using Node.
I will use:
node dist/v2/index.js https://codereviewvideos.com/typescript-tuple
Code language: Shell Session (shell)
It’s a little hard to see.
What’s happening here is…
It is working!
Hurrah.
However, it sort of … hangs.
Whilst the code does what we expect, it doesn’t exit / return in a timely manner.
That’s one out of one. What if we try another URL:
node dist/v2/index.js https://aka.ms/new-console-template
Code language: Shell Session (shell)
Well, that doesn’t work, even though it looks like it should:
That one is interesting because I never expected that a location
would not be a fully qualified URL:
https://aka.ms/new-console-template
location: 'https://learn.microsoft.com/dotnet/core/tutorials/top-level-templates'
location: '/en-us/dotnet/core/tutorials/top-level-templates'
OK, so it kinda works. But there are bugs. Let’s continue on, and fix them.