We now have:
- a way to make HTTP requests
- a means of validating URLs
- and a way to model bad requests
That should be enough to make a first stab at creating a real, working, well tested bit of software.
Let’s have a go.
Entry Point
In the proof of concept we slapped everything right there inside the index.ts
file.
That does work, but it locks us in to consuming our link checker in a very specific way. A better approach would be to think of the linker checker as a kind of library. A standalone package almost, that we can call from the command line, or a REST endpoint, or via a GUI, or really anything that we need.
In this particular instance I do want to create a command line app.
I want to allow the user to provide a URL when they run the script, and that URL will be used as the link to be checked.
We can achieve this using Node’s process.argv
array:
// index.ts
import { visit } from "./visit";
const defaultHref = "https://codereviewvideos.com";
const url = process.argv[2] ?? defaultHref;
(async () => {
try {
const journey = await visit(url);
console.log(journey);
} catch (e) {
console.error(e);
}
})();
Code language: TypeScript (typescript)
There are extra libraries you can use to make working with command line arguments far nicer.
But really all we care about is a string. It should be a valid URL, but it might not be. Fortunately, we have that covered.
Let’s step through, line by line, and cover off what’s happening.
Line 3
We’re going to put our URL checker into a re-usable library. I’m calling that the Visitor, and it will export
a visit
function.
Therefore we are importing that function on line 3. It doesn’t exist yet, but we will fix that in a moment.
Line 5-7
We set up a default, known valid URL on line 5.
I’m going with the variable name of href
rather than url
, because it might not be a URL. There’s no true distinction here, I’m simply trying to keep the concepts distinct for human’s reading the code.
On line 7 we look at the command line arguments use when running the script index.ts
script.
In Node.js, process.argv
is an array that contains the command-line arguments passed to the script when it is run. The first element of the array is the path to the Node.js executable, and the second element is the path to the script file. The remaining elements of the array are any additional command-line arguments passed to the script.
For example, if you run the command node script.js arg1 arg2 arg3
in the command line, the value of process.argv
in the script file would be ['node', 'script.js', 'arg1', 'arg2', 'arg3']
.
Here we use nullish coallescing (??
) to say if there was no URL argument passed when calling the script then fall back to the defaultHref
.
Line 9-16
This code is using an immediately invoked async function expression (IIFE) to execute some asynchronous code.
The code inside the IIFE is an async function that is immediately invoked, so it runs as soon as the JavaScript engine encounters it. Inside the async
function, it’s using a try
/ catch
block to handle any errors that might occur.
The first line inside the try
block, const journey = await visit(url);
, is awaiting the completion of an asynchronous function visit(url)
, which is our as yet unwritten library code. The resolved value of this function is then assigned to the variable journey
.
The next line, console.log(journey)
will log out the value of the journey
variable to the console. This seems like a good way to ensure our application is behaving, for now.
The catch block catch (e) { console.error(e); }
, is catching any errors that occur in the try
block and logs them to the console.
No Tests
This index.ts
file is the only file I am not going to cover with unit tests.
The reason being is that I have further plans for this code and this will not be the eventual way I consume the visit
function.
But for now, this serves a good enough purpose.
Or to put it another way, if a call to index.ts
doesn’t work, then I’ll know straight away anyway, so I am happy to leave this uncovered.
Implementing Visit
From the proof of concept we know a few things about how we expect the visit
function to behave:
- Should fail and return early if:
- given a bad URL
- the URL is valid but does not begin with
http
orhttps
- Should fail and return a bad request object if the
fetch
process fails for an unexpected reason - Should return and exit the function if the current request was a
2xx
response code /ok
- Should call the
visit
function / itself again if the current requestredirected
Call me a pessimist, but I like to start with the unhappy paths when writing tests.
Mostly I write tests when I expect the software to be complex enough to have more than two outcomes. If the total outcomes are one happy path and one sad path, the likelihood of the software being useful enough to stick around for the long term, or serious enough to need a test suite in the first place, are both fairly low.
For real projects though, I try to think of everything that can (and likely will, at some point) go wrong.
And then I write the tests for them.
In the list above, I’d start with #1. How would that look?
Tests For Fail Early And Return
Here’s where I’d start:
import { visit } from "./visit";
const validUrl = "https://some.valid.url";
describe("visit", () => {
afterEach(() => {
jest.resetAllMocks();
});
test("should return early if given an invalid URL", async () => {
expect(await visit("")).toEqual([
{
status: -1,
ok: false,
redirected: false,
headers: {},
url: "",
statusText: "Invalid URL",
},
]);
});
test("should only proceed when working with http and https urls", async () => {
expect(await visit("tel:+1-303-499-7111")).toEqual([
{
status: -1,
ok: false,
redirected: false,
headers: {},
url: "tel:+1-303-499-7111",
statusText: `Unsupported protocol: "tel:"`,
},
]);
});
});
Code language: TypeScript (typescript)
There’s a bit of repetition in there.
Keen eyed readers may recall four of those properties from the badRequest
object we set up previously:
export const badRequest = {
status: -1,
ok: false,
redirected: false,
headers: {},
};
Code language: TypeScript (typescript)
And that raises the question of whether or not to spread in the ...badRequest
rather than hardcoding the values.
It’s a good question. On the one hand it reduces lines in the tests. But on the other I think I prefer being explicit as to exactly what shape I expect in the eventuality of a bad response.
So I’m going to stick with the explicit / verbose test cases, until such time as I change my mind or find reason to do otherwise.
First Pass At The visit
Implementation
With those two test cases, we can have a stab at making them pass by improving on the proof of concept code:
import { VisitedURL } from "./types";
import { badRequest } from "./bad-request";
import { isValidUrl } from "./is-valid-url";
export const visit = async (
href: string,
requests: VisitedURL[] = []
): Promise<VisitedURL[]> => {
if (!isValidUrl(href)) {
return [
...requests,
{
...badRequest,
url: href,
statusText: `Invalid URL`,
},
];
}
const hrefToUrl = new URL(href);
if (!["http:", "https:"].includes(hrefToUrl.protocol)) {
return [
...requests,
{
...badRequest,
url: href,
statusText: `Unsupported protocol: "${hrefToUrl.protocol}"`,
},
];
}
return [];
};
Code language: TypeScript (typescript)
This gives two passes.
Let’s cover the interesting lines.
Line 8
We defined the custom type of VisitedURL
in the proof of concept.
I’ve ported that concept to this implementation, copying the type to our types.d.ts
file:
export type VisitedURL = {
url: string;
status: number;
statusText: string;
ok: boolean;
redirected: boolean;
headers: Record<string, string>;
};
export type FetcherResponse = {
url: string;
status: number;
statusText: string;
ok: boolean;
headers: Record<string, string>;
};
Code language: TypeScript (typescript)
There’s some repetition creeping in there. I will come back to address that a little later.
What Line 8 says is we will return an array of VisitedURL
shaped objects.
Because this is an async
function, it must return a Promise
of some type of value. That could be a Promise<string
> if we were returning a string. Or a Promise<boolean>
if we return true
or false
.
But in this case it’s an array of VisitedURL
objects.
Line 9-18
Pretty straightforward.
if (!isValidUrl(href)) {
return [
...requests,
{
...badRequest,
url: href,
statusText: `Invalid URL`,
},
];
}
Code language: TypeScript (typescript)
We created the isValidUrl
in a previous step.
All we do here is make use of the boolean return value.
It either returns false
, and we return immediately, appending our current badRequest
to any existing requests
we are aware of.
Or, the URL is valid, and so this block is skipped over.
The one brain bender in this entire function is the use of recursion. All we need to know at this point is that requests
will always be an array.
It will either be an empty array (on the first invocation), or it will be an array of N elements, where N is the amount of times the function has previously been called.
It is empty on the first invocation because line 7 – requests: VisitedURL[] = []
– initialises it as an empty array.
Hopefully that makes sense.
Line 20
Line 20 is a little odd, in so much as we already did make a new instance of URL
as part of isValidUrl
.
That’s true.
const hrefToUrl = new URL(href);
Code language: TypeScript (typescript)
But in that case we threw away the outcome on success. Personally I think the creation of a new URL
is so cheap, computationally, that there is no harm in doing this.
Your opinion may differ. That’s fine. But for me, I’d rather an is...
function returned a boolean.
An alternative is that isValidUrl
returns false
or URL
. But I’m not a fan of that.
Easier to have a bit of repetition in the form of creating a new URL
, and at this point we absolutely know that this string will be a valid URL.
What we don’t know though, is whether that URL is one we can handle. That’s the job of …
Line 22-31
This is very similar to lines 9-18:
if (!["http:", "https:"].includes(hrefToUrl.protocol)) {
return [
...requests,
{
...badRequest,
url: href,
statusText: `Unsupported protocol: "${hrefToUrl.protocol}"`,
},
];
}
Code language: TypeScript (typescript)
We saw on line 20 we have a known valid string that will allow us to create a new URL
instance.
But that URL may have the wrong protocol
.
The only protocols we can visit are http
and https
. All others, of which there are many, are irrelevant to us.
Side note: is this a protocol, or a scheme? I looked into this for a while and found the answers inconclusive.
Basically if we get given a URI / href that looks like: “tel:+44-1772-112233” then we can’t really do very much with it. We can’t link check a phone number. It is a valid URI – try it, especially on your mobile phone, it will open up the dialing app – but for our purposes it’s basically unsupported.
Could we extract the logic though?
!["http:", "https:"].includes(hrefToUrl.protocol)
Code language: TypeScript (typescript)
We did this for isValidU
rl, so why not for checking the scheme / protocol?
Well, simply because I’m only doing it here. If this was needed elsewhere then extracting would be a good move. For now, it’s as easy to put this here as anywhere else. So here it stays.
Line 33
This one is temporary:
return [];
Code language: TypeScript (typescript)
We have our two if
conditionals, but our function’s type definition (as per line 8) says we must return an array of VisitedURL
s.
That array can be empty. So by returning an empty array we satisfy the functions type declaration, even though it’s not the true outcome we want.
That will come next.
Next Steps
That’s two tests passing and a working function.
But it’s far from complete, so let’s continue adding tests and logic.