Code Review Videos > Broken Link Checker > C# > The Proof of Concept

The Proof of Concept

After working on the TypeScript code for long enough to have an idea of how I might end up solving the problem, I couldn’t help switch to Rider and have a bash at writing a C# implementation.

This was my first attempt at writing real C# code that might be similar in complexity to a typical sort of side project problem I would usually solve in Node / JavaScript / TypeScript.

I have yet to learn how to write unit tests that involve mocks for C# code, so that too is another part of this learning process and will come in the next sections.

Whilst I know this code isn’t the best, I am nevertheless pleased with producing something that works and (mostly) solves a problem in a language I am still very unfamiliar with.

Here’s what I came up with:

using System.Net.Http.Headers;
using System.Text.Json;
using LinkVisitor;


// var href = "http://codereviewvideos.com";
// var href = "tel:+1-303-499-7111";
// var href = "bad input";
// var href = "mailto:someone@example.com";
var href = "https://aka.ms/new-console-template";

var client = HttpClientFactory.Create();
var linkVisitor = new LinkVisitor.LinkVisitor(client);


var x = await linkVisitor.Visit(href);
Console.WriteLine(
    JsonSerializer.Serialize(
        x,
        new JsonSerializerOptions
        {
            WriteIndented = true,
            PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
        }
    )
);


// Console.WriteLine($"URL: {href}");
// Console.WriteLine($"Status: {(int)response.StatusCode}");
// Console.WriteLine($"Status code: {response.StatusCode}");
// Console.WriteLine($"IsSuccessStatusCode: {response.IsSuccessStatusCode}");
// // Console.WriteLine($"Redirected: {response.}");
// Console.WriteLine("Headers:");
// foreach (var header in response.Headers)
// {
//     Console.WriteLine($"{header.Key}={header.Value.First()}");
// }


namespace LinkVisitor
{
    public static class HttpClientFactory
    {
        public static HttpClient Create()
        {
            return new HttpClient(new HttpClientHandler()
            {
                AllowAutoRedirect = false
            });
        }
    }

    public struct VisitedUrl
    {
        public string Url { get; }
        public int Status { get; }
        public string StatusText { get; }
        public bool Ok { get; }
        public bool Redirected { get; }
        public SortedDictionary<string, string> Headers { get; }

        public VisitedUrl(string url, int status, string statusText, bool ok, bool redirected)
        {
            Url = url;
            Status = status;
            StatusText = statusText;
            Ok = ok;
            Redirected = redirected;
            Headers = new SortedDictionary<string, string>();
        }

        public VisitedUrl(string url, int status, string statusText, bool ok, bool redirected,
            HttpHeaders httpHeaders) : this(url, status, statusText, ok, redirected)
        {
            Headers = new SortedDictionary<string, string>();
            foreach (var header in httpHeaders)
            {
                Headers.Add(header.Key.ToLower(), header.Value.First());
            }
        }
    }

    public class LinkVisitor
    {
        private readonly HttpClient _httpClient;

        public LinkVisitor(HttpClient httpClient)
        {
            _httpClient = httpClient;
        }

        public async Task<List<VisitedUrl>> Visit(string href, List<VisitedUrl> visitedUrls = null)
        {
            var url = new Uri(href);

            visitedUrls ??= new List<VisitedUrl>();

            if (!Array.Exists(new[] { "http", "https" }, schema => schema == url.Scheme))
            {
                visitedUrls.Add(
                    new VisitedUrl(
                        href,
                        -1,
                        $"Unsupported protocol: \"{url.Scheme}\"",
                        false,
                        false
                    )
                );
                return visitedUrls;
            }

            var response = await _httpClient.GetAsync(href);
            var newLocation = response.Headers.Location;
            var redirected = newLocation != null;

            visitedUrls.Add(
                new VisitedUrl(
                    href,
                    (int)response.StatusCode,
                    response.StatusCode.ToString(),
                    true,
                    redirected,
                    response.Headers
                )
            );


            if (newLocation == null)
            {
                return visitedUrls;
            }

            return await Visit(newLocation.ToString(), visitedUrls);
        }
    }
}
Code language: C# (cs)

This code works, but it has several bugs I believe I have now solved in the TypeScript variant.

There are a few things that I am confused by and / or need addressing, which I hope will be fixed (or at least, better understood) once I create a test driven implementation.

Command Line Usage

At this point I was / am still none the wiser as to how to receive input from the command line such that I can pass the URL in when running the program outside of the IDE. For that matter I have no idea actually how to run the built project outside of Rider. All things yet to learn.

This was actually the very last thing I added in on the Node / TypeScript approach, so I am happy to leave all of this until the very end.

JSON Output Makes Rider Unhappy

Looking into how to make C# output JSON, I found several different approaches for C#. I remember one called Newtonsoft / JSON.net. There was another, which after some digging, appeared to be for much older versions of C#.

And then there is JsonSerializer.Serialize.

By default, the property names created are in PascalCase. God knows why. I’ve come across API’s out there in the real world – thankfully not many – that use this approach and I guess it is a dead giveaway that someone is using the Microsoft stack behind the scenes.

Basically this:

{
    "Url": "https://learn.microsoft.com/dotnet/core/tutorials/top-level-templates",
    "Status": 301,
    "StatusText": "Moved",
    "Ok": true,
    "Redirected": true
}

// versus

{
    "url": "https://learn.microsoft.com/dotnet/core/tutorials/top-level-templates",
    "status": 301,
    "statusText": "Moved",
    "ok": true,
    "redirected": true
}Code language: JSON / JSON with Comments (json)

Also the default output is not pretty printed. Which is to say { "it": "is", "all": "inlined" }.

Fortunately sorting both of these issues is fairly obvious (and verbose):

Console.WriteLine(
    JsonSerializer.Serialize(
        myCSharpVariableGoesHere,
        new JsonSerializerOptions
        {
            WriteIndented = true,
            PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
        }
    )
);Code language: C# (cs)

The problem is that in order to get JsonSerializer to output data, I needed to explicitly define which properties I wanted to be output. It won’t just output / serialize everything.

OK, I quite like that.

This meant making properties public and providing a get method. Again, very easy to do, but then Rider gets a bit unhappy:

Because the getters are never explicitly used in my code, Rider isn’t able to understand that they are used when serializing the struct data to JSON.

It’s not the end of the world, and I wonder if this problem goes away when test code is written to interact with the struct… but right now it does irk me that things appear wrong.

Structurally Sound?

Speaking of struct, is that the correct type to use here?

I’m not intending to persist this data off to anywhere. Really these structures are just to hold data for the length of the program invocation.

Prior to this exercise I wasn’t even sure struct‘s could have a constructor. And yet here I have two.

One is for the simplest of scenarios where no headers are present at the point of creation. This would be for an error scenario.

The other is more involved and transfers the given HttpHeaders into a SortedDictionary, which then gives lovely alphabetical output in the resulting JSON.

So I feel this is right, but I’m not 100% sure.

I Miss Type Aliases

An annoyance already, but potentially an indication I am “doing it wrong”.

My gripe here is that, for example, I’ve declared the following:

public SortedDictionary<string, string> Headers { get; }Code language: C# (cs)

Fine.

Now, I then create a new instance of this SortedDictionary in my constructor code:

public VisitedUrl(string url, int status, string statusText, bool ok, bool redirected,
    HttpHeaders httpHeaders) : this(url, status, statusText, ok, redirected)
{
    Headers = new SortedDictionary<string, string>();
    foreach (var header in httpHeaders)
    {
        Headers.Add(header.Key.ToLower(), header.Value.First());
    }
}
Code language: C# (cs)

If the implementation changes to SortedDictionary<string, int> I now have to update two places.

In TypeScript I could create a type alias for this:

type MyAwesomeHeaders = Record<string, string>

And then I could refer to that type rather the more specific definition:

const headers: MyAwesomeHeaders = { ... }Code language: TypeScript (typescript)

I tried something which didn’t work:

public struct VisitedUrlHeaders
{
    public List<KeyValuePair<string, string>> headers { get;  }

    public VisitedUrlHeaders(HttpHeaders httpHeaders) : this()
    {
        headers = new List<KeyValuePair<string, string>>();

        foreach (var header in httpHeaders)
        {
            headers.Add(new KeyValuePair<string, string>(header.Key, header.Value.First()));
        }
    }

    public VisitedUrlHeaders()
    {
        headers = new List<KeyValuePair<string, string>>();
    }
}

public struct VisitedUrl
{
    public string Url { get;  }
    public int Status { get;  }
    public string StatusText { get;  }
    public  bool Ok { get;  }
    public  bool Redirected { get;  }
    public  VisitedUrlHeaders Headers { get;  }

    public VisitedUrl(string url, int status, string statusText, bool ok, bool redirected, VisitedUrlHeaders headers)
    {
        Url = url;
        Status = status;
        StatusText = statusText;
        Ok = ok;
        Redirected = redirected;
        Headers = headers;
    }
}Code language: C# (cs)

But what happened here to make it “not work” was that, when serialized to JSON, I got:

{
    "url": "https://learn.microsoft.com/dotnet/core/tutorials/top-level-templates",
    "status": 301,
    "statusText": "Moved",
    "ok": true,
    "redirected": true,
    "headers": {
       "headers": [
           ...
        ]
    }
}
Code language: JSON / JSON with Comments (json)

As best I could work out this was because the headers property existed on both, so the output was right… but also wrong from my point of view.

I couldn’t figure out how to treat the nested data as though it were part of the first structure. I’m sure it’s possible, but I am also guessing it’s not the right approach.

For my simple problem I could get away with a simpler data structure, but on a larger project that likely wouldn’t be an acceptable solution.

Default To null?

On the first call to the LinkVisitor.Visit method the List<VisitedUrl> visitedUrls variable will be … well, it will be something.

But what?

An empty List would be one possible thing it could be.

Or, failing that, null.

I couldn’t figure out how to inline the creation of a new, empty List, so went for null instead.

But this creates problems:

The yellow background text, when hovered gives:

Cannot convert null literal to non-nullable reference typeCode language: JavaScript (javascript)

I’m not entirely sure what that means.

Interestingly though, it does seem to work.

Then inside the code I need to null check. If visitedUrls is null then I want to instantiate a new List...

But when I do this, the code is greyed out – as though the code will never run. Again, hovering over this code inside Rider gives an error:

'??' left operand is never null according to nullable reference types' annotationsCode language: JavaScript (javascript)

I’m unsure what is happening here.

Possibly though, all of this goes away if I do away with the recursive implementation.

And that leads me on to…

Tail recursive call can be replaced with loop

No doubt about it, I felt 🤓 pretty smart 🤓 when I created a recursive solution to the problem in the TypeScript code.

WebStorm didn’t complain, either.

But when I repeated this approach in C#, Rider was far less impressed:

The refactoring suggestion is:

Tail recursive call can be replaced with loopCode language: JavaScript (javascript)

OK, fine.

It turns out that people far smarter than I could ever hope to be have less positive feelings about recursion. I was really surprised by this as recursion is a huge part of languages like Elixir, and learning that language (even to my basic level) was what really got me looking for recursive solutions to problems in the first place.

If I follow the suggested refactoring I get this:

public async Task<List<VisitedUrl>> Visit(string href, List<VisitedUrl> visitedUrls = null)
{
    while (true)
    {
        var url = new Uri(href);

        visitedUrls ??= new List<VisitedUrl>();

        if (!Array.Exists(new[] { "http", "https" }, schema => schema == url.Scheme))
        {
             visitedUrls.Add(
                new VisitedUrl(
                    href,
                    -1,
                    $"Unsupported protocol: \"{url.Scheme}\"",
                    false,
                    false
                )
            );
            return visitedUrls;
        }

        var response = await _httpClient.GetAsync(href);
        var newLocation = response.Headers.Location;
        var redirected = newLocation != null;

            visitedUrls.Add(
                new VisitedUrl(
                    href,
                    (int)response.StatusCode,
                    response.StatusCode.ToString(),
                    true,
                    redirected,
                    response.Headers
                )
            );

        if (newLocation == null)
        {
            return visitedUrls;
        }

        href = newLocation.ToString();
    }
}
Code language: C# (cs)

And that while(true) block scares me.

Yet the code does work. Although truthfully, I’m not sure why or how. The return statements must cancel the loop somehow… honestly, I’d need to dig into that further to grasp it. I tend to stay well clear of while and do because I’ve murdered servers in the distant past by abusing them.

Anyway, that’s all the interesting stuff.

Time to get testing!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.