Tags: CMS12 Optimizely/Episerver

Optimizely CMS Link Validation Job now finally uses GET instead of HEAD!

Over the years I have spent a lot of time with different Optimizely (previous Episerver) CMS sites, some with several hundred thousand content items. As the number of content items increases, it becomes harder to keep the number of broken links at zero!

One factor that has made this even more challenging is that the Link Validation Job uses only HTTP HEAD to check external links, and not all websites will respond to HTTP HEAD requests. Because of this the Link Status report will contain a lot of links that are not actually broken.

Optimizely World is one of the many websites that refuse to respond to HTTP HEAD, and because of this all links to Optimizely World have been marked as broken on this blog – until today!

Sometime during May 2023, I had a conversation with a nice guy at Optimizely Support regarding different ways to improve the link validation in Optimizely CMS, and the result was bug CMS-27283. For the record, I did not write the description.

Today the update is live with the release of EPiServer.CMS.Core 12.15.1. The Link Validation job will now always use HTTP GET to check external links. I updated my site, ran the link validation job again, and almost all broken links disappeared from the report!

In order to understand how this works, I looked at the code in LinkValidator, found in the EPiServer.LinkAnalyzer.Internal namespace, and noticed that it's always HTTP GET, and not just HTTP GET as a fallback if HTTP HEAD fails.

HttpRequestMessage request = new HttpRequestMessage(HttpMethod.Get, url);

To make sure I didn't miss anything, I also fired up ncat in an Ubuntu Terminal using WSL2, with the following command.

ncat -l 8080 --keep-open

I then added a link to http://localhost:8080 on a page and ran the Link Validation job from my local computer. This was the output.

GET /robots.txt HTTP/1.1
Host: localhost:8080
User-Agent: EPiServer Link Checker
traceparent: 00-1fb46d709816cdf2a351b463d0cdcf5d-ef8898c4ddcd6edb-00

GET / HTTP/1.1
Host: localhost:8080
User-Agent: EPiServer Link Checker
traceparent: 00-1fb46d709816cdf2a351b463d0cdcf5d-3206e52890bc1beb-00

As we can see, the Link Validation job first checks robots.txt to make sure it's allowed to check the link, and then it fires off an HTTP GET request.

Nice work, Optimizely! 🎉