There is a great gem called html-proofer by Garen Torikian that will check html files for a bunch of things like: invalid markup, broken images, dead hyperlinks, bad favicons, etc.
When html-proofer checks for valid external links it actually makes external HTTP calls using Typhoeus. It determines whether a link is external or not by looking for http in the url. For internal links it simply checks whether the file exists or not.
This means, throughout your site, you should use relative links for internal links so that when html-proofer runs it will not make external network calls and slow down the test build. This is also a good practice anyway and because it keep your html pages smaller.
Canonical Link Issue
The issue is that there is a place where the use of absolute urls is appropriate. Google recommends that you use absolute urls to help them identify the canonical url for a piece of content.
Avoid errors: use absolute paths rather than relative paths with the rel="canonical" link element.
Use this structure: https://www.example.com/dresses/green/greendresss.html
Not this structure: /dresses/green/greendress.html).
Google recommends placing them in the head link tag like this.
Doing what google recommends, html-proofer finds an error for any posts that have not yet been published on the live site. Here is an example:
There is a closed GitHub issue covering this canonical url issue. Understandably, support will likely not be added for this edge case. Fortunately a workaround for this issue is to use the html_swap option.
Using the html_swap option to swap out the domain of the site, in my case tongueroo.com, and replacing it with a blank string will make html-proofer think these are relative links and hence only check for the existence of the file versus making the actual HTTP request.