The following details technical SEO auditing processes using Screaming Frog and covers the following main areas:
- Technical SEO Elements
- Technical Auditing Process
- Using Screaming Frog
- Inconsistent Domains
- Orphan Pages
- Core Web Vitals
- Tech Audit Planning
SEO Tech Auditing
A technical SEO audit will analysis all relevant technical factors that influence the performance of a website. The goal of a technical SEO audit is to ensure that the answer to all of those Google searched questions and queries can be found in the content of your website, and nothing stands in the way.
How is this done within the world of technical SEO? Well, the marvellously crafted content sitting on your website that is waiting to inform and direct needs to be easy to locate and we need to allow a contextual flow to it within a sensible website structure, technical SEO works toward achieving the latter part.
The below article covers the main SEO technical elements, we then perform a tech SEO auditing process using Screaming Frog.
Technical SEO Elements
A good place to begin your investigation/analysis is with a technical SEO audit. The very purpose of the technical SEO audit is to discover hindrances resulting in the website’s content not being seen as important, authoritative, or useful.
Issues usually occur, from a technical perspective, when the website’s structure is poor; because of broken or missing links, redirects that have not been fixed overtime, along with other elements that we’ll look at later.
It is these type of issues that result in content being difficult to find quickly. Every redirect placed on a website counts as a crawl, they are still placed using a URL and bots will only crawl the website so many times daily and the easier it is for them to find the content, the quicker the website will be indexed.
Redirects can be placed for many reasons but ensuring the number of redirects is low or fixed quickly, will aid in the indexation process.
All websites will contain something called a head section, within the head section is where we place meta detail which summarises, at a top level, what the content of the page is about.
There are several elements we can place within the head section, from schema tags, bot directives along with more important areas which aid the ranking of a web page. The important areas for which consideration of how you want the page to be found are Title Tag and Meta Description tags. Both these tags can be populated with relevant and informative information that the potential customer can see from the search engines results page.
Schema, or structured data code used on web page elements, is an extension to existing HTML and allows us to give more detail of what page content is about.
Structured data can be applied to several elements on a website’s page, these elements include product detail, business information events and even further contact information for people featured on the website.
You can check on existing schema within GSC, or one of the many online validation tools to ensure any schema present across the website is operating as it should. If it’s not present, look for opportunities where it can be added and get to work placing it on site.
Multiple Page Versions
Google treats different URLs as unique pages; occasionally multiple URL versions may appear for the same page. The below are examples of how the same page maybe represented to search engines such as Google.
All the above URLs will show the exact same page, and all of them will be treated as unique URLs and indexed as such. However, with the page being unchanged, we now have multiple versions of the same page, resulting in the same page competing against itself, along with the website’s competitors.
It’s essential that we force a single preferred version for each URL to ensure duplication of this nature does not occur.
Page experience relates directly to page speed and is a core part of the Core Web Vitals process, which assesses page speed across all web pages on the website.
Web pages that are slower than competitors but have similar content factors, will underperform against them and so rank lower.
Core Web Vital processes focus on ensuring a good user experience, and so we want our website’s pages to load quickly so a good user experience can start.
Technical SEO Auditing Process
If the website has an existing GSC (Google Search Console) account, then the coverage area within GSC is a good place to begin the technical auditing process.
At present the coverage section covers 4 main categories; Error, Valid with Warnings, Valid and Excluded.
We’ll look at each section below:
If any errors are present then these should be the immediate focus upon which to resolve.
There are several types of errors which may flag here, such as Redirect Errors, 404 response codes, blocked URLs and any redirects which conflict URLs present in the XML sitemap. The process to fix these common issues are listed below:
Cause - Redirects which point to pages that no longer exist.
Solution – Either update the redirect to point to the correct page or remove the link if feasible.
Cause - Pages that are blocked via the Robots txt file or have robot directives on the page to NoIndex but are live within the XML sitemap.
Solution – If you do not want the page to be indexed then remove it from the XML sitemap, if you do then remove the block directive from the Robots txt file.
Submitted URLs 404’ing
Cause - A URL in the XML sitemap is present, but the page is not live on the website.
Solution – Place a redirect on the URL to a valid page and remove it from the XML sitemap.
Submitted URL Redirected
Cause - URLs included in a XML sitemap but are being redirected.
Solution – Remove the URL from the XML sitemap.
The above issues are easily resolved and once the work is done mark the error as fixed in GSC. Google will feedback as to whether is fixed and still present.
Valid with Warnings
Errors found in this section stem from one issue where you may see ‘Indexed Though Blocked by Robots.txt’.
This occurs when pages that are already indexed have a blocking directive placed up on them. There are several reasons this may occur, for example if pages are launched from a staging area but aren’t meant for indexation, so the directive maybe placed later after indexation has occurred.
Resolving this is done by using the URL Removal Tool in GSC, place the URLs you want removing in the tool and submit them.
There are two grouped details that you’ll usually find in this section –
Submitted and Indexed
URLs listed here are fine, ensure the count number is similar in size to the website’s page count. If it’s way off, double check the XML sitemap, or sitemaps if a WordPress plugin is being used.
Indexed, not submitted in sitemap
URLs here could be ones that lead to pdfs, or other pages that aren’t required to be placed in the XML sitemap. If URLs are listed that should be in the XML sitemap, then check it and update it to reflect the website’s pages.
URLs listed here are not being shown to visitors, some of these may not need actioning but it’s worthwhile investigating the following elements:
When Google finds a web page which isn’t present in the XML sitemap and returns a 404 it’s URL will be present in this section. Resolve this list by ensuring 404s are removed by redirecting links to the correct page, or more suitable page.
This section is not updated often, you may find previously fixed URLs are still present, be patient and they will eventually resolve provided redirects have been placed correctly.
Soft 404s occur when a page returns a 200-status code, but the page is of poor quality (low word count, html errors). Either remove these pages or redirect the links to them to a more suitable page.
Discovered Not Indexed
URLs here are usually from new pages, check the list and if any pages are present, you deem important, click on them, and submit them via GSC.
Crawled Not Indexed
URLs listed here are still under consideration by Google, as it decides what is best to do with them.
Crawl the Website
A paid version of screaming frog is required for this next section, this tool will aid in gathering further page speed insights and crawl data.
Initially, run the crawl and list all URLs in the Internal tab (defaults to this tab) and let’s have a look for any Crawl Errors.
Click on the toggle button to the right of the search bar (top right of the output lists). Change first box to Status Code, next to Equals, then enter ‘404’ and hit enter.
You will now get a full list of pages that are returning 404 error pages. Highlight all of these in SF and in the second pane select ‘Inlinks’ (below output view) and this will list pages that point to the 404-error page.
The URLs in the second pane listed are URLs which either need to be redirected or internal links that need to be updated.
Export this data:
This data now lists URLs which contain links that point to 404 error pages, the list is split between images and hyperlinks with Anchor detail given where present on the page.
Internally Redirected Links
Edit the filter, like above but in the second pane, select All Link Types above the list and select ‘Hyperlink’ this will give us a list of the actual links on the website that redirect to another page.
Export this list of URLs which link to redirected pages, so the links can then be updated to point to their actual destination without any redirect implementation.
The above exported list will contain URLs ‘From’ which is the URL containing the link, which is redirecting, and a list ‘To’ which lists the URLs the link point to but is being 301’d when clicked and landing on another URL.
It would be helpful to have a list of the actual targets for these links to, so we know where to update the redirects to.
Identifying inconsistent domain convention that are created by internal links not following a consistent and relevant process may lead to link equity not being as impactful as it could be.
Domains have several versions so internal links should match the URL consistently and as it appears in the URL address bar.
Examples of inconsistent domains:
Using Screaming Frog to root out these inconsistencies is achieved by removing all filters from the main ‘Internal’ tab, then click on the ‘Address’ column, you’ll then find all the URLs are listed alphabetically.
Orphaned pages are web pages which are present and crawlable but have no internal links pointing to them from any area of the main website.
Screaming Frog can help in discovering orphaned pages, but at times they may not find all of them. To help with discovering all orphaned pages use GSC too, and other crawling tools you have at your disposal.
With your crawled site in Screaming Frog, select ‘Crawl Analysis’, select; ’Search Console’ view in the right panel and you will see the list of any orphan pages.
SEO Content Meta Elements
Screaming also provides key meta data detail that can exported or viewed from within the Screaming Frog console.
Within the right panel scroll down, once the crawl has been performed, to the Page Titles section. Here you will find the important on-page ranking factors are listed.
Core Web Vitals
For more detail about performing Core Web Vitals processes and the advanced detail check out our blog post – Core Web Vitals explained.
Within the above Core Web Vitals post, I go into detail but for a quick round-up:
Hook Screaming Frog with Google’s Page Speed API, results are similar to what is provided in GSC, but by exporting this data from Screaming Frog you can create a working document and filter to areas where issues exist.
The report contains information detailing Core Web Vital elements, such as slow LCP, or CLS issues. Images will also be detailed along with the specific issues relating to them.
With the above technical audit performed on the website you are working with; you will now have a list of issues upon which you can now list in order of priority.
I use four levels of importance, with Critical being a severe issue, followed by high, medium then low.
Critical issues are typically flagged when areas or the whole of the website is being prevented from crawls, and pages are not being indexed because of this.
A good summary may look like this for the other levels of importance:
- Priority High: issues classed here are usually negatively impacting the website
- Medium: issues maybe neutral and are not typically helping website performance
- Low: these could be the vanity points, though helpful to implement these are not critical to current website performance
The above detail is perfect for when you are required to perform a technical audit against most websites. Many eCom websites utilise faceted navigation. This topic is for another post or page and I’m working on that now.
I will also be producing a pdf version of the above, which will contain a live report and a practical example. So check back soon!
If you require help or need a technical SEO audit performing, then please get in touch.Contact Matt
Absolutely no obligation with this, I'll not bombard you afterwards with work needs. You'll get a free report on your websites current state, if you want to action any findings or question any aspects then I'm sure you will be back in touch.