How (and why) to find and fix your duplicate content
Quentin Muhlert
Nov 12, 2024
If you’re getting started in digital marketing, you may have heard about duplicate content in a few different contexts. It’s often mentioned in discussion of so-called keyword cannibalization — when two pages on your site contain similar, though not identical content, and therefore “compete” for the same search volume to the detriment of your site as a whole.
Here, on the other hand, we’re talking in more general terms, and typically about one-for-one duplication, or “substantive blocks” of content that match closely, as Google puts it — rather than pages that simply serve similar purposes.
At one point, it was a common black-hat SEO practice to copy content from competitors, though this is rarely an issue today. Rather we commonly see issues with multiple versions of the same page that haven’t been handled correctly; using the example of a T-shirt, we might see separate pages for each color, or additional versions for regions with the same language but different currencies. It’s easy to see how, in cases like this, extremely similar content can begin to multiply across your site.
So why do we consider this a problem? Google attempts to provide its end users with the best possible search experience. So if you search for “yellow T-shirt,” it won’t serve you the same page three times — once with US-dollar pricing, once with Canadian dollars, and once with British pounds. Google will rank just one of these pages, but which one? Their solution sidesteps the question and simply prioritizes pages with unique information. In extreme cases, a duplicate content may be seen as an attempt to manipulate seraph rankings, leaving the entire site liable to be deranked.
We therefore need to be aware of and minimize the amount of duplicate content on our site. But it’s not only product variations and regional pricing that causes issues. Here’s a quick list of some of the most common, and sometimes less obvious, sources of duplicate content we’ve seen over the years:
Written a guest article somewhere? Many authors choose to publish the same content on their own site — consider just linking instead.
Parameter handling problems: as with the T-shirt example above, it’s easy to end up with more (sometimes many more) pages than you need for what is effectively one piece of content.
Canonical problems: we’ll dig into this a little later on, but here’s Google’s guide to resolving canonical issues when they occur.
Page structure problems: For example, having the same content in pages that live in different categories, e.g. exampleshop.com/accessories/socks and exampleshop.com/undergarments/socks.
URL rewriting problems: Do you use a slash? Include the “www”? Use https or http? Consistency matters here.
No proper HrefLangs: Have pages for different countries, especially those with the same language? You’ll need a proper internationalization strategy. HrefLang tags should be in place on every page on your site, or else Google won't know which countries to rank the pages in.
Two obvious questions remain: how can we find all of the instances of duplicate content on our sites, and what can we do about them.
The first is a little simpler to answer. First, you can use Screaming Frog to crawl your site and sniff out any duplicate content for you. To do this effectively, you’ll need to tell it to search for near-duplicates as well as exact matches. You can find out more about this process in Screaming Frog’s own tutorial.
Secondly, we can monitor the Page Indexing Reports in Google Search console and look out for new pages or an increase in pages in the “Duplicate without user-selected canonicals” report. Essentially, this means that Google has found duplicate content, and made a guess about which one is the original. If it hasn’t identified the correct page, you can explicitly mark the canonical yourself, or better yet solve the problem by making sure the pages in question are substantially different from one another. Get an overview of the full process in Google’s guide to Page Indexing Reports.
Ultimately, fixing issues with duplicate content can be tricky, and exactly which processes you need to follow are going to depend heavily on your site and the precise problems you’re encountering. As with many issues, general maintenance is important, with an ounce of prevention being worth more than a pound of cure in our experience. It’s good practice to regularly keep an eye on the general health of key issues like canonicals, as well as parameters, URL writing, and so on. We’d recommend that you crawl your site at regular intervals and also create a schedule for looking over GSC reports. Doing so will help you get on top of issues as they emerge, saving time and effort, and avoiding bigger problems down the line.
Of course, if you find yourself in the realm of said bigger problems, or just want to ensure you never reach that point, it may be wise to call in the professionals. And the friendly, experienced team here at Muhlert Digital are always happy to help.