Duplicate Content

Duplicate Content refers to identical (or nearly identical) content that appears on domains under more than one URL. When this occurs, rather than showing the same content multiple times, search engines will choose one version to serve, which may not always be the original or "best".

Overarching Types of Duplicate Content

Exact Duplicates

A page that is 100% identical to another page. They only differ by URL.

Near Duplicates

A page that differs from another page by a small amount. For example, a block of text, image, order of content, etc.

Cross-Domain Duplicates

Exact or near duplicates that appear across domains or subdomains.

Duplicate <title> Tags

Two or more pages containing the exact same page title.

Impacts of Duplication

  • Without unique content, it is difficult for a search engine to accurately understand what version of the duplicate content is the most authoritative. Search engines don't know which version to either include or exclude from their indices.
  • It also represents lost opportunity since unique content is a key factor for SEO and rankings.
  • Some duplication may also be viewed as "thin content" or a site trying to game the algorithm.
  • Dilution of valuable backlink equity from external websites, which will cause equity to potentially flow to duplicate pages. This increases the potential page authority and competition between the authoritative and duplicate pages.

Sources of Duplication

  • Pagination
  • Duplicate URLs (session IDs and tracking parameters)
  • Facets
  • Sorts
  • Cross-categories (example: Having one Jeans page listed under the Pants category page and having another Jeans page listed under the Denim category page.)
  • Cross domain and sub-domain duplication
  • Protocol (HTTP vs. HTTPS URLs) and non-www versions of the domain
  • Printer-friendly and/or mobile-friendly versions of content

Identifying Duplicate Content

  • Searching via site:searches
  • Google Search Console under Search Appearance > HTML Improvements, looking specifically for duplicate page titles and meta descriptions.

  • Navigating through the domain to find duplicate forms of the same content
  • Crawling the site using SEO tools, such as ScreamingFrog or DeepCrawl, to identify duplicate page titles and/or content.

Possible Duplication Solutions

  • 301 redirects
  • rel="canonical" tags
  • Meta robots tags (i.e. noindex,follow or noindex,nofollow)
  • Robots.txt disallow statements
  • rel="prev/next" tags (for pagination)
  • Google & Bing parameter handling features (in Google Search Console and Bing Webmaster Tools)

When to use Canonical tags vs. 301 Redirects?

  • Both canonical tags and 301 redirects transfer equity from one URL to another and prevent the first URL (that has the tag or that redirects) from being indexed. However, with canonical tags, pages can still be accessed by users (and crawlers), while redirects will send clients to the other page.
    Therefore, when it comes to resolving duplication, the answer to the following questions will trigger the choice of implementation: do the duplicate pages need to be accessed by users? Do they serve a specific purpose, such as tracking sessions or sorting the products? etc.
  • If used, canonical tags should be placed on duplicate pages and point to the authoritative URL for the content.
  • If canonical tags are used, self-referential canonical tags can (and should) be placed on authoritative pages.
  • Redirects should be utilized when there are a significant amount of instances (duplicate pages) or equity to be salvaged. For example, if a site has duplication caused by using both lowercase and uppercase letters in URLs, it would be recommended to 301 redirect all URLs using uppercase to lowercase and update internal linking to point to the authoritative pages.