What is Duplicate Content? Explaining SEO Penalties, Disadvantages, and Solutions

study

  •  

  • Duplicate content refers to the situation where multiple pages contain “exactly the same” or “highly similar” content.

    Many people may associate the term “duplicate” with “copied content” unlawfully taken from other sites.

    However, the duplicate content that poses an issue for SEO is not maliciously copied content but rather the kind of duplication that occurs naturally in site management.

    Also, many web administrators believe “duplicate content = penalty,” but in reality, there is no penalty for it.

    As such, “duplicate content” in SEO is often misunderstood.

    Since duplicate content can also affect SEO, web administrators should use this article to better understand how to handle it properly.

    Topics in this Article

    • What is Duplicate Content?
    • Is Duplicate Content a Penalty in SEO?
    • SEO Drawbacks of Duplicate Content [Why You Should Address It]
    • How to Deal with Duplicate Content
    • How to Check for Duplicate Content
    • Common Questions about Duplicate Content and SEO

    Web administrators with questions like “What are the standards for duplicate content?” or “Is it okay to reuse template text?” should definitely check this out.

    What is Duplicate Content?

    Duplicate content refers to the situation where multiple pages contain “exactly the same” or “highly similar” content.

    Duplicate content can occur both within your own site and between different sites.

    Google defines duplicate content as follows:

    When a single page can be accessed via multiple URLs or when content from different pages is similar (for example, when a page has both mobile and PC versions with different URLs), Google considers these pages as duplicate versions of the same page.

    Source: Consolidating Duplicate URLs and Using Canonical Tags | Google Search Central

    In simple terms, this is when “different URLs display the same (or similar) content.”

    As mentioned earlier, Google’s use of the term “duplicate” does not typically refer to malicious “copied content” or “spam content.”

    Duplicate content can be categorized into the following three types:

    Three Types of Duplicate Content

    • Completely Identical Duplicate Content
    • Partially Identical Duplicate Content
    • Cross-domain Duplicate Content

    Let’s explain each type.

    Completely Identical Duplicate Content

    “Completely identical” duplicate content occurs when two pages are 100% the same.

    The only difference is the URL.

    This often happens due to technical factors, like whether the URL includes “www” or whether it uses “http” or “https.”

    Partially Identical Duplicate Content

    “Partially identical” duplicate content refers to cases where two pages differ slightly in content.

    Only part of the text, images, or design differs. For example, product pages on an e-commerce site with only different images are considered partially identical duplicate content.

    Cross-domain Duplicate Content

    “Cross-domain” duplicate content refers to when the same content is published on different websites.

    For example, when content is distributed through another media platform (such as Yahoo! JAPAN) or when content is reposted without permission.

    Is Duplicate Content a Penalty in SEO?

    To get straight to the point, unless there is a spammy intent to manipulate search results, duplicate content does not incur a penalty.

    Google explains this in their official documentation as follows:

    It’s normal for sites to have some duplicate content, and this does not violate Google’s Spam Policies.

    Source: What is URL Canonicalization | Google Search Central

    When duplicate content is detected, Google will display one version as the “canonical” in search results while trying to avoid showing the other.

    This is not a penalty but rather an algorithmic adjustment.

    The idea is: “Having the same content in search results is inconvenient for users, so let’s focus on one version.”

    While a site that is largely made up of duplicate content can be problematic, penalties are not imposed for naturally occurring duplicate content on a typical site.

    Note: Malicious Copied Content is Subject to Penalty

    Unlawfully copied content (= copied content) is subject to penalties. Avoid mass-producing content purely for ranking purposes without adding unique value. Reposting content without permission is legally unacceptable.

    Is It Okay to Reuse Text and Images? [Does Duplication Have Negative Effects?]

    When managing a website, you often find yourself reusing some text or images.

    However, this type of duplication is not a penalty or a negative ranking factor.

    For example, even if you reuse text from one page on your homepage or another page, it won’t result in lower rankings.
    Reference: Google: Duplicate Content is Not a Negative Ranking Factor – Search Engine Journal

    However, it’s important to note that the more duplication there is, the less original content a page might have.

    As you know, Google highly values unique content and ranks it higher.

    When a page is created using partially duplicated content, there may be fewer original elements, making it harder to rank higher.

    Instead of viewing duplication itself as a negative, think of it as “duplicate parts are ignored, so it’s crucial to add unique elements beyond those parts.”

    SEO Disadvantages of Duplicate Content [Why It Should Be Addressed]

    As mentioned earlier, duplicate content does not directly lower your Google ranking.

    However, leaving it unaddressed can cause SEO issues.

    This section explains three reasons why addressing duplicate content is important for SEO.

    SEO Disadvantages of Duplicate Content

    • Disadvantage 1: Link equity gets diluted
    • Disadvantage 2: Wasting crawler resources
    • Disadvantage 3: Intended pages may not appear in search results

    Disadvantage 1: Link Equity Gets Diluted

    The first disadvantage is that link equity can become diluted.

    For example, let’s assume “Page A” and “Page B” are duplicate content.

    In this case, both “Page A” and “Page B” may acquire backlinks, which can split the link equity between the two.

    By designating one as the canonical page, you can consolidate the link equity and efficiently boost your Google ranking.

    Disadvantage 2: Wasting Crawler Resources

    The second disadvantage is wasting crawler resources.

    In SEO, it’s crucial to get new or updated content crawled as quickly as possible.

    If too much time is spent crawling duplicate content, it can delay the crawling of more important pages.

    While this may not be a problem for small sites, larger sites may experience reduced crawl frequency on important pages, causing indexing issues.

    Disadvantage 3: Intended Pages May Not Appear in Search Results

    The third disadvantage is that the intended page might not appear in search results.

    When Google detects duplicate content, it automatically selects one page as the “canonical” version, while the other version is suppressed from search results.

    If you want a specific page to appear in search results, you’ll need to perform “URL canonicalization” to notify Google of the preferred URL.

    What is URL Canonicalization?

    URL canonicalization is the process of selecting and consolidating the correct URL that Google should evaluate. Methods include 301 redirects and canonical tags.

    How to Handle Duplicate Content

    Here is the process for handling duplicate content.

    Process for Addressing Duplicate Content

    1. Identify duplicate content
    2. Select a canonical page
    3. Take necessary actions

    Here are five common methods for addressing duplicate content:

    Methods to Address Duplicate Content

    • Method 1: 301 Redirect
    • Method 2: Use of Canonical Tag
    • Method 3: Set Annotations
    • Method 4: Remove or Consolidate Similar Content
    • Method 5: Request Action from Content Distribution Platforms

    Not all methods are appropriate in every case; the right approach depends on the cause of the duplication.

    Make sure to review each and take appropriate action based on your site’s duplication issues.

    Method 1: 301 Redirect

    A 301 redirect is a process of forwarding to a specified URL.

    It is used when you want to display only the canonical page to users.

    For example, if duplication occurs due to the following reasons, you can use a 301 redirect to address it:

    • Presence or absence of “www”
    • “http” vs “https”
    • Presence or absence of “/index.html” at the end of the URL
    • Presence or absence of a trailing “/” at the end of the URL

    For instance, if there is duplication between “http://example.com” and “https://example.com”, you should redirect the “http page (non-canonical)” to the “https page (canonical).”

    Method 2: Setting Up Canonical Tags

    Canonical tags are used when you want to show both pages to users.

    By using canonical tags, you can leave both pages up while informing Google which page is the canonical one (the one you want to be evaluated).

    For example, use this when duplication occurs for reasons such as:

    • Highly similar product pages on an e-commerce site (e.g., different colors)
    • Different URLs for PC and mobile versions
    • Presence or absence of URL parameters
    • Having a print version of the web page

    Add rel=”canonical” to the non-canonical page and specify the canonical page.

    Method 3: Annotation Settings

    Annotation refers to the process of informing Google that different URLs exist for different devices.

    If your URLs differ between the PC and mobile versions, set up canonical and alternate tags.

    Here’s how:

    • On the PC version, add an “alternate” tag to indicate that a mobile version exists
    • On the mobile version, add a “canonical” tag to indicate that the PC version is the canonical URL

    Reference: Mobile-First Indexing Best Practices | Google Search Central

    However, Google recommends responsive design, which adapts to all devices rather than having separate URLs for each, so you might want to consider this approach.

    Method 4: Reducing Duplicate Content

    If you’ve created multiple similar content pieces, it’s a good idea to reduce them by:

    • Merging two pieces of content into one
    • Adding original content

    For instance, if a travel site has separate pages for two cities and the content is similar, you can either combine them into one page or add original content to each page.

    Method 5: Request Action from Content Syndication Sites

    Syndicating your content to external media is referred to as “content syndication.”

    You can prevent duplication issues from content syndication by discussing it with the distribution sites in advance.

    Ask the distribution site to add a “noindex” tag to their articles.

    If you’re wondering, “Won’t adding a noindex tag make syndicating to external sites pointless?”, it still holds value for driving traffic via social media and internal links within the media.

    Note

    Previously, it was recommended to add a “rel=canonical” tag to the distribution site’s article, canonicalizing the original article as the best method to resolve duplication. However, in May 2023, Google updated its documentation regarding syndicated content. It now recommends blocking the indexing of syndicated content without using canonical tags.
    Reference: Troubleshooting Canonicalization Issues | Google Search Central

    To counter duplication from content syndication, you can also try to get your page indexed faster than the distribution site or simply avoid syndicating the content altogether.

    If your content has been reprinted on another site without your permission, use the following methods to resolve the issue.

    If Your Content is Reposted Without Permission

    You can file a copyright infringement claim with Google. Submit your application using the form below.
    Report Copyright Infringement: Web Search – Google

    How to Check for Duplicate Content

    You can check for duplicate content using “Google Search Console.”

    1. Click on “Pages” under “Indexing” in the left panel of the dashboard.

    2. If there is duplicate content, a message will appear under “Why pages weren’t indexed.” By clicking on it, you can check which URLs are causing duplication.

    For detailed information on each message, you can refer to Google’s help page on the “Page Indexing Report” here.

    You can also use the “URL Inspection Tool” to investigate which page Google considers the canonical version.

    Common Questions About Duplicate Content and SEO

    Here are some common questions regarding duplicate content and SEO. We hope you find them helpful.

    1. What is the percentage that qualifies content as duplicate?

    There is no specific percentage that qualifies content as duplicate.

    John Mueller from Google responded as follows in a Twitter interaction:

    User: Is there a percentage that defines duplicate content? For example, should we aim for at least 72.6% of a page to be unique compared to other pages on the site? Does Google measure this percentage?

    Mueller: There is no such number (how would you even measure that?).

    2. Is it okay for terms and conditions or disclaimers to be duplicated?

    Duplicate content in terms and conditions, disclaimers, or similar standard legal texts is not a problem.

    The same goes for shipping and payment information on e-commerce sites.

    Google understands that these kinds of templates are standard across the web and handles them appropriately.

    It only becomes an issue if you have numerous pages made up entirely of such templates without any unique content. Otherwise, there’s no need to worry.
    Reference: How does required duplicate content (terms and conditions, etc.) affect search? – YouTube (2013)

    3. Does providing the same content in different formats count as duplication?

    If the formats differ, such as a YouTube video and a blog post, the same content will not be considered duplicate.

    They will be evaluated as separate content pieces.

    Providing information in different formats allows you to reach a broader audience, and Google actually encourages content distribution across multiple formats.
    Reference: Google: Same Content in Different Formats is Not Duplicate – Search Engine Journal

    4. Is it duplicate content if we publish the same content on both our e-commerce site and Rakuten?

    If you run your own e-commerce site and publish the same content on Rakuten or Amazon, it will be considered duplicate content.

    In a 2013 talk, Google’s Matt Cutts addressed this issue.

    It will be considered duplicate content. While it’s not a penalty, one of the versions is unlikely to appear in search results.

    Source: How to Avoid Duplicate Content for Your E-commerce Site on Rakuten – International SEO Blog

    For strategies to avoid duplication when selling the same product across multiple platforms, refer to Kenichi Suzuki’s blog above.

    5. Is it okay to use noindex to handle duplicate content?

    Some people use the noindex tag to avoid duplicate content, but Google does not recommend it.

    This is because the noindex tag removes the page’s value from the index.

    For example, if you want to boost the ranking of Page A but Page B (which has duplicate content) is ranking higher, you should use rel=”canonical” on Page B instead of noindex.

    This will consolidate the ranking signals onto Page A.

    If you’re absolutely certain that the page in question holds no value, you can use noindex to handle duplicate content, but it should be a last resort.

    [Summary] Duplicate Content is Not a Penalty! Implement Measures to Win in SEO

    As long as duplicate content isn’t malicious, it won’t result in penalties.

    That said, it can still be detrimental to SEO for the following reasons:

    SEO Disadvantages of Duplicate Content

    • Disadvantage 1: Link equity is divided
    • Disadvantage 2: Wasted crawler resources
    • Disadvantage 3: The intended page may not appear in search results

    If you find duplicate content on your site, you can use the methods introduced in this article to address it.

    Methods to Address Duplicate Content

    • Method 1: Implement 301 redirects
    • Method 2: Use canonical tags
    • Method 3: Set up annotations
    • Method 4: Delete or consolidate similar content
    • Method 5: Request adjustments from content distribution platforms

    If you’re struggling with ranking even after resolving duplicate content issues, feel free to contact us for assistance.