What is Duplicate Content?

Duplicate content is content that appears on the Internet in more than one place (URL). When there are multiple pieces of identical content on the Internet, it is difficult for search engines to decide which version is more relevant to a given search query. To provide the best search experience, search engines will rarely show multiple duplicate pieces of content and thus, are forced to choose which version is most likely to be the original—or best.

The Three Biggest Issues with Duplicate Content

Search engines don't know which version(s) to include/exclude from their indices
Search engines don't know whether to direct the link metrics (trust, authority, anchor text, link juice, etc.) to one page, or keep it separated between multiple versions
Search engines don't know which version(s) to rank for query results

When duplicate content is present, site owners suffer rankings and traffic losses, and search engines provide less relevant results.

Duplicate Content Examples:

1. URL Parameters

URL parameters such as click tracking and some analytics code can cause duplicate content issues.

2. Printer-Friendly

Printer-friendly versions of content can cause duplicate content issues when multiple versions of the pages get indexed.

3. Session IDs

Session IDs are a common duplicate content creator. This occurs when each user that visits a website is assigned a different session ID that is stored in the URL.

SEO Best Practice

Whenever content on a site can be found at multiple URLs, it should be canonicalized for search engines. This can be accomplished using a 301 redirect to the correct URL using the rel=canonical tag (see below) or, in some cases, using the Parameter Handling tool in Google Webmaster Central.

301 Redirect

In many cases the best way to combat duplicate content is to set up a 301 redirect from the "duplicate" page to the original content page. When multiple pages with the potential to rank well are combined into a single page, they not only no longer compete with one another, but also create a stronger relevancy and popularity signal overall. This will positively impact their ability to rank well in the search engines.

Rel="canonical"

Another option for dealing with duplicate content is to utilize the rel=canonical tag. The rel=canonical tag passes the same amount of link juice (ranking power) as a 301 redirect, and often takes up much less development time to implement.

The tag is part of the HTML head of a web page. This meta tag isn't new, but like nofollow, simply uses a new rel parameter. For example:

<link href="http://www.example.com/canonical-version-of-page/" rel="canonical" />

This tag tells Bing and Google that the given page should be treated as though it were a copy of the URL www.example.com/canonical-version-of-page/ and that all of the links and content metrics the engines apply should actually be credited toward the provided URL.

The following examples show capitalization errors that cause duplicate content:

http://www.simplyhired.com/a/jobs/list/q-software+developer
http://www.simplyhired.com/a/jobs/list/q-Software+developer
http://www.simplyhired.com/a/jobs/list/q-software+Developer

The only differences between these URLs are with the capitalization of the words "software" and "developer." A search engine would see all three of these URLs as different pages and would treat them as duplicate content. By implementing the rel="canonical" tool on the 2nd and 3rd instances pointing back to the 1st URL, the search engines would know to treat all of those pages as if they were URL #1.

noindex, follow

The meta robots tag with the values "noindex, follow" can be implemented on pages that shouldn't be included in a search engine's index. This allows the search engine bots to crawl the links on the specified page, but keeps them from including them in their index. This works particularly well with pagination issues.

Parameter Handling in Google Webmaster Tools

Google Webmaster Tools allows you to set the preferred domain of your site and handle various URL parameters differently. The main drawback to these methods is that they only work for Google. Any change you make here will not affect Bing or any other search engine's settings.

Set Preferred Domain

This should be set for all sites. It is a simple way to tell Google whether a given site should be shown with or without a www in the search engine result pages.

Additional Methods for Removing Duplicate Content

Maintain consistency when linking internally throughout a website. For example, if a webmaster determines that the canonical version of a domain is www.example.com/, then all internal links should go to http://www.example.com/example.html rather than http://example.com/page.html (notice the absence of www).
When syndicating content, make sure the syndicating website adds a link back to the original content. See Dealing With Duplicate Content for more information.
Minimize similar content. For example, rather than having one page about raincoats for boys and another page for raincoats for girls that shares 95% of the same content, consider expanding those pages to include distinct, relevant content for each URL. Alternatively, a webmaster could combine the two pages into a single page that is highly relevant for childrens' raincoats.
Remove duplicate content from search engines’ indices by noindex-ing with meta robots or through removal via Webmaster Tools (Google and Bing).

Rel=Canonical Code Sample

<head> <link rel="canonical" href="https://moz.com/blog/" /> </head>

Meta Robots Code Sample

<head> <meta name="robots" content="noindex, follow" /> </head>

Related Tools

Xenu Link Sleuth
Xenu's Link Sleuth (TM) checks Web sites for broken links and other helpful SEO metrics.

External Resources

Duplicate content - Google Technical Support
Google's official documentation duplicate content.
Parameter Handling in Google Webmaster Tools
Search Engine Land's coverage on parameter handling.

Related Guides

The Beginner's Guide to SEO
Moz’s comprehensive guide to the practice of search engine optimization for those unfamiliar with the subject.

Contact Us....

Tronic Global
http://www.tronicglobal.com
165 – C, Ekta Enclave, Peera Garhi,New Delhi
info@tronicglobal.com
+91 – 9599871018
+91-11252 76470

Comments

Parker Taylor 27 January 2020 at 07:30

I really appreciate your post, and you explain each and every point very well. Thanks for creating great content kindly share the review on Does duplicate content actually hurt SEO
arshiya fouzia2 June 2021 at 03:05
Awesome,I have gained more knowledge from your blog,keep sharing such blogs.
digital marketing training in chennai

Search This Blog

Website Designing Company in Delhi