Learn the impact of duplicate URLs

30/11/2019

Certain sites occasionally make the same content available via different URLs by using session IDs or other URL parameters. A session ID is a number that is appended to a URL path, thereby creating a new page with a custom experience for the person visiting the site who corresponds to that ID. For example, session IDs allow a shopping site to differentiate between customers so that each person can see what is in their shopping carts while browsing the site catalogue. URL parameters, meanwhile, are less specific to identifying individual customers: as an example, when a customer searches for "puppies" on a pet store site, she has the option of filtering or sorting her results by age, breed, coloring, and price range. Each combination of filters then represents a different URL since the filters append new strings or parameters to the original URL path to change the customer sees although typically the URLs contain similar or duplicate results.

 Example

The following URLs point to the same content: a collection of green dresses, although some of these pages might be organized or filtered slightly differently.

 http://www.example.com/products/women/dresses/green.htm
 http://www.example.com/products/women?category=dresses&color=green
 http://example.com/shop/index.php?product_id=32&highlight=green+dress&cat_id=1&sessionid=123&affid=431

When Google detects duplicate content, such as the pages in the example above, a Google algorithm groups the duplicate URLs into one cluster and selects what the algorithm thinks is the best URL to represent the cluster in search results (for example, Google might select the URL with the most content). Google then tries to consolidate what we know about the URLs in the cluster, such as link popularity, to the one representative URL to ultimately improve the accuracy of its page ranking and results in Google Search.

However, when Google can't find all the URLs in a cluster or is unable to select the representative URL that you prefer, you can use the URL Parameters tool to give Google information about how to handle URLs containing specific parameters.

Please note that you should exercise caution when using the URL Parameters tool. If you make a mistake in indicating to us what is duplicate content that should not be crawled, Google might stop crawling pages you want available on Google Search.

For example, if you tell Google to only crawl a URL with the food parameter if it has the value food=savory, Google might not crawl a URL with food=sweet in its URL path. As a result, web pages from your site with sweets are not findable in Google Search.

NEXT: CATEGORIZE YOUR URL PARAMETERS

* Nguồn: Google Search Console