Index Coverage Status report

30/11/2019

Use this report to learn which of your pages have been indexed, and how to fix pages that could not be indexed. Each bar in the graph represents the total number of URLs in a particular status (valid, error, and so on) as known by Google.

OPEN INDEX COVERAGE REPORT

Sharing the report

You can share issue details by clicking the Share button on the page. This link grants access only to the current page, plus any validation history pages for this issue, to anyone with the link. It does not grant access to other pages for your resource, or enable the shared user to perform any actions on your property or account. You can revoke the link at any time by disabling sharing for this page.

What to look for

Ideally you should see a gradually increasing count of valid indexed pages as your site grows.

If you see a spike in indexing errors, this might be caused by a change in your template that introduces a new error, or you might have submitted a sitemap that includes URLs that are blocked to crawling (for example, by robots.txt or noindex, or a login requirement).
If you see a drop in total indexed pages without corresponding errors, this might mean that you are blocking access to your existing pages (via robots.txt, 'noindex', or requiring auth). If that is not the issue, look at the excluded issues, sorted by number of pages affected, to see what might be causing this drop.
If you have a significant number of pages that are not indexed and you think they should be, look at excluded URLs for clues. You might be roboting or noindexing many of your pages.

How are these URLs found? Google discovers URLs through many means, most commonly by following links from crawled pages or by sitemaps. Sometimes these links are wrong (and can lead to 404s on your site). Sometimes the page existed, and has disappeared. But once Google has heard of a URL it will continue to try to crawl it for a while. That's natural; if you want to prevent that, you can block indexing, block access, or use a 301 redirect (where appropriate).

Top level report

The top level report shows the index status of all pages that Google has attempted to crawl on your site, grouped by status and reason.

Status

Each page can have one of the following general status classes:

Error: The page has not been indexed. See the specific error type description below to learn more, and how to fix the error. You should concentrate on these issues first.
Warning: Page is indexed, or was until recently, and has an issue that you should be aware of.
Excluded: The page is not included in the index for reasons that you typically cannot affect. The page might be in an intermediate stage of the indexing process, or is deliberately excluded by you (for example by a noindex directive) and is therefore behaving as expected.
Valid: The page was indexed.

Reason

Each status (valid, warning, error, excluded) can have a specific reason. Data in the table is grouped by reason; each row can describe one or more URLs. See Status type descriptions below for a description of each status type, and how to handle it.

Validation

The status of a user-initiated validation flow for this issue. You should prioritize issues that are failed or not started.

URL discovery dropdown filter

The dropdown filter above the chart enables you to filter index results by the mechanism through which Google discovered the URL. The following values are available:

All known pages [Default] - Show all URLs discovered by Google through any means.
All submitted pages - Show only pages submitted in a sitemap either using Search Console, a robots.txt file, or a sitemap ping).
Specific sitemap URL - Show only URLs listed in a specific sitemap that was submitted using Search Console. If it is a sitemap index, all URLs in any included sitemaps are reported.

A URL is considered submitted by a sitemap even if it was also discovered through some other mechanism (for example, by organic crawling from another page).

Drilldown report by status and reason

Clicking on a row in the top page will show details for a specific status type. The reason report contains the following information:

A graph showing URLs by general status (valid, error, warning, excluded).
A table showing the URLs by status type, and the last time that URL was crawled.

Important: Seeing a URL marked with an issue that you've already fixed? Perhaps you fixed the issue AFTER the last Google crawl. Therefore, if you see a URL with an issue that you have fixed, be sure to check the crawl date for that URL:

If the URL was recrawled after your fix, we couldn't confirm your fix. Check and confirm your fix and wait for a recrawl.
If the URL was crawled before the fix, either wait for Google to recrawl the page, or click "start fixing" (if displayed) and fix the issue using the issue management flow.

Troubleshooting your pages

See if you can find any correspondence between the total number of indexing errors or total indexed count and the sparkline for a specific error as a clue to which issue might be affecting your total error or total indexed page count.
Fix issues:
1. The table of URLs grouped by severity and warning are sorted by a combination of severity, number of affected pages, and whether or not they are currently being validated. We recommend addressing them in the default order shown.
2. If there is an increase in errors, look for frequency spikes in the row that happened at the same time as any error spikes in the top chart, and click the row to learn more in the drilldown report (described next).
3. Click an error row to get to the drilldown page with more information (see below). Read the description about the specific error type to learn how to handle it best.
4. Fix all instances of each reason, and request validation by clicking Validate Fix in the drilldown for that reason. Read more about validation.
5. You'll get notifications as your validation proceeds, but you can check back after a few days to see whether your error count has gone down.
Periodically remove the filter for excluded URLs, sort them by number of affected pages, and scan them for any unwanted issues.

Fixing server errors

A server error means that Googlebot couldn't access your URL, the request timed out, or your site was busy. As a result, Googlebot was forced to abandon the request.

Testing server connectivity

You can use the the URL Inspection tool to see if you can reproduce a server error reported by the Index Coverage Status report.

Fixing server connectivity errors

Reduce excessive page loading for dynamic page requests.
A site that delivers the same content for multiple URLs is considered to deliver content dynamically (e.g. www.example.com/shoes.php?color=red&size=7 serves the same content as www.example.com/shoes.php?size=7&color=red). Dynamic pages can take too long to respond, resulting in timeout issues. Or, the server might return an overloaded status to ask Googlebot to crawl the site more slowly. In general, we recommend keeping parameters short and using them sparingly. If you're confident about how parameters work for your site, you can tell Google how we should handle these parameters.
Make sure your site's hosting server is not down, overloaded, or misconfigured.
If connection, timeout or response problems persists, check with your web hoster and consider increasing your site's ability to handle traffic.
Check that you are not inadvertently blocking Google.
You might be blocking Google due to a system level issue, such as a DNS configuration issue, a misconfigured firewall or DoS protection system, or a content management system configuration. Protection systems are an important part of good hosting and are often configured to automatically block unusually high levels of server requests. However, because Googlebot often makes more requests than a human user, it can trigger these protection systems, causing them to block Googlebot and prevent it from crawling your website. To fix such issues, identify which part of your website's infrastructure is blocking Googlebot and remove the block. The firewall may not be under your control, so you may need to discuss this with your hosting provider.
Control search engine site crawling and indexing wisely.
Some webmasters intentionally prevent Googlebot from reaching their websites, perhaps using a firewall as described above. In these cases, usually the intent is not to entirely block Googlebot, but to control how the site is crawled and indexed. If this applies to you, check the following:
- To control Googlebot's crawling of your content, use a robots.txt file and configure URL parameters.
- If you're worried about rogue bots using the Googlebot user-agent, you can verify whether a crawler is actually Googlebot.
If you would like to change how frequently Googlebot crawls your site, you can request a change in Googlebot's crawl rate. Hosting providers can verify ownership of their IP addresses too.

Fixing 404 errors

Most 404 errors don't affect your site's ranking in Google, so you can safely ignore them. Typically, they are caused by typos, site misconfigurations, or by Google's increased efforts to recognize and crawl links in embedded content such as JavaScript. Here are some pointers to help you investigate and fix 404 errors:

Decide if it's worth fixing. Many (most?) 404 errors are not worth fixing. Here's why:Sort your 404s by priority and fix the ones that need to be fixed. You can ignore the other ones, because 404s don't harm your site's indexing or ranking.
- If it is a deleted page that has no replacement or equivalent, returning a 404 is the right thing to do.
- If it is a bad URL generated by a script, or that never have existed on your site, it's probably not a problem you need to worry about. It might bother you to see it on your report, but you don't need to fix it, unless the URL is a commonly misspelled link (see below).
See where the invalid links live. Click a URL to see Linked from these pages information. Your fix will depend on whether the link is coming from your own or from another site:
1. Fix links from your own site to missing pages, or delete them if appropriate.
  - If the content has moved, add a redirect.
  - If you have permanently deleted content without intending to replace it with newer, related content, let the old URL return a 404 or 410. Currently Google treats 410s (Gone) the same as 404s (Not found). Returning a code other than 404 or 410 for a non-existent page (or redirecting users to another page, such as the homepage, instead of returning a 404) can be problematic. Such pages are called soft 404s, and can be confusing to both users and search engines.
  - If the URL is unknown: You might occasionally see 404 errors for URLs that never existed on your site. These unexpected URLs might be generated by Googlebot trying to follow links found in JavaScript, Flash files, or other embedded content, or possibly that exist only in a sitemap. For example, your site may use code like this to track file downloads in Google Analytics:
```
  Hello World PDF
```
    When Googlebot sees this code, it might try to crawl the URL http://www.example.com/download-helloworld, even though it's not a real page. In this case, the link may appear as a 404 (Not Found) error in the Crawl Errors report. Google is working to prevent this type of crawl error. This error has no effect on the crawling or ranking of your site.
2. Fix misspelled links from other sites with 301 redirects. For example, a misspelling of a legitimate URL (www.example.com/redshoos instead of www.example.com/redshoes) probably happened when someone linking to your site simply made a typo. In this case, you can capture that misspelled URL by creating a 301 redirect to the correct URL. You can also contact the webmaster of a site with an incorrect link, and ask for the link to be updated or removed.
Ignore the rest of the errors. Don't create fake content, redirect to your homepage, or use robots.txt to block those URLs—all of these things make it harder for us to recognize your site’s structure and process it properly. We call these soft 404 errors. Note that clicking This issue is fixed in the Crawl Errors report only temporarily hides the 404 error; the error will reappear the next time Google tries to crawl that URL. (Once Google has successfully crawled a URL, it can try to crawl that URL forever. Issuing a 300-level redirect will delay the recrawl attempt, possibly for a very long time.) Note that submitting a URL removal request using the URL removal tool will not remove the error from this report.

If you don't recognize a URL on your site, you can ignore it. These errors occur when someone browses to a non-existent URL on your site - perhaps someone mistyped a URL in the browser, or someone mistyped a link URL. However, you might want to catch some of these mistyped URLs as described in the list above.

Status reasons

Here are the possible reasons for each of your pages.

Submitted vs not submitted

Any time you see an index result that uses the word "Submitted", it means that you have explicitly asked Google to index the URL by submitting it in a sitemap.

Error

Pages with errors have not been indexed.

Server error (5xx): Your server returned a 500-level error when the page was requested. See Fixing server errors.

Redirect error: The URL was a redirect error. Could be one of the following types: it was a redirect chain that was too long; it was a redirect loop; the redirect URL eventually exceeded the max URL length; there was a bad or empty URL in the redirect chain.

Submitted URL blocked by robots.txt: You submitted this page for indexing, but the page is blocked by robots.txt. Try testing your page using the robots.txt tester.

Submitted URL marked ‘noindex’: You submitted this page for indexing, but the page has a 'noindex' directive either in a meta tag or HTTP header. If you want this page to be indexed, you must remove the tag or HTTP header.

Submitted URL seems to be a Soft 404: You submitted this page for indexing, but the server returned what seems to be a soft 404.

Submitted URL returns unauthorized request (401): You submitted this page for indexing, but Google got a 401 (not authorized) response. Either remove authorization requirements for this page, or else allow Googlebot to access your pages by verifying its identity.

Submitted URL not found (404): You submitted a non-existent URL for indexing. See Fixing 404 errors.

Submitted URL has crawl issue: You submitted this page for indexing, and Google encountered an unspecified crawling error that doesn't fall into any of the other reasons. Try debugging your page using the URL Inspection tool.

Warning

Pages with a warning status might require your attention, and may or may not have been indexed, according to the specific result.

Indexed, though blocked by robots.txt: The page was indexed, despite being blocked by robots.txt (Google always respects robots.txt, but this doesn't help if someone else links to it). This is marked as a warning because we're not sure if you intended to block the page from search results. If you do want to block this page, robots.txt is not the correct mechanism to avoid being indexed. To avoid being indexed you should either use 'noindex' or prohibit anonymous access to the page using auth. You can use the robots.txt tester to determine which rule is blocking this page. Because of the robots.txt, any snippet shown for the page will probably be sub-optimal. If you do not want to block this page, update your robots.txt file to unblock your page.

Valid

Pages with a valid status have been indexed.

Submitted and indexed: You submitted the URL for indexing, and it was indexed.

Indexed, not submitted in sitemap: The URL was discovered by Google and indexed. We recommend submitting all important URLs using a sitemap.

Indexed; consider marking as canonical: The URL was indexed. It has duplicate URLs, but we consider this one to be canonical. It is not explicitly marked as canonical, and so we recommend explicitly marking it as canonical.

Excluded

These pages are typically not indexed, and we think that was your intention.

Excluded by ‘noindex’ tag: When Google tried to index the page it encountered a 'noindex' directive and therefore did not index it. If you do not want this page indexed, congratulations! If you do want this page to be indexed, you should remove that 'noindex' directive.

Blocked by page removal tool: The page is currently blocked by a URL removal request. If you are a verified site owner, you can use the URL removals tool to see who submitted a URL removal request. Removal requests are only good for a specified period of time (see the linked documentation). After that period, Googlebot may go back and index the page even if you do not submit another index request. If you don't want the page indexed, use 'noindex', require authorization for the page, or remove the page.

Blocked by robots.txt: This page was blocked to Googlebot with a robots.txt file. You can verify this using the robots.txt tester. Note that this does not mean that the page won't be indexed through some other means. If Google can find other information about this page without loading it, the page could still be indexed (though this is less common). To ensure that a page is not indexed by Google, remove the robots.txt block and use a 'noindex' directive.

Blocked due to unauthorized request (401): The page was blocked to Googlebot by a request for authorization (401 response). If you do want Googlebot to be able to crawl this page, either remove authorization requirements, or allow Googlebot to access your page.

Crawl anomaly: An unspecified anomaly occurred when fetching this URL. This could mean a 4xx- or 5xx-level response code; try fetching the page using Fetch as Google to see if it encounters any fetch issues. The page was not indexed.

Crawled - currently not indexed: The page was crawled by Google, but not indexed. It may or may not be indexed in the future; no need to resubmit this URL for crawling.

Discovered - currently not indexed: The page was found by Google, but not crawled yet. Typically, Google tried to crawl the URL but the site was overloaded; therefore Google had to reschedule the crawl. This is why the last crawl date is empty on the report.

Alternate page with proper canonical tag: This page is a duplicate of a page that Google recognizes as canonical. This page correctly points to the canonical page, so there is nothing for you to do.

Duplicate without user-selected canonical: This page has duplicates, none of which is marked canonical. We think this page is not the canonical one. You should explicitly mark the canonical for this page. Inspecting this URL should show the Google-selected canonical URL.

Duplicate non-HTML page: This non-HTML page (for example, a PDF file) is a duplicate of another page that Google has marked as canonical. Typically only the canonical URL will be shown in Google Search. If you like, you can specify a canonical page using the Link HTTP header in a response.

Duplicate, Google chose different canonical than user: This page is marked as canonical for a set of pages, but Google thinks another URL makes a better canonical. Google has indexed the page that we consider canonical rather than this one. We recommend that you explicitly mark this page as a duplicate of the canonical URL. This page was discovered without an explicit crawl request. Inspecting this URL should show the Google-selected canonical URL.

Not found (404): This page returned a 404 error when requested. Google discovered this URL without any explicit request or sitemap. Google might have discovered the URL as a link from another site, or possibly the page existed before and was deleted. Googlebot will probably continue to try this URL for some period of time; there is no way to tell Googlebot to permanently forget a URL, although it will crawl it less and less often. 404 responses are not a problem, if intentional. If your page has moved, use a 301 redirect to the new location. Read Fixing 404 errors

Page removed because of legal complaint: The page was removed from the index because of a legal complaint.

Page with redirect: The URL is a redirect, and therefore was not added to the index.

Queued for crawling: The page is in the crawling queue; check back in a few days to see if it has been crawled.

Soft 404: The page request returns what we think is a soft 404 response. This means that it returns a user-friendly "not found" message without a corresponding 404 response code. We recommend returning a 404 response code for truly "not found" pages, or adding more information to the page to let us know that it is not a soft 404. Learn more

Submitted URL dropped: You submitted this page for indexing, but it was dropped from the index for an unspecified reason.

Duplicate, submitted URL not selected as canonical: The URL is one of a set of duplicate URLs without an explicitly marked canonical page. You explicitly asked this URL to be indexed, but because it is a duplicate, and Google thinks that another URL is a better candidate for canonical, Google did not index this URL. Instead, we indexed the canonical that we selected. (Google only indexes the canonical in a set of duplicates.) The difference between this status and "Google chose different canonical than user" is that here you have explicitly requested indexing. Inspecting this URL should show the Google-selected canonical URL.

About validation

After you fix all instances of a specific issue on your site, you can ask Google to validate your changes. If all known instances are gone, the issue is marked as fixed in the status table and dropped to the bottom of the table. Search Console tracks the validation state of the issue as a whole, as well as the state of each instance of the issue. When all instances of the issue are gone, the issue is considered fixed. (For actual states recorded, see Issue validation state and Instance validation state.)

More about issue lifetime...

An issue's lifetime extends from the first time any instance of that issue was detected on your site until 90 days after the last instance was marked as gone from your site. If ninety days pass without any recurrences, the issue is removed from the report history.

The issue's first detected date is the first time the issue was detected during the issue's lifetime, and does not change. Therefore:

If all instances of an issue are fixed, but a new instance of the issue occurs 15 days later, the issue is marked as open, and "first detected" date remains the original date.
If the same issue occurs 91 days after the last instance was fixed, the previous issue was closed, and so this is recorded as a new issue, with the first detected date set to "today".

Basic validation flow

Here is an overview of the validation process after you click Validate Fix for an issue. This process can take several days, and you will receive progress notifications by email.

When you click Validate Fix, Search Console immediately checks a few pages.
- If the current instance exists in any of these pages, validation ends, and the validation state remains unchanged.
- If the sample pages do not have the current error, validation continues with state Started. If validation finds other unrelated issues, these issues are counted against that other issue type and validation continues.
Search Console works through the list of known URLs affected by this issue. Only URLs with known instances of this issue are queued for recrawling, not the whole site. Search Console keeps a record of all URLs checked in the validation history, which can be reached from the issue details page.
When a URL is checked:
1. If the issue is not found, the instance validation state changes to Passing. If this is the first instance checked after validation has started, the issue validation state changes to Looking good.
2. If the URL is no longer reachable, the instance validation state changes to Other (which is not an error state).
3. If the instance is still present, issue state changes to Failed and validation ends. If this is a new page discovered by normal crawling, it is considered another instance of this existing issue.
When all error and warning URLs have been checked and the issue count is 0, the issue state changes to Passed. Important: Even when the number of affected pages drops to 0 and issue state changes to Passed, the original severity label will still be shown (Error or Warning).

Even if you never click "start validation" Google can detect fixed instances of an issue. If Google detects that all instances of an issue have been fixed during its regular crawl, it will change the issue state to "N/A" on the report.

When is an issue considered "fixed" for a URL or item?

An issue is marked as fixed for a URL or item when either of the following conditions are met:

When the URL is crawled and the issue is no longer found on the page. For an AMP tag error, this can mean that you either fixed the tag or that the tag has been removed (if the tag is not required). During a validation attempt, it will be considered as "passed."
If the page is not available to Google for any reason (page removed, marked noindex, requires authentication, and so on), the issue will be considered as fixed for that URL. During a validation attempt, it is counted in the "other" validation state.

Revalidation

When you click Revalidate for a failed validation, validation restarts for all failed instances, plus any new instances of this issue discovered through normal crawling.

You should wait for a validation cycle to complete before requesting another cycle, even if you have fixed some issues during the current cycle.

Instances that have passed validation (marked Passed) or are no longer reachable (marked Other) are not checked again, and are removed from the history when you click Revalidate.

Validation history

You can see the progress of a validation request by clicking the validation details link in the issue details page.

Entries in the validation history page are grouped by URL for the AMP report and Index Status report. In the Mobile Usability and Rich Result reports, items are grouped by the combination of URL + structured data item (as determined by the item's Name value). The validation state applies to the specific issue that you are examining. You can have one issue labeled "Passed" on a page, but other issues labeled "Failed", "Pending," or "Other".

Issue validation state

The following validation states apply to a given issue:

Not started: There are one or more pages with an instance of this issue that you have never begun a validation attempt for. Next steps:
1. Click into the issue to learn the details of the error. Inspect the individual pages to see examples of the error on the live page using the AMP Test. (If the AMP Test does not show the error on the page, it is because you fixed the error on the live page after Google found the error and generated this issue report.)
2. Click "Learn more" on the details page to see the details of the rule that was violated.
3. Click an example URL row in the table to get details on that specific error.
4. Fix your pages and then click Validate fix to have Google recrawl your pages. Google will notify you about the progress of the validation. Validation takes anywhere from a few days up to about two weeks, so please be patient.
Started: You have begun a validation attempt and no remaining instances of the issue have been found yet. Next step: Google will send notifications as validation proceeds, telling you what to do, if necessary.
Looking good: You started a validation attempt, and all issue instances that have been checked so far have been fixed. Next step: Nothing to do, but Google will send notifications as validation proceeds, telling you what to do.
Passed: All known instances of the issue are gone (or the affected URL is no longer available). You must have clicked "Validate fix" to get to this state (if instances disappeared without you requesting validation, state would change to N/A). Next step: Nothing more to do.
N/A: Google found that the issue was fixed on all URLs, even though you never started a validation attempt. Next step: Nothing more to do.
Failed: A certain threshold of pages still contain this issue, after you clicked "Validate." Next steps: Fix the issue and revalidate.

Instance validation state

After validation has been requested, every known issue instance is assigned one of the following validation states for a specific issue (states Passed and Other not used in Index Status report):

Pending validation: Queued for validation. The last time Google looked, this issue instance existed.
Passed: Google checked for the issue instance and it no longer exists. Can reach this state only if you explicitly clicked Validate for this issue instance.
Failed: Google checked for the issue instance and it's still there. Can reach this state only if you explicitly clicked Validate for this issue instance.
Other: Google couldn't reach the URL hosting the instance, or (for structured data) couldn't find the item on the page any more. Considered equivalent to Passed.

Note that the same URL can have different states for different issues; For example, if a single page has both issue X and issue Y, issue X can be in validation state Passed and issue Y on the same page can be in validation state Pending.

Known issues

The following are known issues in this beta version of the new Search Console. No need to report them to us, but we'd love your feedback on any other features or issues you spot. Use the Feedback mechanism in the navigation bar.

Indexing data isn't updated daily, and so the data may be a few days delayed, and some data-points are interpolated.
Charts should cover the last 90 days, but currently might show less.
The sitemaps dropdown filter includes only sitemaps submitted using Search Console or robots.txt directives.
The status list is being refined, and might change, for example:
- Items labeled Error mixes different types of responses (4xx/5xx)
- You can ignore “Dropped for unspecified reasons” or “Other” items.
Clicking on a specific reason row now directs you to tools in the old Search Console; we hope to do better in the future.
The mobile experience is still a work in progress.
Property Sets and mobile app properties are not yet supported.

* Nguồn: Google Search Console