{"id":821,"date":"2019-09-09T06:00:10","date_gmt":"2019-09-09T13:00:10","guid":{"rendered":"https:\/\/technicalseo.com\/insights\/?p=821"},"modified":"2019-09-05T08:20:25","modified_gmt":"2019-09-05T15:20:25","slug":"robots-txt-ambiguities","status":"publish","type":"post","link":"https:\/\/technicalseo.com\/insights\/blog\/robots-txt-ambiguities\/","title":{"rendered":"Blocked or not? Wildcards, typos and other robots.txt ambiguities"},"content":{"rendered":"\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"1024\" height=\"605\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/robotstxt-testing-tool-1024x605.jpg\" alt=\"robots.txt Testing Tool\" class=\"wp-image-824\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/robotstxt-testing-tool-1024x605.jpg 1024w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/robotstxt-testing-tool-300x177.jpg 300w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/robotstxt-testing-tool-768x453.jpg 768w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/robotstxt-testing-tool.jpg 1594w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Older than most of the search engines we use today, the robots.txt file is a dinosaur of the web. Although there are many articles discussing the protocol and how to use the file, in this post we will cover some ambiguities, corner cases and other undocumented scenarios.<\/p>\n\n\n\n<h2>Why should you care?<\/h2>\n\n\n\n<p>The robots.txt is a convenient, yet complex tool for SEO that can easily cause unexpected results if not handled carefully. While it makes sense  to avoid any typos and ambiguous or conflicting rules when using Allow and Disallow directives, we should also know how search engines behave in these situations. <\/p>\n\n\n\n<h2>robots.txt: the puffer fish of the web<\/h2>\n\n\n\n<div class=\"wp-block-columns has-2-columns\">\n<div class=\"wp-block-column\">\n<p>robots.txt is like a puffer fish: delicious, yet toxic. It is safe to eat as long as the SEO chef knows how to cook it. They know the file format, the syntax and the rules of grouping. They understand the order of precedence for user-agents and how URL matching based on path values works.<\/p>\n\n\n\n<p>The Robots Exclusion Protocol (REP) is celebrating 25 years and its specification is finally being formalized <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/webmasters.googleblog.com\/2019\/07\/rep-id.html\" target=\"_blank\">with some new rules<\/a>. Yet, we can still observe misuse and confusion in the SEO and webmaster communities, which, unfortunately, result in sub-optimal performance in search for some websites. <\/p>\n\n\n\n<p>So, let\u2019s clarify and expand on some specific puzzling rules.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column\">\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"792\" height=\"612\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/puffer-fish.png\" alt=\"\" class=\"wp-image-832\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/puffer-fish.png 792w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/puffer-fish-300x232.png 300w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/puffer-fish-768x593.png 768w\" sizes=\"(max-width: 792px) 100vw, 792px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<ul><li>The trailing wildcard is ignored<\/li><li>The&nbsp;order of precedence for rules with wildcards is undefined<\/li><li>The&nbsp;path value must start with \u2018\/\u2019 to designate the root<\/li><li>Google doesn\u2019t support the handling of&nbsp;<code>&lt;field&gt;<\/code>&nbsp;elements with simple errors or typos<\/li><\/ul>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>Quick note before we start: some screenshots in this post show our <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/technicalseo.com\/tools\/robots-txt\/\" target=\"_blank\">robots.txt validator and testing tool<\/a>, which was build based on the original <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/www.robotstxt.org\/robotstxt.html\" target=\"_blank\">robots.txt documentation<\/a> and behaves like the following tools and libraries (also used in our research):<\/p>\n\n\n\n<ul><li>Google Search Console\u2019s <a rel=\"noreferrer noopener\" aria-label=\"Google Search Console\u2019s robots.txt Tester (opens in a new tab)\" href=\"https:\/\/www.google.com\/webmasters\/tools\/robots-testing-tool\" target=\"_blank\">robots.txt Tester<\/a><\/li><li><a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/search.google.com\/test\/mobile-friendly\" target=\"_blank\">Mobile-Friendly Test<\/a> and GSC\u2019s URL Inspection Tool, which return an error if the URL is blocked by robots.txt<\/li><li>Google <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/github.com\/google\/robotstxt\" target=\"_blank\">robots.txt Parser and Matcher Library<\/a> that <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/webmasters.googleblog.com\/2019\/07\/repp-oss.html\" target=\"_blank\">is now open source<\/a><\/li><\/ul>\n\n\n\n<p>In addition to those tools we\u2019ve also used <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/ziyuan.baidu.com\/robots\/index\" target=\"_blank\">Baidu\u2019s robots.txt Tester<\/a>, and confirmed the results of our tests on live URLs (crawled or not) with server log files. <\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2>1) The trailing wildcard is ignored<\/h2>\n\n\n\n<p>According to Google\u2019s <a href=\"https:\/\/developers.google.com\/search\/reference\/robots_txt\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">robots.txt specifications<\/a>, the trailing wildcard is ignored. <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"856\" height=\"119\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/01-trailing-wildcard-ignored.jpg\" alt=\"\" class=\"wp-image-839\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/01-trailing-wildcard-ignored.jpg 856w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/01-trailing-wildcard-ignored-300x42.jpg 300w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/01-trailing-wildcard-ignored-768x107.jpg 768w\" sizes=\"(max-width: 856px) 100vw, 856px\" \/><figcaption> <em>Source: <a href=\"https:\/\/developers.google.com\/search\/reference\/robots_txt\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">https:\/\/developers.google.com\/search\/reference\/robots_txt<\/a><\/em> <\/figcaption><\/figure>\n\n\n\n<p>But is it, really? Well, yes and no. It is \u201cignored\u201d in that sense that there is an \u201cimplicit\u201d * wildcard at the end of every path that is not ending with $ (which is the other wildcard that explicitly designates the end of the URL). <br>Therefore, <code>\/fish = \/fish*<\/code> when it comes to simply matching the URL with the path. <\/p>\n\n\n\n<p>However, the trailing wildcard is not ignored when totaling the path length. This becomes important when Disallow and Allow rules are used concurrently, and while different, they both match a particular URL (or set of URLs). <\/p>\n\n\n\n<p>The length of the path is extremely important as it\ndetermines the order of precedence when multiple lines (rules) match the URL. <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"1024\" height=\"153\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/02-path-length-1024x153.jpg\" alt=\"\" class=\"wp-image-841\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/02-path-length-1024x153.jpg 1024w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/02-path-length-300x45.jpg 300w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/02-path-length-768x115.jpg 768w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/02-path-length.jpg 1043w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>In the example above, the URL would be blocked from crawling\nsince the Disallow statement is longer than the (also matching) Allow directive.<\/p>\n\n\n\n<p>In the specification, Google states that \u201cin case of conflicting rules [\u2026] the least restrictive rule is used.\u201d In other words, if both matching paths have the same length, the Allow rule will be used. <\/p>\n\n\n\n<p>As seen in an example in Google\u2019s documentation\u2026<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"856\" height=\"107\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/03-allow-trumps-disallow.jpg\" alt=\"\" class=\"wp-image-842\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/03-allow-trumps-disallow.jpg 856w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/03-allow-trumps-disallow-300x38.jpg 300w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/03-allow-trumps-disallow-768x96.jpg 768w\" sizes=\"(max-width: 856px) 100vw, 856px\" \/><figcaption> <em>Source: <a rel=\"noreferrer noopener\" href=\"https:\/\/developers.google.com\/search\/reference\/robots_txt\" target=\"_blank\">https:\/\/developers.google.com\/search\/reference\/robots_txt<\/a><\/em> <\/figcaption><\/figure>\n\n\n\n<p>\u2026 and clearly stated in the REP draft. <\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" width=\"592\" height=\"162\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/04-allow-trumps-disallow-rep.jpg\" alt=\"\" class=\"wp-image-843\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/04-allow-trumps-disallow-rep.jpg 592w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/04-allow-trumps-disallow-rep-300x82.jpg 300w\" sizes=\"(max-width: 592px) 100vw, 592px\" \/><figcaption> S<em>ource: <a href=\"https:\/\/tools.ietf.org\/html\/draft-rep-wg-topic-00#section-2.2.2\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">https:\/\/tools.ietf.org\/html\/draft-rep-wg-topic-00#section-2.2.2<\/a><\/em> <\/figcaption><\/figure><\/div>\n\n\n\n<p>Now, back to our trailing wildcard. If one were to be appended to the \u201closing\u201d statement in our previous examples, the verdicts would be different.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"1024\" height=\"153\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/05-allow-trumps-disallow-1024x153.jpg\" alt=\"\" class=\"wp-image-844\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/05-allow-trumps-disallow-1024x153.jpg 1024w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/05-allow-trumps-disallow-300x45.jpg 300w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/05-allow-trumps-disallow-768x115.jpg 768w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/05-allow-trumps-disallow.jpg 1043w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>With the trailing wildcard, the path for the Allow directive is now <em>as long as<\/em> the path for the Disallow rule, and therefore takes precedence. <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"1024\" height=\"153\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/06-wildcard-path-length-1024x153.jpg\" alt=\"\" class=\"wp-image-845\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/06-wildcard-path-length-1024x153.jpg 1024w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/06-wildcard-path-length-300x45.jpg 300w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/06-wildcard-path-length-768x115.jpg 768w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/06-wildcard-path-length.jpg 1043w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>With the trailing wildcard, the path for the Disallow directive is now <em>longer<\/em> and so is being used, <strong>as the trailing wildcard is not \u201cignored\u201d when determining the most \u201cspecific\u201d match based on length.<\/strong><\/p>\n\n\n\n<div class=\"wp-block-columns has-3-columns\">\n<div class=\"wp-block-column\">\n<div class=\"wp-block-image aligncenter\"><figure class=\"aligncenter\"><img loading=\"lazy\" width=\"360\" height=\"277\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/wildcard-length-test-1.jpg\" alt=\"\" class=\"wp-image-854\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/wildcard-length-test-1.jpg 360w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/wildcard-length-test-1-300x231.jpg 300w\" sizes=\"(max-width: 360px) 100vw, 360px\" \/><figcaption>Search Console&#8217;s robots.txt Tester<\/figcaption><\/figure><\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column\">\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"360\" height=\"298\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/wildcard-length-test-2.jpg\" alt=\"\" class=\"wp-image-855\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/wildcard-length-test-2.jpg 360w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/wildcard-length-test-2-300x248.jpg 300w\" sizes=\"(max-width: 360px) 100vw, 360px\" \/><figcaption>Search Console&#8217;s URL Inspection Tool<\/figcaption><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column\">\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"360\" height=\"276\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/wildcard-length-test-3.jpg\" alt=\"\" class=\"wp-image-856\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/wildcard-length-test-3.jpg 360w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/wildcard-length-test-3-300x230.jpg 300w\" sizes=\"(max-width: 360px) 100vw, 360px\" \/><figcaption>Google&#8217;s Mobile-Friendly Test<\/figcaption><\/figure>\n<\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2>2) The&nbsp;order of precedence for rules with wildcards is undefined<\/h2>\n\n\n\n<p>Google&#8217;s documentation used to state that &#8220;the order of precedence for rules with wildcards is undefined&#8221;.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"1024\" height=\"188\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/order-precedence-wildcards-undefined-1024x188.jpg\" alt=\"\" class=\"wp-image-889\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/order-precedence-wildcards-undefined-1024x188.jpg 1024w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/order-precedence-wildcards-undefined-300x55.jpg 300w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/order-precedence-wildcards-undefined-768x141.jpg 768w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/order-precedence-wildcards-undefined.jpg 1050w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption><em>Source: <a rel=\"noreferrer noopener\" href=\"https:\/\/developers.google.com\/search\/reference\/robots_txt\" target=\"_blank\">https:\/\/developers.google.com\/search\/reference\/robots_txt<\/a><\/em> <em>(updated since this screenshot has been taken)<\/em><\/figcaption><\/figure>\n\n\n\n<p>While that statement has been removed in a recent update of the documentation (July 2019), one of the <em>sample situations <\/em>still shows &#8220;undefined&#8221; as the verdict.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"856\" height=\"107\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/verdict-undefined.jpg\" alt=\"\" class=\"wp-image-892\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/verdict-undefined.jpg 856w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/verdict-undefined-300x38.jpg 300w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/verdict-undefined-768x96.jpg 768w\" sizes=\"(max-width: 856px) 100vw, 856px\" \/><figcaption><em>Source: <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/developers.google.com\/search\/reference\/robots_txt\" target=\"_blank\">https:\/\/developers.google.com\/search\/refere<\/a><a href=\"https:\/\/developers.google.com\/search\/reference\/robots_txt\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"nce\/robots_txt (opens in a new tab)\">nce\/robots_txt<\/a><\/em><\/figcaption><\/figure>\n\n\n\n<p>What does &#8220;undefined&#8221; mean for Googlebot? As far as we know, a URL can or cannot be crawled. In this situation, is \/page.html allowed or disallowed?<\/p>\n\n\n\n<p>As it turns out and according to the tools at our disposal, including Google&#8217;s own parser library, nothing is &#8220;undefined&#8221;. The most specific rule is used. <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"1024\" height=\"153\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/undefined-01-1024x153.jpg\" alt=\"\" class=\"wp-image-893\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/undefined-01-1024x153.jpg 1024w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/undefined-01-300x45.jpg 300w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/undefined-01-768x115.jpg 768w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/undefined-01.jpg 1043w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>And in case of conflicting rules, the &#8220;least restrictive&#8221; rule (e.i. Allow) is used.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"1024\" height=\"153\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/undefined-02-1024x153.jpg\" alt=\"\" class=\"wp-image-894\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/undefined-02-1024x153.jpg 1024w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/undefined-02-300x45.jpg 300w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/undefined-02-768x115.jpg 768w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/undefined-02.jpg 1043w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2>3) The&nbsp;path value must start with &#8216;\/&#8217; to designate the root<\/h2>\n\n\n\n<p>While the documentation also states that &#8220;the path value must start with &#8216;\/&#8217; to designate the root&#8221;, paths starting with the &#8216;*&#8217; wildcard are also taken into consideration. <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"1024\" height=\"153\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/path-start-01-1024x153.jpg\" alt=\"\" class=\"wp-image-897\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/path-start-01-1024x153.jpg 1024w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/path-start-01-300x45.jpg 300w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/path-start-01-768x115.jpg 768w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/path-start-01.jpg 1043w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption><em>The path value must start with &#8220;\/&#8221; to designate the root&#8230; or with a * wildcard<\/em><\/figcaption><\/figure>\n\n\n\n<p>Such paths, however, lose a little bit in strength due to the missing character. &nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"1024\" height=\"154\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/path-start-02-1024x154.jpg\" alt=\"\" class=\"wp-image-898\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/path-start-02-1024x154.jpg 1024w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/path-start-02-300x45.jpg 300w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/path-start-02-768x116.jpg 768w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/path-start-02.jpg 1043w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption><em>The starting &#8220;\/&#8221; also counts when evaluating the path length<\/em><\/figcaption><\/figure>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2>4) Google doesn&#8217;t support the handling of&nbsp;<code>&lt;field&gt;<\/code>&nbsp;elements with simple errors or typos<\/h2>\n\n\n\n<p>This new statement in the documentation is in direct opposition with the production code shared on Github.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"819\" height=\"50\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/typos-01.jpg\" alt=\"\" class=\"wp-image-901\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/typos-01.jpg 819w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/typos-01-300x18.jpg 300w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/typos-01-768x47.jpg 768w\" sizes=\"(max-width: 819px) 100vw, 819px\" \/><figcaption><em>Source: <a rel=\"noreferrer noopener\" href=\"https:\/\/developers.google.com\/search\/reference\/robots_txt\" target=\"_blank\">https:\/\/developers.google.com\/search\/reference\/robots_tx<\/a><\/em><\/figcaption><\/figure>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" width=\"596\" height=\"547\" src=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/typos-02.jpg\" alt=\"\" class=\"wp-image-902\" srcset=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/typos-02.jpg 596w, https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/08\/typos-02-300x275.jpg 300w\" sizes=\"(max-width: 596px) 100vw, 596px\" \/><figcaption><em>Source: <a href=\"https:\/\/github.com\/google\/robotstxt\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"https:\/\/github.com\/google\/robotstxt (opens in a new tab)\">https:\/\/github.com\/google\/robotstxt<\/a><\/em><\/figcaption><\/figure><\/div>\n\n\n\n<p>According to this code, Google still accepts simple errors and typos such as a missing hyphen between &#8220;user&#8221; and &#8220;agent&#8221;, an extraneous &#8220;s&#8221; in &#8220;dissallow&#8221; or even a missing colon between the directive and the path (e.g. <code>disallow page.html<\/code>)<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>The time and effort spent by Google, and others, on making the robots.txt protocol an Internet standard, and updating the documentation as well as making their production code open source, is truly appreciated. <br>However, some confusion and inconsistencies still exist and testing remains the only true way to be sure of something. <\/p>\n\n\n\n<p>Additionally, not much has changed. Nothing is &#8220;undefined&#8221;, the length of the path determines how strong the directive is, and wildcards (*) count. You can use them at your advantage (simply add an explicit * at the end of the path to make it stronger). Directives with paths starting with a * also work, and (some) typos still appear to be supported (only by Google, not Bing nor Baidu).<br><br>The only big change is quite ironic: the noindex rule, which was never officially supported, is now <a rel=\"noreferrer noopener\" aria-label=\"officially not supported (opens in a new tab)\" href=\"https:\/\/webmasters.googleblog.com\/2019\/07\/a-note-on-unsupported-rules-in-robotstxt.html\" target=\"_blank\">officially not supported<\/a>. If you&#8217;re currently relying on it, make sure to implement another way to prevent indexing before September 1, 2019.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>Special thanks to Merkle&#8217;s Connie Xin for her help and research for this article.<br>If you have any questions let me know on Twitter: <a rel=\"noreferrer noopener\" aria-label=\"@hermesma (opens in a new tab)\" href=\"https:\/\/twitter.com\/hermesma\" target=\"_blank\">@hermesma<\/a>.<br><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Older than most of the search engines we use today, the robots.txt file is a dinosaur of the web. Although [&hellip;]<\/p>\n<a href=\"https:\/\/technicalseo.com\/insights\/blog\/robots-txt-ambiguities\/\" class=\"read-more\">Read more <span class=\"mdi mdi-trending-neutral\"><\/span><\/a>","protected":false},"author":7,"featured_media":825,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v16.6.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Blocked or not? Wildcards, typos and other robots.txt ambiguities | TechnicalSEO.com<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/technicalseo.com\/insights\/blog\/robots-txt-ambiguities\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Blocked or not? Wildcards, typos and other robots.txt ambiguities | TechnicalSEO.com\" \/>\n<meta property=\"og:description\" content=\"Older than most of the search engines we use today, the robots.txt file is a dinosaur of the web. Although [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/technicalseo.com\/insights\/blog\/robots-txt-ambiguities\/\" \/>\n<meta property=\"og:site_name\" content=\"TechnicalSEO.com\" \/>\n<meta property=\"article:published_time\" content=\"2019-09-09T13:00:10+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2019-09-05T15:20:25+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/cropped-robotstxt-testing-tool.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1594\" \/>\n\t<meta property=\"og:image:height\" content=\"897\" \/>\n<meta name=\"twitter:card\" content=\"summary\" \/>\n<meta name=\"twitter:creator\" content=\"@hermesma\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Organization\",\"@id\":\"https:\/\/technicalseo.com\/insights\/#organization\",\"name\":\"Merkle Inc.\",\"url\":\"https:\/\/technicalseo.com\/insights\/\",\"sameAs\":[],\"logo\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/technicalseo.com\/insights\/#logo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/04\/merkle-bug-192.png\",\"contentUrl\":\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/04\/merkle-bug-192.png\",\"width\":192,\"height\":192,\"caption\":\"Merkle Inc.\"},\"image\":{\"@id\":\"https:\/\/technicalseo.com\/insights\/#logo\"}},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/technicalseo.com\/insights\/#website\",\"url\":\"https:\/\/technicalseo.com\/insights\/\",\"name\":\"TechnicalSEO.com\",\"description\":\"Blog, Podcast &amp; Presentations\",\"publisher\":{\"@id\":\"https:\/\/technicalseo.com\/insights\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/technicalseo.com\/insights\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/technicalseo.com\/insights\/blog\/robots-txt-ambiguities\/#primaryimage\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/cropped-robotstxt-testing-tool.jpg\",\"contentUrl\":\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/cropped-robotstxt-testing-tool.jpg\",\"width\":1594,\"height\":897},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/technicalseo.com\/insights\/blog\/robots-txt-ambiguities\/#webpage\",\"url\":\"https:\/\/technicalseo.com\/insights\/blog\/robots-txt-ambiguities\/\",\"name\":\"Blocked or not? Wildcards, typos and other robots.txt ambiguities | TechnicalSEO.com\",\"isPartOf\":{\"@id\":\"https:\/\/technicalseo.com\/insights\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/technicalseo.com\/insights\/blog\/robots-txt-ambiguities\/#primaryimage\"},\"datePublished\":\"2019-09-09T13:00:10+00:00\",\"dateModified\":\"2019-09-05T15:20:25+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/technicalseo.com\/insights\/blog\/robots-txt-ambiguities\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/technicalseo.com\/insights\/blog\/robots-txt-ambiguities\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/technicalseo.com\/insights\/blog\/robots-txt-ambiguities\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Insights\",\"item\":\"https:\/\/technicalseo.com\/insights\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Blog\",\"item\":\"https:\/\/technicalseo.com\/insights\/blog\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Blocked or not? Wildcards, typos and other robots.txt ambiguities\"}]},{\"@type\":\"Article\",\"@id\":\"https:\/\/technicalseo.com\/insights\/blog\/robots-txt-ambiguities\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/technicalseo.com\/insights\/blog\/robots-txt-ambiguities\/#webpage\"},\"author\":{\"@id\":\"https:\/\/technicalseo.com\/insights\/#\/schema\/person\/fb596e49364f81ed59cf70aaeb7ffdd3\"},\"headline\":\"Blocked or not? Wildcards, typos and other robots.txt ambiguities\",\"datePublished\":\"2019-09-09T13:00:10+00:00\",\"dateModified\":\"2019-09-05T15:20:25+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/technicalseo.com\/insights\/blog\/robots-txt-ambiguities\/#webpage\"},\"wordCount\":1197,\"publisher\":{\"@id\":\"https:\/\/technicalseo.com\/insights\/#organization\"},\"image\":{\"@id\":\"https:\/\/technicalseo.com\/insights\/blog\/robots-txt-ambiguities\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/cropped-robotstxt-testing-tool.jpg\",\"articleSection\":[\"Blog\"],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/technicalseo.com\/insights\/#\/schema\/person\/fb596e49364f81ed59cf70aaeb7ffdd3\",\"name\":\"Hermes Ma\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/technicalseo.com\/insights\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/hermes-ma-150x150.jpg\",\"contentUrl\":\"https:\/\/technicalseo.com\/insights\/wp-content\/uploads\/2019\/07\/hermes-ma-150x150.jpg\",\"caption\":\"Hermes Ma\"},\"sameAs\":[\"https:\/\/twitter.com\/hermesma\"],\"url\":\"https:\/\/technicalseo.com\/insights\/author\/hma\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/technicalseo.com\/insights\/wp-json\/wp\/v2\/posts\/821"}],"collection":[{"href":"https:\/\/technicalseo.com\/insights\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/technicalseo.com\/insights\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/technicalseo.com\/insights\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/technicalseo.com\/insights\/wp-json\/wp\/v2\/comments?post=821"}],"version-history":[{"count":65,"href":"https:\/\/technicalseo.com\/insights\/wp-json\/wp\/v2\/posts\/821\/revisions"}],"predecessor-version":[{"id":997,"href":"https:\/\/technicalseo.com\/insights\/wp-json\/wp\/v2\/posts\/821\/revisions\/997"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/technicalseo.com\/insights\/wp-json\/wp\/v2\/media\/825"}],"wp:attachment":[{"href":"https:\/\/technicalseo.com\/insights\/wp-json\/wp\/v2\/media?parent=821"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/technicalseo.com\/insights\/wp-json\/wp\/v2\/categories?post=821"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/technicalseo.com\/insights\/wp-json\/wp\/v2\/tags?post=821"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}