Robots.txt to Disallow or Block Subdomains for Google and Other Search Engines

4074
Bad bots

If you are here that means you are looking for a guide to disallow or block your subdomain (subdomain.example.com) for Google and other search engines like Bing, Duckduckgo and Yahoo. Follow the instructions below to disallow or block your subdomains from being crawled by search engine bots and crawlers.

What is a subdomain?

Your subdomain is the extension of your domain. A subdomain is a part of a larger domain in the Domain Name System (DNS) hierarchy. In a URL (Uniform Resource Locator), a domain is typically divided into multiple parts, and a subdomain is one of these parts. Subdomains are used to organize and categorize different sections or functions of a website or network. They are added to the left of the main domain name and separated by a dot (period).

Example of a subdomain?

www.amazon.com (Main domain)
www.aws.amazon.com (A subdomain for Amazon's cloud business)

Do you want to disallow all of your subdomains or one of them? If yes, just follow the following steps one by one.

To keep in mind – how search engines treat your subdomains

All the search engines like Google, Bing and Duckduckgo treat subdomains as an individual domain. Writing Robots.txt for root domain or any subdomain won’t solve your problem. You will have to write separate Robots.txt files to disallow a domain, subdomain, a directory, or wildcards from your site.

Follow the steps below – case based

theproche.com (Don't want to block)
dev.theproche.com (Want to block)
staging.theproche.com (Want to block)
blog.theproche.com (Want to block comments and admin section)

theproche.com (Don’t want to block)

If you want your root domain to allow for all then do nothing with your robots.txt file. If you don’t upload Robots.txt, SEs treat it allowed by default. Or, you can upload the following file.

User-agent: *
Disallow: 

dev.theproche.com (You want to block this subdomain)

By placing the below Robots.txt file you will block your subdomain from crawling. If you want to be double sure use Robots.txt and No-index meta as described below.

User-agent: *
Disallow: /
<meta name="robots" content="noindex">
<meta name="googlebot" content="noindex">

blog.theproche.com (Only want to block comments and admin section)

If you run a website with an admin section where some sensitive information exists. We recommend you to block all those directories of your website which you don’t want to get crawled and scrapped by web bots.

Bloggers are often worried about spam comments on their website which can degrade their website reputation. To do so, we recommend you to block that directory also. Use the following code to do so. This file blocks the mentioned directories for all the bots not whole website.

User-agent: *
Disallow: /wp-admin/
Disallow: /comments/

If you have custom coded your website, go ask your developer about your website information architecture.

Where do I find my robots.txt file?

Your robots.txt file is located in the root directory of server and in most of the cases /public-html folder of your linux server. To check whether you have robots.txt file on your server or not simply open your website adding /robots.txt to the url. Following will be the path of your robots.txt file. If it exists it should open a file otherwise you will get 404 not found page.

www.example.com/robots.txt

How do I create a robots.txt file?

You can use your notepad or notepad++ to create a robots.txt file. Simply paste the above given code in your file and upload that file on your server.

Read this article: Robots.txt File to Disallow the Whole Website