Site proprietors utilize the/robots.txt document to give
guidelines about their web page to web robots; this is known as The Robots
Exclusion Protocol.
It works prefers this: a robot needs to vists a Web website
URL, state http://www.example.com/welcome.html. Before it does as such, it
firsts checks for http://www.example.com/robots.txt, and finds:
A robots.txt document lives at the base of your site. In
this way, for site www.example.com, the robots.txt document lives at
www.example.com/robots.txt. robots.txt is a plain content document that pursues
the Robots Exclusion Standard. A robots.txt document comprises of at least one
standards. Each standard squares (or permits) access for an offered crawler to
a predetermined record way in that site.
Here is a basic
robots.txt record with two principles, clarified underneath:
# First Rule
Client specialist: Googlebot
Disallow:/nogooglebot/
# Second Rule
Client specialist: *
allow:/
Sitemap: http://www.example.com/sitemap.xml
There are two
important considerations when using /robots.txt:
robots can overlook your/robots.txt. Particularly malware
robots that examine the web for security vulnerabilities, and email address
gatherers utilized by spammers will give careful consideration.
the/robots.txt document is an openly accessible record.
Anybody can perceive what segments of your server you don't need robots to
utilize.
So don't endeavor to utilize/robots.txt to shroud data.
See also:
Can I block just bad robots?
Why did this robot ignore my /robots.txt?
What are the security implications of /robots.txt?
How to create a
/robots.txt file
Where to put it - in the top-level directory of your web
server.
See too:
What program would it be a good idea for me to use to
make/robots.txt?
How would I use/robots.txt on a virtual host?
How would I use/robots.txt on a common host?
What to put in it
The "/robots.txt" document is a content document,
with at least one records. Typically contains a solitary record resembling
this:
Client specialist: *
Disallow:/cgi-canister/
Disallow:/tmp/
Disallow:/~joe/
In this precedent, three indexes are Disallow:.
Note that you require a different " Disallow:"
line for each URL prefix you need to bar - you can't state
"Deny:/cgi-container//tmp/" on a solitary line. Additionally, you
might not have clear lines in a record, as they are utilized to delimit
numerous records.
Note likewise that globing and customary articulation are
not bolstered in either the User-specialist or Disallow lines. The '*' in the
User-operator field is an exceptional esteem signifying "any robot".
In particular, you can't have lines like "Client specialist: *bot*",
"Deny:/tmp/*" or "Refuse: *.gif".
Read More Visit - www.sseducationlab.in

No comments:
Post a Comment