All Web Crawlers and their User Agent’s list

Sameer Memon
Sameer Memon
17 Min Read
List of web crawlers and user agents
- Advertisement -

What are Web crawlers or Web Spiders?

In simple language, web spiders are the bots or programs used by various search engines to get details about your website and index them. They can browse each kind of content such as text content, images, links on pages, sitemaps, etc. They browse the website automatically and gather information from websites to index them.

Here, we are sharing a list of all web crawlers used by the different search engines. This list will help you to make a better robots.txt file for your website by allowing or blocking the required user agents.

Types of Web Crawlers

SearchBots: These are the search bots used by the search engine to crawl websites, views images, and links, and index them on the internet.

Here are some common SearchBots:- GoogleBot – used by Google, BingBot – used by Bing, SlurpBot – used by Yahoo, etc

CommercialBots: These are the bots used by some SEO websites to provide you with SEO reports of a particular website so that you can solve any SEO issues on the Site. For e.g Ahrefsbot – Used by ahref.com, SemrushBot – Used by Semrush.com, etc

Feed Fetchers Bots: These are the bots used to collect thumbnails and titles of the contents to display on their website. For e.g. Facebook external hit – Used by the Facebook website. Twitter bot – used by Twitter.

- Advertisement -

Monitoring Bots: These are checking bots that are used to check the performance of the websites like uptime, pinback, etc. For e.g. WordPress (pingback) – Used by WordPress. (not covered in this post)

List of web crawlers and their User-agents

1. GoogleBot

google website crawler googlebot

What is Googlebot?

Googlebot is the most active good bot that is used by Google to view the contents of your website and index them. They actively visit your website and go through all your content.

User-Agent

Googlebot

User-Agent string

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Googlebot example in robots.txt

- Advertisement -

Below is an example showing how to prevent Google from indexing your webpage https://example.com/exnoindex/donotindexthis.html

User-agent: Googlebot
Disallow: /exnoindex/donotindexthis.html

If you want to restrict Google to index your complete website, you can use the below line in your robots.txt

User-agent: Googlebot
Disallow: /

Apart from Googlebot, google uses more than 9 user agents for different crawling purposes.

Below is the list of all web crawlers used by Google.

User AgentsCrawlers DetailsFull User String
Mediapartners-GoogleUsed for Google AdsenseMediapartners-Google
AdsBot-Google-MobileUse to show ads on Mobile apps
(Android/iPhone)
Android:- Mozilla/5.0 (Linux; Android 5.0; SM-G920A) AppleWebKit (KHTML, like Gecko) Chrome Mobile Safari (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)

Iphone:- Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1 (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)
AdsBot-GoogleUse to show ads on the websAdsBot-Google (+http://www.google.com/adsbot.html)
Googlebot-Image
Googlebot
Used to crawl images from websitesGooglebot-Image/1.0
Googlebot-News
Googlebot
Used to crawl newsIn 2011, Google declared that Googlebot will be used to crawl News. However, Googlebot-News will still respect the robots.txt of the website.
Googlebot-Video
Googlebot
Used to index your videos from websites and youtube.Googlebot-Video/1.0
Google FaviconShow your favicon in the google search resultMozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36 Google Favicon
crawlers used by Google

you can find the rest of the bot’s details here Googlebots.

- Advertisement -

2. Bingbot

bing website crawlers bingbot

What is Bingbot?

Bingbot is a web crawler Bing uses to crawl website contents and images and index them in Search Engine. It replaced the MSNbot back in 2010.

User-Agent

Bingbot

User-Agent string

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/

Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
W.X.Y.Z Safari/537.36

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

Below is the list of all web crawlers used by Bing:

User AgentsCrawlers DetailsFull User String
BingbotUsed to crawl website contentsMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/

Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
W.X.Y.Z Safari/537.36

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
AdIdxBotUsed by Bing ads. They crawl the ads and follow the link to the adsMozilla/5.0 (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)

Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)

Mozilla/5.0 (Windows Phone 8.1; ARM; Trident/7.0; Touch; rv:11.0; IEMobile/11.0; NOKIA; Lumia 530) like Gecko (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)
BingPreviewUsed to generate previews of the website for BingMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/W.X.Y.Z Safari/537.36

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
MicrosoftPreviewIt generates snapshots for Microsoft productsMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; MicrosoftPreview/2.0; +https://aka.ms/MicrosoftPreview) Chrome/W.X.Y.Z Safari/537.36
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; MicrosoftPreview/2.0; +https://aka.ms/MicrosoftPreview)
crawlers used by Bing

Bingbot example in robots.txt

Use the below command in your robots.txt to prevent a particular page from being index in Bing

Useragent: Bingbot
Disallow: /exnoindex/donotindexthis.html

If you want to restrict Bing from indexing your complete website, you can use the below line in your robots.txt

User-agent: Bingbot 
Disallow: /

You can use the Robots.txt tester to validate your robots.txt file. Find more detail about creating robots.txt for Bing.

3. Slurpbot

yahoo website crawlers slurp bot

Slurp is a web crawler used by Yahoo. Yahoo gets its search results from Slurp and Bing web crawlers. While the majority of Yahoo results are powered by Bing, it is advised to allow Slurpbot to get your website to appear in Yahoo mobile search results.

Apart from search, Slurp also helps to collect content from sites and include them in sites like Yahoo News, Yahoo Finance, and Yahoo Sports.

User-agent

Slurp

User-Agent string

Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

Example of code in a robots.txt file to allow index:

User-agent: Slurp
Allow: /

Read more documentation on Slurp

4. DuckDuckBot

duckduckgobot

Similar to other search engines, DuckDuckGo uses a web crawler known as DuckDuckBot. DuckDuckGo has now become quite a popular browser because it doesn’t track users and respects their privacy. DuckDuckGo respects robots.txt rules as well.

User-agent

DuckDuckBot

User-Agent string

DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html)

Read more about DuckDuckBot

5. Baiduspider

baidus bot

As Google doesn’t operate in China, Baidu is the most used search engine there and Baiduspider is the official name of the crawler used by Baidu.

Like any other search engine crawler, Baiduspider visits your websites, reads your content, and indexes them based on relevancy.

User-agent

Baiduspider

User-Agent string

Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)

Just like Google and Bing, Baidu uses multiple bots for different content. List of all the crawlers of Baidu:

User AgentsCrawlers Details
Baiduspider-imageBaidu Image Search
BaiduspiderBaidu Web/Mobile Search
Baiduspider-videoBaidu Video Search
Baiduspider-cproBaidu Union Search
Baiduspider-newsBaidu News Search
Baiduspider-favoBaidu Bookmark Search
Baiduspider-adsBaidu Business Search
crawlers used by Baidu

Read more about Baidu Spider

6. Yandex Bot

yandex web bot

Yandex Bot is the Yandex search engine crawler that visits your website and helps them get indexed on Yandex Search Result.

Yandex is the largest Search Engine in Russia. So if your targeted audience lies in Russian countries, you probably don’t want to block Yandex.

User-agent

YandexBot

User-Agent string

Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
User AgentsCrawlers DetailsFull User StringFollow
robots.txt?
YandexAccessibilityBotYandexAccessibilityBot downloads pages to check their accessibility for users.Mozilla/5.0 (compatible; YandexAccessibilityBot/3.0; +http://yandex.com/bots)No
YandexAdNetThe Yandex advertising network robot.Mozilla/5.0 (compatible; YandexAdNet/1.0; +http://yandex.com/bots)Yes
YandexBlogsThe blog search robot that indexes post comments.Mozilla/5.0 (compatible; YandexBlogs/0.99; robot; +http://yandex.com/bots)Yes
YandexBotDetecting site mirrors.Mozilla/5.0 (compatible; YandexBot/3.0; MirrorDetector; +http://yandex.com/bots)Yes
YandexFaviconsDownloads the site’s favicon file to display in search results.Mozilla/5.0 (compatible; YandexFavicons/1.0; +http://yandex.com/bots)No
crawlers used by Yandex

Apart from this, there are many bots that Yandex uses.

7. MJ12bot

MJ12bot

MJ12bot is a web crawler bot for Majestic, a UK-based search engine that operates in 13 languages in 60+ countries. Powers hundreds of thousands of businesses to get their website online.

It respects robots.txt.

User-agent

MJ12bot

8. Sogou Spider

sogou spider

Sogou is a Chinese Search Engine with an Alexa rank of 121 as of 2010. It was launched in 2004. It powers 10 billion web pages. Sogou Spider is the name of a web crawler used by Sogou.com to read website contents in index them on the internet.

User-agent

Sogou web spider

User-Agent string

Sogou Web Spider mobile user agent

MQQBrowser/26 Mozilla/5.0 (Linux; U; Android 4.4.2; zh-cn; MB200 Build/GRJ22; CyanogenMod-7) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 (compatible; Sogou web spider/4.0 ; +http://www.sogou.com/docs/help/webmasters.htm#07)

Sogou Web Spider desktop user agent

Sogou web spider/4.0 (+http://www.sogou.com/docs/help/webmasters.htm#07)

9. Exabot

exalead web crawler

Exabot is the web crawler used by Exalead’s robot. It collects data from all around the world and supplies it to search engines. Exabot collects data and includes it in the main index of Exalead and thereby included in the search results of Exaleads

User-agent

Exabot

Example of robots.txt to prevent indexing of pages from a particular directory (for example, football):

User-agent: Exabot
Disallow: football

10. Alexa crawler

alexa crawler

Alexa retired on May 1, 2022. Alexa was an American Web traffic analysis company by Amazon. Popularly known as Alexa rank by internet was a key metric of Alexa, that was based on estimated visitors of the websites per day.

11. Soso Spider

soso crawlers

Soso Spider is an automated web crawler for the Soso search engine owned by Tencent Holdings Limited, famous for QQ. Soso is the 13th most visited website in china and 36th in the world with over 20m page views daily.

User-agent

Sosospider
Sosospider+

User-Agent string

Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)

12. Pinterestbot

pinterest bot

Pinterestbot is a crawler used by Pinterest to download images of products from your website’s catalog. It also downloads metadata of the products including price, availability, and description.

It also checks the authenticity of the website under pin pictures.

User-agent

Pinterestbot

User-Agent string

Pinterest/0.2 (+https://www.pinterest.com/bot.html) 

Mozilla/5.0 (compatible; Pinterestbot/1.0; +https://www.pinterest.com/bot.html) 

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Pinterestbot/1.0; +https://www.pinterest.com/bot.html)

You can restrict pinterest from crawling your site by using below command in robots.txt

user-agent: Pinterestbot 
disallow: /

PinterestBot respects robots.txt rules.

13. SemrushBot

semrushbot

Semrush bot is a search bot software that Semrush uses to collect SEO data of your sites and use them for analytics including On-page SEO, backlinks, content analysis, and many more.

It constantly crawls your websites to get updated data. If you do not use any Semrush tools or do not intend to use this in the future, it a wise advice to block this bot.

Semrush uses different bots for different tools:

User AgentsCrawlers Details
SiteAuditBotTo find different SEO and technical issues.
SemrushBot-BAFor the backlink audit tool.
SemrushBot-SIOn-Page SEO Checker tool and similar tools.
SemrushBot-SWAChecking URLs on your site for the SWA tool.
SemrushBot-CTContent Analyzer and Post Tracking tools
SplitSignalBotSplitSignal tool
SemrushBot-COUBContent Outline Builder tool
crawlers used by Semrush

Semrush follows robots.txt rules, you can block these crawlers by adding rules in robots.txt files

User-agent: SemrushBot
Disallow: /

14. Dotbot

Mozbot

Similar to Semrush, Moz uses Dotbot to find Seo and technical issues on a website. Moz is a Seo tool used for keyword research, backlink finding, and many more tools.

Data collected by Dotbot can be accessed only through pro account of MOZ, so if you ever plan to use pro membership of moz, you can allow dotbot to crawl your site. Or simply block it to save your bandwidth.

User-agent

dotbot

Block Moz from crawling your site:

User-agent: dotbot
Disallow: /      

15. AhrefsBot

ahrefsbot

Again, ahrefs is a marketing tool used for link building and website SEO audit. Ahrefsbot is used to scrap your website data and provide you with audit reports including technical issues from your website. This report is then used to improve your website SEO and much more.

Again if you are not planning to use ahrefs marketing tool, you can block their bot:

User-agent

AhrefsBot

Block Ahrefs bot from crawling your site:

User-agent: AhrefsBot
Disallow: /

Find more detail on Ahrefsbot

16. Facebook external hit

Facebook web crawler

Facebook external hit is the web crawler used by Facebook to gather metadata such as thumbnails, titles, and descriptions of the post. Whenever you copy-paste links from a website to FB, the FB crawler hits the website and collects metadata to show to FB users.

You should not block this bot if you plan to share your post of FB.

User-agent

facebookexternalhit

User-Agent string

facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

Example in robots.txt:

User-agent: facebookexternalhit
Disallow: /

read more on facebook crawler.

17. archive.org_bot

Wayback Machine or Internet archives saves a copy of your website in their database of around 150 billion web pages. They use archive.org_bot to keep a snapshot of the web page or a book or probably any online elements, these are then stored and can be accessed by anyone using their website.

I personally block this bot.

User-agent

archive.org_bot

Example in robots.txt

User-agent: archive.org_bot
Disallow: /

Conclusion

With this, we have come to the end of our web crawler lists. I hope this list will help you to properly allow or block the user agents that harm your bandwidth and provides no value to you.

You should be now able to distinguish between good and bad bots. This list will help you to design a better robots.txt file for your website.

- Advertisement -
Follow:
Sameer Memon is a passionate writer with 3 years of experience in Blogging. With a strong background in Blogs, SEO, and Social Media Marketing, Sameer has been creating engaging content on various topics for a wide range of clients. As a dedicated and driven individual, Sameer takes pride in delivering well-researched and thoughtfully written articles that provide readers with valuable insights and information. He is constantly striving to stay up-to-date with the latest industry trends and techniques to produce content that is not only informative but also engaging and easy to read. When Sameer isn't busy writing, he enjoys gaming. He also loves to explore new places to gain fresh perspectives and inspiration for his writing. If you're interested in learning more about Sameer and his work, you can follow him on social media platform at give handles.