Crawler directory
CC
Training crawlersUser agent + IP range

Common Crawl

CCBot

About this crawler

CCBot is tracked as a public-content crawler from Common Crawl.

Usually means: The crawler is collecting public web content that may be used for model training, datasets, or product improvement.

User agent

CCBot

Faurya matches these as user-agent tokens, not full browser strings. Providers may add surrounding version or product text.

IP verification

Verified when user agent and IP range both match

Faurya first matches the crawler user-agent token, then compares the request IP with published ranges from the crawler operator when those ranges exist.

User agent

Request user agent contains CCBot.

Request IP

Server-side tracking forwards the crawler IP seen by your backend.

Range check

The IP is compared with published CIDR ranges or JSON range endpoints.