Skip to main content

Your first crawl

When you create a bot, MimicBot automatically enqueues a Temporal workflow that crawls the clientUrl domain, extracts content, chunks it for retrieval, and indexes it. You don't trigger this — it's automatic.

Status transitions

A bot moves through three states during its first crawl:

StatusMeaning
draftBot created, crawl not started yet. Usually lasts under 5 seconds.
indexingCrawl in progress. Depending on site size, can take from seconds to tens of minutes.
readyCrawl finished and content is searchable. The widget will now answer questions.

How the crawl works

  1. botIndexingWorkflow in worker-ts orchestrates the crawl.
  2. HTML fetching and action extraction are delegated to a Python activity in worker-py.
  3. Each discovered page becomes a row in the pages table with a chunked vector index.
  4. Legacy bot_actions (forms, newsletter signups) are discovered with status review — you activate them manually later.

Trigger a manual re-crawl

If you update your site content and want the bot to see the changes, re-crawl manually:

curl -X POST https://api.mimicbot.app/api/bots/{botId}/crawl \
-H "Authorization: Bearer $MIMICBOT_TOKEN"

The response is 202 Accepted — the crawl runs asynchronously.

Limits

  • crawlConfig.maxPages — defaults to 500, maximum 5000.
  • crawlConfig.respectRobotsTxt — defaults to true. Pages disallowed by robots.txt are skipped.
  • Scheduled crawls (daily, weekly) — not yet wired. Manual re-crawl is the only supported mode today.

Next

→ Verify installation